跳至主要内容

博文

目前显示的是 二月, 2019的博文

Exchange data between zeppelin pyspark and spark session

Problem: dataframe are not shared? As of version Zepplin(0.7.0), Spark dataframe are not shared between %pyspark (python) and %spark (scala) session. Solution: exchange by using temporary table do the following #%pyspark somedf.registerTempTable("somedftable") and then rebuild the DataFrame in scala session //%scala val somedf = sqlContext.table("somedftable") z.show(somedf.limit(20))  

scala case class and No TypeTag available

No TypeTag available compiling… Following code works whell in zeppelin section val textRdd = sc.textFile("hdfs://nameservice1/user/myname/bank/bank.csv") case class TextLine(lineText: String) val modelDates = textRdd.map( s => TextLine(s.trim)).toDF() modelDates.sort(col("lineText").desc).as[String].first() But if I make a function from it, err!! def maxValueIn(hdfsPath: String) = { val textRdd = sc.textFile(hdfsPath) case class TextLine(lineText: String) val modelDates = textRdd.map( s => TextLine(s.trim)).toDF() modelDates.sort(col("lineText").desc).as[String].first() } Solution: Move the case class out of def!!! As described in the stack overflow question , move the case class out of the method, the code finally compiles. import org.apache.spark.SparkContext import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.functions.col trait HdfsTextFile { val spark: SparkContext val sqlContext: HiveContext impo...

Cloudera 5.11.x Spark action on oozie failed to access hive table

Spark workflow fails for not able to access hive table This is an odd issue. With the same spark program, it will fail to access hive table if scheduled by Ozzie, but runs well if run manually by using spark-submit. Solution Add hive-site.xml to the spark workflow to make sure hive context is correctly initialized. Put the hive-site.xml into hdfs On one of the cluster node, find the hive configuration XML ‘hive-site.xml’ from /etc/hive/conf. Copy it to somewhere on hdfs. Add the hive-site.xml as one of the “FILES” of spark workflow In Hue workflow editor, click the plus sign on “FILES” to add a new “FILE” element. Write the corresponding hdfs path of just copied ‘hive-site.xml’.