跳至主要内容

Exchange data between zeppelin pyspark and spark session

Problem: dataframe are not shared?

As of version Zepplin(0.7.0), Spark dataframe are not shared between %pyspark (python) and %spark (scala) session.

Solution: exchange by using temporary table

do the following

#%pyspark

somedf.registerTempTable("somedftable")

and then rebuild the DataFrame in scala session

//%scala

val somedf = sqlContext.table("somedftable")

z.show(somedf.limit(20))

 

评论