Learn the pyspark API through pictures and simple examples
# flatMap
x = sc.parallelize([1,2,3])
y = x.flatMap(lambda x: (x, 100*x, x**2))
print(x.collect())
print(y.collect())
[1, 2, 3]
[1, 100, 1, 2, 200, 4, 3, 300, 9]
- install Spark
- install IPython notebook
-
start pyspark inside IPython notebook
IPYTHON_OPTS="notebook" pyspark
-
open browser to notebook link
-
open pyspark-pictures.ipynb or pyspark-pictures-dataframes.ipynb
-
edit example code, press: ctrl + enter to run each cell
Contributors are welcome
Original images are here, download to pdf, convert to svg with: genSVD (pdf2svg)