Treat PySpark DataFrames like pandas.
Turn somethinig like this:
# Pure PySpark API; df is type pyspark.sql.DataFrame
def multiply(n):
return udf(lambda col: col * n, FloatType())
df = df.withColumn('new_col', df.select(multiply(2)(df['other_col'])))
...into this:
# Using pontem.core.DataFrame object.
df['new_col'] = df['other_col'] * 2