Project based on application of azure databricks
-
Updated
Mar 7, 2023 - Python
Project based on application of azure databricks
This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.
Generate a synthetic dataset with one million records of employee information from a fictional company, load it into a PostgreSQL database, create analytical reports using PySpark and large-scale data analysis techniques, and implement machine learning models to predict trends in hiring and layoffs on a monthly and yearly basis.
This script builds a linear regression model using PySpark to predict student admissions at Unicorn University.
Objective: Perform word count tasks and joins using spark SQL within a Docker container
Worked on Pyspark file streaming
Repositorio para realizar el curso en Udemy llamado "Airflow2.0 De 0 a Héroe", de la academia "Datapath".
Add a description, image, and links to the pyspark-sql topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-sql topic, visit your repo's landing page and select "manage topics."