#

pyspark-sql

Here are 8 public repositories matching this topic...

LalitSharma7 / F1-Data-Analysis

Project based on application of azure databricks

azure databricks pysaprk pyspark-sql

Updated Mar 7, 2023
Python

codeplinth / pysparkbootcamp

pyspark pyspark-tutorial pyspark-api pyspark-python pyspark-sql

Updated Oct 8, 2021
Python

CamilaJaviera91 / pyspark-first-approach

This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.

os pandas path kaggle curses gspread matplotlib fpdf google-oauth2 shutil linearregression pyspark-python kaggle-api pathlib pyspark-sql sparksession vectorassembler window-pyspark

Updated Mar 31, 2025
Python

CamilaJaviera91 / sql-mock-data

Generate a synthetic dataset with one million records of employee information from a fictional company, load it into a PostgreSQL database, create analytical reports using PySpark and large-scale data analysis techniques, and implement machine learning models to predict trends in hiring and layoffs on a monthly and yearly basis.

python unicode sql random logging postgresql os faker locale pyspark connection matplotlib sys psycopg2 shutil pyspark-sql random-python sparksession

Updated Apr 29, 2025
Python

nmcintyre5 / admissionPredictionML

This script builds a linear regression model using PySpark to predict student admissions at Unicorn University.

machine-learning spark linear-regression pyspark pyspark-sql

Updated Apr 25, 2024
Python

steve303 / sparkSQL

Objective: Perform word count tasks and joins using spark SQL within a Docker container

apache-spark pyspark-sql

Updated Mar 15, 2022
Python

avimonda298 / Pyspark

Worked on Pyspark file streaming

pyspark pyspark-python pyspark-streaming pyspark-sql

Updated Jun 11, 2023
Python

Tinmarian / Airflow2.0-De-0-a-Heroe

Repositorio para realizar el curso en Udemy llamado "Airflow2.0 De 0 a Héroe", de la academia "Datapath".

aws airflow azure aws-s3 gcp python3 pyspark azure-storage dataproc gcp-storage dataproc-clusters pyspark-sql

Updated Feb 9, 2023
Python

Improve this page

Add a description, image, and links to the pyspark-sql topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-sql topic, visit your repo's landing page and select "manage topics."