ETL_with_Pyspark_-_SparkSQL

A sample project designed to demonstrate ETL process using Pyspark & Spark SQL API in Apache Spark.

In this project I used Apache Sparks's Pyspark and Spark SQL API's to implement the ETL process on the data and finally load the transformed data to a destination source.

I have used Azure Databricks to run my notebooks and to create jobs for my notebooks. To orchestrate the entire workflow, I have used Azure data factory to create the pipelines.

Note: Any resources deployed in azure has an associated price involved. So, user's are wholely responsible for creating and deploying resources to azure and also responsible for all the charges that are incurred if any.

-------------------************************-------------------

main_latest branch:

This branch contains the updated code of the main project that's under main_old branch.

New implementaions/changes:

When compared with the code of main_old branch the number of notebooks and the number of lines of code were decreased to acheive the goal of automating the entire ETL process by creating a single generic notebook that will be used for performing the transformations on the data.

I will be updating this readme file soon with the links of medium post and youtube video where i have clearly explained the new changes that i have done to the old notebooks/code.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Dim_Sales_customers.html		Dim_Sales_customers.html
README.md		README.md
Sales-Dataflow_orchestrator.html		Sales-Dataflow_orchestrator.html
Sales_Load_Data.html		Sales_Load_Data.html
Sales_Load_Initiator.html		Sales_Load_Initiator.html
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL_with_Pyspark_-_SparkSQL

About

Releases

Packages

Languages

PujitH-V/ETL_with_Pyspark_-_SparkSQL

Folders and files

Latest commit

History

Repository files navigation

ETL_with_Pyspark_-_SparkSQL

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages