Welcome to this repository, which aims to help MCT trainers to deploy a demo environment (see instructions here). It builds further on the work already done at the official DP-203: Azure Data Engineer git repository, which contains instructions and assets for hands-on excercises.
The dp-203.bicep template can be used to deploy all the required resources:
- Data Lake with containers
- Synapse workspace
- Dedicated pool (will be paused every 2 hours)
- Spark pool (auto pause after 15 minutes of inactivity)
- Data explorer pool (will be paused every 2 hours)
- SQL Server with AdventureWorksLT database
- Eventhub + namespace
- Purview account
- Logic app to pause dedicated pool and data explorer pool every 2 hours
- Cosmos DB Account (no database yet)
- Databricks (without cluster)
Additionally, you can find below a set of mindmaps that summarize each of the Learn Modules. Click on the mindmap itself to get the SVG file. The labs contain a high-level summary of what needs to be done.
Introduction to data engineering on Azure
Introduction to Azure Data Lake Storage Gen2
Introduction to Azure Synapse Analytics
Lab 1: Explore Azure Synapse Analytics, 90 min
- Explore Synapse Studio
- Ingest data with a pipeline
- Use a serverless SQL pool to analyze data
- Use a Spark pool to analyze data
- Use a dedicated SQL pool to query a data warehouse
- Explore data with a Data Explorer pool
Use Azure Synapse serverless SQL pool to query files in a data lake
Demo 2: Query files using a serverless SQL pool, 45 min
- Query data in files
- Access external data in a database
- Visualize query results
Use Azure Synapse serverless SQL pools to transform data in a data lake
Lab 3: Transform data using a serverless SQL pool, 30 min
- Query data in files
- Transform data using CREATE EXTERAL TABLE AS SELECT (CETAS) statements
- Encapsulate data transformation in a stored procedure
Create a lake database in Azure Synapse Analytics
Demo 4: Analyze data in a lake database, 45 min
- Create a lake database (RetailDB in files/labs-dp-203/04/RetailDB)
- Create a table (Customer + Load csv + query SQL)
- Create a table from a database template (RetailProduct -> Product -> Load csv + query SQL)
- Create a table from existing data (Load salesorder.csv, Create table SalesOrder, define fields, create relationships)
- Work with lake database tables
Analyze data with Apache Spark in Azure Synapse Analytics
Demo 5: Analyze data in a data lake with Spark, 45 min
- Query data in files
- Analyze data in a dataframe
- Query data using Spark SQL
- Visualize data with Spark
Transform data with Spark in Azure Synapse Analytics
Lab 6: Transform data using Spark in Synapse Analytics, 30 min
- Use a Spark notebook to transform data
Use Delta Lake in Azure Synapse Analytics
Lab 7: Use Delta Lake in Azure Synapse Analytics, 40 min
- Create delta tables
- Create catalog tables
- Use delta tables for streaming data
- Query a delta table from a serverless SQL pool
Analyze data in a relational data warehouse
Demo 8: Explore a relational data warehouse, 45 min
- Explore the data warehouse schema
- Query the data warehouse tables
- Challenge - Analyze reseller sales
Load data into a relational data warehouse
Lab 9: Load Data into a Relational Data Warehouse, 30 min
- Prepare to load data
- Load data warehouse tables
- Perform post-load optimization
Build a data pipeline in Azure Synapse Analytics
Lab 10: Build a data pipeline in Azure Synapse Analytics, 45 min
- View source and destination data stores
- Implement a pipeline
- Debug the Data Flow
- Publish and run the pipeline
Use Spark Notebooks in an Azure Synapse Pipeline
Lab 11: Use an Apache Spark notebook in a pipeline, 30 min
- Run a Spark notebook interactively
- Run the notebook in a pipeline
Plan hybrid transactional and analytical processing using Azure Synapse Analytics
Implement Azure Synapse Link with Azure Cosmos DB
Lab 14: Use Azure Synapse Link for Azure Cosmos DB, 35 min
- Configure Synapse Link in Azure Cosmos DB
- Configure Synapse Link in Azure Synapse Analytics
- Query Azure Cosmos DB from Azure Synapse Analytics
Implement Azure Synapse Link for SQL
Demo 15: Use Azure Synapse Link for SQL, 35 min
- Configure Azure SQL Database
- Explore the transactional database
- Configure Azure Synapse Link
Get started with Azure Stream Analytics
Demo 17: Get started with Azure Stream Analytics, 15 min
rm -r dp-203 -f
git clone https://github.com/MicrosoftLearning/dp-203-azure-data-engineer dp-203
cd dp-203/Allfiles/labs/17
code setup.txt
az eventhubs namespace authorization-rule keys list --name RootManageSharedAccessKey --namespace-name events-michals --resource-group rg-dp-203 | jq '.primaryConnectionString'
(ctrl+s, ctrl+q)
cp setup.txt orderclient.js
npm install @azure/event-hubs | Out-Null
node ~/dp-203/Allfiles/labs/17/orderclient
- View the streaming data source
- Create an Azure Stream Analytics job
- Create an input for the event stream
- Create an output for the blob store
- Create a query
- Run the streaming job
Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytics
Lab 18: Ingest realtime data with Azure Stream Analytics and Azure Synapse Analytics, 45 min
- Ingest streaming data into a dedicated SQL pool
- Summarize streaming data in a data lake
Visualize real-time data with Azure Stream Analytics and Power BI
Demo 19: Create a realtime report with Azure Stream Analytics and Microsoft Power BI, 45 min
- Create a Power BI workspace
- Use Azure Stream Analytics to process streaming data
- Visualize the streaming data in Power BI
Introduction to Microsoft Purview
Integrate Microsoft Purview and Azure Synapse Analytics
Lab 22: Use Microsoft Purview with Azure Synapse Analytics, 40 min
- Catalog Azure Synapse Analytics data assets in Microsoft Purview
- Integrate Microsoft Purview with Azure Synapse Analytics
Demo 23: Explore Azure Databricks, 30 min
- Create a cluster
- Use Spark to analyze a data file
- Create and query a database table
Use Apache Spark in Azure Databricks
Lab 24: Use Spark in Azure Databricks, 45 min
- Provision an Azure Databricks workspace
- Create a cluster
- Explore data using a notebook
Run Azure Databricks Notebooks with Azure Data Factory
Demo 25: Use Delta Lake in Azure Databricks, 40 min
- Provision an Azure Databricks workspace
- Create a cluster
- Explore data using a notebook
Demo 26: Use a SQL Warehouse in Azure Databricks, 30 min
- View and start a SQL Warehouse
- Create a database
- Create a table
- Create a query
- Create a dashboard
Demo 27: Automate an Azure Databricks Notebook with Azure Data Factory, 40 min
- Import a notebook
- Enable Azure Databricks integration with Azure Data Factory
- Use a pipeline to run the Azure Databricks notebook