Skip to content

michalsadowski/dp-203

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DP-203 Setup for trainers (MCT)

Welcome to this repository, which aims to help MCT trainers to deploy a demo environment (see instructions here). It builds further on the work already done at the official DP-203: Azure Data Engineer git repository, which contains instructions and assets for hands-on excercises.

The dp-203.bicep template can be used to deploy all the required resources:

  • Data Lake with containers
  • Synapse workspace
  • Dedicated pool (will be paused every 2 hours)
  • Spark pool (auto pause after 15 minutes of inactivity)
  • Data explorer pool (will be paused every 2 hours)
  • SQL Server with AdventureWorksLT database
  • Eventhub + namespace
  • Purview account
  • Logic app to pause dedicated pool and data explorer pool every 2 hours
  • Cosmos DB Account (no database yet)
  • Databricks (without cluster)

Whiteboards and labs

Additionally, you can find below a set of mindmaps that summarize each of the Learn Modules. Click on the mindmap itself to get the SVG file. The labs contain a high-level summary of what needs to be done.

DP-203-01-Intro to data engineering

Introduction to data engineering on Azure

mindmap

Introduction to Azure Data Lake Storage Gen2

mindmap

Introduction to Azure Synapse Analytics

mindmap

Lab 1: Explore Azure Synapse Analytics, 90 min

  1. Explore Synapse Studio
  2. Ingest data with a pipeline
  3. Use a serverless SQL pool to analyze data
  4. Use a Spark pool to analyze data
  5. Use a dedicated SQL pool to query a data warehouse
  6. Explore data with a Data Explorer pool

DP-203-02-Serverless SQL

Use Azure Synapse serverless SQL pool to query files in a data lake

mindmap

Demo 2: Query files using a serverless SQL pool, 45 min

  1. Query data in files
  2. Access external data in a database
  3. Visualize query results

Use Azure Synapse serverless SQL pools to transform data in a data lake

mindmap

Lab 3: Transform data using a serverless SQL pool, 30 min

  1. Query data in files
  2. Transform data using CREATE EXTERAL TABLE AS SELECT (CETAS) statements
  3. Encapsulate data transformation in a stored procedure

Create a lake database in Azure Synapse Analytics

mindmap

Demo 4: Analyze data in a lake database, 45 min

  1. Create a lake database (RetailDB in files/labs-dp-203/04/RetailDB)
  2. Create a table (Customer + Load csv + query SQL)
  3. Create a table from a database template (RetailProduct -> Product -> Load csv + query SQL)
  4. Create a table from existing data (Load salesorder.csv, Create table SalesOrder, define fields, create relationships)
  5. Work with lake database tables

DP-203-03-Spark

Analyze data with Apache Spark in Azure Synapse Analytics

mindmap

Demo 5: Analyze data in a data lake with Spark, 45 min

  1. Query data in files
  2. Analyze data in a dataframe
  3. Query data using Spark SQL
  4. Visualize data with Spark

Transform data with Spark in Azure Synapse Analytics

mindmap

Lab 6: Transform data using Spark in Synapse Analytics, 30 min

  1. Use a Spark notebook to transform data

Use Delta Lake in Azure Synapse Analytics

mindmap

Lab 7: Use Delta Lake in Azure Synapse Analytics, 40 min

  1. Create delta tables
  2. Create catalog tables
  3. Use delta tables for streaming data
  4. Query a delta table from a serverless SQL pool

DP-203-04-Data Warehouse

Analyze data in a relational data warehouse

mindmap

Demo 8: Explore a relational data warehouse, 45 min

  1. Explore the data warehouse schema
  2. Query the data warehouse tables
  3. Challenge - Analyze reseller sales

Load data into a relational data warehouse

mindmap

Lab 9: Load Data into a Relational Data Warehouse, 30 min

  1. Prepare to load data
  2. Load data warehouse tables
  3. Perform post-load optimization

DP-203-05-Pipelines

Build a data pipeline in Azure Synapse Analytics

mindmap

Lab 10: Build a data pipeline in Azure Synapse Analytics, 45 min

  1. View source and destination data stores
  2. Implement a pipeline
  3. Debug the Data Flow
  4. Publish and run the pipeline

Use Spark Notebooks in an Azure Synapse Pipeline

mindmap

Lab 11: Use an Apache Spark notebook in a pipeline, 30 min

  1. Run a Spark notebook interactively
  2. Run the notebook in a pipeline

DP-203-06-HTAP

Plan hybrid transactional and analytical processing using Azure Synapse Analytics

mindmap

Implement Azure Synapse Link with Azure Cosmos DB

mindmap

Lab 14: Use Azure Synapse Link for Azure Cosmos DB, 35 min

  1. Configure Synapse Link in Azure Cosmos DB
  2. Configure Synapse Link in Azure Synapse Analytics
  3. Query Azure Cosmos DB from Azure Synapse Analytics

Implement Azure Synapse Link for SQL

mindmap

Demo 15: Use Azure Synapse Link for SQL, 35 min

  1. Configure Azure SQL Database
  2. Explore the transactional database
  3. Configure Azure Synapse Link

DP-203-07-Stream Analytics

Get started with Azure Stream Analytics

mindmap

Demo 17: Get started with Azure Stream Analytics, 15 min

rm -r dp-203 -f
git clone https://github.com/MicrosoftLearning/dp-203-azure-data-engineer dp-203
cd dp-203/Allfiles/labs/17
code setup.txt
az eventhubs namespace authorization-rule keys list --name RootManageSharedAccessKey --namespace-name events-michals --resource-group rg-dp-203 | jq '.primaryConnectionString'
(ctrl+s, ctrl+q)
cp setup.txt orderclient.js
npm install @azure/event-hubs | Out-Null
node ~/dp-203/Allfiles/labs/17/orderclient
  1. View the streaming data source
  2. Create an Azure Stream Analytics job
  3. Create an input for the event stream
  4. Create an output for the blob store
  5. Create a query
  6. Run the streaming job

Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytics

mindmap

Lab 18: Ingest realtime data with Azure Stream Analytics and Azure Synapse Analytics, 45 min

  1. Ingest streaming data into a dedicated SQL pool
  2. Summarize streaming data in a data lake

Visualize real-time data with Azure Stream Analytics and Power BI

mindmap

Demo 19: Create a realtime report with Azure Stream Analytics and Microsoft Power BI, 45 min

  1. Create a Power BI workspace
  2. Use Azure Stream Analytics to process streaming data
  3. Visualize the streaming data in Power BI

DP-203-08-Purview

Introduction to Microsoft Purview

mindmap

Integrate Microsoft Purview and Azure Synapse Analytics

mindmap

Lab 22: Use Microsoft Purview with Azure Synapse Analytics, 40 min

  1. Catalog Azure Synapse Analytics data assets in Microsoft Purview
  2. Integrate Microsoft Purview with Azure Synapse Analytics

DP-203-09-Databricks

Explore Azure Databricks

mindmap

Demo 23: Explore Azure Databricks, 30 min

  1. Create a cluster
  2. Use Spark to analyze a data file
  3. Create and query a database table

Use Apache Spark in Azure Databricks

mindmap

Lab 24: Use Spark in Azure Databricks, 45 min

  1. Provision an Azure Databricks workspace
  2. Create a cluster
  3. Explore data using a notebook

Run Azure Databricks Notebooks with Azure Data Factory

mindmap

Demo 25: Use Delta Lake in Azure Databricks, 40 min

  1. Provision an Azure Databricks workspace
  2. Create a cluster
  3. Explore data using a notebook

Demo 26: Use a SQL Warehouse in Azure Databricks, 30 min

  1. View and start a SQL Warehouse
  2. Create a database
  3. Create a table
  4. Create a query
  5. Create a dashboard

Demo 27: Automate an Azure Databricks Notebook with Azure Data Factory, 40 min

  1. Import a notebook
  2. Enable Azure Databricks integration with Azure Data Factory
  3. Use a pipeline to run the Azure Databricks notebook

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published