Skip to content

miridih-jslee02/amazon-sagemaker-feature-store-end-to-end-workshop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SageMaker Feature Store Workshop

workshop

Please Note: This repository includes a github submodule (ml-lineage-helper) which must also be cloned for certain notebook examples to run properly. Therefore, you must include the --recursive option when running git clone, like this:

~$ git clone --recursive https://github.com/aws-samples/amazon-sagemaker-feature-store-end-to-end-workshop.git

If you have already cloned the repository and need to pull the submodule code, you can run this command from the top-level directory of the repo:

~$ git submodule update --init --recursive

You should notice these lines of output during the clone of the submodule:

Submodule 'ml-lineage-helper' (https://github.com/aws-samples/ml-lineage-helper.git) registered for path 'ml-lineage-helper' Cloning into '/home/sagemaker-user/workshops/amazon-sagemaker-feature-store-end-to-end-workshop/ml-lineage-helper'...

  • Module 1: Feature Store Foundations

    • Topics:
      • Dataset introduction
      • Creating a feature group
      • Ingesting a Pandas DataFrame into Online/Offline feature store
      • GetRecord, ListFeatureGroups, DescribeFeatureGroup
  • Module 2: Working with the Offline Store

    • Topics:
      • Look at data in S3 console (Offline feature store)
      • Athena query for dataset extraction (via Athena console)
      • Athena query for dataset extraction (programmatically using SageMaker SDK)
      • Extract a training dataset and storing in S3
      • Apache Iceberg and offline file compaction
  • Module 3: Model training and batch scroing using extracted dataset from the Offline feature store

    • Topics:
      • Training a model using feature sets derived from the Offline feature store
      • Perform batch scoring using feature sets derived from Offline feature store in CSV and Parquet format
  • Module 4: Leveraging the Online feature store

    • Topics:
      • Get record from Online feature store during single inference
      • Get multiple records from Online store using BatchGet during batch inference
  • Module 5: Scalable batch ingestion using distributed processing

    • Topics:
      • Batch ingestion via SageMaker Processing job
      • Batch ingestion via SageMaker Processing PySpark job
      • SageMaker Data Wrangler export job to feature store
      • Use the Feature Store Spark Connector to incrementally materialize the latest features to the online store.
  • Module 6: Automate feature engineering pipelines with Amazon SageMaker

    • Topics:
      • Leverage Amazon SageMaker Data Wrangler, Amazon SageMaker Feature Store, and Amazon SageMaker Pipelines alongside AWS Lambda to automate feature transformation.
  • Module 7: Feature Monitoring

    • Topics:
      • Feature Group Monitoring Preparation, DataBrew Dataset Creation
      • Run Feature Group Monitoring using DataBrew Profile Job
      • Visualization of Feature Group Statistics and Feature Drift
  • Module 8: Create, Delete and Query ML Lineage Tracking with Amazon SageMaker

    • Topics:
      • Create/Delete ML Lineage.
      • Query ML Lineage by SageMaker Model Name or SageMaker Inference Endpoint
      • Given a SageMaker Model name or artifact ARN, you can find associated Feature Groups
      • Given a Feature Group ARN, and find associated SageMaker Models
      • Given a data source's S3 URI or Artifact ARN, you can find associated SageMaker Feature Groups
      • Given a Feature Group ARN, and find associated data sources
  • Module 9: Feature Security

    • Topics:
      • Setup of granular access control to Offline Feature Store using AWS Lake Formation
      • Testing of the access control using SageMaker Feature Store SDK
      • Cross-account feature groups sharing using AWS Resource Access Manager
  • Module 10: Compliance

    • Topics:
      • Hard Delete records from Feature Store using DeleteRecord API and Iceberg compaction procedures

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 82.0%
  • Python 18.0%