Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Feature/dlp [WIP] #68

Merged
merged 159 commits into from
Feb 10, 2022
Merged

Feature/dlp [WIP] #68

merged 159 commits into from
Feb 10, 2022

Conversation

eugenemiretsky
Copy link
Contributor

Adding support for DLP. Will add more details docs in a bit, but on a high level. For both ODS and HDS

  1. Check if it is time to run DLP - runs on first ingestion + schedule (no point to run every day, and it costs too much)
  2. Run DLP scan using a provided DLP inspection template and wait for the results to get written to a temp table in BQ
  3. Read the results, parse them to determine if any of the columns are sensitive and update the table schema with column policy tags if required.

Also did some other cleanup

  1. Added docker-compose and provided instruction to run it
  2. Added dags to the dag folder that read from a publicly accessible GCS bucket
  3. Added unit tests for parsing dags from AF-8 Operators for LandingZone to BQ for ODS #2
  4. Cleaned up some of the GCSSource code (just whatever I needed to get the sample dags running)
  5. Made clustering key optional (currently table creation fails if the clustering keys are empty)

@eugenemiretsky eugenemiretsky merged commit 4e58257 into main Feb 10, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants