Skip to content

Introducing Reconcile databricks app #1509

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,4 @@ remorph_transpile/
/linter/src/main/antlr4/library/gen/
.databricks-login.json
/core/src/main/antlr4/com/databricks/labs/remorph/parsers/*/gen/
src/databricks/labs/remorph/app/.env
55 changes: 55 additions & 0 deletions src/databricks/labs/remorph/app/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Pre requisites:

1. Setup the CLI.
2. Clone the Remorph repo
3. Checkout feature/reconcile-databricks-app
4. Open app.yaml in the app module

![YAML Path](assets/yaml_path.png)

5. Update the REMORPH_METADATA_SCHEMA value with your Remorph reconcile schema (Use the same schema you used while installing Remorph).
6. If you dont have remorph, feel free to use any other <catalog.schema> where you have access.




# Steps to deploy app

1. Create the app

>> databricks apps create <reconcile-app-name>

2. Sync the app directory to local path and upload the files to Workspace.First navigate to app directory

>> cd src/databricks/labs/remorph/app/


Then, upload the app files to workspace. Do this in a new terminal tab an leave it open to sync.

>> databricks sync --watch . /Workspace/Users/user..name@databricks.com/ <reconcile-app-name>


3. Deploy the app

>> databricks apps deploy <reconcile-app-name>
--source-code-path /Workspace/Users/user..name@databricks.com/ <reconcile-app-name>





# Fix permission issues
(TODO: Do this programmatically)

1. Copy service principle ID of your app. Goto Compute > Apps Your app > Authorization tab

![Fix permission](assets/app_service_principle.png)



2Provide this service principle access to your remorph Schema. Data editor access should be fine:

![Catalog permission](assets/catalog_permission.png)

Once done, launch the app and you should see a message that some tables have been created. If yes then app is successfully deployed.

21 changes: 21 additions & 0 deletions src/databricks/labs/remorph/app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from src.resources.web_components.homepage import render_homepage
from src.services.spark_service import initialize_app

selected_option = render_homepage()
initialize_app()

# Routing
if selected_option == "Home":
from src.routes.home import main as page_main
elif selected_option == "Recon Executor":
from src.routes.recon_executor import main as page_main
elif selected_option == "Secret Manager":
from src.routes.secret_manager import main as page_main
elif selected_option == "Config Manager":
from src.routes.config_manager import main as page_main
elif selected_option == "Dashboard":
from src.routes.dashboard import main as page_main
elif selected_option == "About":
from src.routes.about import main as page_main

page_main()
18 changes: 18 additions & 0 deletions src/databricks/labs/remorph/app/app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
command: [
"streamlit",
"run",
"app.py"
]

env:
- name: "REMORPH_METADATA_SCHEMA"
value: "kushagra_remorph.reconcile"
- name: STREAMLIT_BROWSER_GATHER_USAGE_STATS
value: "false"
- name: DATABRICKS_CLUSTER_ID
value: "0709-132523-cnhxf2p6"
- name: RECON_CONFIG_TABLE_NAME
value: "recon_app_config_table"
- name: RECON_JOB_RUN_DETAILS_TABLE_NAME
value: "job_run_details"

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions src/databricks/labs/remorph/app/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
streamlit==1.41.0
databricks-connect==15.4.0
streamlit-dynamic-filters
streamlit-aggrid
altair==5.0.0
streamlit-dynamic-filters
pandas~=2.2.2
databricks-sdk~=0.41.0
streamlit_option_menu
python-dotenv~=1.0.1
pandas-stubs
Empty file.
Empty file.
17 changes: 17 additions & 0 deletions src/databricks/labs/remorph/app/src/config/settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import os
from dotenv import load_dotenv
from databricks.connect.session import DatabricksSession


class Settings:
def __init__(self):
load_dotenv()
self.DATABRICKS_CLUSTER_ID = os.getenv('DATABRICKS_CLUSTER_ID')
self.REMORPH_METADATA_SCHEMA = os.getenv('REMORPH_METADATA_SCHEMA')
self.RECON_CONFIG_TABLE_NAME = os.getenv('RECON_CONFIG_TABLE_NAME')
self.RECON_JOB_RUN_DETAILS_TABLE_NAME = os.getenv('RECON_JOB_RUN_DETAILS_TABLE_NAME')
# self.LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
self.spark = DatabricksSession.builder.clusterId(self.DATABRICKS_CLUSTER_ID).getOrCreate()


settings = Settings()
Empty file.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
CREATE TABLE IF NOT EXISTS {RECON_CONFIG_TABLE_NAME}
(
config_id
INT
PRIMARY
KEY,
source_catalog
STRING
NOT
NULL,
source_schema
STRING
NOT
NULL,
target_catalog
STRING
NOT
NULL,
target_schema
STRING
NOT
NULL,
tables
ARRAY<
STRUCT<
source_name
:
STRING,
target_name
:
STRING,
drop_columns
:
ARRAY<
STRING>,
join_columns
:
ARRAY<
STRING>,
transformations
:
ARRAY<
STRUCT<
column_name
:
STRING,
source
:
STRING,
target
:
STRING
>>,
jdbc_reader_options
:
MAP<
STRING,
STRING>
>>
)
USING DELTA
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
CREATE TABLE IF NOT EXISTS {RECON_JOB_RUN_DETAILS_TABLE_NAME}
(
job_run_id
BIGINT
PRIMARY
KEY,
start_time
TIMESTAMP,
end_time
TIMESTAMP,
user_name
STRING,
duration
BIGINT, -- Store duration in seconds
source_dialect
STRING,
workspace_id
STRING,
workspace_name
STRING,
status
STRING,
exception_message
STRING,
created_at
TIMESTAMP,
updated_at
TIMESTAMP
) USING DELTA;
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select * from {REMORPH_METADATA_SCHEMA}.recon_app_config_table;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT DISTINCT recon_type FROM {REMORPH_METADATA_SCHEMA}.details
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
WITH tmp AS (SELECT recon_table_id,
inserted_ts,
EXPLODE(data) AS schema_data
FROM {REMORPH_METADATA_SCHEMA}.details
WHERE
recon_type = 'schema'
)
SELECT main.recon_id,
main.source_table.`catalog` AS source_catalog,
main.source_table.`schema` AS source_schema,
main.source_table.table_name AS source_table_name,
IF(
ISNULL(source_catalog),
CONCAT_WS('.', source_schema, source_table_name),
CONCAT_WS('.', source_catalog, source_schema, source_table_name)
) AS source_table,
main.target_table.`catalog` AS target_catalog,
main.target_table.`schema` AS target_schema,
main.target_table.table_name AS target_table_name,
CONCAT(
main.target_table.catalog,
'.',
main.target_table.schema,
'.',
main.target_table.table_name
) AS target_table,
schema_data['source_column'] AS source_column,
schema_data['source_datatype'] AS source_datatype,
schema_data['databricks_column'] AS databricks_column,
schema_data['databricks_datatype'] AS databricks_datatype,
schema_data['is_valid'] AS is_valid
FROM {REMORPH_METADATA_SCHEMA}.main AS main
INNER JOIN tmp
ON main.recon_table_id = tmp.recon_table_id
ORDER BY
tmp.inserted_ts DESC,
main.recon_id,
main.target_table
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT DISTINCT source_type FROM {REMORPH_METADATA_SCHEMA}.main
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
SELECT main.recon_id,
main.source_type,
main.report_type,
main.source_table.`catalog` AS source_catalog,
main.source_table.`schema` AS source_schema,
main.source_table.table_name AS source_table_name,
IF(
ISNULL(source_catalog),
CONCAT_WS('.', source_schema, source_table_name),
CONCAT_WS('.', source_catalog, source_schema, source_table_name)
) AS source_table,
main.target_table.`catalog` AS target_catalog,
main.target_table.`schema` AS target_schema,
main.target_table.table_name AS target_table_name,
CONCAT(
main.target_table.catalog,
'.',
main.target_table.schema,
'.',
main.target_table.table_name
) AS target_table,
metrics.run_metrics.status AS status,
metrics.run_metrics.exception_message AS exception,
metrics.recon_metrics.row_comparison.missing_in_source AS missing_in_source,
metrics.recon_metrics.row_comparison.missing_in_target AS missing_in_target,
metrics.recon_metrics.column_comparison.absolute_mismatch AS absolute_mismatch,
metrics.recon_metrics.column_comparison.threshold_mismatch AS threshold_mismatch,
metrics.recon_metrics.column_comparison.mismatch_columns AS mismatch_columns,
metrics.recon_metrics.schema_comparison AS schema_comparison,
metrics.run_metrics.run_by_user AS executed_by,
main.start_ts AS start_ts,
main.end_ts AS end_ts
FROM {REMORPH_METADATA_SCHEMA}.main AS main
INNER JOIN {REMORPH_METADATA_SCHEMA}.metrics AS metrics
ON main.recon_table_id = metrics.recon_table_id
ORDER BY
metrics.inserted_ts DESC,
main.recon_id,
main.target_table.table_name
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
SELECT main.recon_id AS rec_id,
CAST(main.start_ts AS DATE) AS start_date
FROM {REMORPH_METADATA_SCHEMA}.main AS main
INNER JOIN {REMORPH_METADATA_SCHEMA}.metrics AS metrics
ON main.recon_table_id = metrics.recon_table_id
WHERE
metrics.run_metrics.status = FALSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
SELECT CONCAT_WS('.', main.target_table.catalog, main.target_table.schema, main.target_table.table_name) AS t_table,
CAST(main.start_ts AS DATE) AS start_date
FROM {REMORPH_METADATA_SCHEMA}.main AS main
INNER JOIN {REMORPH_METADATA_SCHEMA}.metrics AS metrics
ON main.recon_table_id = metrics.recon_table_id
WHERE
metrics.run_metrics.status = FALSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
SELECT CONCAT_WS('.', main.target_table.catalog, main.target_table.schema, main.target_table.table_name) AS t_table,
CAST(main.start_ts AS DATE) AS start_date
FROM {REMORPH_METADATA_SCHEMA}.main AS main
INNER JOIN {REMORPH_METADATA_SCHEMA}.metrics AS metrics
ON main.recon_table_id = metrics.recon_table_id
WHERE
metrics.run_metrics.status = TRUE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT DISTINCT run_metrics.run_by_user FROM {REMORPH_METADATA_SCHEMA}.metrics
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading