-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
add ability to switch off/on creation of parquet dwh #1074
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1074 +/- ##
============================================
- Coverage 52.03% 51.99% -0.04%
Complexity 653 653
============================================
Files 89 89
Lines 5396 5402 +6
Branches 708 710 +2
============================================
+ Hits 2808 2809 +1
- Misses 2325 2328 +3
- Partials 263 265 +2 ☔ View full report in Codecov by Sentry. |
2f13701
to
9ffab3a
Compare
cc @bashir2 |
cc @bashir2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mozzy11 for this change. I had a look and made some comments but in general I feel we need to think deeper about the implications of skipping Parquet file generation; I feel there are still scenarios not covered in your change (beyond what I have commented below) but need to think more about this.
pipelines/batch/src/main/java/com/google/fhir/analytics/FhirEtlOptions.java
Outdated
Show resolved
Hide resolved
@@ -25,6 +25,8 @@ fhirdata: | |||
# fhirServerUrl: "http://hapi-server:8080/fhir" | |||
dbConfig: "config/hapi-postgres-config_local.json" | |||
dwhRootPrefix: "/dwh/controller_DWH" | |||
#Whether to create a Parquet DWH or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably drop this comment as we have a reference to pipelines/controller/config/application.yaml
at the top for all comments.
@@ -0,0 +1,59 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that most of the content of this directory is a copy of config
dir. Can you reuse those config files and only override the values that you need to change, e.g., with command-line arguments?
id: 'Bring down controller and Spark containers for FHIR server to FHIR server sync' | ||
args: [ '-f', './docker/compose-controller-spark-sql-single.yaml', 'down' ,'-v'] | ||
|
||
# Resetting Sink FHIR server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that these new tests are adding 15+ minutes to the e2e test run-time; I think changes in PR #947 had a similar effect too and we should try to reduce this. How about doing the sync test in one of the scenarios only and see how much it reduces the run-time? Maybe we can have only one scenario where sync is on and Parquet generation is off. Please also make sure that the incremental mode is tested in that scenario.
@@ -200,7 +202,7 @@ public void setup() throws SQLException, ProfileException { | |||
oAuthClientSecret, | |||
fhirContext); | |||
fhirSearchUtil = new FhirSearchUtil(fetchUtil); | |||
if (!Strings.isNullOrEmpty(parquetFile)) { | |||
if (createParquetDwh) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a sanity check if createParquetDwh
is true but parquetFile
is null or empty?
@@ -138,6 +143,7 @@ void validateProperties() { | |||
logger.info("Using FHIR-search mode since dbConfig is not set."); | |||
} | |||
Preconditions.checkState(!createHiveResourceTables || !thriftserverHiveConfig.isEmpty()); | |||
Preconditions.checkState(!createHiveResourceTables || createParquetDwh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are more config sanity check that needs to be done, e.g., when we are not generating Parquet files, generation of views should be disabled as well.
@@ -213,6 +219,8 @@ PipelineConfig createBatchOptions() { | |||
Instant.now().toString().replace(":", "-").replace("-", "_").replace(".", "_"); | |||
options.setOutputParquetPath(dwhRootPrefix + TIMESTAMP_PREFIX + timestampSuffix); | |||
|
|||
options.setCreateParquetDwh(createParquetDwh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the incremental pipeline when this flag is turned off. In particular does the mergerPipelines
here work fine? I think we need extra logic in PipelineManager
to handle these edge cases.
Co-authored-by: Bashir Sadjad <bashir@google.com>
…lOptions.java Co-authored-by: Bashir Sadjad <bashir@google.com>
Fixes #1073
Add ability to switch of creation of a parquet DWH in case of syncying betwen FHIR servers
Added a flag to the
createParquetDwh
to the controller to switch off/on creation of parquet DWHE2E test
Adedd e2e tests for synching from a hapi fhir sever to another using the pipeline controller for both FULL and INCREMENTAL modes while swtching on/off creation of parquet DWH
TESTED:
Testes Locally syncying between FHIR server while the parquet DWH is switched off/on
Checklist: I completed these to help reviewers :)
I have read and will follow the review process.
I am familiar with Google Style Guides for the language I have coded in.
No? Please take some time and review Java and Python style guides.
My IDE is configured to follow the Google code styles.
No? Unsure? -> configure your IDE.
I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)
I ran
mvn clean package
right before creating this pull request and added all formatting changes to my commit.All new and existing tests passed.
My pull request is based on the latest changes of the master branch.
No? Unsure? -> execute command
git pull --rebase upstream master