Truncate staging tables after ingestion #778
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change makes it so the cloud staging tables, openstack_staging_table and generic_cloud_staging_table, are truncated at the end of the jobs-cloud-extract-openstack pipeline. The reason this table was not truncated before is because the event_id generated on the staging table is used as part of a unique key on the event table to prevent duplicates. Instead of generating the event_id on the staging table it is now generated on the event table and uniqueness on the event table is now based on the resource id, instance id, event time, event type and host id.
There are also changes to the event_asset table to prevent duplicate rows from being added to it and changing the queries to account for the event_id being created on the event table instead of the staging table.
The action for populating the instance_data table also has to be moved to its own action instead of residing in the OpenStackEventIngestor and GenericCloudEventIngestor. This is because the instance_data action needs to run after the event table and the event_id is generated.
Tests performed
All component, integration and regression tests were run and passed. Manually tested in docker to make sure values were correct and consistent when using multiple resources. Also tested that duplicate data is not loaded into the event table.
Types of changes
Checklist: