-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Change SQL connection reuse strategy #1604
Conversation
We were seeing database timeouts when running the upgrade procedure. It was thought that ubccr#1602 fixed some of the problems, but actually it didn't fix anything since the `DB::factory('datawarehouse');` function can return a stale database handle. Calling it multiple times will not result in a "fresh" handle, you just get the same one (which could be stale). This commit has three changes: - fix the DatabasesMigration to run the talbe existance checks first then run each migration step as a seperate EtlV2 run - Change the EtlV2 data endpoint so that the initial connect call will guarantee to get a fresh database connection and not a stale existing one. (looking at the code design it may be that the original author expected the DB::factory() function to actually return a 'fresh' connection since the connection details are cached in the class). - change the Maintenance/ExecuteSql.php to use a fresh connection for each file. The same connection is still reused for different statements within a file. These changes were tested as follows: in the docker set then mysql timeout to 30s: ``` echo "wait_timeout = 30" >> /etc/my.cnf.d/server.cnf ``` then apply the following patch to deliberately make a couple of sql file runs take a "long" time: ``` diff --git a/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/mod_shredder/update_storage_datetimes.sql b/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/mod_shredder/update_storage_datetimes.sql index cb7c496a..28efba12 100644 --- a/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/mod_shredder/update_storage_datetimes.sql +++ b/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/mod_shredder/update_storage_datetimes.sql @@ -2,3 +2,4 @@ UPDATE mod_shredder.staging_storage_usage SET dt = CONCAT(DATE(dt), 'T', TIME_FORMAT(dt, "%T"), 'Z'); +SELECT SLEEP(45); diff --git a/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/modw_cloud/update_image_index.sql b/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/modw_cloud/update_image_index.sql index 6b3cd725..b3cd67b2 100644 --- a/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/modw_cloud/update_image_index.sql +++ b/configuration/etl/etl_sql.d/migrations/9.5.0-10.0.0/modw_cloud/update_image_index.sql @@ -44,3 +44,4 @@ DROP INDEX image_resource_idx ON modw_cloud.image; DROP INDEX image_resource_idx ON modw_cloud.instance_data; UNLOCK TABLES; +SELECT SLEEP(45); ``` when you run the original code you get the following error: ``` 2022-02-01 21:00:26 [notice] Finished processing section 'xdmod.storage-table-definition-update-9-5-0_10-0-0' SQLSTATE[HY000]: General error: 2006 MySQL server has gone away ``` and you don't see that error with this change in place. Also as a different test set the database timeout to 10 seconds (and do not add the SLEEP(45) patch). The original code gives this error: ``` 2022-02-01 21:56:45 [warning] Stopping ETL due to exception in xdmod.migration-9_5_0-10_0_0.alter-shredded_job_slurm (ETL\Maintenance\ExecuteSql) xdmod.migration-9_5_0-10_0_0.alter-shredded_job_slurm (ETL\Maintenance\ExecuteSql): Error executing SQL Exception: 'SQLSTATE[HY000]: General error: 2006 MySQL server has gone away' ``` which is also not seen with this change in place.
@eiffel777 I've slightly changed the migration file here so that the table exists checks are all performed first. Please can you confirm this is ok to do and that you didn't intentionally have a table exists check that needed to be run after a migration had completed. |
@jpwhite4 That change is fine. The more important thing is that the order that the pipelines run remains the same, specifically |
Description
We were seeing database timeouts when running the upgrade procedure. It was thought that #1602 fixed some of the problems, but actually it didn't fix anything since the
DB::factory('datawarehouse');
function can return a stale database handle. Calling it multiple times will not result in a "fresh" handle, you just get the same one (which could be stale).This commit has three changes:
Tests performed
These changes were tested as follows:
in the docker set then mysql timeout to 30s:
then apply the following patch to deliberately make a couple of sql file runs
take a "long" time:
when you run the original code you get the following error:
and you don't see that error with this change in place.
Also as a different test set the database timeout to 10 seconds (and do not add the SLEEP(45) patch). The original code gives this error:
which is also not seen with this change in place.