-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
bug: activate_version deletes rows inserted in earlier batches (and deletes despite hard_delete not being set) #2103
Comments
One thing I was unsure of is if it's expected for activate_version to be called multiple times during EL. Is there a chance the tap is the one misbehaving? (although the default still would be wrong in that case) |
Hi @msg555! Thanks for logging and for the detail investigation into the issue. This does seem like a significant problem that could lead to unexpected data loss.
Yeah, I think that's right.
Trying to think what users would experience by changing the default to not hard delete. Would they just start seeing that data is now upserted/soft-deleted instead of removed and would that cause issues for downstream data modeling? I still think we should apply your suggested patch but we should call out this change in the release notes of both the SDK and downstream targets.
I think there would be at most one activate_version message for each stream in the tap, but I could be wrong. What's the tap in question? cc @pnadolny13 Curious if you've seen this problem come up with target-snowflake. And fwiw target-postgres seems to do the right thing. |
Ah, apologies if I've identified the wrong repository. Perhaps the code you linked that's directly in target-snowflake is what is relevant in my case; it still seems to use the <= rather than < operator. Perhaps I should create a new issue over there.
The tap is pipelinewise-tap-mysql==1.5.6 - name: tap-mysql
variant: transferwise
pip_url: pipelinewise-tap-mysql~=1.5.0
config:
host: ${TAP_MYSQL_HOST}
port: ${TAP_MYSQL_PORT}
user: ${TAP_MYSQL_USER}
password: ${TAP_MYSQL_PASSWORD}
filter_dbs: my_db
session_sqls:
- SET @@session.max_statement_time=0
- SET @@session.net_read_timeout=3600
- SET @@session.net_write_timeout=3600
- SET @@session.wait_timeout=28800
- SET @@session.innodb_lock_wait_timeout=3600
select:
... |
The
Thanks! The tap does seem to emit a single message per stream: https://github.com/transferwise/pipelinewise-tap-mysql/blob/572e08a3576702895e2a9edae188773ec9d7a096/tap_mysql/sync_strategies/full_table.py#L137-L138 |
On a second thought, an issue and PR are probably needed for target-snowflake since failing tests are blocking bumping the SDK: MeltanoLabs/target-snowflake#105 |
@edgarrmondragon I havent noticed to me honest. I dont use activate version for anything really. I agree though that seeing hard_delete defaulting to true seems weird. I also dont think In terms of breaking changes I also agree that its better to break someone by getting them back to the behavior they expect vs leaving the bug. |
Singer SDK Version
0.30.0
Is this a regression?
Python Version
3.9
Bug scope
Targets (data type handling, batching, SQL object generation, etc.)
Operating System
Linux
Description
I was attempting to transition to using meltanolabs-target-snowflake (version 0.5.1) from the pipelinewise variant.
When I run EL from a tap_mysql (pipelinewise-tap-mysql) to this target on a table that has 21k rows, I find that after EL completes the destination table only has 1k rows instead. If I turn off
hard_delete
I instead end up with the full 21k rows. From some investigation it appears that this code snippet is the problem:sdk/singer_sdk/sinks/sql.py
Lines 381 to 389 in 299acc0
I see two problems here:
Proposed patch might look like
Loader config
Code
No response
The text was updated successfully, but these errors were encountered: