How not to ingest a file more than once? #355
Unanswered
alberto-lanfranco-storebrand
asked this question in
Q&A
Replies: 1 comment 3 replies
-
Yes, need to document this better. Based on file timestamp: source: azure
target: my_db
defaults:
update_key: _sling_loaded_at # <-- tells sling to use the file timestamp for comparison
object: my_schema.{stream_file_name}
streams:
"path/to/my/folder/*.csv":
env:
SLING_LOADED_AT_COLUMN: unix Based on a column in a file (not what you're asking, will scan all files again, but stream only rows after latest max(update_key)): source: local
target: postgres
defaults:
mode: incremental
update_key: create_dt
primary_key: id
object: public.incremental_csv
target_options:
adjust_column_type: true
streams:
cmd/sling/tests/files/test1.csv:
cmd/sling/tests/files/test1.upsert.csv: |
Beta Was this translation helpful? Give feedback.
3 replies
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
-
Hello,
I'm currently implementing Sling in my architecture.
I'm ingesting from an Azure blob storage with a lot of big csv files, but at each iteration I only want to ingest the new files since last execution.
What is the best practice on how to implement this in Sling?
I have a feeling in might involve the
_SLING_STREAM_URL
column, andupdate_key
in the replication file, but I don't know exactly how to make it work.Beta Was this translation helpful? Give feedback.
All reactions