-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Support for loading data from stages #35
Comments
Hi @tharwan this might be possible ... could you share example SQL statements you'd like support for, both creating the stage and loading from it? |
Loading from a stage looks something like this: COPY INTO my_table
FROM @my_stage/my_file
FILE_FORMAT = (FIELD_DELIMITER = ';') Creating a stage is not so relevant in our case but looks like this: CREATE STAGE my_ext_stage
URL='azure://myaccount.blob.core.windows.net/load/files/'
STORAGE_INTEGRATION = myint; maybe related: tobymao/sqlglot#2463 |
Thanks! From the above I gather you want support for CSV files. I think creating the stage would be needed for fakesnow to know where to find the files to load. |
another option could be to just look locally for a file and ignore the @my_stage part. e.g. COPY INTO my_table
FROM @my_stage/my_file
FILE_FORMAT = (FIELD_DELIMITER = ';') would translate to COPY my_table
FROM my_file
(FORMAT CSV, DELIMITER ';'); |
we recently found ourselves wanting this at $job. i may look into a relatively simple initial implementation somewhat soon. My personal plan/preference would be for fakesnow to simply internally track creations of stages inside And then translate COPY statements against the service in question into boto3 calls + insert statements (which ought to work for all the supported cloud providers iirc) It'd then be the job of the user to set up like i said, i'll probably be looking into the feasibility of this strat soon, but i expect sqlglot to be the short term limiting factor in terms of minimally parsing the integration/stage/copy statement syntaxes respectively. |
If the use-case were just testing then I thought for the first iteration of this we could just support a local file via a file:// url. It would avoid the need to set up actual s3/azure storage or moto to mock s3. Would that work? |
my personal usecase requires a storage integration/s3, but presumably my implementing that will be easier if there was already general support against a staged file:// |
Yes good point - if we transform a COPY INTO into a INSERT INTO .. SELECT in duckdb, then we could support both a file:// and s3:// url via the same mechanism relatively easily, using duckdb's support for s3 via the https extension. If you're keen to tackle this a PR would be welcome! |
Hi,
Would it be possible to support loading of data from a stage? (https://docs.snowflake.com/en/user-guide/data-load-considerations-load)
I am not sure from the docs if this would be even inside the scope of this project.
However it would help us a lot to test our complete ETL workflow.
The text was updated successfully, but these errors were encountered: