Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[motherduck-demo] it's not clear how a user could configure this project to run locally #21

Open
maxestorr opened this issue Apr 27, 2024 · 16 comments
Assignees

Comments

@maxestorr
Copy link
Contributor

maxestorr commented Apr 27, 2024

Description

There are two configuration files that have to be modified to allow this project to run locally (./.env and ./reports/sources/dagster_hybrid_demo/connection.yaml), and this isn't very clear to the user.

Firstly, setting DUCKDB_DATABASE="local.duckdb" as is explained in the .env.example file will result in the dbt assets failing to materialize as I believe dbt assumes this path is relative to its runtime directory, meaning it can't locate the database. Instead, it's mandatory to use the absolute path in the .env file like so: DUCKDB_DATABASE="/home/maxisq/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute/local.duckdb".

Secondly, the EVIDENCE_SOURCE__DAGSTER_HYBRID_DEMO__FILENAME environment variable isn't referenced in the evidence project files isn't respected by evidence meaning it will fail to find the local database and dagster will fail to materialize the evidence assets. Because of this it's necessary to modify the connection.yaml file in the evidence sources directory, where you unintuitively have to use a relative path, using an absolute paths will result in evidence modifying the path provided at run time which causes the same error.

Proposed Solution

Firstly we can change the instructions in .env.example to properly indicate how to configure the project for local use, using absolute paths in the .env file.

Secondly we can either find a way to configure evidence such that it reads from the EVIDENCE_SOURCE__DAGSTER_HYBRID_DEMO__FILENAME environment variable properly, or give users instructions to modify the connection.yaml file to point to the local duckdb file using a relative path.

@maxestorr
Copy link
Contributor Author

Will submit a PR fix tomorrow.

@maxestorr
Copy link
Contributor Author

From the Evidence documentation it's clear that the evidence environment variable is meant to be automatically picked up by evidence, but it's not respecting it for some reason, may look into this further.

https://docs.evidence.dev/reference/cli/#environment-variables

@cmpadden
Copy link
Collaborator

From the Evidence documentation it's clear that the evidence environment variable is meant to be automatically picked up by evidence, but it's not respecting it for some reason, may look into this further.

https://docs.evidence.dev/reference/cli/#environment-variables

Regarding the evidence environment variable issue -- I also experienced this in that Evidence was not respecting the environment variables being set despite it matching their documentation. I got around this by defining the connection in the /settings page of the Evidence project, but it would be much more ideal if the environment variable could be used, and the user did not need to perform any manual steps.

The evidence team is very responsive on Slack, and I'll shoot a message there way to get their input.

@cmpadden
Copy link
Collaborator

I've started a thread here if you would like to follow along:
https://evidencedev.slack.com/archives/C023LRB9Z40/p1714315726200319

@cmpadden cmpadden self-assigned this Apr 28, 2024
@maxestorr
Copy link
Contributor Author

Unfortunately I'd have to ask the evidence team for an invite:

image

@maxestorr
Copy link
Contributor Author

I noticed from their documentation the env variable is case sensitive, it's defined as EVIDENCE_SOURCE__DAGSTER_HYBRID_DEMO__FILENAME where maybe it should be EVIDENCE_SOURCE__dagster_hybrid_demo__filename, I'm testing a few other changes at the moment but will try that shortly.

@maxestorr
Copy link
Contributor Author

Example below, seems like the case sensitivity may be the issue. Tested this running the commands myself but will now try changing the env variable in the .env file to see if dagster properly exports it before running evidence.

(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ export EVIDENCE_SOURCE__DAGSTER_HYBRID_DEMO__FILENAME="../../../data/db/local.duckdb"
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ echo $EVIDENCE_SOURCE__DAGSTER_HYBRID_DEMO__FILENAME
../../../data/db/local.duckdb
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ make evidence-sources
npm --prefix ./reports run sources

> my-evidence-project@0.0.1 sources
> evidence sources

(node:28943) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Processing dagster_hybrid_demo
Missing required duckdb option 'filename' (/home/maxisq/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute/reports/sources/dagster_hybrid_demo)
  all_ducks ✖ Catalog Error: Table with name all_birds does not exist!
Did you mean "system.information_schema.tables"?
LINE 20: from main.all_birds
              ^
  top_ducks_by_region ✖ Catalog Error: Table with name top_ducks_by_region does not exist!
Did you mean "duckdb_indexes"?
LINE 8: from main.top_ducks_by_region
             ^
  top_ducks_by_state ✖ Catalog Error: Table with name top_ducks_by_state does not exist!
Did you mean "duckdb_tables"?
LINE 7: from main.top_ducks_by_state
             ^
  top_ducks_by_year ✖ Catalog Error: Table with name top_ducks_by_year does not exist!
Did you mean "duckdb_types"?
LINE 9: from main.top_ducks_by_year
             ^
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ export EVIDENCE_SOURCE__DAGSTER_HYBRID_DEMO__FILENAME="../../../data/db/local.duckdb"
export EVIDENCE_SOURCE__dagster_hybrid_demo__filename="../../../data/db/local.duckdb"
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ echo $EVIDENCE_SOURCE__dagster_hybrid_demo__filename
../../../data/db/local.duckdb
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ make evidence-sources
npm --prefix ./reports run sources

> my-evidence-project@0.0.1 sources
> evidence sources

(node:32066) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Processing dagster_hybrid_demo
  all_ducks ✔ Finished. 12782 rows
  top_ducks_by_region ✔ Finished. 30 rows
  top_ducks_by_state ✔ Finished. 891 rows
  top_ducks_by_year ✔ Finished. 1738 rows
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$

@maxestorr
Copy link
Contributor Author

maxestorr commented Apr 28, 2024

Okay setting the environment variables like so in the .env file fixed it, will submit a PR with proper instructions:

EVIDENCE_SOURCE__dagster_hybrid_demo__type="duckdb"
EVIDENCE_SOURCE__dagster_hybrid_demo__filename="../../../data/db/local.duckdb"

@cmpadden
Copy link
Collaborator

Okay setting the environment variables like so in the .env file fixed it, will submit a PR with proper instructions:


EVIDENCE_SOURCE__dagster_hybrid_demo__type="duckdb"

EVIDENCE_SOURCE__dagster_hybrid_demo__filename="../../../data/db/local.duckdb"

Nice! Fantastic job getting to the bottom of that. That really simplifies getting up and running.

@maxestorr
Copy link
Contributor Author

maxestorr commented Apr 28, 2024

cmpadden it'd still be really cool if the evidence team provided some advice on absolute paths, they'd be preferable because having to specify the environment variable as a relative path from the reports/sources/project_name/ folder isn't super user friendly, their docs say that your datasource should be stored in that same directory but that's not really realistic for projects where you're using multiple tools.

For reference this is what happens if you use an absolute path:

(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ export EVIDENCE_SOURCE__dagster_hybrid_demo__filename="/home/maxisq/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute/data/db/local.duckdb"
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ make evidence-sources
npm --prefix ./reports run sources

> my-evidence-project@0.0.1 sources
> evidence sources

(node:31327) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Processing dagster_hybrid_demo
[!] dagster_hybrid_demo failed to connect; Catalog Error: Cannot open database "/home/maxisq/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute/reports/sources/dagster_hybrid_demo/home/maxisq/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute/data/db/local.duckdb" in read-only mode: database does not exist
make: *** [Makefile:23: evidence-sources] Error 1
(venv) maxisq@MS-V-Desktop:~/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute$ echo $EVIDENCE_SOURCE__dagster_hybrid_demo__filename
/home/maxisq/dev/dagster-devrel-fork/motherduck-dagster-hybrid-compute/data/db/local.duckdb

You can see that for some reason evidence injects the source directory into the path, which is redundant if you're doing an absolute path, probably something to do with their docs saying that they assume the datasource will be stored in that same source directory so their default method is just to prepend that path themselves rather than checking the one provided first.

maxestorr added a commit to maxestorr/devrel-project-demos that referenced this issue Apr 28, 2024
export case sensitive evidence environment variables.

also provided instructions in .env.example on how to configure project
for local use.

removed `dagster_hybrid_demo.` from schema name in FROM statement as the
tables created by duckdb exist in the `main` schema, was causing error
where evidence couldn't find the data sources.
@cmpadden
Copy link
Collaborator

Unfortunately I'd have to ask the evidence team for an invite:

image

Missed this message, but you should be able to join at this link - https://slack.evidence.dev/

@maxestorr
Copy link
Contributor Author

maxestorr commented Apr 28, 2024

Dropping a timestamped video link here as an option for solving this issue, not 100% certain this could solve it just thinking out loud.

We could specify a LOCAL and PROD dictionary for environment variables that may enable us to seamlessly deploy, just an idea.

https://www.youtube.com/live/kpco5u1zG9Y?si=zfSZIrlZR3e4pW1d?t=30m14s

@maxestorr
Copy link
Contributor Author

Seems like the IO managers and having separate environment variables for local vs prod may be the way after watching that vid, will test it out myself when i get time outside my day job.

@cmpadden
Copy link
Collaborator

Seems like the IO managers and having separate environment variables for local vs prod may be the way after watching that vid, will test it out myself when i get time outside my day job.

Just one note here is that we are de-emphasizing the use of IO managers, so I don't think I'd necessarily want to start leveraging that in this demo. But I think we should be able to do the local/prod environment variables with our resources.

@maxestorr
Copy link
Contributor Author

Oh interesting, why is it that you're de-emphasising their use? Not to worry will work out another way.

@cmpadden
Copy link
Collaborator

Oh interesting, why is it that you're de-emphasising their use? Not to worry will work out another way.

They're still a totally valid approach, but I think they can be a bit "magic" in how they operator, so they're maybe not the most intuitive demonstration for people who are new to Dagster. I'm not opposed to using them, but want to make sure the code remains easy to understand!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants