Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Some python and notebook versions of examples have diverged #357

Open
eordentlich opened this issue Feb 2, 2024 · 2 comments
Open

Some python and notebook versions of examples have diverged #357

eordentlich opened this issue Feb 2, 2024 · 2 comments
Assignees

Comments

@eordentlich
Copy link
Collaborator

Describe the bug
Not sure it is the case for all examples, but for the mortgage ETL + XGBoost example there are some non-trivial discrepancies. Example:
python script has udfs: https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/python/com/nvidia/spark/examples/mortgage/etl.py#L22-L23
while the notebook(s) implement these using Spark SQL directly:
https://github.com/NVIDIA/spark-rapids-examples/blob/main/examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL.ipynb?short_path=2af22cf#L454-L478
There are some other differences. Looks like the scripts may be lagging the notebooks.

Steps/Code to reproduce bug
N/A

Expected behavior
Notebooks and python script versions should ideally be aligned (or at least documented why they don't).

Environment details (please complete the following information)
N/A

@GaryShen2008
Copy link
Collaborator

@nvliyuan Do you remember who wrote these examples? I can't recall the reason, but there should be.

@nvliyuan
Copy link
Collaborator

Yes, the same example with different implementations should keep the same logic, will draft a pr to fix it.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants