Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[xgboost] Remove default parameters #226

Merged
merged 3 commits into from
Sep 7, 2022

Conversation

wbo4958
Copy link
Collaborator

@wbo4958 wbo4958 commented Aug 31, 2022

This PR tries to remove the default XGBoost parameters.

@wbo4958 wbo4958 force-pushed the remove-default-para branch from 6ab0e24 to 1d4c6a6 Compare August 31, 2022 12:17
@nvliyuan nvliyuan self-requested a review September 2, 2022 02:32
@nvliyuan
Copy link
Collaborator

nvliyuan commented Sep 2, 2022

why we need to remove these default params?

@nvliyuan
Copy link
Collaborator

nvliyuan commented Sep 2, 2022

should we also remove the same config in python-codes/notebooks-codes to keep consistent?

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 2, 2022

@nvliyuan The python code is deprecated for the new xgboost pyspark. I will submit the whole APP. so right now it's ok to just remove the default parameters of scala.

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 6, 2022

@GaryShen2008 @nvliyuan I just checked the notebook of scala and all the default parameters have already been removed.

@GaryShen2008
Copy link
Collaborator

Will this change impact the number in https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/README.md?
If so, why do we change the config?

@nvliyuan
Copy link
Collaborator

nvliyuan commented Sep 6, 2022

@GaryShen2008 @nvliyuan I just checked the notebook of scala and all the default parameters have already been removed.

I found a useless variable in mortgage/notebooks/scala/mortgage-ETL.ipynb:val commParamMapcan you also help remove it? thanks

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 6, 2022

Will this change impact the number in https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/README.md? If so, why do we change the config?

I think there might be a slight affection for that figure since from what I tested, the performance for the default xgboost parameters should be better than the parameters hardcoded by the sample for the mortgage case (I didn't test taxi).

BTW, the README seems just to list a final performance picture, which xgboost/rapids version and which parameters of xgboost, and which conf of spark can be used to repro it? Seems it's hard for user to repro this number.

BTW, seems this performance is tested by nvidia-internal-xgboost version instead of based on dmlc/xgboost. Please correct me. Since I remember I just had the PR switching the dependency from nvidia version to dmlc version. So from that point of view, I think removing parameters is not bad. On the other hand, The performance should be more meaningful for default parameters instead of tuning much, since some parameters may like GPU more, and vice versa.

@nvliyuan
Copy link
Collaborator

nvliyuan commented Sep 6, 2022

@GaryShen2008 I think we can merge this pr first, then I will file an issue and retest the xgboost performance, if the affection is not large, I will draft a pr to update(if there is no affection then no need to draft pr); if the affection is unexpected, we can revert this pr. Any insights?

@nvliyuan
Copy link
Collaborator

nvliyuan commented Sep 6, 2022

BTW, seems this performance is tested by nvidia-internal-xgboost version instead of based on dmlc/xgboost. Please correct me.

@wbo4958 yes, the performance is based on xgboost4j_3.0-1.4.2-0.3.0; but changing the config still might have an affection; I will double check the perf result

@wbo4958
Copy link
Collaborator Author

wbo4958 commented Sep 7, 2022

@GaryShen2008 any suggestions?

@GaryShen2008
Copy link
Collaborator

GaryShen2008 commented Sep 7, 2022

@GaryShen2008 any suggestions?

Since Yuan accepts it, I'm ok with this PR.
The sample here is mainly to demonstrate the capability of xgboost training with GPU. It's not for the benchmark directly.
Anyway, removing some parameters makes it simpler.

@wbo4958 wbo4958 merged commit ea2b7a4 into NVIDIA:branch-22.10 Sep 7, 2022
@wbo4958 wbo4958 deleted the remove-default-para branch September 7, 2022 02:38
nvliyuan added a commit that referenced this pull request Oct 28, 2022
* Init 22.10.0-SNAPSHOT (#214)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>

* update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#222)

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

* update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#224)

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

* Update default cmake to 3.23.X in udf exmaple dockerfile (#227)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>

* [xgboost] Remove default parameters (#226)

* remove the default parameters for xgboost examples

* remove the default parameters

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

* remove unused variables for mortgage-ETL

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

* add more details/notes for the mortgage perforamcne tests (#229)

* add more details/notes for the mortgage perforamcne tests

Signed-off-by: liyuan <yuali@nvidia.com>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

Signed-off-by: liyuan <yuali@nvidia.com>
Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Enable automerge from 22.10 to 22.12 (#230)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>

* update versions for v22.10 release (#235)

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: Peixin Li <pxli@nyu.edu>
Signed-off-by: liyuan <yuali@nvidia.com>
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com>
Co-authored-by: Peixin <pxli@nyu.edu>
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
nvliyuan added a commit that referenced this pull request Feb 15, 2023
* merge dev-2210 branch to Main branch (#237)

* Init 22.10.0-SNAPSHOT (#214)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>

* update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#222)

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

* update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#224)

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

* Update default cmake to 3.23.X in udf exmaple dockerfile (#227)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>

* [xgboost] Remove default parameters (#226)

* remove the default parameters for xgboost examples

* remove the default parameters

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

* remove unused variables for mortgage-ETL

Signed-off-by: Bobby Wang <wbo4958@gmail.com>

* add more details/notes for the mortgage perforamcne tests (#229)

* add more details/notes for the mortgage perforamcne tests

Signed-off-by: liyuan <yuali@nvidia.com>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

Signed-off-by: liyuan <yuali@nvidia.com>
Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Enable automerge from 22.10 to 22.12 (#230)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>

* update versions for v22.10 release (#235)

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: liyuan <yuali@nvidia.com>

Signed-off-by: Peixin Li <pxli@nyu.edu>
Signed-off-by: liyuan <yuali@nvidia.com>
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com>
Co-authored-by: Peixin <pxli@nyu.edu>
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>

* Adding Databricks tool demo notebooks for qualification and profiling (#249)

* Adding Databricks tool demo notebooks for basic qualification and profiling

Signed-off-by: Matt Ahrens <matthewahrens@gmail.com>

* Adding README updates for the databricks tool notebooks

Signed-off-by: Matt Ahrens <matthewahrens@gmail.com>

* Fixing typo in OUTPUT_DIR in qual notebook

Signed-off-by: Matt Ahrens <matthewahrens@gmail.com>

Signed-off-by: Matt Ahrens <matthewahrens@gmail.com>

* Revert "Adding Databricks tool demo notebooks for qualification and profiling (#249)" (#251)

This reverts commit 5f77070.

* Updated code on Retail Analytics Spark RAPIDS

* Update DataGen.ipynb

* Update DataGen.ipynb

* Updated code

* Update Retail Analytics.ipynb

* Update README.md

---------

Signed-off-by: Peixin Li <pxli@nyu.edu>
Signed-off-by: liyuan <yuali@nvidia.com>
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
Signed-off-by: Matt Ahrens <matthewahrens@gmail.com>
Co-authored-by: liyuan <84758614+nvliyuan@users.noreply.github.com>
Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com>
Co-authored-by: Peixin <pxli@nyu.edu>
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
Co-authored-by: Matt Ahrens <matthewahrens@gmail.com>
Co-authored-by: liyuan <yuali@nvidia.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants