-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[xgboost] Remove default parameters #226
Conversation
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
6ab0e24
to
1d4c6a6
Compare
why we need to remove these default params? |
should we also remove the same config in python-codes/notebooks-codes to keep consistent? |
@nvliyuan The python code is deprecated for the new xgboost pyspark. I will submit the whole APP. so right now it's ok to just remove the default parameters of scala. |
@GaryShen2008 @nvliyuan I just checked the notebook of scala and all the default parameters have already been removed. |
Will this change impact the number in https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/README.md? |
I found a useless variable in mortgage/notebooks/scala/mortgage-ETL.ipynb: |
I think there might be a slight affection for that figure since from what I tested, the performance for the default xgboost parameters should be better than the parameters hardcoded by the sample for the mortgage case (I didn't test taxi). BTW, the README seems just to list a final performance picture, which xgboost/rapids version and which parameters of xgboost, and which conf of spark can be used to repro it? Seems it's hard for user to repro this number. BTW, seems this performance is tested by nvidia-internal-xgboost version instead of based on dmlc/xgboost. Please correct me. Since I remember I just had the PR switching the dependency from nvidia version to dmlc version. So from that point of view, I think removing parameters is not bad. On the other hand, The performance should be more meaningful for default parameters instead of tuning much, since some parameters may like GPU more, and vice versa. |
@GaryShen2008 I think we can merge this pr first, then I will file an issue and retest the xgboost performance, if the affection is not large, I will draft a pr to update(if there is no affection then no need to draft pr); if the affection is unexpected, we can revert this pr. Any insights? |
@wbo4958 yes, the performance is based on xgboost4j_3.0-1.4.2-0.3.0; but changing the config still might have an affection; I will double check the perf result |
@GaryShen2008 any suggestions? |
Since Yuan accepts it, I'm ok with this PR. |
* Init 22.10.0-SNAPSHOT (#214) Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu> * update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#222) Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> * update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#224) Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> * Update default cmake to 3.23.X in udf exmaple dockerfile (#227) Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu> * [xgboost] Remove default parameters (#226) * remove the default parameters for xgboost examples * remove the default parameters Signed-off-by: Bobby Wang <wbo4958@gmail.com> * remove unused variables for mortgage-ETL Signed-off-by: Bobby Wang <wbo4958@gmail.com> * add more details/notes for the mortgage perforamcne tests (#229) * add more details/notes for the mortgage perforamcne tests Signed-off-by: liyuan <yuali@nvidia.com> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> Signed-off-by: liyuan <yuali@nvidia.com> Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Enable automerge from 22.10 to 22.12 (#230) Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu> * update versions for v22.10 release (#235) Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com> Co-authored-by: Peixin <pxli@nyu.edu> Co-authored-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
* merge dev-2210 branch to Main branch (#237) * Init 22.10.0-SNAPSHOT (#214) Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu> * update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#222) Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> * update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#224) Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> * Update default cmake to 3.23.X in udf exmaple dockerfile (#227) Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu> * [xgboost] Remove default parameters (#226) * remove the default parameters for xgboost examples * remove the default parameters Signed-off-by: Bobby Wang <wbo4958@gmail.com> * remove unused variables for mortgage-ETL Signed-off-by: Bobby Wang <wbo4958@gmail.com> * add more details/notes for the mortgage perforamcne tests (#229) * add more details/notes for the mortgage perforamcne tests Signed-off-by: liyuan <yuali@nvidia.com> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Update examples/XGBoost-Examples/README.md Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> Signed-off-by: liyuan <yuali@nvidia.com> Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Enable automerge from 22.10 to 22.12 (#230) Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: Peixin Li <pxli@nyu.edu> * update versions for v22.10 release (#235) Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com> Co-authored-by: Peixin <pxli@nyu.edu> Co-authored-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> * Adding Databricks tool demo notebooks for qualification and profiling (#249) * Adding Databricks tool demo notebooks for basic qualification and profiling Signed-off-by: Matt Ahrens <matthewahrens@gmail.com> * Adding README updates for the databricks tool notebooks Signed-off-by: Matt Ahrens <matthewahrens@gmail.com> * Fixing typo in OUTPUT_DIR in qual notebook Signed-off-by: Matt Ahrens <matthewahrens@gmail.com> Signed-off-by: Matt Ahrens <matthewahrens@gmail.com> * Revert "Adding Databricks tool demo notebooks for qualification and profiling (#249)" (#251) This reverts commit 5f77070. * Updated code on Retail Analytics Spark RAPIDS * Update DataGen.ipynb * Update DataGen.ipynb * Updated code * Update Retail Analytics.ipynb * Update README.md --------- Signed-off-by: Peixin Li <pxli@nyu.edu> Signed-off-by: liyuan <yuali@nvidia.com> Signed-off-by: Bobby Wang <wbo4958@gmail.com> Signed-off-by: Matt Ahrens <matthewahrens@gmail.com> Co-authored-by: liyuan <84758614+nvliyuan@users.noreply.github.com> Co-authored-by: Jenkins Automation <70000568+nvauto@users.noreply.github.com> Co-authored-by: Peixin <pxli@nyu.edu> Co-authored-by: Bobby Wang <wbo4958@gmail.com> Co-authored-by: Hao Zhu <9665750+viadea@users.noreply.github.com> Co-authored-by: Matt Ahrens <matthewahrens@gmail.com> Co-authored-by: liyuan <yuali@nvidia.com>
This PR tries to remove the default XGBoost parameters.