Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Doc] Update 22.08 documentation #6216

Merged
merged 12 commits into from
Aug 4, 2022
Merged

[Doc] Update 22.08 documentation #6216

merged 12 commits into from
Aug 4, 2022

Conversation

viadea
Copy link
Collaborator

@viadea viadea commented Aug 3, 2022

Update the 22.08 documentation.

  1. Add release notes for 22.08
  2. Change the diagram from Spark 3.0 to Spark 3.x

Note: In this PR, I did not add the download links for 22.08 jars or modify the Tool jar links to avoid link check failure issues.
I will do it right before the final merge PR in another PR.

Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
@viadea viadea added the documentation Improvements or additions to documentation label Aug 3, 2022
Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
viadea and others added 2 commits August 3, 2022 13:22
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
jlowe
jlowe previously approved these changes Aug 3, 2022
Copy link
Collaborator

@sameerz sameerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other changes:

docs/download.md Outdated
* Binary read support from parquet
* Apache Iceberg 0.13 support
* Array function support: array_intersect, array_union, array_except and arrays_overlap
* Function from_json support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_json will be in 22.10 ( PR #6211 )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all above.

@sameerz
Copy link
Collaborator

sameerz commented Aug 4, 2022

Can we update https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/docs/configs.md (which is a generated file) to remove --conf spark.rapids.sql.incompatibleOps.enabled=true since it is now true by default?

@nvliyuan
Copy link
Collaborator

nvliyuan commented Aug 4, 2022

if a customer setup a databricks cluster, then log in to one cluster node and run some codes through pyspark/spark-shell scripts with rapids-plugin, it will complain that the spark shim loader can not be found, because it is up to spark.databricks.clusterUsageTags.sparkVersion to decide which databricks shim should be load, so customers need to set an additional config --conf spark.rapids.shims-provider-override=com.nvidia.spark.rapids.shims.spark3XXdb.SparkShimServiceProvider; this is set automatically on databricks notebook but not databricks scripts, maybe we should document it in databricks getting started guide

Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
viadea and others added 4 commits August 4, 2022 11:43
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Co-authored-by: Sameer Raheja <sameerz@users.noreply.github.com>
Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
@viadea
Copy link
Collaborator Author

viadea commented Aug 4, 2022

Good information. But I feel Databricks users may not really want to SSH to the nodes and run spark-shell.

Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
@viadea
Copy link
Collaborator Author

viadea commented Aug 4, 2022

Can we update https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/docs/configs.md (which is a generated file) to remove --conf spark.rapids.sql.incompatibleOps.enabled=true since it is now true by default?

@sameerz I modified below 1 files to use another parameter spark.rapids.sql.concurrentGpuTasks to replace spark.rapids.sql.incompatibleOps.enabled:
./sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

Also fixed one typo : spark -> spark-shell

For ./spark2-sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala, Tom will use a different PR to fix it.

viadea added 2 commits August 4, 2022 12:59
…/rapids/RapidsConf.scala

Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
Signed-off-by: Hao Zhu <9665750+viadea@users.noreply.github.com>
Copy link
Collaborator

@sameerz sameerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I am going to remove the [skip ci] since there are changes to RapidsConf.scala.

@sameerz sameerz changed the title [Doc]Update 22.08 documentation[skip ci] [Doc]Update 22.08 documentation Aug 4, 2022
@sameerz sameerz changed the title [Doc]Update 22.08 documentation [Doc] Update 22.08 documentation Aug 4, 2022
@sameerz
Copy link
Collaborator

sameerz commented Aug 4, 2022

build

@viadea viadea merged commit c6bb1d9 into NVIDIA:branch-22.08 Aug 4, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants