Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Doc] Update 22.08 documentation #6216

Merged
merged 12 commits into from
Aug 4, 2022
10 changes: 5 additions & 5 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ conditions within the computation itself the result may not be the same each tim
run. This is inherent in how the plugin speeds up the calculations and cannot be "fixed." If a query
joins on a floating point value, which is not wise to do anyways, and the value is the result of a
floating point aggregation then the join may fail to work properly with the plugin but would have
worked with plain Spark. As of 22.06 this is behavior is enabled by default but can be disabled with
worked with plain Spark. Starting from 22.06 this is behavior is enabled by default but can be disabled with
the config
[`spark.rapids.sql.variableFloatAgg.enabled`](configs.md#sql.variableFloatAgg.enabled).

Expand Down Expand Up @@ -807,7 +807,7 @@ leads to restrictions:
* Float values cannot be larger than `1e18` or smaller than `-1e18` after conversion.
* The results produced by GPU slightly differ from the default results of Spark.

As of 22.06 this conf is enabled, to disable this operation on the GPU when using Spark 3.1.0 or
Starting from 22.06 this conf is enabled, to disable this operation on the GPU when using Spark 3.1.0 or
later, set
[`spark.rapids.sql.castFloatToDecimal.enabled`](configs.md#sql.castFloatToDecimal.enabled) to `false`

Expand All @@ -819,7 +819,7 @@ Spark 3.1.0 the MIN and MAX values were floating-point values such as `Int.MaxVa
starting with 3.1.0 these are now integral types such as `Int.MaxValue` so this has slightly
affected the valid range of values and now differs slightly from the behavior on GPU in some cases.

As of 22.06 this conf is enabled, to disable this operation on the GPU when using Spark 3.1.0 or later, set
Starting from 22.06 this conf is enabled, to disable this operation on the GPU when using Spark 3.1.0 or later, set
[`spark.rapids.sql.castFloatToIntegralTypes.enabled`](configs.md#sql.castFloatToIntegralTypes.enabled)
to `false`.

Expand All @@ -831,7 +831,7 @@ The GPU will use different precision than Java's toString method when converting
types to strings. The GPU uses a lowercase `e` prefix for an exponent while Spark uses uppercase
`E`. As a result the computed string can differ from the default behavior in Spark.

As of 22.06 this conf is enabled by default, to disable this operation on the GPU, set
Starting from 22.06 this conf is enabled by default, to disable this operation on the GPU, set
[`spark.rapids.sql.castFloatToString.enabled`](configs.md#sql.castFloatToString.enabled) to `false`.

### String to Float
Expand All @@ -845,7 +845,7 @@ default behavior in Apache Spark is to return `+Infinity` and `-Infinity`, respe

Also, the GPU does not support casting from strings containing hex values.

As of 22.06 this conf is enabled by default, to enable this operation on the GPU, set
Starting from 22.06 this conf is enabled by default, to enable this operation on the GPU, set
[`spark.rapids.sql.castStringToFloat.enabled`](configs.md#sql.castStringToFloat.enabled) to `false`.

### String to Date
Expand Down
2 changes: 1 addition & 1 deletion docs/demo/Databricks/generate-init-script.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"cell_type":"code",
"source":[
"dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-22.06.0-cuda11.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.06.0/rapids-4-spark_2.12-22.06.0-cuda11.jar\n\"\"\", True)"
"dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-22.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar\n\"\"\", True)"
],
"metadata":{

Expand Down
41 changes: 41 additions & 0 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,47 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.

## Release v22.08.0
Hardware Requirements:

The plugin is tested on the following architectures:

GPU Models: NVIDIA V100, T4 and A2/A10/A30/A100 GPUs

Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, Rocky Linux 8

CUDA & NVIDIA Drivers*: 11.x & v450.80.02+

Apache Spark 3.1.1, 3.1.2, 3.1.3, 3.2.0, 3.2.1, 3.3.0, Databricks 9.1 ML LTS or 10.4 ML LTS Runtime and GCP Dataproc 2.0

Python 3.6+, Scala 2.12, Java 8

*Some hardware may have a minimum driver version greater than v450.80.02+. Check the GPU spec sheet
for your hardware's minimum driver version.

*For Cloudera and EMR support, please refer to the
[Distributions](./FAQ.md#which-distributions-are-supported) section of the FAQ.

### Release Notes
New functionality and performance improvements for this release include:
* Rocky Linux 8 support
* Ability to build Spark RAPIDS jar using JDK 11
* ZSTD parquet read support
* Binary read support from parquet
* Apache iceberg 0.14 support
* Array function support: array_intersect, array_union, array_except and arrays_overlap
* Function from_json support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_json will be in 22.10 ( PR #6211 )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all above.

* Support nth_value, first and last in windowing function
* Alluxio auto mount for AWS S3 buckets
* Qualification tool:
* SQL level qualification
* Add application details view

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

## Release v22.06.0
Hardware Requirements:

Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/getting-started-databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ cluster.
```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-22.06.0.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-22.08.0.jar:/databricks/spark/python
```

7. Once you’ve added the Spark config, click “Confirm and Restart”.
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/gpu_dataproc_packages_ubuntu_sample.sh
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ EOF
systemctl start dataproc-cgroup-device-permissions
}

readonly DEFAULT_SPARK_RAPIDS_VERSION="22.06.0"
readonly DEFAULT_SPARK_RAPIDS_VERSION="22.08.0"
readonly DEFAULT_CUDA_VERSION="11.0"
readonly DEFAULT_XGBOOST_VERSION="1.6.1"
readonly SPARK_VERSION="3.0"
Expand Down