NVIDIA · jlowe · Jul 26, 2022 · Apr 26, 2022 · May 19, 2022 · May 19, 2022
diff --git a/NOTICE b/NOTICE
@@ -1,6 +1,8 @@
 RAPIDS plugin for Apache Spark
 Copyright (c) 2019-2022, NVIDIA CORPORATION
 
+--------------------------------------------------------------------------------
+
 // ------------------------------------------------------------------
 // NOTICE file corresponding to the section 4d of The Apache License,
 // Version 2.0, in this case for
@@ -12,6 +14,34 @@ Copyright 2014 and onwards The Apache Software Foundation
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
+--------------------------------------------------------------------------------
+
+Apache Iceberg
+Copyright 2017-2022 The Apache Software Foundation
+
+This product includes software developed at
+The Apache Software Foundation (http://www.apache.org/).
+
+--------------------------------------------------------------------------------
+
+This project includes code from Kite, developed at Cloudera, Inc. with
+the following copyright notice:
+
+| Copyright 2013 Cloudera Inc.
+|
+| Licensed under the Apache License, Version 2.0 (the "License");
+| you may not use this file except in compliance with the License.
+| You may obtain a copy of the License at
+|
+|   http://www.apache.org/licenses/LICENSE-2.0
+|
+| Unless required by applicable law or agreed to in writing, software
+| distributed under the License is distributed on an "AS IS" BASIS,
+| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+| See the License for the specific language governing permissions and
+| limitations under the License.
+
+--------------------------------------------------------------------------------
 
 This product bundles various third-party components under other open source licenses.
 

diff --git a/NOTICE-binary b/NOTICE-binary
@@ -12,7 +12,35 @@ Copyright 2014 and onwards The Apache Software Foundation
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
----------------------------------------------------------------------
+--------------------------------------------------------------------------------
+
+Apache Iceberg
+Copyright 2017-2022 The Apache Software Foundation
+
+This product includes software developed at
+The Apache Software Foundation (http://www.apache.org/).
+
+--------------------------------------------------------------------------------
+
+This project includes code from Kite, developed at Cloudera, Inc. with
+the following copyright notice:
+
+| Copyright 2013 Cloudera Inc.
+|
+| Licensed under the Apache License, Version 2.0 (the "License");
+| you may not use this file except in compliance with the License.
+| You may obtain a copy of the License at
+|
+|   http://www.apache.org/licenses/LICENSE-2.0
+|
+| Unless required by applicable law or agreed to in writing, software
+| distributed under the License is distributed on an "AS IS" BASIS,
+| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+| See the License for the specific language governing permissions and
+| limitations under the License.
+
+--------------------------------------------------------------------------------
+
 UCF Consortium - Unified Communication X (UCX)
 
 Copyright (c) 2014-2015      UT-Battelle, LLC. All rights reserved.

diff --git a/docs/additional-functionality/iceberg-support.md b/docs/additional-functionality/iceberg-support.md
@@ -0,0 +1,62 @@
+---
+layout: page
+title: Apache Iceberg Support
+parent: Additional Functionality
+nav_order: 7
+---
+
+# Apache Iceberg Support
+
+The RAPIDS Accelerator for Apache Spark provides limited support for Apache Iceberg tables.
+This document details the Apache Iceberg features that are supported.
+
+## Apache Iceberg Versions
+
+The RAPIDS Accelerator supports Apache Iceberg 0.13.x. Earlier versions of Apache Iceberg are
+not supported.
+
+## Reading Tables
+
+### Metadata Queries
+
+Reads of Apache Iceberg metadata, i.e.: the `history`, `snapshots`, and other metadata tables
+associated with a table, will not be GPU-accelerated. The CPU will continue to process these
+metadata-level queries.
+
+### Row-level Delete and Update Support
+
+Apache Iceberg supports row-level deletions and updates. Tables that are using a configuration of
+`write.delete.mode=merge-on-read` are not supported.
+
+### Schema Evolution
+
+Columns that are added and removed at the top level of the table schema are supported. Columns
+that are added or removed within struct columns are not supported.
+
+### Data Formats
+
+Apache Iceberg can store data in various formats. Each section below details the levels of support
+for each of the underlying data formats.
+
+#### Parquet
+
+Data stored in Parquet is supported with the same limitations for loading data from raw Parquet
+files. See the [Input/Output](../supported_ops.md#inputoutput) documentation for details. The
+following compression codecs applied to the Parquet data are supported:
+- gzip (Apache Iceberg default)
+- snappy
+- uncompressed
+- zstd
+
+#### ORC
+
+The RAPIDS Accelerator does not support Apache Iceberg tables using the ORC data format.
+
+#### Avro
+
+The RAPIDS Accelerator does not support Apache Iceberg tables using the Avro data format.
+
+## Writing Tables
+
+The RAPIDS Accelerator for Apache Spark does not accelerate Apache Iceberg writes. Writes
+to Iceberg tables will be processed by the CPU.
diff --git a/docs/configs.md b/docs/configs.md
@@ -82,6 +82,8 @@ Name | Description | Default Value
 <a name="sql.format.avro.reader.type"></a>spark.rapids.sql.format.avro.reader.type|Sets the Avro reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spark.rapids.sql.multiThreadedRead.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spark.rapids.sql.multiThreadedRead.numThreads and spark.rapids.sql.format.avro.multiThreadedRead.maxNumFilesParallel to control the number of threads and amount of memory used. By default this is set to AUTO so we select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids.cloudSchemes.|AUTO
 <a name="sql.format.csv.enabled"></a>spark.rapids.sql.format.csv.enabled|When set to false disables all csv input and output acceleration. (only input is currently supported anyways)|true
 <a name="sql.format.csv.read.enabled"></a>spark.rapids.sql.format.csv.read.enabled|When set to false disables csv input acceleration|true
+<a name="sql.format.iceberg.enabled"></a>spark.rapids.sql.format.iceberg.enabled|When set to false disables all Iceberg acceleration|true
+<a name="sql.format.iceberg.read.enabled"></a>spark.rapids.sql.format.iceberg.read.enabled|When set to false disables Iceberg input acceleration|true
 <a name="sql.format.json.enabled"></a>spark.rapids.sql.format.json.enabled|When set to true enables all json input and output acceleration. (only input is currently supported anyways)|false
 <a name="sql.format.json.read.enabled"></a>spark.rapids.sql.format.json.read.enabled|When set to true enables json input acceleration|false
 <a name="sql.format.orc.enabled"></a>spark.rapids.sql.format.orc.enabled|When set to false disables all orc input and output acceleration|true

diff --git a/docs/supported_ops.md b/docs/supported_ops.md
@@ -18307,6 +18307,49 @@ dates or timestamps, or for a lack of type coercion support.
 <td> </td>
 </tr>
 <tr>
+<th rowSpan="2">Iceberg</th>
+<th>Read</th>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td>S</td>
+<td><em>PS<br/>UTC is only supported TZ for TIMESTAMP</em></td>
+<td>S</td>
+<td>S</td>
+<td> </td>
+<td><b>NS</b></td>
+<td> </td>
+<td><em>PS<br/>UTC is only supported TZ for child TIMESTAMP;<br/>unsupported child types BINARY, UDT</em></td>
+<td><em>PS<br/>UTC is only supported TZ for child TIMESTAMP;<br/>unsupported child types BINARY, UDT</em></td>
+<td><em>PS<br/>UTC is only supported TZ for child TIMESTAMP;<br/>unsupported child types BINARY, UDT</em></td>
+<td><b>NS</b></td>
+</tr>
+<tr>
+<th>Write</th>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td> </td>
+<td><b>NS</b></td>
+<td> </td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+<td><b>NS</b></td>
+</tr>
+<tr>
 <th rowSpan="2">JSON</th>
 <th>Read</th>
 <td>S</td>
@@ -18436,3 +18479,7 @@ dates or timestamps, or for a lack of type coercion support.
 <td><b>NS</b></td>
 </tr>
 </table>
+
+### Apache Iceberg Support
+Support for Apache Iceberg has additional limitations. See the
+[Apache Iceberg Support](additional-functionality/iceberg-support.md) document.