Skip to content

Commit 53b2bea

Browse files
committed
Move configuration information out of example usage page
1 parent e693ed7 commit 53b2bea

File tree

5 files changed

+178
-133
lines changed

5 files changed

+178
-133
lines changed

datafusion/core/src/lib.rs

+7
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,13 @@ doc_comment::doctest!(
620620
user_guide_example_usage
621621
);
622622

623+
#[cfg(doctest)]
624+
doc_comment::doctest!(
625+
"../../../docs/source/user-guide/crate-configuration.md",
626+
user_guide_crate_configuration
627+
);
628+
629+
623630
#[cfg(doctest)]
624631
doc_comment::doctest!(
625632
"../../../docs/source/user-guide/configs.md",

docs/source/index.rst

+6-2
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,16 @@ DataFusion offers SQL and Dataframe APIs, excellent
4141
CSV, Parquet, JSON, and Avro, extensive customization, and a great
4242
community.
4343

44-
To get started with examples, see the `example usage`_ section of the user guide and the `datafusion-examples`_ directory.
44+
To get started, see
4545

46-
See the `developer’s guide`_ for contributing and `communication`_ for getting in touch with us.
46+
* The `example usage`_ section of the user guide and the `datafusion-examples`_ directory.
47+
* The `library user guide`_ for examples of using DataFusion's extension APIs
48+
* The `developer’s guide`_ for contributing and `communication`_ for getting in touch with us.
4749

4850
.. _example usage: user-guide/example-usage.html
4951
.. _datafusion-examples: https://github.com/apache/datafusion/tree/main/datafusion-examples
5052
.. _developer’s guide: contributor-guide/index.html#developer-s-guide
53+
.. _library user guide: library-user-guide/index.html
5154
.. _communication: contributor-guide/communication.html
5255

5356
.. _toc.asf-links:
@@ -79,6 +82,7 @@ See the `developer’s guide`_ for contributing and `communication`_ for getting
7982

8083
user-guide/introduction
8184
user-guide/example-usage
85+
user-guide/crate-configuration
8286
user-guide/cli/index
8387
user-guide/dataframe
8488
user-guide/expressions

docs/source/library-user-guide/index.md

+19-2
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,25 @@
1919

2020
# Introduction
2121

22-
The library user guide explains how to use the DataFusion library as a dependency in your Rust project. Please check out the user-guide for more details on how to use DataFusion's SQL and DataFrame APIs, or the contributor guide for details on how to contribute to DataFusion.
22+
The library user guide explains how to use the DataFusion library as a
23+
dependency in your Rust project and customize its behavior using its extension APIs.
2324

24-
If you haven't reviewed the [architecture section in the docs][docs], it's a useful place to get the lay of the land before starting down a specific path.
25+
Please check out the [user guide] for getting started using
26+
DataFusion's SQL and DataFrame APIs, or the [contributor guide]
27+
for details on how to contribute to DataFusion.
2528

29+
If you haven't reviewed the [architecture section in the docs][docs], it's a
30+
useful place to get the lay of the land before starting down a specific path.
31+
32+
DataFusion is designed to be extensible at all points, including
33+
34+
- [x] User Defined Functions (UDFs)
35+
- [x] User Defined Aggregate Functions (UDAFs)
36+
- [x] User Defined Table Source (`TableProvider`) for tables
37+
- [x] User Defined `Optimizer` passes (plan rewrites)
38+
- [x] User Defined `LogicalPlan` nodes
39+
- [x] User Defined `ExecutionPlan` nodes
40+
41+
[user guide]: ../user-guide/example-usage.md
42+
[contributor guide]: ../contributor-guide/index.md
2643
[docs]: https://docs.rs/datafusion/latest/datafusion/#architecture
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Crate Configuration
21+
22+
This section contains information on how to configure DataFusion in your Rust
23+
project. See the [Configuration Settings] section for a list of options that
24+
control DataFusion's behavior.
25+
26+
[configuration settings]: configs.md
27+
28+
## Add latest non published DataFusion dependency
29+
30+
DataFusion changes are published to `crates.io` according to the [release schedule](https://github.com/apache/datafusion/blob/main/dev/release/README.md#release-process)
31+
32+
If you would like to test out DataFusion changes which are merged but not yet
33+
published, Cargo supports adding dependency directly to GitHub branch:
34+
35+
```toml
36+
datafusion = { git = "https://github.com/apache/datafusion", branch = "main"}
37+
```
38+
39+
Also it works on the package level
40+
41+
```toml
42+
datafusion-common = { git = "https://github.com/apache/datafusion", branch = "main", package = "datafusion-common"}
43+
```
44+
45+
And with features
46+
47+
```toml
48+
datafusion = { git = "https://github.com/apache/datafusion", branch = "main", default-features = false, features = ["unicode_expressions"] }
49+
```
50+
51+
More on [Cargo dependencies](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies)
52+
53+
## Optimized Configuration
54+
55+
For an optimized build several steps are required. First, use the below in your `Cargo.toml`. It is
56+
worth noting that using the settings in the `[profile.release]` section will significantly increase the build time.
57+
58+
```toml
59+
[dependencies]
60+
datafusion = { version = "22.0" }
61+
tokio = { version = "^1.0", features = ["rt-multi-thread"] }
62+
snmalloc-rs = "0.3"
63+
64+
[profile.release]
65+
lto = true
66+
codegen-units = 1
67+
```
68+
69+
Then, in `main.rs.` update the memory allocator with the below after your imports:
70+
71+
```rust ,ignore
72+
use datafusion::prelude::*;
73+
74+
#[global_allocator]
75+
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
76+
77+
#[tokio::main]
78+
async fn main() -> datafusion::error::Result<()> {
79+
Ok(())
80+
}
81+
```
82+
83+
Based on the instruction set architecture you are building on you will want to configure the `target-cpu` as well, ideally
84+
with `native` or at least `avx2`.
85+
86+
```shell
87+
RUSTFLAGS='-C target-cpu=native' cargo run --release
88+
```
89+
90+
## Enable backtraces
91+
92+
By default Datafusion returns errors as a plain message. There is option to enable more verbose details about the error,
93+
like error backtrace. To enable a backtrace you need to add Datafusion `backtrace` feature to your `Cargo.toml` file:
94+
95+
```toml
96+
datafusion = { version = "31.0.0", features = ["backtrace"]}
97+
```
98+
99+
Set environment [variables](https://doc.rust-lang.org/std/backtrace/index.html#environment-variables)
100+
101+
```bash
102+
RUST_BACKTRACE=1 ./target/debug/datafusion-cli
103+
DataFusion CLI v31.0.0
104+
> select row_numer() over (partition by a order by a) from (select 1 a);
105+
Error during planning: Invalid function 'row_numer'.
106+
Did you mean 'ROW_NUMBER'?
107+
108+
backtrace: 0: std::backtrace_rs::backtrace::libunwind::trace
109+
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
110+
1: std::backtrace_rs::backtrace::trace_unsynchronized
111+
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
112+
2: std::backtrace::Backtrace::create
113+
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/backtrace.rs:332:13
114+
3: std::backtrace::Backtrace::capture
115+
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/backtrace.rs:298:9
116+
4: datafusion_common::error::DataFusionError::get_back_trace
117+
at /datafusion/datafusion/common/src/error.rs:436:30
118+
5: datafusion_sql::expr::function::<impl datafusion_sql::planner::SqlToRel<S>>::sql_function_to_expr
119+
............
120+
```
121+
122+
The backtraces are useful when debugging code. If there is a test in `datafusion/core/src/physical_planner.rs`
123+
124+
```
125+
#[tokio::test]
126+
async fn test_get_backtrace_for_failed_code() -> Result<()> {
127+
let ctx = SessionContext::new();
128+
129+
let sql = "
130+
select row_numer() over (partition by a order by a) from (select 1 a);
131+
";
132+
133+
let _ = ctx.sql(sql).await?.collect().await?;
134+
135+
Ok(())
136+
}
137+
```
138+
139+
To obtain a backtrace:
140+
141+
```bash
142+
cargo build --features=backtrace
143+
RUST_BACKTRACE=1 cargo test --features=backtrace --package datafusion --lib -- physical_planner::tests::test_get_backtrace_for_failed_code --exact --nocapture
144+
```
145+
146+
Note: The backtrace wrapped into systems calls, so some steps on top of the backtrace can be ignored

docs/source/user-guide/example-usage.md

-129
Original file line numberDiff line numberDiff line change
@@ -33,29 +33,6 @@ datafusion = "latest_version"
3333
tokio = { version = "1.0", features = ["rt-multi-thread"] }
3434
```
3535

36-
## Add latest non published DataFusion dependency
37-
38-
DataFusion changes are published to `crates.io` according to [release schedule](https://github.com/apache/datafusion/blob/main/dev/release/README.md#release-process)
39-
In case if it is required to test out DataFusion changes which are merged but yet to be published, Cargo supports adding dependency directly to GitHub branch
40-
41-
```toml
42-
datafusion = { git = "https://github.com/apache/datafusion", branch = "main"}
43-
```
44-
45-
Also it works on the package level
46-
47-
```toml
48-
datafusion-common = { git = "https://github.com/apache/datafusion", branch = "main", package = "datafusion-common"}
49-
```
50-
51-
And with features
52-
53-
```toml
54-
datafusion = { git = "https://github.com/apache/datafusion", branch = "main", default-features = false, features = ["unicode_expressions"] }
55-
```
56-
57-
More on [Cargo dependencies](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies)
58-
5936
## Run a SQL query against data stored in a CSV
6037

6138
```rust
@@ -201,109 +178,3 @@ async fn main() -> datafusion::error::Result<()> {
201178
| 1 | 2 |
202179
+---+--------+
203180
```
204-
205-
## Extensibility
206-
207-
DataFusion is designed to be extensible at all points. To that end, you can provide your own custom:
208-
209-
- [x] User Defined Functions (UDFs)
210-
- [x] User Defined Aggregate Functions (UDAFs)
211-
- [x] User Defined Table Source (`TableProvider`) for tables
212-
- [x] User Defined `Optimizer` passes (plan rewrites)
213-
- [x] User Defined `LogicalPlan` nodes
214-
- [x] User Defined `ExecutionPlan` nodes
215-
216-
## Optimized Configuration
217-
218-
For an optimized build several steps are required. First, use the below in your `Cargo.toml`. It is
219-
worth noting that using the settings in the `[profile.release]` section will significantly increase the build time.
220-
221-
```toml
222-
[dependencies]
223-
datafusion = { version = "22.0" }
224-
tokio = { version = "^1.0", features = ["rt-multi-thread"] }
225-
snmalloc-rs = "0.3"
226-
227-
[profile.release]
228-
lto = true
229-
codegen-units = 1
230-
```
231-
232-
Then, in `main.rs.` update the memory allocator with the below after your imports:
233-
234-
```rust ,ignore
235-
use datafusion::prelude::*;
236-
237-
#[global_allocator]
238-
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
239-
240-
#[tokio::main]
241-
async fn main() -> datafusion::error::Result<()> {
242-
Ok(())
243-
}
244-
```
245-
246-
Based on the instruction set architecture you are building on you will want to configure the `target-cpu` as well, ideally
247-
with `native` or at least `avx2`.
248-
249-
```shell
250-
RUSTFLAGS='-C target-cpu=native' cargo run --release
251-
```
252-
253-
## Enable backtraces
254-
255-
By default Datafusion returns errors as a plain message. There is option to enable more verbose details about the error,
256-
like error backtrace. To enable a backtrace you need to add Datafusion `backtrace` feature to your `Cargo.toml` file:
257-
258-
```toml
259-
datafusion = { version = "31.0.0", features = ["backtrace"]}
260-
```
261-
262-
Set environment [variables](https://doc.rust-lang.org/std/backtrace/index.html#environment-variables)
263-
264-
```bash
265-
RUST_BACKTRACE=1 ./target/debug/datafusion-cli
266-
DataFusion CLI v31.0.0
267-
> select row_numer() over (partition by a order by a) from (select 1 a);
268-
Error during planning: Invalid function 'row_numer'.
269-
Did you mean 'ROW_NUMBER'?
270-
271-
backtrace: 0: std::backtrace_rs::backtrace::libunwind::trace
272-
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
273-
1: std::backtrace_rs::backtrace::trace_unsynchronized
274-
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
275-
2: std::backtrace::Backtrace::create
276-
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/backtrace.rs:332:13
277-
3: std::backtrace::Backtrace::capture
278-
at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/backtrace.rs:298:9
279-
4: datafusion_common::error::DataFusionError::get_back_trace
280-
at /datafusion/datafusion/common/src/error.rs:436:30
281-
5: datafusion_sql::expr::function::<impl datafusion_sql::planner::SqlToRel<S>>::sql_function_to_expr
282-
............
283-
```
284-
285-
The backtraces are useful when debugging code. If there is a test in `datafusion/core/src/physical_planner.rs`
286-
287-
```
288-
#[tokio::test]
289-
async fn test_get_backtrace_for_failed_code() -> Result<()> {
290-
let ctx = SessionContext::new();
291-
292-
let sql = "
293-
select row_numer() over (partition by a order by a) from (select 1 a);
294-
";
295-
296-
let _ = ctx.sql(sql).await?.collect().await?;
297-
298-
Ok(())
299-
}
300-
```
301-
302-
To obtain a backtrace:
303-
304-
```bash
305-
cargo build --features=backtrace
306-
RUST_BACKTRACE=1 cargo test --features=backtrace --package datafusion --lib -- physical_planner::tests::test_get_backtrace_for_failed_code --exact --nocapture
307-
```
308-
309-
Note: The backtrace wrapped into systems calls, so some steps on top of the backtrace can be ignored

0 commit comments

Comments
 (0)