Skip to content

Document CREATE EXTERNAL TABLE ... OPTIONS #10451

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
alamb opened this issue May 10, 2024 · 3 comments
Open

Document CREATE EXTERNAL TABLE ... OPTIONS #10451

alamb opened this issue May 10, 2024 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 10, 2024

Is your feature request related to a problem or challenge?

While reviewing https://github.com/apache/datafusion/pull/10404/files I could not find documentation on what the syntax / available settings are for the OPTIONS clause

For example there are generic options like

CREATE EXTERNAL TABLE 
....
  OPTIONS(
    NULL_VALUE 'NAN',
  )

And then there are format specific options like (the newly added):

CREATE EXTERNAL TABLE 
....
  OPTIONS(
    NULL_VALUE 'NAN',
    'format.has_header' 'true'
  )

Neither appears to be documented

Describe the solution you'd like

It would be great (as another PR) if we can document them similarly to how @devinjdangelo documented the write options:

https://datafusion.apache.org/user-guide/sql/write_options.html#available-options

With a section for the generic options that apply to all formats and then a list of options for each specific format

Describe alternatives you've considered

No response

Additional context

No response

@marvelshan
Copy link
Contributor

take

@marvelshan
Copy link
Contributor

marvelshan commented Apr 12, 2025

Before proceeding with implementation, I'd like to confirm my approach is correct. I'm planning to create a new file named options.mddedicated to documenting the available options for the OPTIONS clause.

Rationale:

  • Current documentation gap: The existing ddl.md only briefly introduces the basic syntax of CREATE EXTERNAL TABLE without detailed explanations of the OPTIONS clause.

  • Different focus in existing files: While write_options.md does cover some options related to COPY and INSERT INTO operations, its focus is different from what's needed for comprehensive OPTIONS clause documentation.

  • Clearer organization: A new dedicated file would provide complete documentation for the OPTIONS clause without confusing users by mixing it with other command documentation.

@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2025

Hi @marvelshan -- I also took a look at the docs. It actually looks like the format options are largely documented, but the documentation could be improved

I suggest:

  1. Rename write_options.md to format_options.md and make it clear they apply both to reading and writing . Also, link write_options clearly in the CREATE EXTERNAL TABLE section

  2. Then for each sub heading for each format (JSON format specific options, etc) add an example, such as

CREATE EXTERNAL TABLE t 
STORED AS JSON
LOCATION '/tmp/foo.dat'
OPTIONS('COMPRESSION', 'bzip')

I also suggest increasing the level of Json, csv, etc options so they appear in the table of contents

Image

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants