Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Automate build and publish of the user guide #5500

Closed
2 tasks
alamb opened this issue Mar 7, 2023 · 7 comments · Fixed by #5670
Closed
2 tasks

Automate build and publish of the user guide #5500

alamb opened this issue Mar 7, 2023 · 7 comments · Fixed by #5670
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed

Comments

@alamb
Copy link
Contributor

alamb commented Mar 7, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The main datafusion documentation site at https://arrow.apache.org/datafusion is good because:

  1. They look good and follow the best practice for open source project documentation
  2. They are associated clearly with the overall arrow.apache.org

The documentation source is at https://github.com/apache/arrow-datafusion/tree/main/docs.

https://arrow.apache.org/datafusion is updated (typically by @andygrove) when new releases of arrow-datafusion are published to crates.io (for example, apache/arrow-site#313)

However, the current setup has a few notable issues:

  1. It is behind what is in the repo as it is only updated every release (every 2 weeks at the time of writing)
  2. The content of the landing page in the repo (README): https://github.com/apache/arrow-datafusion has diverged from the user guide (as I believe developers want the latest content and the only way to see that without building the docs locally is README.md)
  3. The manual update process is somewhat cumbersome

I think if the user guide was more immediately updated, people would be more likely to contribute to it as well.

Describe the solution you'd like
I would like some mechanism to see the latest, rendered version of the user guide as a webpage.

  • On every commit to main, the site would be updated with the latest version of the user guide
  • The main README.md page in arrow-datafusion would redirect to the hosted site

Bonus points for

Describe alternatives you've considered
Perhaps we could make a github pages site https://pages.github.com/ ?

Additional context

@alamb alamb added enhancement New feature or request documentation Improvements or additions to documentation help wanted Extra attention is needed labels Mar 7, 2023
@alamb
Copy link
Contributor Author

alamb commented Mar 8, 2023

Leaving a note to myself (and maybe others) from @martin-g to check out how the apache-datafusion-python module works

Maybe we can make it like the apache-datafusion-python module

@martin-g
Copy link
Member

Just to make sure I understand correctly: The DataFusion team prefers commits to main to update the main site immediately, right ?

@alamb
Copy link
Contributor Author

alamb commented Mar 17, 2023

Just to make sure I understand correctly: The DataFusion team prefers commits to main to update the main site immediately, right ?

This is my preference. Any thoughts @andygrove @Dandandan or @thinkharderdev ?

Eventually, perhaps when we have longer term stable versions of datafusion, hosting snapshots of the docs for older releases might be useful. But I think the most important first thing is to get the most up to date docs published first

martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 17, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Member

The docs build fails for me with:

./build.sh
Running Sphinx v6.1.3
making output directory... done
[autosummary] generating autosummary for: contributor-guide/communication.md, contributor-guide/index.md, contributor-guide/quarterly_roadmap.md, contributor-guide/roadmap.md, contributor-guide/specification/index.rst, contributor-guide/specification/invariants.md, contributor-guide/specification/output-field-name-semantic.md, index.rst, user-guide/cli.md, user-guide/configs.md, ..., user-guide/sql/aggregate_functions.md, user-guide/sql/data_types.md, user-guide/sql/ddl.md, user-guide/sql/explain.md, user-guide/sql/index.rst, user-guide/sql/information_schema.md, user-guide/sql/scalar_functions.md, user-guide/sql/select.md, user-guide/sql/sql_status.md, user-guide/sql/subqueries.md
myst v1.0.0: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions={'tasklist'}, disable_syntax=[], all_links_external=False, url_schemes=('http', 'https', 'mailto', 'ftp'), ref_domains=None, fence_as_directive=set(), number_code_blocks=[], title_to_header=False, heading_anchors=3, heading_slug_func=None, html_meta={}, footnote_transition=True, words_per_minute=200, substitutions={}, linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', enable_checkboxes=False, suppress_warnings=[], highlight_code_blocks=True)
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 26 source files that are out of date
updating environment: [new config] 26 added, 0 changed, 0 removed
reading sources... [100%] user-guide/sql/subqueries                                                                                                                                                                 
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [  3%] contributor-guide/communication                                                                                                                                                            
Theme error:
An error happened in rendering the page contributor-guide/communication.
Reason: UndefinedError("'logo' is undefined")
make: *** [Makefile:38: html] Error 2

https://github.com/martin-g/arrow-datafusion/actions/runs/4448158011/jobs/7810591628?pr=1

Do I need to do something more than https://github.com/martin-g/arrow-datafusion/pull/1/files#diff-d54d69dbb27e75dae25cb4b2384310cb57707e419377cf572d5cb0ecc1f16877R31-R43 ?

@martin-g
Copy link
Member

I've removed temporarily the usage of logo to be able to build: martin-g@a3a3107

martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 17, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@thinkharderdev
Copy link
Contributor

Just to make sure I understand correctly: The DataFusion team prefers commits to main to update the main site immediately, right ?

This is my preference. Any thoughts @andygrove @Dandandan or @thinkharderdev ?

Eventually, perhaps when we have longer term stable versions of datafusion, hosting snapshots of the docs for older releases might be useful. But I think the most important first thing is to get the most up to date docs published first

Agreed

martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 21, 2023
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to martin-g/arrow-datafusion that referenced this issue Mar 22, 2023
…support

Suggested-by @kou at apache#5670 (comment)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
alamb pushed a commit that referenced this issue Mar 22, 2023
* Fixes #5500 - Add a Github Actions workflow that builds the docs

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Change target branch to "main"

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>

* Use rsync to copy the new content

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>

* Issue #5500 - Add a new line at the bottom of .asf.yaml

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Issue #5500 - Add .nojekyll to explicitly disable Github Pages support

Suggested-by @kou at #5670 (comment)

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

---------

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@alamb
Copy link
Contributor Author

alamb commented Mar 22, 2023

I am happy to report I have tested this and confirmed that the process is working as expected!

Here is an example PR #5684

And the content has appeared at https://arrow.apache.org/datafusion/user-guide/introduction.html 🎉

Screenshot 2023-03-22 at 7 08 23 AM

kou pushed a commit to apache/arrow-site that referenced this issue Mar 22, 2023
…thon` (#337)

Per @kou 's suggestion
#336 (comment)

We are now serving datafusion content from the datafusion-repo -- see
apache/datafusion#5500
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
3 participants