Skip to content

Commit

Permalink
docs: update ray integration and move schema evolution doc to a separ…
Browse files Browse the repository at this point in the history
…ate doc (#3530)

* Move `object store config` into a new page
* Update ray doc to include official lance sink / source
* Move `schema evolution` to separate doc
  • Loading branch information
eddyxu authored Mar 11, 2025
1 parent c12fc3b commit eddb670
Show file tree
Hide file tree
Showing 8 changed files with 1,069 additions and 1,059 deletions.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def setup(app):
"numpy": ("https://numpy.org/doc/stable/", None),
"pyarrow": ("https://arrow.apache.org/docs/", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
"ray": ("https://docs.ray.io/en/latest/", None),
}


Expand Down
4 changes: 3 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,16 @@ Preview releases receive the same level of testing as regular releases.
:maxdepth: 2

Quickstart <./notebooks/quickstart>
./read_and_write
./introduction/read_and_write
./introduction/schema_evolution

.. toctree::
:caption: Advanced Usage
:maxdepth: 1

Lance Format Spec <./format>
Blob API <./blob>
Object Store Configuration <./object_store>
Performance Guide <./performance>
Tokenizer <./tokenizer>
Extension Arrays <./arrays>
Expand Down
34 changes: 21 additions & 13 deletions docs/integrations/ray.rst
Original file line number Diff line number Diff line change
@@ -1,27 +1,35 @@
Lance ❤️ Ray
--------------------

Ray effortlessly scale up ML workload to large distributed compute environment.
`Ray <https://www.anyscale.com/product/open-source/ray>`_ effortlessly scale up ML workload to large distributed
compute environment.

`Ray Data <https://docs.ray.io/en/latest/data/data.html>`_ can be directly written in Lance format by using the
:class:`lance.ray.sink.LanceDatasink` class. For example:
Lance format is one of the official `Ray data sources <https://docs.ray.io/en/latest/data/api/input_output.html#lance>`_:

.. code-block:: bash
* Lance Data Source :py:meth:`ray.data.read_lance`
* Lance Data Sink :py:meth:`ray.data.Dataste.write_lance`

pip install pylance[ray]
.. testsetup::

shutil.rmtree("./alice_bob_and_charlie.lance", ignore_errors=True)

``Ray Data Dataset`` can be written to Lance format using the following code:

.. code-block:: python
.. testcode::

import ray
from lance.ray.sink import LanceDatasink

ray.init()

sink = LanceDatasink("s3://bucket/to/data.lance")
ray.data.range(10).map(
lambda x: {"id": x["id"], "str": f"str-{x['id']}"}
).write_datasink(sink)
data = [
{"id": 1, "name": "alice"},
{"id": 2, "name": "bob"},
{"id": 3, "name": "charlie"}
]
ray.data.from_items(data).write_lance("./alice_bob_and_charlie.lance")

# It can be read via lance directly
tbl = lance.dataset("./alice_bob_and_charlie.lance").to_table()
assert tbl == pa.Table.from_pylist(data)

# Or via Ray.data.read_lance
pd_df = ray.data.read_lance("./alice_bob_and_charlie.lance").to_pandas()
assert tbl == pa.Table.from_pandas(pd_df)
Loading

0 comments on commit eddb670

Please # to comment.