-
Notifications
You must be signed in to change notification settings - Fork 264
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: update ray integration and move schema evolution doc to a separ…
…ate doc (#3530) * Move `object store config` into a new page * Update ray doc to include official lance sink / source * Move `schema evolution` to separate doc
- Loading branch information
Showing
8 changed files
with
1,069 additions
and
1,059 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,35 @@ | ||
Lance ❤️ Ray | ||
-------------------- | ||
|
||
Ray effortlessly scale up ML workload to large distributed compute environment. | ||
`Ray <https://www.anyscale.com/product/open-source/ray>`_ effortlessly scale up ML workload to large distributed | ||
compute environment. | ||
|
||
`Ray Data <https://docs.ray.io/en/latest/data/data.html>`_ can be directly written in Lance format by using the | ||
:class:`lance.ray.sink.LanceDatasink` class. For example: | ||
Lance format is one of the official `Ray data sources <https://docs.ray.io/en/latest/data/api/input_output.html#lance>`_: | ||
|
||
.. code-block:: bash | ||
* Lance Data Source :py:meth:`ray.data.read_lance` | ||
* Lance Data Sink :py:meth:`ray.data.Dataste.write_lance` | ||
|
||
pip install pylance[ray] | ||
.. testsetup:: | ||
|
||
shutil.rmtree("./alice_bob_and_charlie.lance", ignore_errors=True) | ||
|
||
``Ray Data Dataset`` can be written to Lance format using the following code: | ||
|
||
.. code-block:: python | ||
.. testcode:: | ||
|
||
import ray | ||
from lance.ray.sink import LanceDatasink | ||
|
||
ray.init() | ||
|
||
sink = LanceDatasink("s3://bucket/to/data.lance") | ||
ray.data.range(10).map( | ||
lambda x: {"id": x["id"], "str": f"str-{x['id']}"} | ||
).write_datasink(sink) | ||
data = [ | ||
{"id": 1, "name": "alice"}, | ||
{"id": 2, "name": "bob"}, | ||
{"id": 3, "name": "charlie"} | ||
] | ||
ray.data.from_items(data).write_lance("./alice_bob_and_charlie.lance") | ||
|
||
# It can be read via lance directly | ||
tbl = lance.dataset("./alice_bob_and_charlie.lance").to_table() | ||
assert tbl == pa.Table.from_pylist(data) | ||
|
||
# Or via Ray.data.read_lance | ||
pd_df = ray.data.read_lance("./alice_bob_and_charlie.lance").to_pandas() | ||
assert tbl == pa.Table.from_pandas(pd_df) |
Oops, something went wrong.