Text Extensions for Pandas includes extension types for representing spans and tensors inside Pandas DataFrames. This section describes the Python classes that implement these types.
The :class:`SpanDtype` extension data type efficiently stores span data in a Pandas Series. Each span is represented by begin and end character offsets into a target document. We use dense NumPy arrays for efficient internal storage.
.. autoclass:: text_extensions_for_pandas.SpanDtype
.. autoclass:: text_extensions_for_pandas.SpanArray :members: .. autoclasstoc::
.. autoclass:: text_extensions_for_pandas.Span :members: .. autoclasstoc::
The :class:`TokenSpanDtype` extension data type is similar to :class:`SpanDtype`, except that it represents spans using begin and end offsets into the tokens of a target document. These tokens are stored in a (shared) :class:`SpanArray` object.
.. autoclass:: text_extensions_for_pandas.TokenSpanDtype
.. autoclass:: text_extensions_for_pandas.TokenSpanArray :members: .. autoclasstoc::
.. autoclass:: text_extensions_for_pandas.TokenSpan :members: .. autoclasstoc::
The :class:`TensorDtype` extension data type is efficiently stores tensors in the rows of a Pandas Series. For efficiency, we store all of the tensors in a Series in a single NumPy array.
.. autoclass:: text_extensions_for_pandas.TensorDtype
.. autoclass:: text_extensions_for_pandas.TensorArray :members: .. autoclasstoc::
.. autoclass:: text_extensions_for_pandas.TensorElement :members: .. autoclasstoc::