Added `filter_table_by_query` #894

srivarra · 2025-03-03T07:22:06Z

Added filtering by a table query as discussed in #626. Added both a standalone function sd.filter_table_by_query and a method sd.SpatialData.filter_table_by_query.

Function signature

class SpatialData:
	...

    def filter_by_table_query(
        self,
        table_name: str,
        filter_tables: bool = True,
        elements: list[str] | None = None,
        obs_expr: Predicates | None = None,
        var_expr: Predicates | None = None,
        x_expr: Predicates | None = None,
        obs_names_expr: Predicates | None = None,
        var_names_expr: Predicates | None = None,
        layer: str | None = None,
        how: Literal["left", "left_exclusive", "inner", "right", "right_exclusive"] = "right",
    ) -> SpatialData:

sd.filter_by_table_query is the same, but instead of self, you have to provide the SpatialData object of interest.

What expressions can you use?

Several methods are supported by narwhals. As long as the method doesn't aggregate.
- I know that the following work: >,>=,<,<=, ==, is_in,
- And from Expr.str contains, starts_with, ends_with work.

What parts can you filter on?

You can filter on the obs and var DataFrame attributes of AnnData.

You can filter on obs_names and var_names. (uses an.obs_names, and an.var_names instead of an.col)

You can filter on the expression matrix X w.r.t layers as well.

Some Examples

# Using the mibitof dataset cause it's small and has a table which covers multiple spatialdata elements.

import spatialdata as sd
import annsel as an
from upath import UPath

mibitof_path = UPath("~/Downloads/mibitof-dataset.zarr")

sdata = sd.read_zarr(mibitof_path)

sdata

SpatialData Repr

SpatialData object, with associated Zarr store: [/Users/srivarra/Downloads/mibitof-dataset.zarr](https://file+.vscode-resource.vscode-cdn.net/Users/srivarra/Downloads/mibitof-dataset.zarr)
├── Images
│     ├── 'point8_image': DataArray[cyx] (3, 1024, 1024)
│     ├── 'point16_image': DataArray[cyx] (3, 1024, 1024)
│     └── 'point23_image': DataArray[cyx] (3, 1024, 1024)
├── Labels
│     ├── 'point8_labels': DataArray[yx] (1024, 1024)
│     ├── 'point16_labels': DataArray[yx] (1024, 1024)
│     └── 'point23_labels': DataArray[yx] (1024, 1024)
└── Tables
      └── 'table': AnnData (3309, 36)
with coordinate systems:
    ▸ 'point8', with elements:
        point8_image (Images), point8_labels (Labels)
    ▸ 'point16', with elements:
        point16_image (Images), point16_labels (Labels)
    ▸ 'point23', with elements:
        point23_image (Images), point23_labels (Labels)

For context here is what the table looks like:

AnnData object with n_obs × n_vars = 3309 × 36
    obs: 'row_num', 'point', 'cell_id', 'X1', 'center_rowcoord', 'center_colcoord', 'cell_size', 'category', 'donor', 'Cluster', 'batch', 'library_id'
    uns: 'spatialdata_attrs'
    obsm: 'X_scanorama', 'X_umap', 'spatial'

Filter with respect the donor "21d7", and filter var_names where we have "ASCT2", "ATP5A" and any marker that starts with "CD".

sd.filter_by_table_query(
    sdata,
    table_name="table",
    obs_expr=an.col("donor") == "21d7",
    var_names_expr=(
        an.var_names.is_in(["ASCT2", "ATP5A"])
        | an.var_names.str.starts_with("CD")
    ),
    x_expr=None,
)

Output

SpatialData object
├── Labels
│     └── 'point23_labels': DataArray[yx] (1024, 1024)
└── Tables
      └── 'table': AnnData (1241, 14)
with coordinate systems:
    ▸ 'point23', with elements:
        point23_labels (Labels)

Filter by batches "0" and "1".

sdata.filter_by_table_query(
    table_name="table",
    obs_expr=an.col("batch").is_in(["1", "0"]),
)

Output

SpatialData object
├── Labels
│     ├── 'point8_labels': DataArray[yx] (1024, 1024)
│     └── 'point23_labels': DataArray[yx] (1024, 1024)
└── Tables
      └── 'table': AnnData (2286, 36)
with coordinate systems:
    ▸ 'point8', with elements:
        point8_labels (Labels)
    ▸ 'point23', with elements:
        point23_labels (Labels)

Filter by obs_names which start with "9"

sd.filter_by_table_query(
    sdata,
    table_name="table",
    obs_names_expr=an.obs_names.str.starts_with("9")
)

Output

SpatialData object
├── Labels
│     └── 'point8_labels': DataArray[yx] (1024, 1024)
└── Tables
      └── 'table': AnnData (624, 36)
with coordinate systems:
    ▸ 'point8', with elements:
        point8_labels (Labels)

Note that tuples of Expressions applies an & operator

sd.filter_by_table_query(
    sdata,
    table_name="table",
    var_names_expr=(an.var_names.str.contains("CD"), an.var_names == "CD8"),
    x_expr=None,
)

Output

SpatialData object
├── Labels
│     ├── 'point8_labels': DataArray[yx] (1024, 1024)
│     ├── 'point16_labels': DataArray[yx] (1024, 1024)
│     └── 'point23_labels': DataArray[yx] (1024, 1024)
└── Tables
      └── 'table': AnnData (3309, 1)
with coordinate systems:
    ▸ 'point8', with elements:
        point8_labels (Labels)
    ▸ 'point16', with elements:
        point16_labels (Labels)
    ▸ 'point23', with elements:
        point23_labels (Labels)

You can make it as complicated as you want.

sd.filter_by_table_query(
    sdata,
    table_name="table",
    # Filter observations (rows) based on multiple conditions
    obs_expr=(
        # Cells from donor 21d7 OR 90de
        an.col("donor").is_in(["21d7", "90de"])
        # AND cells with size greater than 400
        & (an.col("cell_size") > 400)
        # AND cells that are either Epithelial or contain "Tcell" in their cluster name
        & (an.col("Cluster") == "Epithelial")
        | (an.col("Cluster").str.contains("Tcell"))
    ),
    # Filter variables (columns) based on multiple conditions
    var_names_expr=(
        # Select columns that start with CD
        an.var_names.str.starts_with("CD")
        # OR columns that contain "ATP"
        | an.var_names.str.contains("ATP")
        # OR specific columns
        | an.var_names.is_in(["ASCT2", "PKM2", "SMA"])
    ),
    # Filter based on expression values
    x_expr=(
        # Keep cells where ASCT2 is greater than 0.1
        (an.col("ASCT2") > 0.1)
        # AND less than 2 for ASCT2
        & (an.col("ASCT2") < 2)
    ),
    # Additional parameters
    how="right",
    elements=["point23_labels", "point8_labels"],
)

Output

SpatialData object
├── Labels
│     ├── 'point8_labels': DataArray[yx] (1024, 1024)
│     └── 'point23_labels': DataArray[yx] (1024, 1024)
└── Tables
      └── 'table': AnnData (268, 17)
with coordinate systems:
    ▸ 'point8', with elements:
        point8_labels (Labels)
    ▸ 'point23', with elements:
        point23_labels (Labels)

codecov · 2025-03-03T07:29:53Z

Codecov Report

Attention: Patch coverage is 44.44444% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.05%. Comparing base (62356a2) to head (3140fc0).

Files with missing lines	Patch %	Lines
src/spatialdata/_core/query/relational_query.py	40.00%	3 Missing ⚠️
src/spatialdata/_core/spatialdata.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #894      +/-   ##
==========================================
- Coverage   92.11%   92.05%   -0.06%     
==========================================
  Files          48       48              
  Lines        7429     7438       +9     
==========================================
+ Hits         6843     6847       +4     
- Misses        586      591       +5

Files with missing lines	Coverage Δ
src/spatialdata/__init__.py	`96.42% <ø> (ø)`
src/spatialdata/_core/spatialdata.py	`91.29% <50.00%> (-0.17%)`	⬇️
src/spatialdata/_core/query/relational_query.py	`90.59% <40.00%> (-0.55%)`	⬇️

added filter_table_by_query

3140fc0

added tests

7b9ec9b

srivarra marked this pull request as draft March 6, 2025 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `filter_table_by_query` #894

Added `filter_table_by_query` #894

srivarra commented Mar 3, 2025 •

edited

Loading

codecov bot commented Mar 3, 2025 •

edited

Loading

Added filter_table_by_query #894

Are you sure you want to change the base?

Added filter_table_by_query #894

Conversation

srivarra commented Mar 3, 2025 • edited Loading

Some Examples

codecov bot commented Mar 3, 2025 • edited Loading

Codecov Report

Added `filter_table_by_query` #894

Added `filter_table_by_query` #894

srivarra commented Mar 3, 2025 •

edited

Loading

codecov bot commented Mar 3, 2025 •

edited

Loading