PolarsGrouper is a Rust-based extension for Polars that provides efficient graph analysis capabilities, with a focus on component grouping and network analysis.
super_merger
: Easy-to-use wrapper for grouping connected componentssuper_merger_weighted
: Component grouping with weight thresholds- Efficient implementation using Rust and Polars
- Works with both eager and lazy Polars DataFrames
- Shortest Path Analysis: Find shortest paths between nodes
- PageRank: Calculate node importance scores
- Betweenness Centrality: Identify key bridge nodes
- Association Rules: Discover item relationships and patterns
pip install polars-grouper
# For development:
python -m venv .venv
source .venv/bin/activate
maturin develop
The core functionality uses super_merger
to identify connected components:
import polars as pl
from polars_grouper import super_merger
df = pl.DataFrame({
"from": ["A", "B", "C", "D", "E", "F"],
"to": ["B", "C", "A", "E", "F", "D"],
"value": [1, 2, 3, 4, 5, 6]
})
result = super_merger(df, "from", "to")
print(result)
For cases where edge weights matter:
from polars_grouper import super_merger_weighted
df = pl.DataFrame({
"from": ["A", "B", "C", "D", "E"],
"to": ["B", "C", "D", "E", "A"],
"weight": [0.9, 0.2, 0.05, 0.8, 0.3]
})
result = super_merger_weighted(
df,
"from",
"to",
"weight",
weight_threshold=0.3
)
print(result)
Find shortest paths between nodes:
from polars_grouper import calculate_shortest_path
df = pl.DataFrame({
"from": ["A", "A", "B", "C"],
"to": ["B", "C", "C", "D"],
"weight": [1.0, 2.0, 1.0, 1.5]
})
paths = df.select(
calculate_shortest_path(
pl.col("from"),
pl.col("to"),
pl.col("weight"),
directed=False
).alias("paths")
).unnest("paths")
Calculate node importance:
from polars_grouper import page_rank
df = pl.DataFrame({
"from": ["A", "A", "B", "C", "D"],
"to": ["B", "C", "C", "A", "B"]
})
rankings = df.select(
page_rank(
pl.col("from"),
pl.col("to"),
damping_factor=0.85
).alias("pagerank")
).unnest("pagerank")
Discover item relationships:
from polars_grouper import graph_association_rules
transactions = pl.DataFrame({
"transaction_id": [1, 1, 1, 2, 2, 3],
"item_id": ["A", "B", "C", "B", "D", "A"],
"frequency": [1, 2, 1, 1, 1, 1]
})
rules = transactions.select(
graph_association_rules(
pl.col("transaction_id"),
pl.col("item_id"),
pl.col("frequency"),
min_support=0.1
).alias("rules")
).unnest("rules")
Identify bridge nodes:
from polars_grouper import betweenness_centrality
df = pl.DataFrame({
"from": ["A", "A", "B", "C", "D", "E"],
"to": ["B", "C", "C", "D", "E", "A"]
})
centrality = df.select(
betweenness_centrality(
pl.col("from"),
pl.col("to"),
normalized=True
).alias("centrality")
).unnest("centrality")
The library is implemented in Rust for high performance:
- Efficient memory usage
- Fast computation for large graphs
- Seamless integration with Polars' lazy evaluation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.