Releases · quixio/quix-streams

27 May 15:55

daniil-quix

v3.15.0

a2f4ee7

v3.15.0 Latest

Latest

What's Changed

💎 New streaming join: `StreamingDataFrame.join_asof`

With StreamingDataFrame.join_asof(), you can join two topics into a new stream where each left record is merged with the right record with the same key whose timestamp is less than or equal to the left timestamp.

This join is built with the timeseries enrichment use cases in mind, where the left side represents some measurements and the right side represents events.

Some examples:

Matching of the sensor measurements with the events in the system.
Joining the purchases with the effective prices of the goods.

from datetime import timedelta

from quixstreams import Application

app = Application(...)

sdf_measurements = app.dataframe(app.topic("measurements"))
sdf_metadata = app.dataframe(app.topic("metadata"))

# Join records from the topic "measurements"
# with the latest effective records from the topic "metadata".
# using the "inner" join strategy and keeping the "metadata" records stored for 14 days in event time.
sdf_joined = sdf_measurements.join_asof(
    right=sdf_metadata,
    how="inner",                 # Emit updates only if the match is found in the store.
    on_merge="keep-left",        # Prefer the columns from the left dataframe if they overlap with the right. 
    grace_ms=timedelta(days=14), # Keep the state for 14 days (measured in event time, similar to windows).
)

if __name__ == '__main__':
    app.run()

Learn more about it on the Joins docs page.

By @gwaramadze and @daniil-quix in #874 #841

State improvements

Enable fsync in RocksDB by default by @gwaramadze in #883
RocksDBStorePartition: log a number of bytes written by @daniil-quix in #890
Optimize state operations by @daniil-quix in #891
Add a parameter to clear corrupted RocksDBs by @gwaramadze in #888.
To re-create the corrupted RocksDB state store automatically, use the RocksDBOptions object:

from quixstreams import Application
from quixstreams.state.rocksdb import RocksDBOptions

app = Application(..., rocksdb_options=RocksDBOptions(on_corrupted_recreate=True))

Dependencies

Bump types-protobuf from 6.30.2.20250503 to 6.30.2.20250516 by @dependabot in #885

Full Changelog: v3.14.1...v3.15.0

Contributors

gwaramadze, dependabot, and daniil-quix

Assets 2

12 May 11:17

daniil-quix

v3.14.1

d700dfb

v3.14.1

What's Changed

PostgreSQLSink: allow dynamic table name selection based on record by @tim-quix in #867
Fix: deserialization propagation after sliding window by @gwaramadze in #875

Full Changelog: v3.14.0...v3.14.1

Contributors

gwaramadze and tim-quix

Assets 2

07 May 15:52

daniil-quix

v3.14.0

1504a44

v3.14.0

💎 New features

`StreamingDataFrame.concat()` for combining multiple topics and branches

In this release, we introduce a new API to combine multiple StreamingDataFrames together and process the underlying data as a single stream - StreamingDataFrame.concat().
You can use it to either combine different topics or branches of the same dataframe:

from datetime import timedelta

from quixstreams import Application
from quixstreams.dataframe.Windows import Mean

# Example:  Aggregate e-commerce orders from different locations into one stream and calculate the average order amount in 1h windows.

app = Application(...)

# Define the topics with e-commerce orders
topic_uk = app.topic("orders-uk")
topic_de = app.topic("orders-de")

# Create StreamingDataFrames for each location
orders_uk = app.dataframe(topic_uk)
orders_de = app.dataframe(topic_de)

# Simulate the currency conversion step for each topic before concatenating them.
orders_uk["amount_usd"] = orders_uk["amount"].apply(convert_currency("GBP", "USD"))
orders_de["amount_usd"] = orders_de["amount"].apply(convert_currency("EUR", "USD"))

# Concatenate the orders from different locations into a new StreamingDataFrame.
# The new dataframe will have all records from both topics.
orders_combined = orders_uk.concat(orders_de)

# Calculate the average order size in USD within 1h tumbling window. 
orders_combined.tumbling_window(timedelta(hours=1)).agg(avg_amount_usd=Mean("amount_usd"))

# Print the aggregated results
orders_combined.print()


if __name__ == '__main__':
    app.run()

See the Concatenating Topics and Branching StreamingDataFrames pages to learn more about processing multiple topics and merging branches.

By @daniil-quix in #802 #866 #865 #857

New aggregations

Added new built-in aggregations to quixstreams.dataframe.windows.aggregations:

Earliest and Latest - to store values with the smallest or the latest timestamp within each window.
First and Last - to store the first and last values within each window. These aggregations work based on the processing order and are agnostic of the timestamps.

By @ovv in #812

Other improvements

Skip the repartition topic when using StreamingDataFrame.group_by() with single-partition topics by @ovv in #836
Accept multiple columns in StreamingDataFrame.contains() by @ovv in #830
Add State.get_bytes and State.set_bytes methods to persist raw bytes in state stores by @ovv in #834
Add Quix__State__Dir environment variable to configure app's state directory in Quix Cloud deployments by @ovv in #844

🦠 Bugfixes

Fix SyntaxWarning by @gwaramadze in #827

🔌 Connectors

Updated file source by @tim-quix in #787
Add additional doc info about adding metadata with ListSink by @tim-quix in #814
Connectors: update the default sink connect error to avoid error str clashes by @tim-quix in #831
MongoDBSink: fix issue with dumping a headers dict that is None by @tim-quix in #829
MongoDBSink: simplify auth style by @tim-quix in #854
InfluxDB3Sink: improve timestamp type handling by @tim-quix in #811
New connector - InfluxDB3Source: by @tim-quix in #788
FileSource: fix requiring all optional file source deps to use a file source by @tim-quix in #859

🔧 Internal

Include py.typed file to mark quixstreams as a typed package by @ovv in #816
clean up (and fix) doc build list by @tim-quix in #813
Remove usage of Self type hint by @ovv in #826
Fix some documentation hyperlinks + update CI workflow by @ovv in #835
CI: Bump Python version to 3.13 by @ovv in #843
Add topic_type to the Topic objects by @daniil-quix in #847
Refactor RowConsumer and RowProducer by @daniil-quix in #861
Bump mypy-extensions from 1.0.0 to 1.1.0 by @dependabot in #858
Update pre-commit requirement from <4.1,>=3.4 to >=3.4,<4.3 by @dependabot in #840
Bump testcontainers from 4.8.2 to 4.10.0 by @dependabot in #821
Merge PausingManager into RowConsumer by @daniil-quix in #853

Dependencies

Bump types-protobuf from 5.29.1.20250403 to 6.30.2.20250503 by @dependabot in #864
Update protobuf requirement from <6.0,>=5.27.2 to >=5.27.2,<7.0 by @dependabot in #822
Remove schema_registry extra by @gwaramadze in #825
Update confluent-kafka[avro,json,protobuf,schemaregistry] requirement from <2.9,>=2.8.2 to >=2.8.2,<2.10 by @dependabot in #820
Update pydantic-settings requirement from <2.9,>=2.3 to >=2.3,<2.10 by @dependabot in #851
Update rich requirement from <14,>=13 to >=13,<15 by @dependabot in #839
Update pydantic requirement from <2.11,>=2.7 to >=2.7,<2.12 by @dependabot in #837

Full Changelog: v3.13.1...v3.14.0

Contributors

gwaramadze, ovv, and 3 other contributors

Assets 2

04 Apr 16:56

daniil-quix

v3.13.1

2a91526

v3.13.1

🦠Bugfixes

Fix the initial recovery getting stuck when auto_offset_reset="latest" in #817 by @daniil-quix

Full Changelog: v3.13.0...v3.13.1

Contributors

daniil-quix

Assets 2

04 Apr 16:56

daniil-quix

v3.13.0

63491f5

v3.13.0

💎 More flexible API for windowed aggregations

Introducing a new API to perform windowed aggregations via the .agg() method:

from datetime import timedelta
from quixstreams import Application
from quixstreams.dataframe.windows import Min, Max, Count, Mean

app = Application(...)
sdf = app.dataframe(...)

sdf = (

    # Define a tumbling window of 10 minutes
    sdf.tumbling_window(timedelta(minutes=10))
    
    # Configure the aggregations to perform
    .agg(
        min_temp=Min("temperature"),
        max_temp=Max("temperature"),
        avg_temp=Mean("temperature"),
        total_events=Count(),
    )

    # Emit results only for closed windows
    .final()
)

# Output:
# {
#   'start': <window start>, 
#   'end': <window end>, 
#   'min_temp': 1,
#   'max_temp': 999,
#   'avg_temp': 34.32,
#   'total_events': 999,
# }

Main benefits of .agg():

The keyword parameters are mapped to the output columns (no "value" column in the window results anymore, unless you define it this way)
Support for aggregations on specific columns
Support for multiple aggregations on the same window
You can also implement a custom aggregation class and re-use it across the code base

See the Aggregations docs to learn more about .agg().

The previous API for windowed aggregations (e.g., sdf.tumbling_window().reduce()) remains there, but is considered deprecated.

by @ovv in #785

💎 New method `StreamingDataFrame.fill` to fill missing data

Added a new method fill missing data - StreamingDataFrame.fill().
Use it to adjust the schema of the incoming messages to the desired shape when you expect some keys to be missing in the values.

sdf: StreamingDataFrame

# Input {"x": 1} would become {"x": 1, "y": None}
sdf.fill("y")

# {"x": 1} => {"x": 1, "y": 0}
sdf.fill(y=0)

See the Missing Data docs to learn more.

By @gwaramadze in #792

💎 New Sink: Elasticsearch

Added a new sink to write data to Elasticsearch:

from quixstreams import Application
from quixstreams.sinks.community.elasticsearch import ElasticsearchSink

app = Application(...)
sdf = app.dataframe(...)

# Configure the sink
elasticsearch_sink = ElasticsearchSink(
    url="http://localhost:9200",
    index="my_index",
)

sdf.sink(elasticsearch_sink)

Docs - ElasticsearchSink

By @tim-quix in #801

🦠Bugfixes

Check if topic already exists before creating by @gwaramadze in #800

Dependencies

Bump types-jsonschema from 4.23.0.20240813 to 4.23.0.20241208 by @dependabot in #684
Bump types-requests from 2.32.0.20241016 to 2.32.0.20250328 by @dependabot in #808

Full Changelog: v3.12.0...v3.13.0

Contributors

gwaramadze, ovv, and 2 other contributors

Assets 2

27 Mar 13:31

daniil-quix

v3.12.0

51c3bf2

v3.12.0

What's Changed

Gracefully handle None values in built-in windowed aggregations like sum() and count() by @gwaramadze in #789
Add a flag to automatically convert integers to floats to InfluxDB3Sink by @tim-quix in #793
Upgrade confluent-kafka to >=2.8.2,<2.9 by @gwaramadze in #799
Include optional dependencies for confluent-kafka and fix anaconda dependencies by @gwaramadze in #804

Full Changelog: v3.11.0...v3.12.0

Contributors

gwaramadze and tim-quix

Assets 2

24 Mar 15:29

daniil-quix

v3.11.0

20db25a

v3.11.0

What's Changed

💎 Stop conditions for `Application.run()`

Application.run() now accepts additional count and timeout parameters to stop itself when the condition is met.

It is intended for interactive debugging of the applications on smaller portions of data.

How it works:
- count - a number of messages to process from main SDF input topics (default 0 == infinite)
- timeout - the maximum time in seconds to wait for a new message to appear (default 0.0 == infinite).

Example:

from quixstreams import Application

app = Application(...)
# ... some processing happening here ...

app.run(
    count=20,   # Process 20 messages from input topics
    timeout=5,  # Wait for 5s if fewer than 20 messages are available in the topics.
)

For more info, see the Inspecting Data & Debugging docs page.

by @tim-quix in #780

[breaking] 💥 Changes to `Sink.flush()`

This release introduces breaking changes to the Sink.flush() method and Sinks API overall in order to accommodate future features like joins.

Sinks are now flushed first on each checkpoint before producing changelogs to minimize potential over-production in case of Sink's failure.
Previously, changelogs were produced first, and only then Sinks were flushed.
Sink.flush() is now expected to flush all the accumulated data for all TPs.
The custom implementations of Sinks need to be updated.
Previously, Sink.flush() was called for each processed topic partition.
SinkBackpressureError now pauses the whole assignment instead of certain partitions only.

By @daniil-quix in #786

🦠Bugfixes

default sink connect fail callback should raise by @tim-quix in #794

🛠️ Other changes

Refactor recovery to support Stores belonging to multiple topics by @daniil-quix in #774
windows: Split Aggregations and collectors into classes in #772
tweak Source to have a defined setup method for easier simple use cases by @tim-quix in #783
remove all source and sink setup abstracts by @tim-quix in #784
InfluxDb v3 sink tags improvements by @tomas-quix in #795

⚠️ Upgrade considerations

In #774, the format of the changelog message headers was changed.
When updating the existing application running with processing_guarantee="at-least-once" (default), ensure the app is stopped normally and the last checkpoint is committed successfully before upgrading to the new version.
See #774 for more details.

Full Changelog: v3.10.0...v3.11.0

Contributors

tomas-quix, daniil-quix, and tim-quix

Assets 2

07 Mar 14:53

daniil-quix

v3.10.0

0115ebb

v3.10.0

What's Changed

💎 Window closing strategies

Previously, windows used the "key" closing strategy.
With this strategy, messages advance time and close only windows with the same message key.
It helps to capture more data when it's not aligned in time (e.g. some keys are produced irregularly), but the latest windows can remain unprocessed until the message with the same key is received.

In this release, we added a new "partition" strategy, and an API to configure the strategy for tumbling and hopping windows (sliding windows don't support it yet).

With "partition" closing strategy, messages advance time and close windows for the whole partition to which this key belongs.
It helps to close windows faster because different keys advance time at the cost of potentially skipping more out-of-order messages.

Example:

from datetime import timedelta
from quixstreams import Application

app = Application(...)
sdf = app.dataframe(...)

# Define a window with the "partition" closing strategy.
sdf = sdf.tumbling_window(timedelta(seconds=10)).sum().final(closing_strategy="partition")

Learn more about closing strategies in the docs - https://quix.io/docs/quix-streams/windowing.html#closing-strategies

Added by @quentin-quix in #747

💎 Connectors status callbacks

Sinks and Sources now accept optional on_client_connect_success and on_client_connect_failure callbacks and can trigger them to inform about the Connector status during setup.

By @tim-quix in #708 #775

🦠 Bugfixes

fix bad timestamps in test_app by @tim-quix in #768
Bound protobuf<6.0 in tests by @quentin-quix in #773
Bugfix for recovering from exactly 1 changelog message by @tim-quix in #769
print_table method handles non-dict values by @gwaramadze in #767

🛠️ Other changes

Create topics eagerly the moment they are defined by @daniil-quix in #763
Increase default timeout and retries for Producer.produce by @quentin-quix in #771
Add rich license by @gwaramadze in #776
update docs and tutorials based on connect callback addition by @tim-quix in #775
typing: make state protocols and ABCs generic by @quentin-quix in #777

Full Changelog: v3.9.0...v3.10.0

Contributors

gwaramadze, daniil-quix, and tim-quix

Assets 2

24 Feb 13:55

daniil-quix

v3.9.0

0923770

v3.9.0

What's Changed

💎Table-style printing of StreamingDataFrames

You can now examine the incoming data streams in a table-like format StreamingDataFrame.print_table() feature.

For interactive terminals, it can print new rows row-by-row in a live mode with an artificial delay, allowing you to glance at the data stream easily.
For non-interactive environments (stdout, file, etc.) or if live=False, it will print rows in batches as soon as the data is available to the application.

This is an experimental feature, so feel free to submit an issue with your feedback 👍

See the docs to learn more about StreamingDataFrame.print_table().

sdf = app.dataframe(...)
# some SDF transformations happening here ...

# Print last 5 records with metadata columns in live mode
sdf.print_table(size=5, title="My Stream", live=True)

# For wide datasets, limit columns to improve readability
sdf.print_table(
    size=5,
    title="My Stream",
    columns=["id", "name", "value"],
    column_widths={"name": 20}
)


# Live output:
My Stream
┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ _key       ┃ _timestamp ┃ id     ┃ name                 ┃ value   ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ b'53fe8e4' │ 1738685136 │ 876    │ Charlie              │ 42.5    │
│ b'91bde51' │ 1738685137 │ 11     │ Alice                │ 18.3    │
│ b'6617dfe' │ 1738685138 │ 133    │ Bob                  │ 73.1    │
│ b'f47ac93' │ 1738685139 │ 244    │ David                │ 55.7    │
│ b'038e524' │ 1738685140 │ 567    │ Eve                  │ 31.9    │
└────────────┴────────────┴────────┴──────────────────────┴─────────┘

By @gwaramadze in #740, #760

Bugfixes

⚠️ Fix default state dir for Quix Cloud apps by @gwaramadze in #759
Please note that the state may be recovered to a different directory when updating existing deployments in Quix Cloud if state_dir is not set.
[Issue #440] ignore errors in rmtree by @ulisesojeda in #753
Fix QuixPortalApiService failing in multiprocessing environment by @daniil-quix in #755

Docs

Add missing "how to install" section for PandasDataFrameSource by @daniil-quix in #751

New Contributors

@ulisesojeda made their first contribution in #753

Full Changelog: v3.8.1...v3.9.0

Contributors

gwaramadze, ulisesojeda, and daniil-quix

Assets 2

13 Feb 14:03

daniil-quix

v3.8.1

4150fbf

v3.8.1

What's Changed

New PandasDataFrameSource connector to stream data from pandas.DataFrames during development and debugging by @JotaBlanco and @daniil-quix in #748
Made logging of common Kafka ACL issues more helpful by providing potentially missing ACLs and topic names by @tim-quix in
#742

Fix docs for MongoDBSink by @tim-quix in #746
Bump mypy from 1.13.0 to 1.15.0 by @dependabot in #744

Full Changelog: v3.8.0...v3.8.1

Contributors

dependabot, JotaBlanco, and 2 other contributors

Assets 2

Releases: quixio/quix-streams

v3.15.0

What's Changed

💎 New streaming join: StreamingDataFrame.join_asof

State improvements

Dependencies

Contributors

Uh oh!

v3.14.1

What's Changed

Contributors

Uh oh!

v3.14.0

💎 New features

StreamingDataFrame.concat() for combining multiple topics and branches

New aggregations

Other improvements

🦠 Bugfixes

🔌 Connectors

🔧 Internal

Dependencies

Contributors

Uh oh!

v3.13.1

🦠Bugfixes

Contributors

Uh oh!

v3.13.0

💎 More flexible API for windowed aggregations

💎 New method StreamingDataFrame.fill to fill missing data

💎 New Sink: Elasticsearch

🦠Bugfixes

Dependencies

Contributors

Uh oh!

v3.12.0

What's Changed

Contributors

Uh oh!

v3.11.0

What's Changed

💎 Stop conditions for Application.run()

[breaking] 💥 Changes to Sink.flush()

🦠Bugfixes

🛠️ Other changes

⚠️ Upgrade considerations

Contributors

Uh oh!

v3.10.0

What's Changed

💎 Window closing strategies

💎 Connectors status callbacks

🦠 Bugfixes

🛠️ Other changes

Contributors

Uh oh!

v3.9.0

What's Changed

💎Table-style printing of StreamingDataFrames

Bugfixes

Docs

New Contributors

Contributors

Uh oh!

v3.8.1

What's Changed

Contributors

Uh oh!

💎 New streaming join: `StreamingDataFrame.join_asof`

`StreamingDataFrame.concat()` for combining multiple topics and branches

💎 New method `StreamingDataFrame.fill` to fill missing data

💎 Stop conditions for `Application.run()`

[breaking] 💥 Changes to `Sink.flush()`