Skip to content

ThreadFactory is a concurrency framework for Python 3.13+ (No-GIL). It provides custom Work future objects and thread-safe collections, laying the foundation for scalable parallel execution in modern Python.

License

Notifications You must be signed in to change notification settings

Synaptic724/ThreadFactory

Repository files navigation

ThreadFactory

PyPI version License Python Version

PyPI Downloads PyPI Downloads PyPI Downloads

Upload Python Package Docs

High-performance thread-safe (No-GIL–friendly) data structures and parallel operations for Python 3.13+.

NOTE
ThreadFactory is designed and tested against Python 3.13+ in No-GIL mode.
This library will only function on 3.13 and higher.


All benchmark tests below are available if you clone the library and run the tests. See the Benchmark Details πŸš€ for more benchmark stats.

πŸ”₯ Benchmark Results (10,000,000 ops β€” 10 producers / 10 consumers)

Queue Type Time (sec) Throughput (ops/sec) Notes
multiprocessing.Queue 119.99 ~83,336 Not suited for thread-only workloads, incurs unnecessary overhead.
thread_factory.ConcurrentBuffer 23.27 ~429,651 ⚑ Dominant here. Consistent and efficient under moderate concurrency.
thread_factory.ConcurrentQueue 37.87 ~264,014 Performs solidly. Shows stable behavior even at higher operation counts.
collections.deque 64.16 ~155,876 Suffers from contention. Simplicity comes at the cost of throughput.

βœ… Highlights:

  • ConcurrentBuffer outperformed multiprocessing.Queue by 96.72 seconds.
  • ConcurrentBuffer outperformed ConcurrentQueue by 14.6 seconds.
  • ConcurrentBuffer outperformed collections.deque by 40.89 seconds.

πŸ’‘ Observations:

  • ConcurrentBuffer continues to be the best performer under moderate concurrency.
  • ConcurrentQueue maintains a consistent performance but is outperformed by ConcurrentBuffer.
  • All queues emptied correctly (final length = 0).

πŸ”₯ Benchmark Results (20,000,000 ops β€” 20 Producers / 20 Consumers)

Queue Type Time (sec) Throughput (ops/sec) Notes
multiprocessing.Queue 249.92 ~80,020 Severely limited by thread-unfriendly IPC locks.
thread_factory.ConcurrentBuffer 138.64 ~144,270 Solid under moderate producer-consumer balance. Benefits from shard windowing.
thread_factory.ConcurrentBuffer 173.89 ~115,010 Too many shards increased internal complexity, leading to lower throughput.
thread_factory.ConcurrentQueue 77.69 ~257,450 ⚑ Fastest overall. Ideal for large-scale multi-producer, multi-consumer scenarios.
collections.deque 190.91 ~104,771 Still usable, but scalability is poor compared to specialized implementations.

βœ… Notes:

  • ConcurrentBuffer performs better with 10 shards than 20 shards at this concurrency level.
  • ConcurrentQueue continues to be the most stable performer under moderate-to-high thread counts.
  • multiprocessing.Queue remains unfit for threaded-only workloads due to its heavy IPC-oriented design.

πŸ’‘ Observations:

  • Shard count tuning in ConcurrentBuffer is crucial β€” too many shards can reduce performance.
  • Bit-flip balancing in ConcurrentBuffer helps under moderate concurrency but hits diminishing returns with excessive sharding.
  • ConcurrentQueue is proving to be the general-purpose winner for most balanced threaded workloads.
  • For ~40 threads, ConcurrentBuffer shows ~25% drop when doubling the number of shards due to increased dequeue complexity.
  • All queues emptied correctly (final length = 0).

πŸš€ Features

Concurrent Data Structures

ConcurrentBag

  • A thread-safe β€œmultiset” collection that allows duplicates.
  • Methods like add, remove, discard, etc.
  • Ideal for collections where duplicate elements matter.

ConcurrentDict

  • A thread-safe dictionary.
  • Supports typical dict operations (update, popitem, etc.).
  • Provides map, filter, and reduce for safe, bulk operations.

ConcurrentList

  • A thread-safe list supporting concurrent access and modification.
  • Slice assignment, in-place operators (+=, *=), and advanced operations (map, filter, reduce).

ConcurrentQueue

  • A thread-safe FIFO queue built atop collections.deque.
  • Tested and outperforms deque alone by up to 64% in our benchmark.
  • Supports enqueue, dequeue, peek, map, filter, and reduce.
  • Raises Empty when dequeue or peek is called on an empty queue.
  • Outperforms multiprocessing queues by over 400% in some cases clone and run unit tests to see.

ConcurrentStack

  • A thread-safe LIFO stack.
  • Supports push, pop, peek operations.
  • Ideal for last-in, first-out (LIFO) workloads.
  • Built on deque for fast appends and pops.
  • Similar performance to ConcurrentQueue

ConcurrentBuffer

  • A high-performance, thread-safe buffer using sharded deques for low-contention access.
  • Designed to handle massive producer/consumer loads with better throughput than standard queues.
  • Supports enqueue, dequeue, peek, clear, and bulk operations (map, filter, reduce).
  • Timestamp-based ordering ensures approximate FIFO behavior across shards.
  • Outperforms ConcurrentQueue by up to 60% in mid-range concurrency in even thread Producer/Consumer configuration with 10 shards.
  • Automatically balances items across shards; ideal for parallel pipelines and low-latency workloads.
  • Best used with shard_count β‰ˆ thread_count / 2 for optimal performance, but keep shards at or below 10.

ConcurrentCollection

  • An unordered, thread-safe alternative to ConcurrentBuffer.
  • Optimized for high-concurrency scenarios where strict FIFO is not required.
  • Uses fair circular scans seeded by bit-mixed monotonic clocks to distribute dequeues evenly.
  • Benchmarks (10 producers / 20 consumers, 2M ops) show ~5.6% higher throughput than ConcurrentBuffer:
    • ConcurrentCollection: 108,235 ops/sec
    • ConcurrentBuffer: 102,494 ops/sec
    • Better scaling under thread contention.

Parallel Utilities

ThreadFactory provides a collection of parallel programming utilities inspired by .NET's Task Parallel Library (TPL).

parallel_for

  • Executes a traditional for loop in parallel across multiple threads.
  • Accepts start, stop, and a body function to apply to each index.
  • Supports:
    • Automatic chunking to balance load.
    • Optional local_init / local_finalize for per-thread local state.
    • Optional stop_on_exception to abort on the first error.

parallel_foreach

  • Executes an action function on each item of an iterable in parallel.
  • Supports:
    • Both pre-known-length and streaming iterables.
    • Optional chunk_size to tune batch sizes.
    • Optional stop_on_exception to halt execution when an exception occurs.
    • Efficient when processing large datasets or streaming data without loading everything into memory.

parallel_invoke

  • Executes multiple independent functions concurrently.
  • Accepts an arbitrary number of functions as arguments.
  • Returns a list of futures representing the execution of each function.
  • Optionally waits for all functions to finish (or fail).
  • Simplifies running unrelated tasks in parallel with easy error propagation.

parallel_map

  • Parallel equivalent of Python’s built-in map().
  • Applies a transform function to each item in an iterable concurrently.
  • Maintains the order of results.
  • Automatically splits the work into chunks for efficient multi-threaded execution.
  • Returns a fully materialized list of results.

Notes

  • All utilities automatically default to max_workers = os.cpu_count() if unspecified.
  • chunk_size can be manually tuned or defaults to roughly 4 Γ— #workers for balanced performance.
  • Exceptions raised inside tasks are properly propagated to the caller.

πŸ“– Documentation

Full API reference and usage examples are available at:

➑️ https://threadfactory.readthedocs.io


βš™οΈ Installation

Option 1: Clone and Install Locally (Recommended for Development)

# Clone the repository
git clone https://github.com/yourusername/threadfactory.git
cd threadfactory

# Create a Python 3.13+ virtual environment (No-GIL/Free concurrency recommended)
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

Option 2: Install the library from PyPI

# Install the library in editable mode
pip install threadfactory

About

ThreadFactory is a concurrency framework for Python 3.13+ (No-GIL). It provides custom Work future objects and thread-safe collections, laying the foundation for scalable parallel execution in modern Python.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages