Skip to content

Commit

Permalink
Merge pull request #11 from LaurentRDC/master
Browse files Browse the repository at this point in the history
Preparation for release 0.2
  • Loading branch information
LaurentRDC authored Aug 5, 2017
2 parents 7b28cb6 + 1c714b5 commit b210c9f
Show file tree
Hide file tree
Showing 41 changed files with 943 additions and 443 deletions.
10 changes: 5 additions & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,8 @@ __pycache__/
*.vs/
*.vscode/

# Protein DataBank files and cache folder
*.ent

# Structure file caches
*_cache/
# autogenerated documentation
docs/source/functions/

# Jupyter notebooks
notebooks/
Expand All @@ -33,6 +30,7 @@ lib64/
parts/
sdist/
var/

*.egg-info/
.installed.cfg
*.egg
Expand Down Expand Up @@ -103,4 +101,6 @@ ENV/

# PyCharm
.idea/

# others
benchmark.py
104 changes: 84 additions & 20 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,36 +70,100 @@ npstreams comes with some streaming functions built-in. Some examples:

All routines are documented in the `API Reference on readthedocs.io <http://npstreams.readthedocs.io>`_.

Making your own Streaming Functions
-----------------------------------
Example: Streaming Maximum
--------------------------

Let's create a streaming maximum function for a stream. First, we have to choose
how to handle NaNs:

* If we want to propagate NaNs, we should use :code:`numpy.maximum`
* If we want to ignore NaNs, we should use :code:`numpy.fmax`

Both of those functions are binary ufuncs, so we can use :code:`ireduce_ufunc`. We will
also want to make sure that anything in the stream that isn't an array will be made into one
using the :code:`array_stream` decorator.

Putting it all together::

from npstreams import array_stream, ireduce_ufunc
from numpy import maximum, fmax

@array_stream
def imax(arrays, axis = -1, ignore_nan = False, **kwargs):
"""
Streaming maximum along an axis.

Parameters
----------
arrays : iterable
Stream of arrays to be compared.
axis : int or None, optional
Axis along which to compute the maximum. If None,
arrays are flattened before reduction.
ignore_nan : bool, optional
If True, NaNs are ignored. Default is False.
Yields
------
online_max : ndarray
"""
ufunc = fmax if ignore_nan else maximum
yield from ireduce_ufunc(arrays, ufunc, axis = axis, **kwargs)

This will provide us with a streaming function, meaning that we can look at the progress
as it is being computer. We can also create a function that returns the max of the stream
like :code:`numpy.ndarray.max()` using the :code:`npstreams.last` function::

Any NumPy reduction function can be transformed into a streaming function using the
:code:`stream_reduce` function. For example::
from npstreams import last

from npstreams import stream_reduce
from numpy import prod
def smax(*args, **kwargs): # s for stream
"""
Maximum of all arrays in a stream, along an axis.

def streaming_prod(stream, axis, **kwargs):
""" Streaming product along axis """
yield from stream_reduce(stream, npfunc = prod, axis = axis, **kwargs)
Parameters
----------
arrays : iterable
Stream of arrays to be compared.
axis : int or None, optional
Axis along which to compute the maximum. If None,
arrays are flattened before reduction.
ignore_nan : bool, optional
If True, NaNs are ignored. Default is False.
Returns
-------
max : scalar or ndarray
"""
return last(imax(*args, **kwargs)

The above :code:`streaming_prod` will accumulate (and yield) the result of the operation
as arrays come in the stream.
Benchmark
---------

The two following snippets should return the same result::
Let's look at a simple benchmark. Let compare the two snippets to sum the following data::

from numpy import prod, stack
dense = stack(stream, axis = -1)
from_numpy = prod(dense, axis = 0) # numpy.prod = numpy.multiply.reduce
def stream():
for _ in range(100):
yield np.empty((2048, 2048), dtype = np.int)

.. code::
Snippet 1: dense arrays only. Note that I count the creation of the dense array::

from npstreams import last
import numpy as np

stack = np.stack(list(stream()), axis = -1)
s = np.sum(stack, axis = -1)

On my machine, this takes 7 seconds and ~3G of memory.
Snippet 2: streaming arrays. This also includes the creation of the stream::

# snippet 2
import npstreams as nps
s = nps.last(nps.isum(stream(), axis = -1))

from_stream = last(streaming_prod(stream, axis = 0))
On my machine, this takes 8 seconds and 95 MB of memory.

However, :code:`streaming_prod` will work on 100 GB of data in a single line of code.
Bottom line: for raw speed, use NumPy. If you want to mimimize memory usage, use streams.
If you want to process data in parallel, you'll want to minimize memory usage.
If your data is large (think 10 000 images), you better use streams as well.

Future Work
-----------
Expand Down
44 changes: 29 additions & 15 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,41 @@ Numerics
inanprod
isub

Others
------
Linear Algebra
--------------
.. autosummary::
:toctree: functions/

idot
iinner
itensordot
ieinsum

Control Flow
------------
.. autosummary::
:toctree: functions/

ipipe

Comparisons
-----------
.. autosummary::
:toctree: functions/

iany
iall
imax
imin

Stacking
--------
.. autosummary::
:toctree: functions/

istack
iflatten

Iterator Utilities
------------------
.. autosummary::
Expand All @@ -61,15 +86,4 @@ Parallelization
:toctree: functions/

pmap
preduce

General Stream reduction
------------------------

You can assemble your own streaming reduction using the following generator:

.. autofunction:: stream_reduce

This decorator will ensure that streams will be transformed into streams of NumPy arrays

.. autofunction:: array_stream
preduce
6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.chunked.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.iall.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.iany.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.iaverage.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.iflatten.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.imean.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.inanmean.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.inanprod.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.inanstd.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.inansum.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.inanvar.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.iprod.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.isem.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.istack.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.istd.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.isub.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.isum.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.ivar.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.last.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.linspace.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.multilinspace.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.pmap.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/source/functions/npstreams.pprod.rst

This file was deleted.

Loading

0 comments on commit b210c9f

Please # to comment.