Skip to content

Commit f3212b1

Browse files
GH-77265: Document NaN handling in statistics functions that sort or count (GH-94676) (#94726)
1 parent e5c8ad3 commit f3212b1

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

Doc/library/statistics.rst

+29
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
3535
you may be able to use :func:`map` to ensure a consistent result, for
3636
example: ``map(float, input_data)``.
3737

38+
Some datasets use ``NaN`` (not a number) values to represent missing data.
39+
Since NaNs have unusual comparison semantics, they cause surprising or
40+
undefined behaviors in the statistics functions that sort data or that count
41+
occurrences. The functions affected are ``median()``, ``median_low()``,
42+
``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and
43+
``quantiles()``. The ``NaN`` values should be stripped before calling these
44+
functions::
45+
46+
>>> from statistics import median
47+
>>> from math import isnan
48+
>>> from itertools import filterfalse
49+
50+
>>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
51+
>>> sorted(data) # This has surprising behavior
52+
[20.7, nan, 14.4, 18.3, 19.2, nan]
53+
>>> median(data) # This result is unexpected
54+
16.35
55+
56+
>>> sum(map(isnan, data)) # Number of missing values
57+
2
58+
>>> clean = list(filterfalse(isnan, data)) # Strip NaN values
59+
>>> clean
60+
[20.7, 19.2, 18.3, 14.4]
61+
>>> sorted(clean) # Sorting now works as expected
62+
[14.4, 18.3, 19.2, 20.7]
63+
>>> median(clean) # This result is now well defined
64+
18.75
65+
66+
3867
Averages and measures of central location
3968
-----------------------------------------
4069

0 commit comments

Comments
 (0)