@@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
35
35
you may be able to use :func: `map ` to ensure a consistent result, for
36
36
example: ``map(float, input_data) ``.
37
37
38
+ Some datasets use ``NaN `` (not a number) values to represent missing data.
39
+ Since NaNs have unusual comparison semantics, they cause surprising or
40
+ undefined behaviors in the statistics functions that sort data or that count
41
+ occurrences. The functions affected are ``median() ``, ``median_low() ``,
42
+ ``median_high() ``, ``median_grouped() ``, ``mode() ``, ``multimode() ``, and
43
+ ``quantiles() ``. The ``NaN `` values should be stripped before calling these
44
+ functions::
45
+
46
+ >>> from statistics import median
47
+ >>> from math import isnan
48
+ >>> from itertools import filterfalse
49
+
50
+ >>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
51
+ >>> sorted(data) # This has surprising behavior
52
+ [20.7, nan, 14.4, 18.3, 19.2, nan]
53
+ >>> median(data) # This result is unexpected
54
+ 16.35
55
+
56
+ >>> sum(map(isnan, data)) # Number of missing values
57
+ 2
58
+ >>> clean = list(filterfalse(isnan, data)) # Strip NaN values
59
+ >>> clean
60
+ [20.7, 19.2, 18.3, 14.4]
61
+ >>> sorted(clean) # Sorting now works as expected
62
+ [14.4, 18.3, 19.2, 20.7]
63
+ >>> median(clean) # This result is now well defined
64
+ 18.75
65
+
66
+
38
67
Averages and measures of central location
39
68
-----------------------------------------
40
69
0 commit comments