-
Notifications
You must be signed in to change notification settings - Fork 0
/
3.2 Expression contexts.py
135 lines (120 loc) · 3.67 KB
/
3.2 Expression contexts.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# -*- coding: utf-8 -*-
"""
Created on Thu Nov 24 23:09:25 2022
@author: xuanQS
"""
import polars as pl
import numpy as np
np.random.seed(12)
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"groups": ["A", "A", "B", "C", "B"],
}
)
df
"""
You cannot use an expression anywhere.
An expression needs a context,
the available contexts are:
selection: df.select([..])
groupy aggregation: df.groupby(..).agg([..])
hstack/ add columns: df.with_columns([..])
其中的[..],应该就是所谓的context吧
"""
#%% 3.2 Expression contexts
#%%% Syntactic sugar
"""
The reason for such a context,
is that you actually are using the Polars lazy API,
even if you use it in eager. For instance this snippet:
对于context而言,看似用即时模式,实则是用lazy模式
df.groupby("foo").agg([pl.col("bar").sum()])
actually desugars to:
(df.lazy().groupby("foo").agg([pl.col("bar").sum()])).collect()
也就是说,被自动翻译成lazy模式的语法了
This allows Polars to push the expression into
the query engine,
do optimizations, and cache intermediate results.
通过查询引擎,做了优化,并缓存了中间结果
Rust differs from Python somewhat in this respect.
Where Python's eager mode is little more than
a thin veneer over the lazy API,
Rust's eager mode is closer to an implementation detail,
and isn't really recommended for end-user use.
It is possible that the eager API in Rust will be
scoped private sometime in the future.
Therefore, for the remainder of this document,
assume that the Rust examples are using the lazy API.
"""
#%%% Select context
#%%%% Select context
"""
The expressions in this context must produce Series that
are all the same length or have a length of 1.
A Series of a length of 1 will be broadcasted
to match the height of the DataFrame.
select作用于列
生成的是Series,多个操作的长度要么相等,要么为1
如果多个操作的结果中有一个为长度为1,
就会扩展到这多个操作结果构成的DF的高度(行数)
(多个操作的结果会构建成DF,即out的type是DF)
"""
out = df.select(
[
pl.sum("nrs"), #求和,被自动broadcasted
pl.col("names").sort(),
pl.col("names").first().alias("first name"),
# first()得到第一个元素
(pl.mean("nrs") * 10).alias("10xnrs"),
]
)
out
type(out)
#%%%% 添加新列 df.with_columns
"""
Adding columns to a DataFrame using with_columns
is also the selection context.
"""
out = df.with_columns(
[
pl.sum("nrs").alias("nrs_sum"),
pl.col("random").count().alias("count"),
]
)
out
# 示例代码有误
#%%% Groupby context
#%%%% 分组上下文df.groupby().agg()
"""
In the groupby context expressions work on groups and
thus may yield results of any length
(a group may have many members).
"""
out = df.groupby("groups").agg(
[
pl.sum("nrs"),
# sum nrs by groups,对nrs分组求和
pl.col("random").count().alias("count"),
# count group members
# 对random分组计数
# sum random where name != null
pl.col("random").filter(
pl.col("names").is_not_null()
).sum().suffix("_sum"),
# 1 选出random列
# 2 如果name值不为空,就选出random的值,
# 然后求和,并添加后缀名
pl.col("names").reverse().alias(("reversed names")),
# 反向排序
]
)
out
"""
Besides the standard groupby,
groupby_dynamic,
and groupby_rolling
are also entrances to the groupby context.
"""