-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathatom.xml
2153 lines (1896 loc) · 540 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[鸟窝]]></title>
<subtitle><![CDATA[大道至简 Simplicity is the ultimate form of sophistication]]></subtitle>
<link href="/atom.xml" rel="self"/>
<link href="https://colobu.com/"/>
<updated>2025-02-01T15:46:14.351Z</updated>
<id>https://colobu.com/</id>
<author>
<name><![CDATA[smallnest]]></name>
</author>
<generator uri="http://zespia.tw/hexo/">Hexo</generator>
<entry>
<title><![CDATA[啥时候等到Go官方支持SIMD?]]></title>
<link href="https://colobu.com/2025/02/01/the-state-of-simd-in-go/"/>
<id>https://colobu.com/2025/02/01/the-state-of-simd-in-go/</id>
<published>2025-02-01T15:39:40.000Z</published>
<updated>2025-02-01T15:45:58.244Z</updated>
<content type="html"><![CDATA[<p>单指令多数据流(<code>SIMD</code>,Single Instruction Multiple Data)是一种并行计算技术,允许一条指令同时处理多个数据点。SIMD在现代CPU中广泛应用,能够显著提升计算密集型任务的性能,如图像处理、机器学习、科学计算等。随着Go语言在高性能计算领域的应用逐渐增多,SIMD支持成为了开发者关注的焦点。</p>
<p>当前很多主流和新型的语言都有相应的<code>simd</code>库了,比如C++、Rust、Zig等,但Go语言的<code>simd</code>官方支持还一直在讨论中(<a href="https://github.com/golang/go/issues/67520" target="_blank" rel="external">issue#67520</a>)。Go语言的设计目标是简单性和可移植性,而SIMD的实现通常需要针对不同的硬件架构进行优化,这与Go的设计目标存在一定冲突。因此,Go语言对SIMD的支持一直备受争议。<br>最近几周这个issue的讨论有活跃起来, 希望能快点支持。</p>
<a id="more"></a>
<h2 id="1-_Go语言与SIMD的背景">1. Go语言与SIMD的背景</h2>
<h3 id="1-1_Go语言的性能追求">1.1 Go语言的性能追求</h3>
<p>Go语言以其简洁的语法、高效的并发模型和快速的编译速度赢得了广泛的应用。然而,Go在性能优化方面一直面临挑战,尤其是在需要处理大量数据的场景下。SIMD作为一种高效的并行计算技术,能够显著提升计算性能,因此Go社区对SIMD的支持呼声日益高涨。</p>
<p>如果没有 SIMD,我们就会错过很多潜在的优化。以下是可以提高日常生活场景中性能的具体事项的非详尽列表:</p>
<ul>
<li><a href="https://github.com/simdjson/simdjson" target="_blank" rel="external">simdjson</a></li>
<li><a href="https://people.csail.mit.edu/jshun/6886-s19/lectures/lecture19-1.pdf" target="_blank" rel="external">通过矢量化每秒解码数十亿个整数</a></li>
<li><a href="https://opensource.googleblog.com/2022/06/Vectorized%20and%20performance%20portable%20Quicksort.html" target="_blank" rel="external">矢量化和性能可移植的快速排序</a></li>
<li><a href="https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-hyperscan.html" target="_blank" rel="external">Hyperscan 简介</a></li>
<li><a href="https://sourcegraph.com/blog/slow-to-simd" target="_blank" rel="external">From slow to SIMD: A Go optimization story</a></li>
<li><a href="https://gorse.io/posts/avx512-in-golang.html#convert-assembly" target="_blank" rel="external">How to Use AVX512 in Golang via C Compiler</a></li>
</ul>
<p>此外,它将使这些当前存在的软件包更具可移植性和可维护性:</p>
<ul>
<li><a href="https://github.com/minio/simdjson-go" target="_blank" rel="external">simdjson-go</a></li>
<li><a href="https://github.com/minio/sha256-simd" target="_blank" rel="external">SHA256-SIMD</a></li>
<li><a href="https://github.com/minio/md5-simd" target="_blank" rel="external">MD5-SIMD</a></li>
</ul>
<p>在这个月即将发布的Go 1.24版中,将会将内建的map使用Swiss Tables替换,而Swiss Tables针对AMD64的架构采用了<a href="https://go-review.googlesource.com/c/go/+/626277" target="_blank" rel="external">SIMD的代码</a>,这是不是Go官方代码库首次引进了SIMD的指令呢?</p>
<p>当前先前也有人实现了SIMD加速encoding/hex,<a href="https://go-review.googlesource.com/c/go/+/110195" target="_blank" rel="external">被否了</a>,当然理由也很充分:加速效果很好但请放弃吧,看起来太复杂,违背了Go简洁的初衷。<br>类似的还有<a href="https://go-review.googlesource.com/c/go/+/535838" target="_blank" rel="external">unicode/utf8: make Valid use AVX2 on amd64</a></p>
<p>其实Go官方在2023就已经在标准库crypto/sha256中使用SIMD指令了 <a href="https://go-review.googlesource.com/c/go/+/408795" target="_blank" rel="external">crypto/sha256: add sha-ni implementation</a>。</p>
<h3 id="1-2_SIMD的基本概念">1.2 SIMD的基本概念</h3>
<p>SIMD通过一条指令同时处理多个数据点,通常用于向量化计算。现代CPU(如Intel的<code>SSE/AVX</code>、ARM的<code>NEON</code>)都提供了SIMD指令集,允许开发者通过特定的指令集加速计算任务。然而,直接使用SIMD指令集通常需要编写汇编代码或使用特定的编译器内置函数,这对开发者提出了较高的要求。</p>
<h4 id="1-2-1_SIMD的核心思想">1.2.1 SIMD的核心思想</h4>
<p>SIMD的核心思想是通过一条指令同时处理多个数据点。例如,传统的标量加法指令一次只能处理两个数,而SIMD加法指令可以同时处理多个数(如4个、8个甚至更多)。这种并行化处理方式能够显著提升计算密集型任务的性能。</p>
<h4 id="1-2-2_SIMD指令集的组成">1.2.2 SIMD指令集的组成</h4>
<p>SIMD指令集通常包括以下几类指令:</p>
<ul>
<li><strong>算术运算</strong>:加法、减法、乘法、除法等。</li>
<li><strong>逻辑运算</strong>:与、或、非、异或等。</li>
<li><strong>数据搬移</strong>:加载、存储、重排数据。</li>
<li><strong>比较操作</strong>:比较多个数据点并生成掩码。</li>
<li><strong>特殊操作</strong>:如求平方根、绝对值、最大值、最小值等。</li>
</ul>
<h3 id="1-3_常见的指令集">1.3 常见的指令集</h3>
<h4 id="1-3-1_Intel的SIMD指令集">1.3.1 Intel的SIMD指令集</h4>
<h5 id="1-3-1-1_MMX(MultiMedia_eXtensions)">1.3.1.1 MMX(MultiMedia eXtensions)</h5>
<ul>
<li><strong>推出时间</strong>:1996年</li>
<li><strong>寄存器宽度</strong>:64位</li>
<li><strong>数据类型</strong>:整数(8位、16位、32位)</li>
<li><strong>特点</strong>:<ul>
<li>主要用于多媒体处理。</li>
<li>引入了8个64位寄存器(MM0-MM7)。</li>
<li>不支持浮点数运算。</li>
</ul>
</li>
</ul>
<h5 id="1-3-1-2_SSE(Streaming_SIMD_Extensions)">1.3.1.2 SSE(Streaming SIMD Extensions)</h5>
<ul>
<li><strong>推出时间</strong>:1999年</li>
<li><strong>寄存器宽度</strong>:128位</li>
<li><strong>数据类型</strong>:单精度浮点数(32位)、整数(8位、16位、32位、64位)</li>
<li><strong>特点</strong>:<ul>
<li>引入了8个128位寄存器(XMM0-XMM7)。</li>
<li>支持浮点数运算,适用于科学计算和图形处理。</li>
<li>后续版本(SSE2、SSE3、SSSE3、SSE4)增加了更多指令和功能。</li>
</ul>
</li>
</ul>
<h5 id="1-3-1-3_AVX(Advanced_Vector_Extensions)">1.3.1.3 AVX(Advanced Vector Extensions)</h5>
<ul>
<li><strong>推出时间</strong>:2011年</li>
<li><strong>寄存器宽度</strong>:256位</li>
<li><strong>数据类型</strong>:单精度浮点数(32位)、双精度浮点数(64位)、整数(8位、16位、32位、64位)</li>
<li><strong>特点</strong>:<ul>
<li>引入了16个256位寄存器(YMM0-YMM15)。</li>
<li>支持更宽的向量操作,性能进一步提升。</li>
<li>后续版本(AVX2、AVX-512)支持更复杂的操作和更宽的寄存器(512位)。</li>
</ul>
</li>
</ul>
<h5 id="1-3-1-4_AVX-512">1.3.1.4 AVX-512</h5>
<ul>
<li><strong>推出时间</strong>:2016年</li>
<li><strong>寄存器宽度</strong>:512位</li>
<li><strong>数据类型</strong>:单精度浮点数(32位)、双精度浮点数(64位)、整数(8位、16位、32位、64位)</li>
<li><strong>特点</strong>:<ul>
<li>引入了32个512位寄存器(ZMM0-ZMM31)。</li>
<li>支持更复杂的操作,如掩码操作、广播操作等。</li>
<li>主要用于高性能计算和人工智能领域。</li>
</ul>
</li>
</ul>
<h4 id="1-3-2_ARM的SIMD指令集">1.3.2 ARM的SIMD指令集</h4>
<h5 id="1-3-2-1_NEON">1.3.2.1 NEON</h5>
<ul>
<li><strong>推出时间</strong>:2005年</li>
<li><strong>寄存器宽度</strong>:128位</li>
<li><strong>数据类型</strong>:单精度浮点数(32位)、整数(8位、16位、32位、64位)</li>
<li><strong>特点</strong>:<ul>
<li>广泛应用于移动设备和嵌入式系统。</li>
<li>支持16个128位寄存器(Q0-Q15)。</li>
<li>适用于多媒体处理、信号处理等场景。</li>
</ul>
</li>
</ul>
<h5 id="1-3-2-2_SVE(Scalable_Vector_Extension)">1.3.2.2 SVE(Scalable Vector Extension)</h5>
<ul>
<li><strong>推出时间</strong>:2016年</li>
<li><strong>寄存器宽度</strong>:可变(128位至2048位)</li>
<li><strong>数据类型</strong>:单精度浮点数(32位)、双精度浮点数(64位)、整数(8位、16位、32位、64位)</li>
<li><strong>特点</strong>:<ul>
<li>支持可变长度的向量操作,适应不同的硬件配置。</li>
<li>引入了谓词寄存器(Predicate Registers),支持条件执行。</li>
<li>主要用于高性能计算和机器学习。</li>
</ul>
</li>
</ul>
<h3 id="1-4_编译器内置函数">1.4 编译器内置函数</h3>
<p>大多数现代编译器(如GCC、Clang、MSVC)提供了SIMD指令集的内置函数,开发者可以通过这些函数调用SIMD指令,而无需编写汇编代码。</p>
<h3 id="1-5_自动向量化">1.5 自动向量化</h3>
<p>一些编译器支持自动向量化功能,能够自动将标量代码转换为SIMD代码。例如,使用GCC编译以下代码时,可以启用自动向量化:</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">gcc -O3 -mavx2 -o program program.c</div></pre></td></tr></table></figure>
<h2 id="2-_Go语言中的SIMD支持现状">2. Go语言中的SIMD支持现状</h2>
<h3 id="2-1_Go语言标准库的SIMD支持">2.1 Go语言标准库的SIMD支持</h3>
<p>Go语言的标准库尚未提供对SIMD的直接支持。Go语言的编译器(gc)也没有自动向量化功能,这意味着开发者无法像在C/C++中那样通过编译器自动生成SIMD代码。</p>
<p>在Issue <a href="https://github.com/golang/go/issues/67520" target="_blank" rel="external">#67520</a> 中,讨论依然磨磨唧唧,讨论时常偏离到实现的具体方式上(build tag)。</p>
<h3 id="2-2_第三方库与解决方案">2.2 第三方库与解决方案</h3>
<p>尽管Go语言标准库缺乏对SIMD的直接支持,但社区已经开发了一些第三方库和工具,帮助开发者在Go中使用SIMD指令集。在<a href="https://github.com/golang/go/issues/67520" target="_blank" rel="external">#67520</a>的讨论中,Clement Jean 也提供了一个概念化的实现方案:<a href="https://github.com/Clement-Jean/simd-go-POC" target="_blank" rel="external">simd-go-POC</a> 。</p>
<p>以下是一些第三方实现的(simd指令,不是基于simd实现的库sonic、simdjson-go等):</p>
<h4 id="2-2-1_kelindar/simd">2.2.1 <code>kelindar/simd</code></h4>
<p><a href="https://github.com/kelindar/simd" target="_blank" rel="external">kelindar/simd</a>这个库包含一组矢量化的数学函数,它们使用 clang 编译器自动矢量化,并转换为 Go 的 PLAN9 汇编代码。对于不支持矢量化的 CPU,或此库没有为其生成代码的 CPU,也提供了通用版本。</p>
<p>目前它仅支持 AVX2,但生成 AVX512 和 SVE (for ARM) 的代码应该很容易。这个库中的大部分代码都是自动生成的,这有助于维护。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">sum := simd.SumFloat32s([]<span class="typename">float32</span><span class="number">{1</span>,<span class="number"> 2</span>,<span class="number"> 3</span>,<span class="number"> 4</span>,<span class="number"> 5</span>})</div></pre></td></tr></table></figure>
<h4 id="2-2-2_alivanz/go-simd">2.2.2 <code>alivanz/go-simd</code></h4>
<p>[alivanz/go-simd](<a href="https://github.com/alivanz/go-simd)实现了" target="_blank" rel="external">https://github.com/alivanz/go-simd)实现了</a> Go 语言的 SIMD(单指令多数据)操作,专门针对 ARM NEON 架构进行了优化。其目标是为特定的计算任务提供优化的并行处理能力。<br>下面是一个加法和乘法的例子:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"log"</span></div><div class="line"></div><div class="line"> <span class="string">"github.com/alivanz/go-simd/arm"</span></div><div class="line"> <span class="string">"github.com/alivanz/go-simd/arm/neon"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> <span class="keyword">var</span> a, b arm.Int8X8</div><div class="line"> <span class="keyword">var</span> add, mul arm.Int16X8</div><div class="line"> <span class="keyword">for</span> i :=<span class="number"> 0</span>; i <<span class="number"> 8</span>; i++ {</div><div class="line"> a[i] = arm.Int8(i)</div><div class="line"> b[i] = arm.Int8(i * i)</div><div class="line"> }</div><div class="line"> log.Printf(<span class="string">"a = %+v"</span>, b)</div><div class="line"> log.Printf(<span class="string">"b = %+v"</span>, a)</div><div class="line"> neon.VaddlS8(&add, &a, &b)</div><div class="line"> neon.VmullS8(&mul, &a, &b)</div><div class="line"> log.Printf(<span class="string">"add = %+v"</span>, add)</div><div class="line"> log.Printf(<span class="string">"mul = %+v"</span>, mul)</div><div class="line">}</div></pre></td></tr></table></figure>
<h4 id="2-2-3_pehringer/simd">2.2.3 <code>pehringer/simd</code></h4>
<p><a href="https://github.com/pehringer/simd" target="_blank" rel="external">pehringer/simd</a> 通过 Go 汇编提供 SIMD 支持,实现了算术运算、位运算以及最大值和最小值运算。它允许进行并行的逐元素计算,从而带来 100% 到 400% 的速度提升。目前支持 AMD64 (x86_64) 和 ARM64 处理器。</p>
<h3 id="2-3_Go汇编与SIMD">2.3 Go汇编与SIMD</h3>
<p>Go语言支持通过汇编代码直接调用CPU指令集,这为SIMD的实现提供了可能。开发者可以编写Go汇编代码,调用特定的SIMD指令集(如SSE、AVX等),从而实现高性能的向量化计算。然而,编写和维护汇编代码对开发者提出了较高的要求,且代码的可移植性较差。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// 以下是一个简单的Go汇编示例,使用AVX指令集进行向量加法</span></div><div class="line">TEXT ·add(SB), <span class="number">$0</span><span class="number">-32</span></div><div class="line"> MOVQ a<span class="number">+0</span>(FP), DI</div><div class="line"> MOVQ b<span class="number">+8</span>(FP), SI</div><div class="line"> MOVQ result<span class="number">+16</span>(FP), DX</div><div class="line"> MOVQ <span class="built_in">len</span><span class="number">+24</span>(FP), CX</div><div class="line"> </div><div class="line"> TESTQ CX, CX ; 检查长度是否<span class="number">为0</span></div><div class="line"> JZ done ; 如果<span class="number">为0</span>直接返回</div><div class="line"> </div><div class="line"> MOVQ CX, R8 ; 保存原始长度</div><div class="line"> SHRQ <span class="number">$2</span>, CX ; 除<span class="number">以4</span>得到循环次数</div><div class="line"> JZ remainder ; 如果不<span class="number">足4</span>个元素,跳到处理余数</div><div class="line"> </div><div class="line"> XORQ R9, R9 ; 用于索引的计数器,<span class="number">从0</span>开始</div><div class="line">loop:</div><div class="line"> VMOVUPD (DI)(R9<span class="number">*8</span>), Y0</div><div class="line"> VMOVUPD (SI)(R9<span class="number">*8</span>), Y1</div><div class="line"> VADDPD Y0, Y1, Y0</div><div class="line"> VMOVUPD Y0, (DX)(R9<span class="number">*8</span>)</div><div class="line"> ADDQ <span class="number">$4</span>, R9</div><div class="line"> DECQ CX</div><div class="line"> JNZ loop</div><div class="line"></div><div class="line">remainder: ; 处理剩余的元素</div><div class="line"> ANDQ <span class="number">$3</span>, R8 ; 获取余数</div><div class="line"> JZ done</div><div class="line"> ; 这里添加处理余数的代码</div><div class="line"></div><div class="line">done:</div><div class="line"> RET</div></pre></td></tr></table></figure>
<p>当然需要a,b和 result 数组的地址是对齐的,以获得最佳性能。</p>
<h2 id="结论">结论</h2>
<p>尽管Go语言目前对SIMD的支持尚不完善,但社区已经通过第三方库和汇编代码提供了一些解决方案。未来,随着Go编译器的改进和标准库的支持(相信Go官方最终会支持的),Go语言在高性能计算领域的潜力将进一步释放。对于开发者而言,掌握SIMD技术将有助于编写更高效的Go代码,应对日益复杂的计算任务。</p>
]]></content>
<summary type="html">
<![CDATA[<p>单指令多数据流(<code>SIMD</code>,Single Instruction Multiple Data)是一种并行计算技术,允许一条指令同时处理多个数据点。SIMD在现代CPU中广泛应用,能够显著提升计算密集型任务的性能,如图像处理、机器学习、科学计算等。随着Go语言在高性能计算领域的应用逐渐增多,SIMD支持成为了开发者关注的焦点。</p>
<p>当前很多主流和新型的语言都有相应的<code>simd</code>库了,比如C++、Rust、Zig等,但Go语言的<code>simd</code>官方支持还一直在讨论中(<a href="https://github.com/golang/go/issues/67520" target="_blank" rel="external">issue#67520</a>)。Go语言的设计目标是简单性和可移植性,而SIMD的实现通常需要针对不同的硬件架构进行优化,这与Go的设计目标存在一定冲突。因此,Go语言对SIMD的支持一直备受争议。<br>最近几周这个issue的讨论有活跃起来, 希望能快点支持。</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[DeepSeek数据库暴露?扫描一下,应该不止此一家吧!]]></title>
<link href="https://colobu.com/2025/01/31/scan-clickhouse-service/"/>
<id>https://colobu.com/2025/01/31/scan-clickhouse-service/</id>
<published>2025-01-31T04:02:33.000Z</published>
<updated>2025-01-31T04:04:52.695Z</updated>
<content type="html"><![CDATA[<p>DeepSeek出街老火了,整个AI界都在热火朝天的讨论它。</p>
<p>同时,安全界也没闲着,来自美国的攻击使它不得不通知中国大陆以外的手机号的注册,同时大家也对它的网站和服务安全性进行了审视,这不Wiz Research就发现它们的数据库面向公网暴露并且无需任何身份即可访问。这两个域名oauth2callback.deepseek.com:9000和dev.deepseek.com:9000。</p>
<p>AI的核心技术既需要这些清北的天才去研究,产品也需要专业的人才去打磨。像DeepSeek这么专业的公司都可能出现这样的漏洞,相信互联网上这么数据库无密码暴露的实例也应该不在少数(实际只找到了2个)。</p>
<p>基于上一篇《扫描全国的公网IP要多久》,我们改造一下代码,让它使用 <code>tcp_syn</code> 的方式探测clickhopuse的9000端口。</p>
<p>首先声明,所有的技术都是为了给大家介绍使用Go语言开发底层的网络程序所做的演示,不是为了介绍安全和攻击方面的内容,所以也不会使用已经成熟的端口和IP扫描工具如zmap、rustscan、nmap、masscan、Advanced IP Scanner、Angry IP Scanner、unicornscan等工具。</p>
<p>同时,也不会追求快速,我仅仅在家中的100M的网络中,使用一台10多年前的4核Linux机器进行测试,尽可能让它能出结果。我一般晚上启动它,早上吃过早餐后来查看结果。</p>
<a id="more"></a>
<p>我想把这个实验分成两部分:</p>
<ol>
<li>寻找中国大陆暴露9000端口的公网IP</li>
<li>检查这些IP是否是部署clickhouse并可以无密码的访问</li>
</ol>
<p>接下来先介绍第一部分。</p>
<h2 id="寻找暴露端口9000的IP">寻找暴露端口9000的IP</h2>
<p>我们需要将上一篇的代码改造,让它使用TCP进行扫描,而不是ICMP扫描,而且我们只扫描9000端口。</p>
<p>为了更有效的扫描,我做了以下的优化:</p>
<ol>
<li>使用ICMP扫描出来的可用IP, 一共五千多万</li>
<li>使用tcp sync模拟TCP建联是的握手,这样目的服务器会回一个sync+ack的包</li>
<li>同时探测机自动回复一个RST, 我们也别老挂着目的服务器,怪不好意思的,及时告诉人家别等着咱了</li>
</ol>
<p>同样的,我们也定义一个<code>TCPScanner</code>结构体,用来使用TCP握手来进行探测。如果你已经阅读了前一篇文章,应该对我们实现的套路有所了解。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> fishfinding</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"net"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line"> <span class="string">"time"</span></div><div class="line"></div><div class="line"> <span class="string">"github.com/kataras/golog"</span></div><div class="line"> <span class="string">"golang.org/x/net/bpf"</span></div><div class="line"> <span class="string">"golang.org/x/net/ipv4"</span></div><div class="line">)</div><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">type</span> TCPScanner <span class="keyword">struct</span> {</div><div class="line"> src net.IP</div><div class="line"> srcPort <span class="typename">int</span></div><div class="line"> dstPort <span class="typename">int</span></div><div class="line"> input <span class="keyword">chan</span> <span class="typename">string</span></div><div class="line"> output <span class="keyword">chan</span> <span class="typename">string</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> NewTCPScanner(srcPort, dstPort <span class="typename">int</span>, input <span class="keyword">chan</span> <span class="typename">string</span>, output <span class="keyword">chan</span> <span class="typename">string</span>) *TCPScanner {</div><div class="line"> localIP := GetLocalIP()</div><div class="line"> s := &TCPScanner{</div><div class="line"> input: input,</div><div class="line"> output: output,</div><div class="line"> src: net.ParseIP(localIP).To4(),</div><div class="line"> srcPort: srcPort,</div><div class="line"> dstPort: dstPort,</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> s</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> (s *TCPScanner) Scan() {</div><div class="line"> <span class="keyword">go</span> s.recv()</div><div class="line"> <span class="keyword">go</span> s.send(s.input)</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这里定义了一个<code>TCPScanner</code>结构体,它有一个<code>Scan</code>方法,用来启动接收和发送两个goroutine。接收goroutine用来接收目标服务器的回复(sync+ack 包),发送goroutine用来发送TCP sync包。</p>
<h3 id="发送逻辑">发送逻辑</h3>
<p>发送goroutine首先通过<code>net.ListenPacket</code>创建一个原始套接字,这里使用的是<code>ip4:tcp</code>,然后发送TCP的包就可以了。</p>
<p>我并没有使用gopacket这个库来构造TCP包,而是自己构造了TCP包,因为我觉得gopacket这个库太重了,而且我只需要构造TCP包,所以自己构造一个TCP包也不是很难。</p>
<p>seq数我们使用了当前进程的PID,这样在接收到回包的时候,还可以使用这个seq数来判断是不是我们发送的回包。</p>
<p>注意这里我们要计算tcp包的checksum, 并没有利用网卡的TCP/IP Checksum Offload功能,而是自己计算checksum,原因在于我的机的网卡很古老了,没有这个功能。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (s *TCPScanner) send(input <span class="keyword">chan</span> <span class="typename">string</span>) error {</div><div class="line"> <span class="keyword">defer</span> <span class="keyword">func</span>() {</div><div class="line"> time.Sleep<span class="number">(5</span> * time.Second)</div><div class="line"> <span class="built_in">close</span>(s.output)</div><div class="line"> golog.Infof(<span class="string">"send goroutine exit"</span>)</div><div class="line"> }()</div><div class="line"></div><div class="line"> <span class="comment">// 创建原始套接字</span></div><div class="line"> conn, err := net.ListenPacket(<span class="string">"ip4:tcp"</span>, s.src.To4().String())</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> conn.Close()</div><div class="line"></div><div class="line"> pconn := ipv4.NewPacketConn(conn)</div><div class="line"> <span class="comment">// 不接收数据</span></div><div class="line"> filter := createEmptyFilter()</div><div class="line"> <span class="keyword">if</span> assembled, err := bpf.Assemble(filter); err == <span class="constant">nil</span> {</div><div class="line"> pconn.SetBPF(assembled)</div><div class="line"> }</div><div class="line"></div><div class="line"> seq := <span class="typename">uint32</span>(os.Getpid())</div><div class="line"> <span class="keyword">for</span> ip := <span class="keyword">range</span> input {</div><div class="line"> dstIP := net.ParseIP(ip)</div><div class="line"> <span class="keyword">if</span> dstIP == <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"failed to resolve IP address %s"</span>, ip)</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 构造 TCP SYN 包</span></div><div class="line"> tcpHeader := &TCPHeader{</div><div class="line"> Source: <span class="typename">uint16</span>(s.srcPort), <span class="comment">// 源端口</span></div><div class="line"> Destination: <span class="typename">uint16</span>(s.dstPort), <span class="comment">// 目标端口(这里探测80端口)</span></div><div class="line"> SeqNum: seq,</div><div class="line"> AckNum: <span class="number"> 0</span>,</div><div class="line"> Flags: <span class="number"> 0</span>x002, <span class="comment">// SYN</span></div><div class="line"> Window: <span class="number"> 65535</span>,</div><div class="line"> Checksum: <span class="number"> 0</span>,</div><div class="line"> Urgent: <span class="number"> 0</span>,</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 计算校验和</span></div><div class="line"> tcpHeader.Checksum = tcpChecksum(tcpHeader, s.src, dstIP)</div><div class="line"></div><div class="line"> <span class="comment">// 序列化 TCP 头</span></div><div class="line"> packet := tcpHeader.Marshal()</div><div class="line"></div><div class="line"> <span class="comment">// 发送 TCP SYN 包</span></div><div class="line"> _, err = conn.WriteTo(packet, &net.IPAddr{IP: dstIP})</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"failed to send TCP packet: %v"</span>, err)</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="接收逻辑">接收逻辑</h3>
<p>接收goroutine首先创建一个原始套接字,使用<code>net.ListenIP</code>,然后使用<code>ipv4.NewPacketConn</code>来创建一个<code>ipv4.PacketConn</code>,并设置<code>ipv4.FlagSrc|ipv4.FlagDst|ipv4.FlagInterface</code>,这样可以获取到源IP、目标IP和接口信息。<br>这里必须设置<code>ipv4.FlagSrc|ipv4.FlagDst|ipv4.FlagInterface</code>, 否则不能获取到目标服务器的IP。<code>pv4.FlagDst</code>到是不需要的。</p>
<p>接收到数据后,我们解析TCP头,然后判断是否是我们发送的包,如果是我们发送的包,我们就将目标IP发送到<code>output</code>通道。</p>
<p>如果是我们发送的回包,我们就判断是否是SYN+ACK包,同时判断ACK是否和我们发送的seq对应,如果是,我们就将目标IP发送到<code>output</code>通道。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (s *TCPScanner) recv() error {</div><div class="line"> <span class="keyword">defer</span> <span class="built_in">recover</span>()</div><div class="line"></div><div class="line"> <span class="comment">// 创建原始套接字</span></div><div class="line"> conn, err := net.ListenIP(<span class="string">"ip4:tcp"</span>, &net.IPAddr{IP: s.src})</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> conn.Close()</div><div class="line"></div><div class="line"> pconn := ipv4.NewPacketConn(conn)</div><div class="line"> <span class="keyword">if</span> err := pconn.SetControlMessage(ipv4.FlagSrc|ipv4.FlagDst|ipv4.FlagInterface, <span class="constant">true</span>); err != <span class="constant">nil</span> {</div><div class="line"> golog.Fatalf(<span class="string">"set control message error: %v\n"</span>, err)</div><div class="line"> }</div><div class="line"></div><div class="line"> seq := <span class="typename">uint32</span>(os.Getpid()) +<span class="number"> 1</span></div><div class="line"></div><div class="line"> buf := <span class="built_in">make</span>([]<span class="typename">byte</span>,<span class="number"> 1024</span>)</div><div class="line"> <span class="keyword">for</span> {</div><div class="line"> n, peer, err := conn.ReadFrom(buf)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"failed to read: %v"</span>, err)</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> n < tcpHeaderLength {</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 解析 TCP 头</span></div><div class="line"> tcpHeader := ParseTCPHeader(buf[:n])</div><div class="line"></div><div class="line"> <span class="keyword">if</span> tcpHeader.Destination != <span class="typename">uint16</span>(s.srcPort) || tcpHeader.Source != <span class="typename">uint16</span>(s.dstPort) {</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// golog.Info("peer: %s, flags: %d", peer.String(), tcpHeader.Flags)</span></div><div class="line"></div><div class="line"> <span class="comment">// 检查是否是 SYN+ACK, 同时检查ACK是否和发送的seq对应</span></div><div class="line"> <span class="keyword">if</span> tcpHeader.Flags ==<span class="number"> 0</span>x012 && tcpHeader.AckNum == seq { <span class="comment">// SYN + ACK</span></div><div class="line"> s.output <- peer.String()</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>完整的代码在<a href="https://github.com/smallnest/fishfinder" target="_blank" rel="external">这里</a>。</p>
<p>最终我把可以连接端口9000的IP保存到了一个文件中,一共有970+个IP。</p>
<h2 id="检查没有身份验证clickhouse">检查没有身份验证clickhouse</h2>
<p>接下来我们要检查这些IP是否是clickhouse的服务,而且没有身份验证。</p>
<p>使用类似的方法,我们定义一个<code>ClickHouseChecker</code>结构体,用来检查这些IP是否是clickhouse的服务。</p>
<p>它会尝试使用这些IP和9000建立和clickhouse的连接,如果连接成功,并且调用<code>Ping()</code>方法成功,我们就认为这个IP是clickhouse的服务。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div><div class="line">80</div><div class="line">81</div><div class="line">82</div><div class="line">83</div><div class="line">84</div><div class="line">85</div><div class="line">86</div><div class="line">87</div><div class="line">88</div><div class="line">89</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> fishfinding</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"context"</span></div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"runtime"</span></div><div class="line"> <span class="string">"sync"</span></div><div class="line"> <span class="string">"time"</span></div><div class="line"></div><div class="line"> <span class="string">"github.com/ClickHouse/clickhouse-go/v2"</span></div><div class="line"> _ <span class="string">"github.com/ClickHouse/clickhouse-go/v2"</span></div><div class="line"> <span class="string">"github.com/kataras/golog"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">type</span> ClickHouseChecker <span class="keyword">struct</span> {</div><div class="line"> wg *sync.WaitGroup</div><div class="line"> port <span class="typename">int</span></div><div class="line"></div><div class="line"> input <span class="keyword">chan</span> <span class="typename">string</span></div><div class="line"> output <span class="keyword">chan</span> <span class="typename">string</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> NewClickHouseChecker(port <span class="typename">int</span>, input <span class="keyword">chan</span> <span class="typename">string</span>, output <span class="keyword">chan</span> <span class="typename">string</span>, wg *sync.WaitGroup) *ClickHouseChecker {</div><div class="line"> s := &ClickHouseChecker{</div><div class="line"> port: port,</div><div class="line"> input: input,</div><div class="line"> output: output,</div><div class="line"> wg: wg,</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> s</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> (s *ClickHouseChecker) Check() {</div><div class="line"> parallel := runtime.NumCPU()</div><div class="line"></div><div class="line"> <span class="keyword">for</span> i :=<span class="number"> 0</span>; i < parallel; i++ {</div><div class="line"> s.wg.Add<span class="number">(1</span>)</div><div class="line"> <span class="keyword">go</span> s.check()</div><div class="line"> }</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> (s *ClickHouseChecker) check() {</div><div class="line"> <span class="keyword">defer</span> s.wg.Done()</div><div class="line"></div><div class="line"> <span class="keyword">for</span> ip := <span class="keyword">range</span> s.input {</div><div class="line"> <span class="keyword">if</span> ip == <span class="string">"splitting"</span> || ip == <span class="string">"failed"</span> {</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> isClickHouse(ip, s.port) {</div><div class="line"> s.output <- ip</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> isClickHouse(ip <span class="typename">string</span>, port <span class="typename">int</span>) <span class="typename">bool</span> {</div><div class="line"> conn, err := clickhouse.Open(&clickhouse.Options{</div><div class="line"> Addr: []<span class="typename">string</span>{fmt.Sprintf(<span class="string">"%s:%d"</span>, ip, port)},</div><div class="line"> <span class="comment">// Auth: clickhouse.Auth{</span></div><div class="line"> <span class="comment">// Database: "default",</span></div><div class="line"> <span class="comment">// Username: "default",</span></div><div class="line"> <span class="comment">// Password: "",</span></div><div class="line"> <span class="comment">// },</span></div><div class="line"> Settings: clickhouse.Settings{</div><div class="line"> <span class="string">"max_execution_time"</span>:<span class="number"> 1</span>,</div><div class="line"> },</div><div class="line"> DialTimeout: time.Second,</div><div class="line"> MaxOpenConns: <span class="number"> 1</span>,</div><div class="line"> MaxIdleConns: <span class="number"> 1</span>,</div><div class="line"> ConnMaxLifetime: time.Duration<span class="number">(1</span>) * time.Minute,</div><div class="line"> ConnOpenStrategy: clickhouse.ConnOpenInOrder,</div><div class="line"> BlockBufferSize: <span class="number"> 10</span>,</div><div class="line"> MaxCompressionBuffer:<span class="number"> 1024</span>,</div><div class="line"> })</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"open %s:%d failed: %v"</span>, ip, port, err)</div><div class="line"> <span class="keyword">return</span> <span class="constant">false</span></div><div class="line"> }</div><div class="line"></div><div class="line"> ctx, cancel := context.WithTimeout(context.Background(), time.Second)</div><div class="line"> <span class="keyword">defer</span> cancel()</div><div class="line"> err = conn.Ping(ctx)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Warnf(<span class="string">"ping %s:%d failed: %v"</span>, ip, port, err)</div><div class="line"> <span class="keyword">return</span> <span class="constant">false</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="constant">true</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>实际扫描下来,几乎所有的IP的9000端口都连接超时或者不是clickhouse服务,只有4个IP是clickhouse服务,但是需要身份验证。,报错<code>default: Authentication failed: password is incorrect, or there is no user with such name.</code></p>
<p>挺好的一件事情,至少公网暴露的clickhouse服务都是需要身份验证的。</p>
<p>当然也有可能是clickhouse的服务端配置了IP白名单,只允许内网访问,这样的话我们就无法访问了。也可能是clickhouse的端口改成了其他端口,我们无法访问。</p>
<h2 id="有必要扫描一下全网的IP和它们的9000端口了">有必要扫描一下全网的IP和它们的9000端口了</h2>
<p>使用既有的程序即可。我们先拉取全网的网段信息。</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">wget -c -O- http://ftp.apnic.net/stats/apnic/delegated-apnic-latest | awk -F <span class="string">'|'</span> <span class="string">'/ipv4/ {print $4 "/" 32-log($5)/log(2)}'</span> | cat > ipv4.txt</div></pre></td></tr></table></figure>
<p>先用<code>icmp_scan</code>扫描一下公网课访问的IP地址:</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line">......</div><div class="line"></div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">250.221</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">233.1</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">240.91</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">233.10</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">233.15</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">233.11</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">233.115</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> <span class="number">223.255</span>.<span class="number">233.100</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> send goroutine <span class="keyword">exit</span></div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">03</span>:<span class="number">56</span> total: <span class="number">884686592</span>, alive: <span class="number">15500888</span>, time: <span class="number">2</span>h35m28.<span class="number">930123788</span>s</div></pre></td></tr></table></figure>
<p>一共8亿多个IP,可以ping的通的有1500多万个,耗时2小时扫描完。</p>
<blockquote>
<p>根据网友在上一篇的留言反馈,光美国就有8亿多个IP。<br>我问deepseek,全球有37亿个IP,美国有9亿个,这个数量才合理,我自己扫描的8亿要远远少于这个数量。而且活跃的IP我感觉应该远远大于1500多万。<br>但是这些不重要了,我要做的就是能扫描到可以免密登录的clickhouse服务,看看这些IP里面有没有。</p>
</blockquote>
<p>接下来我们使用<code>tcp_scan</code>扫描这些IP的9000端口:</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div></pre></td><td class="code"><pre><div class="line">......</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.197</span>.<span class="number">222.126</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.197</span>.<span class="number">219.60</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.220</span>.<span class="number">171.218</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.221</span>.<span class="number">238.176</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.197</span>.<span class="number">235.26</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.197</span>.<span class="number">225.240</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.197</span>.<span class="number">225.208</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> <span class="number">223.197</span>.<span class="number">219.139</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> send goroutine <span class="keyword">exit</span></div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">08</span>:<span class="number">47</span> total: <span class="number">15500890</span>, alive: <span class="number">3953</span>, time: <span class="number">2</span>m41.<span class="number">23585658</span>s</div></pre></td></tr></table></figure>
<p>在这1500多万个IP中,有3953个IP的9000端口是可以访问的,但是都需要验证能不能进行clickhouse的操作,我们需要进一步检查。</p>
<p>接下来我们使用<code>clickhouse_check</code>检查这些IP是否是clickhouse服务:</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line">......</div><div class="line"></div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.197</span>.<span class="number">222.126</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">53494</span>-><span class="number">223.197</span>.<span class="number">222.126</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.197</span>.<span class="number">219.60</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">49718</span>-><span class="number">223.197</span>.<span class="number">219.60</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.221</span>.<span class="number">238.176</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">56662</span>-><span class="number">223.221</span>.<span class="number">238.176</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.197</span>.<span class="number">235.26</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">47676</span>-><span class="number">223.197</span>.<span class="number">235.26</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping send:<span class="number">9000</span> failed: dial tcp: lookup send on <span class="number">127.0</span>.<span class="number">0.53</span>:<span class="number">53</span>: server misbehaving</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping total::<span class="number">9000</span> failed: dial tcp: address total::<span class="number">9000</span>: too many colons <span class="keyword">in</span> address</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.197</span>.<span class="number">225.240</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">55342</span>-><span class="number">223.197</span>.<span class="number">225.240</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.197</span>.<span class="number">225.208</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">43300</span>-><span class="number">223.197</span>.<span class="number">225.208</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[WARN] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> ping <span class="number">223.197</span>.<span class="number">219.139</span>:<span class="number">9000</span> failed: <span class="built_in">read</span>: <span class="built_in">read</span> tcp <span class="number">192.168</span>.<span class="number">1.5</span>:<span class="number">57552</span>-><span class="number">223.197</span>.<span class="number">219.139</span>:<span class="number">9000</span>: i/o timeout</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">31</span> <span class="number">11</span>:<span class="number">47</span> total: <span class="number">2</span>, time: <span class="number">4</span>m20.<span class="number">744235925</span>s</div></pre></td></tr></table></figure>
<p>4分钟完成。最终还是真的发现有两个IP的9000端口是clickhouse服务,而且不需要密码验证。</p>
<p>类似的我们还可以验证Redis、Mysql等服务的安全性。</p>
]]></content>
<summary type="html">
<![CDATA[<p>DeepSeek出街老火了,整个AI界都在热火朝天的讨论它。</p>
<p>同时,安全界也没闲着,来自美国的攻击使它不得不通知中国大陆以外的手机号的注册,同时大家也对它的网站和服务安全性进行了审视,这不Wiz Research就发现它们的数据库面向公网暴露并且无需任何身份即可访问。这两个域名oauth2callback.deepseek.com:9000和dev.deepseek.com:9000。</p>
<p>AI的核心技术既需要这些清北的天才去研究,产品也需要专业的人才去打磨。像DeepSeek这么专业的公司都可能出现这样的漏洞,相信互联网上这么数据库无密码暴露的实例也应该不在少数(实际只找到了2个)。</p>
<p>基于上一篇《扫描全国的公网IP要多久》,我们改造一下代码,让它使用 <code>tcp_syn</code> 的方式探测clickhopuse的9000端口。</p>
<p>首先声明,所有的技术都是为了给大家介绍使用Go语言开发底层的网络程序所做的演示,不是为了介绍安全和攻击方面的内容,所以也不会使用已经成熟的端口和IP扫描工具如zmap、rustscan、nmap、masscan、Advanced IP Scanner、Angry IP Scanner、unicornscan等工具。</p>
<p>同时,也不会追求快速,我仅仅在家中的100M的网络中,使用一台10多年前的4核Linux机器进行测试,尽可能让它能出结果。我一般晚上启动它,早上吃过早餐后来查看结果。</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[趁着假期, 快速了解 Go io/fs 包]]></title>
<link href="https://colobu.com/2025/01/30/some-notes-about-go-io-fs-package/"/>
<id>https://colobu.com/2025/01/30/some-notes-about-go-io-fs-package/</id>
<published>2025-01-29T16:44:01.000Z</published>
<updated>2025-01-29T17:30:00.416Z</updated>
<content type="html"><![CDATA[<p>Go 语言的 <code>io/fs</code> 包是 Go 1.16 版本引入的一个标准库包,它定义了文件系统的抽象接口。这个包提供了一种统一的方式来访问<strong>不同类型的文件系统</strong>,包括本地文件系统、内存文件系统、zip 文件等。</p>
<a id="more"></a>
<h3 id="io/fs_包的主要作用"><code>io/fs</code> 包的主要作用</h3>
<ul>
<li><strong>抽象文件系统:</strong> <code>io/fs</code> 包定义了一组接口,用于描述文件系统的基本操作,如打开文件、读取目录等。通过这些接口,我们可以编写与具体文件系统无关的代码。</li>
<li><strong>统一访问方式:</strong> 无论底层文件系统是什么类型,只要实现了 <code>io/fs</code> 包定义的接口,就可以使用相同的代码进行访问。</li>
<li><strong>提高代码可测试性:</strong> 通过使用 <code>io/fs</code> 包,我们可以方便地mock文件系统,从而提高代码的可测试性。</li>
</ul>
<h3 id="io/fs_包的核心接口"><code>io/fs</code> 包的核心接口</h3>
<ul>
<li><strong><code>fs.FS</code>:</strong> 表示一个文件系统,定义了打开文件的方法 <code>Open</code>。</li>
<li><strong><code>fs.File</code>:</strong> 表示一个打开的文件,定义了读取、写入、关闭等方法。</li>
<li><strong><code>fs.FileInfo</code>:</strong> 表示文件的元信息,包括文件名、大小、修改时间等。</li>
<li><code>fs.DirEntry</code> 接口表示一个目录项,它可以是文件或子目录。</li>
<li><code>fs.FileInfo</code> 接口表示文件的元信息。</li>
<li><code>fs.FileMode</code> 类型表示文件的权限和类型,它是一个位掩码。</li>
</ul>
<p>还有一些基于<code>fs.FS</code>、<code>fs.File</code>等接口扩展的一些接口:</p>
<ul>
<li><code>fs.GlobFS</code> 接口扩展了 <code>fs.FS</code> 接口,增加了 <code>Glob(pattern string) ([]string, error)</code> 方法。该方法允许使用通配符模式匹配文件和目录。</li>
<li><code>fs.ReadDirFS</code> 接口也扩展了 <code>fs.FS</code> 接口,增加了 <code>ReadDir(name string) ([]fs.DirEntry, error)</code> 方法。该方法用于读取指定目录下的所有文件和子目录。</li>
<li><code>fs.ReadDirFile</code> 接口扩展了 <code>fs.File</code> 接口,增加了 <code>ReadDir(n int) ([]fs.DirEntry, error)</code> 方法。这个接口主要用于读取目录文件中的内容,返回一个 <code>fs.DirEntry</code> 列表。它通常用于实现了 <code>fs.ReadDirFS</code> 的文件系统。</li>
<li><code>fs.ReadFileFS</code> 接口扩展了 <code>fs.FS</code> 接口,增加了 <code>ReadFile(name string) ([]byte, error)</code> 方法。这个接口允许直接读取指定文件的全部内容,返回字节切片。 它提供了一种更便捷的方式来读取文件内容,避免了先打开文件再读取的步骤。</li>
<li><code>fs.StatFS</code> 接口也扩展了 <code>fs.FS</code> 接口,增加了 <code>Stat(name string) (fs.FileInfo, error)</code> 方法。该方法用于获取指定文件的元信息,返回一个 <code>fs.FileInfo</code> 对象。</li>
<li><code>fs.SubFS</code> 接口也扩展了 <code>fs.FS</code> 接口,增加了 <code>Sub(dir string) (fs.FS, error)</code> 方法。该方法用于创建一个新的文件系统,它表示原始文件系统的一个子目录。这在需要限制访问文件系统的特定部分时非常有用。</li>
<li><code>fs.WalkDirFunc</code> 类型定义了一个函数签名,用于 <code>fs.WalkDir</code> 函数的回调。</li>
</ul>
<h3 id="io/fs_包的应用场景"><code>io/fs</code> 包的应用场景</h3>
<ul>
<li><strong>访问不同类型的文件系统:</strong> 可以使用相同的代码访问本地文件系统、内存文件系统、zip 文件等。</li>
<li><strong>测试代码:</strong> 可以方便地mock文件系统,从而提高代码的可测试性。</li>
<li><strong>嵌入资源:</strong> 可以将静态资源嵌入到程序中,并使用 <code>io/fs</code> 包进行访问。</li>
</ul>
<h3 id="示例代码">示例代码</h3>
<h4 id="示例代码一:fs-FS_接口">示例代码一:<code>fs.FS</code> 接口</h4>
<p><code>fs.FS</code> 接口是 <code>io/fs</code> 包的核心,它表示一个文件系统。最常见的实现是 <code>os.DirFS</code>,它表示本地文件系统的一个目录。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> <span class="comment">// 创建一个表示当前目录的文件系统</span></div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> <span class="comment">// 打开一个文件</span></div><div class="line"> f, err := fsys.Open(<span class="string">"README.md"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> f.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 读取文件内容</span></div><div class="line"> data := <span class="built_in">make</span>([]<span class="typename">byte</span>,<span class="number"> 100</span>)</div><div class="line"> n, err := f.Read(data)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="typename">string</span>(data[:n]))</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 <code>os.DirFS</code> 创建一个文件系统,然后使用 <code>fsys.Open</code> 方法打开一个文件并读取其内容。</p>
<h4 id="示例代码二:fs-File_接口">示例代码二:<code>fs.File</code> 接口</h4>
<p><code>fs.File</code> 接口表示一个打开的文件,它提供了读取、写入、关闭等方法。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> f, err := fsys.Open(<span class="string">"README.md"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> f.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 获取文件信息</span></div><div class="line"> info, err := f.Stat()</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="string">"File size:"</span>, info.Size())</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 f.Stat 方法获取文件的元信息。</p>
<h4 id="示例代码三:fs-DirEntry_接口">示例代码三:<code>fs.DirEntry</code> 接口</h4>
<p><code>fs.DirEntry</code> 接口表示一个目录项,它可以是文件或子目录。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> entries, err := fs.ReadDir(fsys, <span class="string">"."</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">for</span> _, entry := <span class="keyword">range</span> entries {</div><div class="line"> fmt.Println(<span class="string">"Name:"</span>, entry.Name())</div><div class="line"> fmt.Println(<span class="string">"Is directory:"</span>, entry.IsDir())</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 <code>fs.ReadDir</code> 函数读取目录中的所有条目,并使用 <code>entry.Name</code> 和 <code>entry.IsDir</code> 方法获取条目的名称和类型。</p>
<h4 id="示例代码四:fs-GlobFS_接口">示例代码四:<code>fs.GlobFS</code> 接口</h4>
<p><code>fs.GlobFS</code> 接口扩展了 <code>fs.FS</code> 接口,增加了 <code>Glob</code> 方法,允许使用通配符模式匹配文件和目录。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> <span class="keyword">if</span> globFS, ok := fsys.(fs.GlobFS); ok {</div><div class="line"> matches, err := globFS.Glob(<span class="string">"*.go"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="string">"Go files:"</span>, matches)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 fs.Glob 函数查找所有以 .go 结尾的文件。</p>
<h4 id="示例代码五:fs-ReadDirFS_接口">示例代码五:<code>fs.ReadDirFS</code> 接口</h4>
<p><code>fs.ReadDirFS</code> 接口也扩展了 <code>fs.FS</code> 接口,增加了 <code>ReadDir</code> 方法,用于读取指定目录下的所有文件和子目录。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> <span class="keyword">if</span> readDirFS, ok := fsys.(fs.ReadDirFS); ok {</div><div class="line"> entries, err := readDirFS.ReadDir(<span class="string">"."</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="string">"Directory contents:"</span>)</div><div class="line"> <span class="keyword">for</span> _, entry := <span class="keyword">range</span> entries {</div><div class="line"> fmt.Println(entry.Name())</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 <code>fs.ReadDir</code> 函数读取目录中的所有条目。</p>
<h4 id="示例代码六:fs-SubFS_接口">示例代码六:<code>fs.SubFS</code> 接口</h4>
<p><code>fs.SubFS</code> 接口也扩展了 <code>fs.FS</code> 接口,增加了 <code>Sub</code> 方法,用于创建一个新的文件系统,它表示原始文件系统的一个子目录。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> <span class="keyword">if</span> subFS, ok := fsys.(fs.SubFS); ok {</div><div class="line"> sub, err := subFS.Sub(<span class="string">"subdir"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="string">"Sub directory contents:"</span>)</div><div class="line"> entries, err := fs.ReadDir(sub, <span class="string">"."</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">for</span> _, entry := <span class="keyword">range</span> entries {</div><div class="line"> fmt.Println(entry.Name())</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 <code>fs.Sub</code> 函数创建一个表示子目录的文件系统,并读取其内容。</p>
<h4 id="示例代码七:fs-WalkDirFunc_接口">示例代码七:<code>fs.WalkDirFunc</code> 接口</h4>
<p><code>fs.WalkDirFunc</code> 类型定义了一个函数签名,用于 <code>fs.WalkDir</code> 函数的回调。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> fsys := os.DirFS(<span class="string">"."</span>)</div><div class="line"></div><div class="line"> err := fs.WalkDir(fsys, <span class="string">"."</span>, <span class="keyword">func</span>(path <span class="typename">string</span>, d fs.DirEntry, err error) error {</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">return</span> err</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="string">"Walking:"</span>, path)</div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line"> })</div><div class="line"></div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 fs.WalkDir 函数遍历目录,并使用 fs.WalkDirFunc 函数打印每个文件和目录的路径。</p>
<h3 id="那些有趣的文件系统">那些有趣的文件系统</h3>
<h4 id="内存文件系统">内存文件系统</h4>
<p>内存文件系统是一种虚拟文件系统,它将文件存储在内存中而不是磁盘上。内存文件系统通常用于临时存储数据,或者用于测试和调试目的。<br>这种文件系统速度非常快,但数据在程序退出后会丢失。Go 语言的 <code>testing/fstest</code> 包提供了一个 <code>MapFS</code> 包,可以方便地创建内存文件系统。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"> <span class="string">"os"</span></div><div class="line"> <span class="string">"testing/fstest"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> <span class="comment">// 创建一个内存文件系统</span></div><div class="line"> fsys := fstest.MapFS{</div><div class="line"> <span class="string">"file1.txt"</span>: {Data: []<span class="typename">byte</span>(<span class="string">"Hello, world!"</span>)},</div><div class="line"> <span class="string">"dir1/file2.txt"</span>: {Data: []<span class="typename">byte</span>(<span class="string">"This is file2."</span>)},</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 打开一个文件</span></div><div class="line"> f, err := fsys.Open(<span class="string">"file1.txt"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> f.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 读取文件内容</span></div><div class="line"> data := <span class="built_in">make</span>([]<span class="typename">byte</span>,<span class="number"> 100</span>)</div><div class="line"> n, err := f.Read(data)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="typename">string</span>(data[:n]))</div><div class="line"></div><div class="line"> <span class="comment">// 遍历文件系统</span></div><div class="line"> err = fs.WalkDir(fsys, <span class="string">"."</span>, <span class="keyword">func</span>(path <span class="typename">string</span>, d fs.DirEntry, err error) error {</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">return</span> err</div><div class="line"> }</div><div class="line"> fmt.Println(<span class="string">"Walking:"</span>, path)</div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line"> })</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<p>也有一些第三方的库实现了内存文件系统,比如<a href="https://github.com/psanford/memfs" target="_blank" rel="external">psanford/memfs</a>,这是一个它的例子:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"></div><div class="line"> <span class="string">"github.com/psanford/memfs"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> rootFS := memfs.New()</div><div class="line"></div><div class="line"> err := rootFS.MkdirAll(<span class="string">"dir1/dir2"</span>,<span class="number"> 0777</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="built_in">panic</span>(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> err = rootFS.WriteFile(<span class="string">"dir1/dir2/f1.txt"</span>, []<span class="typename">byte</span>(<span class="string">"incinerating-unsubstantial"</span>),<span class="number"> 0755</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="built_in">panic</span>(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> err = fs.WalkDir(rootFS, <span class="string">"."</span>, <span class="keyword">func</span>(path <span class="typename">string</span>, d fs.DirEntry, err error) error {</div><div class="line"> fmt.Println(path)</div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line"> })</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="built_in">panic</span>(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> content, err := fs.ReadFile(rootFS, <span class="string">"dir1/dir2/f1.txt"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="built_in">panic</span>(err)</div><div class="line"> }</div><div class="line"> fmt.Printf(<span class="string">"%s\n"</span>, content)</div><div class="line">}</div></pre></td></tr></table></figure>
<h4 id="嵌入式文件系统">嵌入式文件系统</h4>
<p>嵌入式文件系统将文件嵌入到程序中,这样可以方便地将静态资源打包到程序中。Go 语言标准库提供了一个 <code>embed</code> 包,可以方便地创建嵌入式文件系统。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"embed"</span></div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="comment">//go:embed static</span></div><div class="line"><span class="keyword">var</span> staticFiles embed.FS</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> <span class="comment">// 打开一个嵌入的文件</span></div><div class="line"> f, err := staticFiles.Open(<span class="string">"static/file1.txt"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> f.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 读取文件内容</span></div><div class="line"> data := <span class="built_in">make</span>([]<span class="typename">byte</span>,<span class="number"> 100</span>)</div><div class="line"> n, err := f.Read(data)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> fmt.Println(<span class="typename">string</span>(data[:n]))</div><div class="line"></div><div class="line"></div><div class="line"> <span class="comment">// 遍历嵌入式文件系统</span></div><div class="line"> err = fs.WalkDir(staticFiles, <span class="string">"static"</span>, <span class="keyword">func</span>(path <span class="typename">string</span>, d fs.DirEntry, err error) error {</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">return</span> err</div><div class="line"> }</div><div class="line"> fmt.Println(<span class="string">"Walking:"</span>, path)</div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line"> })</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>这个例子展示了如何使用</code>embed.FS<code>类型创建一个嵌入式文件系统,并使用</code>staticFiles.Open` 方法打开一个嵌入的文件。</p>
<h4 id="云存储文件系统">云存储文件系统</h4>
<p>有一些第三方库提供了将 S3 存储桶挂载为本地文件系统的功能,这样我们就可以像访问本地文件一样访问 S3 文件。例如,<code>go-cloud</code> 库就提供了对多种云存储服务的统一访问接口,包括 S3。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"context"</span></div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"io/fs"</span></div><div class="line"> <span class="string">"log"</span></div><div class="line"></div><div class="line"> <span class="string">"gocloud.dev/blob"</span></div><div class="line"> _ <span class="string">"gocloud.dev/blob/gcs"</span> <span class="comment">// 引入 GCS 驱动,如果使用其他云存储服务,请引入相应的驱动</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> <span class="comment">// 设置 S3 存储桶 URL</span></div><div class="line"> bucketURL := <span class="string">"gs://my-bucket"</span></div><div class="line"></div><div class="line"> <span class="comment">// 创建一个 blob.Bucket</span></div><div class="line"> bucket, err := blob.OpenBucket(context.Background(), bucketURL)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> bucket.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 创建一个 fs.FS</span></div><div class="line"> fsys := blob.NewFS(bucket)</div><div class="line"></div><div class="line"> <span class="comment">// 现在可以使用 fsys 进行文件操作</span></div><div class="line"> err = fs.WalkDir(fsys, <span class="string">"."</span>, <span class="keyword">func</span>(path <span class="typename">string</span>, d fs.DirEntry, err error) error {</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">return</span> err</div><div class="line"> }</div><div class="line"> fmt.Println(<span class="string">"Walking:"</span>, path)</div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line"> })</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这个例子展示了如何使用 <code>gocloud.dev/blob</code> 包将 google GCS 存储桶挂载为本地文件系统,并使用 <code>fs.WalkDir</code> 函数遍历存储桶中的文件。</p>
]]></content>
<summary type="html">
<![CDATA[<p>Go 语言的 <code>io/fs</code> 包是 Go 1.16 版本引入的一个标准库包,它定义了文件系统的抽象接口。这个包提供了一种统一的方式来访问<strong>不同类型的文件系统</strong>,包括本地文件系统、内存文件系统、zip 文件等。</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/tags/Go/"/>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[扫描全国的公网IP需要多久?]]></title>
<link href="https://colobu.com/2025/01/27/how-long-to-scan-all-IPs-of-cn/"/>
<id>https://colobu.com/2025/01/27/how-long-to-scan-all-IPs-of-cn/</id>
<published>2025-01-26T16:38:47.000Z</published>
<updated>2025-01-26T16:42:06.652Z</updated>
<content type="html"><![CDATA[<p>自从加入百度负责物理网络的监控业务之后,我大部分的都是编写各种各样额度底层的网络程序。业余时间我也是编写一些有趣的网络程序,不仅仅是兴趣,也是为未来的某个业务探索一下技术方案。</p>
<p>而且这次,我想知道,就在我这一个10年前的小mini机器4核机器上,在家庭网络中扫描全国(中国大陆)的所有的公网IP地址需要多少时间。</p>
<p>利用它,我可以知道和全国各省市的运营商、云服务商的联通情况。有没有运营商的出口故障以及IP已没有被运营商或者有关部门劫持。</p>
<p>TL;DR: 一共扫描了<strong>3亿</strong>个地址(343142912),当前ping的通的IP <strong>592万</strong>个(5923768),耗时<strong>1小时</strong>(1h2m57.973755197s)。</p>
<p>这次我重构了以前的一个扫描公网IP的程序。先前的程序使用gopacket收发包,也使用gopacket组装包。但是gopacket很讨厌的的一个地方是它依赖libpcap库,没有办法在禁用CGO的情况下。</p>
<p>事实上利用Go的扩展包icmp和ipv4,我们完全可以不使用gopacket实现这个功能,本文介绍具体的实现。</p>
<p>程序的全部代码在:<a href="https://github.com/smallnest/fishfinder" target="_blank" rel="external">https://github.com/smallnest/fishfinder</a></p>
<a id="more"></a>
<h2 id="程序的主要架构">程序的主要架构</h2>
<p>程序使用ICMP协议进行探测。</p>
<p>首先它启动一个goroutine解析全国的IP地址。IP地址文件每一行都是一个网段,它对每一个网段解析成一组IP地址,把这组IP地址扔进input channel。</p>
<p>一个发送goroutine从input通道中接收IP地址,然后组装成ICMP echo包发送给每一个IP地址,它只负责发送,发送完所有的地址就返回。</p>
<p>一个接收goroutine处理接收到的ICMP reply 回包,并将结果写入到output channel中。</p>
<p>主程序不断的从output中接收已经有回包的IP并打印到日志中,直到所有的IP都处理完就退出。</p>
<p><img src="flow.png" alt=""></p>
<p>这里涉及到并发编程的问题,几个goroutine怎么协调:</p>
<ul>
<li>IP解析和任务分发goroutine和发送goroutine通过input通讯。分发goroutine处理完所有的IP后,就会关闭input通知发送goroutine。</li>
<li>发送goroutine得知input关闭,就知道已经处理完所有的IP,发送完最后的IP后把output关闭。</li>
<li>接收goroutine往output发送接收到回包的IP, 如果output关闭,再往output发送就会panic,程序中捕获了panic。不过还没到这一步主程序应该就退出了。</li>
<li>主程序从output读取IP, 一旦output关闭,主程序就打印统计信息后推出。</li>
</ul>
<blockquote>
<p>如果你对Go并发编程有疑问,可以阅读极客时间上的《Go并发编程实战课》专栏,或者图书《深入理解Go并发编程》。<br>如果你是Rust程序员,不就我会推出《Go并发编程实战课》姊妹专栏,专门介绍Rust并发编程。<br>如果你对网络编程感兴趣,今年我还想推出《深入理解网络编程》的专栏或者图书,如果你感兴趣,欢迎和我探讨。</p>
</blockquote>
<p>主程序的代码如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"flag"</span></div><div class="line"> <span class="string">"time"</span></div><div class="line"></div><div class="line"> <span class="string">"github.com/kataras/golog"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">var</span> (</div><div class="line"> protocol = flag.String(<span class="string">"p"</span>, <span class="string">"icmp"</span>, <span class="string">"The protocol to use (icmp, tcp or udp)"</span>)</div><div class="line">)</div><div class="line"></div><div class="line"><span class="comment">// 嵌入ip.sh</span></div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> flag.Parse()</div><div class="line"></div><div class="line"> input := <span class="built_in">make</span>(<span class="keyword">chan</span> []<span class="typename">string</span>,<span class="number"> 1024</span>)</div><div class="line"> output := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="typename">string</span>,<span class="number"> 1024</span>)</div><div class="line"> scanner := NewICMPScanner(input, output)</div><div class="line"></div><div class="line"> <span class="keyword">var</span> total <span class="typename">int</span></div><div class="line"> <span class="keyword">var</span> alive <span class="typename">int</span></div><div class="line"></div><div class="line"> golog.Infof(<span class="string">"start scanning"</span>)</div><div class="line"></div><div class="line"> start := time.Now()</div><div class="line"> <span class="comment">// 将待探测的IP发送给send goroutine</span></div><div class="line"> <span class="keyword">go</span> <span class="keyword">func</span>() {</div><div class="line"> lines := readIPList()</div><div class="line"> <span class="keyword">for</span> _, line := <span class="keyword">range</span> lines {</div><div class="line"> ips := cidr2IPList(line)</div><div class="line"> input <- ips</div><div class="line"> total += <span class="built_in">len</span>(ips)</div><div class="line"> }</div><div class="line"> <span class="built_in">close</span>(input)</div><div class="line"> }()</div><div class="line"></div><div class="line"> <span class="comment">// 启动 send goroutine</span></div><div class="line"> scanner.Scan()</div><div class="line"></div><div class="line"> <span class="comment">// 接收 send goroutine 发送的结果, 直到发送之后5秒结束</span></div><div class="line"> <span class="keyword">for</span> ip := <span class="keyword">range</span> output {</div><div class="line"> golog.Infof(<span class="string">"%s is alive"</span>, ip)</div><div class="line"> alive++</div><div class="line"> }</div><div class="line"></div><div class="line"> golog.Infof(<span class="string">"total: %d, alive: %d, time: %v"</span>, total, alive, time.Since(start))</div><div class="line">}</div></pre></td></tr></table></figure>
<p>接下来介绍三个三个主要goroutine的逻辑。</p>
<h2 id="公网IP获取以及任务分发">公网IP获取以及任务分发</h2>
<p>首先你需要到互联网管理中心下载中国大陆所有的注册的IP网段,这是从亚太互联网络信息中心下载的公网IP信息,实际上可以探测全球的IP,这里以中国大陆的公网IP为例。</p>
<p>通过下面的代码转换成网段信息:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="shebang">#!/bin/bash</span></div><div class="line"></div><div class="line">wget -c -O- http://ftp.apnic.net/stats/apnic/delegated-apnic-latest | awk -F <span class="string">'|'</span> <span class="string">'/CN/&&/ipv4/ {print $4 "/" 32-log($5)/log(2)}'</span> | cat > ipv4.txt</div></pre></td></tr></table></figure>
<p>ipv4.txt文件中是一行行的网段:</p>
<figure class="highlight txt"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="number">1.0</span><span class="number">.1</span><span class="number">.0</span>/<span class="number">24</span></div><div class="line"><span class="number">1.0</span><span class="number">.2</span><span class="number">.0</span>/<span class="number">23</span></div><div class="line"><span class="number">1.0</span><span class="number">.8</span><span class="number">.0</span>/<span class="number">21</span></div><div class="line"><span class="number">1.0</span><span class="number">.32</span><span class="number">.0</span>/<span class="number">19</span></div><div class="line"><span class="number">1.1</span><span class="number">.0</span><span class="number">.0</span>/<span class="number">24</span></div><div class="line"><span class="number">1.1</span><span class="number">.2</span><span class="number">.0</span>/<span class="number">23</span></div><div class="line"><span class="number">1.1</span><span class="number">.4</span><span class="number">.0</span>/<span class="number">22</span></div><div class="line"><span class="keyword">...</span></div></pre></td></tr></table></figure>
<p>数据量不大,我们全读取进来(如果太多的话我们就流式读取了)。<br>解析每一行的网段,转换成IP地址列表,然后发送给input通道。<br>等处理完就把inpout通道关闭。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">go</span> <span class="keyword">func</span>() {</div><div class="line"> lines := readIPList()</div><div class="line"> <span class="keyword">for</span> _, line := <span class="keyword">range</span> lines {</div><div class="line"> ips := cidr2IPList(line)</div><div class="line"> input <- ips</div><div class="line"> total += <span class="built_in">len</span>(ips)</div><div class="line"> }</div><div class="line"> <span class="built_in">close</span>(input)</div><div class="line">}()</div></pre></td></tr></table></figure>
<h2 id="发送逻辑">发送逻辑</h2>
<p>我使用了<code>ICMPScanner</code>结构体来管理发送和接收的逻辑。看名字你也可以猜测到我们将来还可以使用TCP/UDP等协议进行探测。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> ICMPScanner <span class="keyword">struct</span> {</div><div class="line"> src net.IP</div><div class="line"></div><div class="line"> input <span class="keyword">chan</span> []<span class="typename">string</span></div><div class="line"> output <span class="keyword">chan</span> <span class="typename">string</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">// 调大缓存区</span></div><div class="line"><span class="comment">// sysctl net.core.rmem_max</span></div><div class="line"><span class="comment">// sysctl net.core.wmem_max</span></div><div class="line"></div><div class="line"><span class="keyword">func</span> NewICMPScanner(input <span class="keyword">chan</span> []<span class="typename">string</span>, output <span class="keyword">chan</span> <span class="typename">string</span>) *ICMPScanner {</div><div class="line"> localIP := getLocalIP()</div><div class="line"> s := &ICMPScanner{</div><div class="line"> input: input,</div><div class="line"> output: output,</div><div class="line"> src: net.ParseIP(localIP),</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> s</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> (s *ICMPScanner) Scan() {</div><div class="line"> <span class="keyword">go</span> s.recv()</div><div class="line"> <span class="keyword">go</span> s.send(s.input)</div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>send</code>方法负责发送ICMP包,<code>recv</code>方法负责接收ICMP包。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// send sends a single ICMP echo request packet for each ip in the input channel.</span></div><div class="line"><span class="keyword">func</span> (s *ICMPScanner) send(input <span class="keyword">chan</span> []<span class="typename">string</span>) error {</div><div class="line"> <span class="keyword">defer</span> <span class="keyword">func</span>() {</div><div class="line"> time.Sleep<span class="number">(5</span> * time.Second)</div><div class="line"> <span class="built_in">close</span>(s.output)</div><div class="line"> golog.Infof(<span class="string">"send goroutine exit"</span>)</div><div class="line"> }()</div><div class="line"></div><div class="line"> id := os.Getpid() &<span class="number"> 0</span>xffff</div><div class="line"></div><div class="line"> <span class="comment">// 创建 ICMP 连接</span></div><div class="line"> conn, err := icmp.ListenPacket(<span class="string">"ip4:icmp"</span>, s.src.String())</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> conn.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 不负责接收数据</span></div><div class="line"> filter := createEmptyFilter()</div><div class="line"> <span class="keyword">if</span> assembled, err := bpf.Assemble(filter); err == <span class="constant">nil</span> {</div><div class="line"> conn.IPv4PacketConn().SetBPF(assembled)</div><div class="line"> }</div><div class="line"></div><div class="line"> ... <span class="comment">// 先忽略,后面再介绍</span></div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>send</code>方法中,我们首先创建一个ICMP连接,我通过icmp包创建了一个连接,然后设置了一个BPF过滤器,过滤掉我们不关心的包。<br>这是一个技巧,这个连接我们不关心接收到的包,只关心发送的包,所以我们设置了一个空的过滤器。</p>
<p>这个设计本来是为了将来的性能扩展做准备,可以创建多个连接用来更快的发送。不过目前我们只使用一个连接,所以这个连接其实可以和接收goroutine共享,目前的设计还是发送和接收使用各自的连接。</p>
<p>接下来就是发送的逻辑了,也就是上面省略的部分:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div></pre></td><td class="code"><pre><div class="line">seq := <span class="typename">uint16</span><span class="number">(0</span>)</div><div class="line"><span class="keyword">for</span> ips := <span class="keyword">range</span> input {</div><div class="line"> <span class="keyword">for</span> _, ip := <span class="keyword">range</span> ips {</div><div class="line"> dst, err := net.ResolveIPAddr(<span class="string">"ip"</span>, ip)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Fatalf(<span class="string">"failed to resolve IP address: %v"</span>, err)</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 构造 ICMP 报文</span></div><div class="line"> msg := &icmp.Message{</div><div class="line"> Type: ipv4.ICMPTypeEcho,</div><div class="line"> Code:<span class="number"> 0</span>,</div><div class="line"> Body: &icmp.Echo{</div><div class="line"> ID: id,</div><div class="line"> Seq: <span class="typename">int</span>(seq),</div><div class="line"> Data: []<span class="typename">byte</span>(<span class="string">"Hello, are you there!"</span>),</div><div class="line"> },</div><div class="line"> }</div><div class="line"> msgBytes, err := msg.Marshal(<span class="constant">nil</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"failed to marshal ICMP message: %v"</span>, err)</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 发送 ICMP 报文</span></div><div class="line"> _, err = conn.WriteTo(msgBytes, dst)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"failed to send ICMP message: %v"</span>, err)</div><div class="line"> }</div><div class="line"> seq++</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>发送循环从input通道中读取IP地址,然后构造ICMP echo报文,发送到目标地址。</p>
<ul>
<li>从 input channel 读取 IP 列表</li>
<li>对每个 IP 执行以下操作:<ol>
<li>解析 IP 地址</li>
<li>构造 ICMP echo 请求报文</li>
<li>序列化报文</li>
<li>发送到目标地址</li>
</ol>
</li>
</ul>
<p>icmp报文中的ID我们设置为进程的PID,在接收的时候可以用来判断是否是我们发送的回包。</p>
<h2 id="接收逻辑">接收逻辑</h2>
<p>接收逻辑比较简单,我们只需要接收ICMP回包,然后解析出IP地址,然后发送到output通道。</p>
<p>首先我们创建一个ICMP连接,然后循环接收ICMP回包,解析出IP地址,然后发送到output通道。</p>
<p>我们只需处理ICMPTypeEchoReply类型的回包,然后判断ID是否是我们发送的ID,如果是就把对端的IP发送到output通道。</p>
<p>我们通过ID判断回包针对我们的场景就足够了,不用再判断seq甚至payload信息。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (s *ICMPScanner) recv() error {</div><div class="line"> <span class="keyword">defer</span> <span class="built_in">recover</span>()</div><div class="line"></div><div class="line"> id := os.Getpid() &<span class="number"> 0</span>xffff</div><div class="line"></div><div class="line"> <span class="comment">// 创建 ICMP 连接</span></div><div class="line"> conn, err := icmp.ListenPacket(<span class="string">"ip4:icmp"</span>, <span class="string">"0.0.0.0"</span>)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"> <span class="keyword">defer</span> conn.Close()</div><div class="line"></div><div class="line"> <span class="comment">// 接收 ICMP 报文</span></div><div class="line"> reply := <span class="built_in">make</span>([]<span class="typename">byte</span>,<span class="number"> 1500</span>)</div><div class="line"> <span class="keyword">for</span> {</div><div class="line"> n, peer, err := conn.ReadFrom(reply)</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> log.Fatal(err)</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 解析 ICMP 报文</span></div><div class="line"> msg, err := icmp.ParseMessage(protocolICMP, reply[:n])</div><div class="line"> <span class="keyword">if</span> err != <span class="constant">nil</span> {</div><div class="line"> golog.Errorf(<span class="string">"failed to parse ICMP message: %v"</span>, err)</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 打印结果</span></div><div class="line"> <span class="keyword">switch</span> msg.Type {</div><div class="line"> <span class="keyword">case</span> ipv4.ICMPTypeEchoReply:</div><div class="line"> echoReply, ok := msg.Body.(*icmp.Echo)</div><div class="line"> <span class="keyword">if</span> !ok {</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> echoReply.ID == id {</div><div class="line"> s.output <- peer.String()</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>可以看到,200行代码基本就可以我们扫描全国公网IP的程序了。你也可以尝试扫描一下全球的IP地址,看看需要多少时间。</p>
<p>对了,下面是我运行这个程序的输出:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">...</span></div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">26</span> <span class="number">22</span>:<span class="number">01</span> <span class="number">223.255</span><span class="number">.236</span><span class="number">.221</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">26</span> <span class="number">22</span>:<span class="number">01</span> <span class="number">223.255</span><span class="number">.252</span><span class="number">.9</span> is alive</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">26</span> <span class="number">22</span>:<span class="number">01</span> send goroutine exit</div><div class="line">[INFO] <span class="number">2025</span>/<span class="number">01</span>/<span class="number">26</span> <span class="number">22</span>:<span class="number">01</span> total: <span class="number">343142912</span>, alive: <span class="number">5923768</span>, time: 1h2m57.973755197s</div></pre></td></tr></table></figure>
]]></content>
<summary type="html">
<![CDATA[<p>自从加入百度负责物理网络的监控业务之后,我大部分的都是编写各种各样额度底层的网络程序。业余时间我也是编写一些有趣的网络程序,不仅仅是兴趣,也是为未来的某个业务探索一下技术方案。</p>
<p>而且这次,我想知道,就在我这一个10年前的小mini机器4核机器上,在家庭网络中扫描全国(中国大陆)的所有的公网IP地址需要多少时间。</p>
<p>利用它,我可以知道和全国各省市的运营商、云服务商的联通情况。有没有运营商的出口故障以及IP已没有被运营商或者有关部门劫持。</p>
<p>TL;DR: 一共扫描了<strong>3亿</strong>个地址(343142912),当前ping的通的IP <strong>592万</strong>个(5923768),耗时<strong>1小时</strong>(1h2m57.973755197s)。</p>
<p>这次我重构了以前的一个扫描公网IP的程序。先前的程序使用gopacket收发包,也使用gopacket组装包。但是gopacket很讨厌的的一个地方是它依赖libpcap库,没有办法在禁用CGO的情况下。</p>
<p>事实上利用Go的扩展包icmp和ipv4,我们完全可以不使用gopacket实现这个功能,本文介绍具体的实现。</p>
<p>程序的全部代码在:<a href="https://github.com/smallnest/fishfinder" target="_blank" rel="external">https://github.com/smallnest/fishfinder</a></p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/tags/Go/"/>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[Go中秘而不宣的数据结构: 四叉堆,不是普通的二叉堆]]></title>
<link href="https://colobu.com/2024/11/18/go-internal-ds-4-ary-heap/"/>
<id>https://colobu.com/2024/11/18/go-internal-ds-4-ary-heap/</id>
<published>2024-11-18T14:47:50.000Z</published>
<updated>2024-11-18T15:00:08.681Z</updated>
<content type="html"><![CDATA[<p>Go语言中Timer以及相关的Ticker、time.After、time.AfterFunc 等定时器最终是以四叉堆的数据形式存放的。</p>
<p>全局的 timer 堆也经历过三个阶段的重要升级。</p>
<ul>
<li>Go 1.9 版本之前,所有的计时器由全局唯一的四叉堆维护,goroutine间竞争激烈。</li>
<li>Go 1.10 - 1.13,全局使用 64 个四叉堆维护全部的计时器,通过分片减少了竞争的压力,但是本质上还是没有解决 1.9 版本之前的问题</li>
<li>Go 1.14 版本之后,每个 P 单独维护一个四叉堆,避免了goroutine的竞争。 (后面我们再介绍 per-P 的数据结构)</li>
</ul>
<p>常见的堆(heap)常常以二叉堆的形式实现。可是为什么Go timer使用四叉堆呢?</p>
<a id="more"></a>
<p>以最小堆为例,下图展示了二叉堆和四叉堆的区别:</p>
<p><img src="bheap-vs-d-ary-heap.png" alt=""></p>
<ul>
<li>二叉堆:每个节点最多有2个子节点;四叉堆:每个节点最多有4个子节点</li>
<li>在相同节点数下,四叉堆的高度更低,约为二叉堆的一半(log₄n vs log₂n)</li>
<li>对于最小堆来说, 父节点的值小于等于子节点的值。</li>
</ul>
<p>父节点和子节点的索引计算也略有不同。二叉堆的父子索引如下:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">parent = (i - <span class="number">1</span>) // <span class="number">2</span></div><div class="line">left_child = <span class="number">2</span> * i + <span class="number">1</span></div><div class="line">right_child = <span class="number">2</span> * i + <span class="number">2</span></div></pre></td></tr></table></figure>
<p>四叉堆的父子索引如下:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">parent = (i - <span class="number">1</span>) // <span class="number">4</span></div><div class="line">first_child = <span class="number">4</span> * i + <span class="number">1</span></div><div class="line">last_child = <span class="number">4</span> * i + <span class="number">4</span></div></pre></td></tr></table></figure>
<p>他们的操作时间复杂度:<br><img src="b-vs-4.png" alt=""></p>
<p>因为四叉树的高度相对更低,所以四叉堆适合数据量特别大,需要减少树的高度的场景, Go的timer很久以前(11年前)就使用四叉树来实现Timer的保存,当然Go开发者也是根据测试结果选择了四叉树,最早的这个提交可以查看: <a href="https://codereview.appspot.com/13094043/#ps1" target="_blank" rel="external">## code review 13094043: time: make timers heap 4-ary (Closed)</a></p>
<p>在Go的运行时中,四叉堆的实现在 <code>src/runtime/time.go</code> 文件中,可以查看源码实现。<code>timers</code>数据结构代表Timer的集合,每个P都有一个timers实例,用于维护当前P的所有Timer。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// A timers is a per-P set of timers.</span></div><div class="line"><span class="keyword">type</span> timers <span class="keyword">struct</span> {</div><div class="line"> <span class="comment">// 互斥锁保护timers; 虽然timers是每个P的,但是调度器可以访问另一个P的timers,所以我们必须锁定。</span></div><div class="line"> mu mutex</div><div class="line"></div><div class="line"> <span class="comment">// heap是一组计时器,按heap[i].when排序。这就是一个四叉堆,虽然没有明确的说明。</span></div><div class="line"> <span class="comment">// 必须持有锁才能访问这个堆。</span></div><div class="line"> heap []timerWhen</div><div class="line"></div><div class="line"> <span class="comment">// len是heap的长度的原子副本。</span></div><div class="line"> <span class="built_in">len</span> atomic.Uint32</div><div class="line"></div><div class="line"> <span class="comment">// zombies是堆中标记为删除的计时器的数量。</span></div><div class="line"> zombies atomic.Int32</div><div class="line"></div><div class="line"> raceCtx <span class="typename">uintptr</span></div><div class="line"></div><div class="line"> <span class="comment">// minWhenHeap是最小的heap[i].when值(= heap[0].when)。</span></div><div class="line"> <span class="comment">// wakeTime方法使用minWhenHeap和minWhenModified来确定下一个唤醒时间。</span></div><div class="line"> <span class="comment">// 如果minWhenHeap = 0,表示堆中没有计时器。</span></div><div class="line"> minWhenHeap atomic.Int64</div><div class="line"></div><div class="line"> <span class="comment">// minWhenModified是具有timerModified位设置的计时器的最小heap[i].when的下界。</span></div><div class="line"> <span class="comment">// 如果minWhenModified = 0,表示堆中没有timerModified计时器。</span></div><div class="line"> minWhenModified atomic.Int64</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">type</span> timerWhen <span class="keyword">struct</span> {</div><div class="line"> timer *timer</div><div class="line"> when <span class="typename">int64</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> (ts *timers) lock() {</div><div class="line"> lock(&ts.mu)</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> (ts *timers) unlock() {</div><div class="line"> ts.<span class="built_in">len</span>.Store(<span class="typename">uint32</span>(<span class="built_in">len</span>(ts.heap)))</div><div class="line"> unlock(&ts.mu)</div><div class="line">}</div></pre></td></tr></table></figure>
<p>同时<code>Timer</code>结构体还引用了<code>Timers</code>, 这叫你中有我,我中有你,这样的设计是为了方便Timer的管理,Timer的创建、删除、执行都是通过Timers来实现的。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> timer <span class="keyword">struct</span> {</div><div class="line"> mu mutex</div><div class="line"> astate atomic.Uint8 </div><div class="line"> state <span class="typename">uint8</span> </div><div class="line"> isChan <span class="typename">bool</span> </div><div class="line"> blocked <span class="typename">uint32</span></div><div class="line"></div><div class="line"></div><div class="line"> when <span class="typename">int64</span></div><div class="line"> period <span class="typename">int64</span></div><div class="line"> f <span class="keyword">func</span>(arg any, seq <span class="typename">uintptr</span>, delay <span class="typename">int64</span>)</div><div class="line"> arg any</div><div class="line"> seq <span class="typename">uintptr</span></div><div class="line"></div><div class="line"> ts *timers <span class="comment">// 注意这里</span></div><div class="line"></div><div class="line"> sendLock mutex</div><div class="line"> isSending atomic.Int32</div><div class="line">}</div></pre></td></tr></table></figure>
<p>我们来看看对这个堆操作的一些方法。</p>
<p><code>timerHeapN</code>定义了堆是四叉堆,也就是每个节点最多有4个子节点。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">const</span> timerHeapN =<span class="number"> 4</span></div></pre></td></tr></table></figure>
<p>堆常用的辅助方法就是<code>siftUp</code>和<code>siftDown</code>,分别用于上浮和下沉操作。</p>
<p>下面是上浮的方法,我把一些跟踪检查的代码去掉了。整体看代码还是比较简单的,就是不停的上浮,直到找到合适的位置。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// siftUp将位置i的计时器在堆中合适的位置,通过将其向堆的顶部移动。</span></div><div class="line"><span class="keyword">func</span> (ts *timers) siftUp(i <span class="typename">int</span>) {</div><div class="line"> heap := ts.heap</div><div class="line"> <span class="keyword">if</span> i >= <span class="built_in">len</span>(heap) {</div><div class="line"> badTimer()</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 注意下面两行我们保存了当前i的计时器和它的when值</span></div><div class="line"> tw := heap[i] </div><div class="line"> when := tw.when</div><div class="line"> <span class="keyword">if</span> when <=<span class="number"> 0</span> {</div><div class="line"> badTimer()</div><div class="line"> }</div><div class="line"> <span class="keyword">for</span> i ><span class="number"> 0</span> {</div><div class="line"> p := <span class="typename">int</span>(<span class="typename">uint</span>(i<span class="number">-1</span>) / timerHeapN) <span class="comment">// 父节点 (i-1)/4</span></div><div class="line"> <span class="keyword">if</span> when >= heap[p].when { <span class="comment">// 如果父节点的when <= 当前节点的when,那么就不需要再上浮了</span></div><div class="line"> <span class="keyword">break</span></div><div class="line"> }</div><div class="line"> heap[i] = heap[p] <span class="comment">// 父节点下沉到当前的i</span></div><div class="line"> i = p <span class="comment">// i指向父节点, 继续循环上浮检查</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 如果发生了上浮,那么最后将tw放到上浮到的合适位置</span></div><div class="line"> <span class="keyword">if</span> heap[i].timer != tw.timer {</div><div class="line"> heap[i] = tw</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>类似的,下面是下沉的方法:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// siftDown将位置i的计时器放在堆中的正确位置,通过将其向堆的底部移动。</span></div><div class="line"><span class="keyword">func</span> (ts *timers) siftDown(i <span class="typename">int</span>) {</div><div class="line"> heap := ts.heap</div><div class="line"> n := <span class="built_in">len</span>(heap)</div><div class="line"> <span class="keyword">if</span> i >= n {</div><div class="line"> badTimer()</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 如果已经是叶子节点,不用下沉了</span></div><div class="line"> <span class="keyword">if</span> i*timerHeapN<span class="number">+1</span> >= n {</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 保存当前i的计时器和when值</span></div><div class="line"> tw := heap[i]</div><div class="line"> when := tw.when</div><div class="line"> <span class="keyword">if</span> when <=<span class="number"> 0</span> {</div><div class="line"> badTimer()</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 从左子节点开始,找到最小的when值,然后将当前节点下沉到这个位置</span></div><div class="line"> <span class="keyword">for</span> {</div><div class="line"> leftChild := i*timerHeapN +<span class="number"> 1</span> <span class="comment">// 左子节点</span></div><div class="line"> <span class="keyword">if</span> leftChild >= n {</div><div class="line"> <span class="keyword">break</span></div><div class="line"> }</div><div class="line"> w := when</div><div class="line"> c :=<span class="number"> -1</span></div><div class="line"> <span class="keyword">for</span> j, tw := <span class="keyword">range</span> heap[leftChild:min(leftChild+timerHeapN, n)] { <span class="comment">// 从左子节点开始遍历子节点,找到小于当前w的最小的子节点</span></div><div class="line"> <span class="keyword">if</span> tw.when < w {</div><div class="line"> w = tw.when</div><div class="line"> c = leftChild + j</div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> c <<span class="number"> 0</span> { <span class="comment">// 如果没有找到比当前节点更小的子节点,那么就不用下沉了</span></div><div class="line"> <span class="keyword">break</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 将当前节点下沉到最小的子节点</span></div><div class="line"> heap[i] = heap[c]</div><div class="line"> i = c</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 如果发生了下沉,那么最后将tw放到下沉到的合适位置</span></div><div class="line"> <span class="keyword">if</span> heap[i].timer != tw.timer {</div><div class="line"> heap[i] = tw</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>比上浮略微复杂,因为需要在兄弟节点中找到最小的节点,然后将当前节点下沉到这个位置。</p>
<p>对于一个任意的slice,我们可以把它初始化为一个四叉堆,方法如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (ts *timers) initHeap() {</div><div class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(ts.heap) <=<span class="number"> 1</span> {</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="comment">// 从最后一个非叶子节点开始,依次下沉</span></div><div class="line"> <span class="keyword">for</span> i := <span class="typename">int</span>(<span class="typename">uint</span>(<span class="built_in">len</span>(ts.heap<span class="number">)-1</span><span class="number">-1</span>) / timerHeapN); i >=<span class="number"> 0</span>; i-- {</div><div class="line"> ts.siftDown(i)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>当然timers还有一些辅助timer处理的一些方法,很多和四叉堆没有关系了,我就不一一介绍了,我主要介绍几个和四叉堆相关的方法。</p>
<blockquote>
<p>这里吐槽一下,这个time.go文件中代码组织很乱,timer和timers的方法都穿插在一起。理论应该是timer方法和timers方法分开,这样更清晰。或者把timers抽取到一个单独的文件中。</p>
</blockquote>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (ts *timers) deleteMin() {</div><div class="line"> <span class="comment">// 得到堆顶元素</span></div><div class="line"> t := ts.heap<span class="number">[0</span>].timer</div><div class="line"> <span class="keyword">if</span> t.ts != ts {</div><div class="line"> throw(<span class="string">"wrong timers"</span>)</div><div class="line"> }</div><div class="line"> t.ts = <span class="constant">nil</span> <span class="comment">// 将timer的ts置为nil,自此和ts一别两宽,再无瓜葛</span></div><div class="line"></div><div class="line"> <span class="comment">// 将最后一个元素设置为堆顶</span></div><div class="line"> last := <span class="built_in">len</span>(ts.heap) -<span class="number"> 1</span></div><div class="line"> <span class="keyword">if</span> last ><span class="number"> 0</span> {</div><div class="line"> ts.heap<span class="number">[0</span>] = ts.heap[last]</div><div class="line"> }</div><div class="line"> ts.heap[last] = timerWhen{} <span class="comment">// 将最后一个元素置为空</span></div><div class="line"> ts.heap = ts.heap[:last] <span class="comment">// 缩减slice,剔除最后的空元素</span></div><div class="line"> <span class="keyword">if</span> last ><span class="number"> 0</span> { <span class="comment">// 将堆顶元素下沉</span></div><div class="line"> ts.siftDown<span class="number">(0</span>)</div><div class="line"> }</div><div class="line"> ts.updateMinWhenHeap()</div><div class="line"> <span class="keyword">if</span> last ==<span class="number"> 0</span> {</div><div class="line"> ts.minWhenModified.Store<span class="number">(0</span>)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>增加一个timer到堆中:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (ts *timers) addHeap(t *timer) {</div><div class="line"> <span class="keyword">if</span> netpollInited.Load() ==<span class="number"> 0</span> {</div><div class="line"> netpollGenericInit()</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> t.ts != <span class="constant">nil</span> {</div><div class="line"> throw(<span class="string">"ts set in timer"</span>)</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 设置timer的ts为当前的timers,从此执子之手,笑傲江湖</span></div><div class="line"> t.ts = ts</div><div class="line"> <span class="comment">// 添加到最后</span></div><div class="line"> ts.heap = <span class="built_in">append</span>(ts.heap, timerWhen{t, t.when})</div><div class="line"> ts.siftUp(<span class="built_in">len</span>(ts.heap) -<span class="number"> 1</span>) <span class="comment">// 上浮它到合适的位置</span></div><div class="line"> <span class="keyword">if</span> t == ts.heap<span class="number">[0</span>].timer {</div><div class="line"> ts.updateMinWhenHeap()</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<h2 id="n叉堆">n叉堆</h2>
<p><strong><em>d-ary</em> 堆</strong>或 <strong><em>d-heap</em></strong> 是一种优先队列数据结构,是二进制堆的泛化,其中节点有d个子节点而不是 2 个子节点。因此,二进制堆是2堆,<strong>而三元堆</strong>是3堆。根据 Tarjan 和 Jensen 等人的说法,d-ary堆是由 Donald B. Johnson 1975 年发明的。</p>
<p>此数据结构允许比二进制堆更快地执行降低优先级操作(因为深度更浅了),但代价是删除最小操作速度较慢。这种权衡导致算法的运行时间更长,其中降低优先级操作比删除最小操作更常见。此外,d-ary堆比二进制堆具有更好的内存缓存行为,尽管理论上最坏情况下的运行时间更长,但它们在实践中运行得更快。与二进制堆一样,d-ary堆是一种就地数据结构,除了在堆中存储项目数组所需的存储空间外,它不使用任何额外的存储空间。</p>
<p>在Go生态圈已经有相应的库实现这个数据结构,比如<a href="https://github.com/ahrav/go-d-ary-heap" target="_blank" rel="external">ahrav/go-d-ary-heap</a>,所以如果你有类似场景的需求,或者想对比测试,你可以使用这个库。</p>
<p>导入库:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> <span class="string">"github.com/ahrav/go-d-ary-heap"</span></div></pre></td></tr></table></figure>
<p>下面的例子是创建三叉最小堆和四叉最大堆的例子:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"fmt"</span></div><div class="line"> <span class="string">"github.com/ahrav/go-d-ary-heap"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="keyword">func</span> main() {</div><div class="line"> <span class="comment">// Create a min-heap for integers with a branching factor of 3.</span></div><div class="line"> minHeap := heap.NewHeap[<span class="typename">int</span>]<span class="number">(3</span>, <span class="keyword">func</span>(a, b <span class="typename">int</span>) <span class="typename">bool</span> { <span class="keyword">return</span> a < b })</div><div class="line"></div><div class="line"> <span class="comment">// Create a max-heap for integers with a branching factor of 4.</span></div><div class="line"> maxHeap := heap.NewHeap[<span class="typename">int</span>]<span class="number">(4</span>, <span class="keyword">func</span>(a, b <span class="typename">int</span>) <span class="typename">bool</span> { <span class="keyword">return</span> a > b })</div><div class="line">}</div></pre></td></tr></table></figure>
<p>往堆中增加元素:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line">minHeap.Push<span class="number">(10</span>)</div><div class="line">minHeap.Push<span class="number">(5</span>)</div><div class="line">minHeap.Push<span class="number">(15</span>)</div><div class="line"></div><div class="line">maxHeap.Push<span class="number">(10</span>)</div><div class="line">maxHeap.Push<span class="number">(5</span>)</div><div class="line">maxHeap.Push<span class="number">(15</span>)</div></pre></td></tr></table></figure>
<p>从堆中移除最值:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">fmt.Println(minHeap.Pop()) <span class="comment">// Outputs: 5</span></div><div class="line">fmt.Println(maxHeap.Pop()) <span class="comment">// Outputs: 15</span></div></pre></td></tr></table></figure>
<p>返回但是不移除最值:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">fmt.Println(minHeap.Peek()) <span class="comment">// Assuming more elements were added, outputs the smallest</span></div><div class="line">fmt.Println(maxHeap.Peek()) <span class="comment">// Assuming more elements were added, outputs the largest</span></div></pre></td></tr></table></figure>
]]></content>
<summary type="html">
<![CDATA[<p>Go语言中Timer以及相关的Ticker、time.After、time.AfterFunc 等定时器最终是以四叉堆的数据形式存放的。</p>
<p>全局的 timer 堆也经历过三个阶段的重要升级。</p>
<ul>
<li>Go 1.9 版本之前,所有的计时器由全局唯一的四叉堆维护,goroutine间竞争激烈。</li>
<li>Go 1.10 - 1.13,全局使用 64 个四叉堆维护全部的计时器,通过分片减少了竞争的压力,但是本质上还是没有解决 1.9 版本之前的问题</li>
<li>Go 1.14 版本之后,每个 P 单独维护一个四叉堆,避免了goroutine的竞争。 (后面我们再介绍 per-P 的数据结构)</li>
</ul>
<p>常见的堆(heap)常常以二叉堆的形式实现。可是为什么Go timer使用四叉堆呢?</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[HeapMap, 一个混合功能的数据结构Go语言实现]]></title>
<link href="https://colobu.com/2024/11/17/heapmap/"/>
<id>https://colobu.com/2024/11/17/heapmap/</id>
<published>2024-11-17T09:17:13.000Z</published>
<updated>2024-11-17T09:18:38.769Z</updated>
<content type="html"><![CDATA[<p>今天在准备《秘而不宣》系列下一篇文章时,思绪飘散了,突然想到使用 Heap 的功能再加 HashTable (Map) 的功能,可以构造一种新的数据结构,然后把我聚合程序中的数据聚合数据结构替换掉,总之思绪翩翩。然后在网上搜了一下,这种数据结构其实早就有了,名字叫 <code>HeapMap</code>。</p>
<a id="more"></a>
<p><code>HeapMap</code> (也叫做 <code>PriorityMap</code>) 是一种结合了<strong>堆</strong>和<strong>哈希映射</strong>的数据结构,常用于需要按键排序并进行高效查找的场景。它可以在优先级队列的基础上,使用哈希映射来提供快速访问和更新。<code>HeapMap</code> 在实现过程中利用堆的有序性和哈希表的快速查找能力,以支持<strong>按键排序</strong>和<strong>常数时间查找</strong>。</p>
<p>Go 语言支付 Rob Pike 在他的 <a href="https://users.ece.utexas.edu/~adnan/pike.html" target="_blank" rel="external">Rob Pike's 5 Rules of Programming</a> 第 5 条就指出:</p>
<blockquote>
<ul>
<li>Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.<br>数据为王。如果你选择了合适的数据结构并进行了良好的组织,算法通常会变得显而易见。<em>编程的核心在于数据结构,而非算法</em>。</li>
</ul>
</blockquote>
<p>所以,如果在合适的场景下,针对它的特点,使用 HeapMap 会取得事半功倍的效果。</p>
<h3 id="HeapMap_的主要特点"><code>HeapMap</code> 的主要特点</h3>
<ol>
<li><strong>堆的特点</strong>:<code>HeapMap</code> 内部通过堆来维护键的顺序,可以快速获取最小或最大键。堆提供了插入和删除堆顶元素的 <code>O(log n)</code> 时间复杂度。</li>
<li><strong>哈希映射的特点</strong>:<code>HeapMap</code> 同时使用哈希映射以支持快速查找。哈希映射的查找、插入、删除等操作在理想情况下时间复杂度为 <code>O(1)</code>。</li>
<li><strong>用途</strong>:<code>HeapMap</code> 适合需要<em>频繁按键排序和快速查找</em>的场景,比如带有优先级的缓存、调度系统、任务优先队列等。</li>
</ol>
<h3 id="HeapMap_的基本结构"><code>HeapMap</code> 的基本结构</h3>
<ul>
<li><strong>堆(Heap)</strong>:用来维持按键的顺序,堆可以是最小堆或最大堆,根据具体需求决定。</li>
<li><strong>哈希映射(Map)</strong>:用来存储每个键值对,并支持通过键快速查找元素。</li>
</ul>
<p>你使用一个 <code>container/heap</code> + <code>map</code> 很容易实现一个 <code>HeapMap</code>, 其实我们没必要自己去写一个重复的轮子了,网上其他语言比如 Rust、Java 都有现成的实现,Go 语言中也有一个很好的实现:<a href="https://github.com/nemars/heapmap" target="_blank" rel="external">nemars/heapmap</a></p>
<h3 id="HeapMap_的实现"><code>HeapMap</code> 的实现</h3>
<p><code>nemars/heapmap</code> 这个库是去年增加到 github 中的,我是第一个 star 它的人。我们看看它是怎么实现的。</p>
<h4 id="结构体定义">结构体定义</h4>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> Entry[K comparable, V, P any] <span class="keyword">struct</span> {</div><div class="line"> Key K</div><div class="line"> Value V</div><div class="line"> Priority P</div><div class="line"> index <span class="typename">int</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">type</span> heapmap[K comparable, V, P any] <span class="keyword">struct</span> {</div><div class="line"> h pq[K, V, P]</div><div class="line"> m <span class="keyword">map</span>[K]*Entry[K, V, P]</div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>Entry</code> 代表这个数据结构中的一个节点 (元素、条目) , 它包含 key、value 值,还有优先级,index 记录它在堆的实现数组中的索引。</p>
<p><code>heapmap</code> 代表 <code>HeapMap</code> 的实现,它包含两个字段,第一个字段其实就是 <code>Heap</code> 的实现,为了方便实现泛型,它就自己实现了一个堆。第二个字段就是一个 map 对象了。</p>
<h4 id="典型的方法">典型的方法</h4>
<p>数据结构定义清楚了,那就就可以实现它的方法了。它实现了一些便利的方法,我们值关注几个实现就好了。</p>
<h5 id="Len_方法">Len 方法</h5>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (hm *heapmap[K, V, P]) Len() <span class="typename">int</span> {</div><div class="line"> <span class="keyword">return</span> <span class="built_in">len</span>(hm.m)</div><div class="line">}</div></pre></td></tr></table></figure>
<p>读取<code>h</code>字段或者<code>m</code>字段的长度都可以。</p>
<h5 id="Peek_方法">Peek 方法</h5>
<p>返回root元素。<br>最小堆就是返回最小的元素,最大堆就是返回最大的元素。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (hm *heapmap[K, V, P]) Peek() (Entry[K, V, P], <span class="typename">bool</span>) {</div><div class="line"> <span class="keyword">if</span> hm.Empty() {</div><div class="line"> <span class="keyword">return</span> Entry[K, V, P]{}, <span class="constant">false</span></div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> *hm.h.entries<span class="number">[0</span>], <span class="constant">true</span></div><div class="line">}</div></pre></td></tr></table></figure>
<h5 id="Pop_方法">Pop 方法</h5>
<p>弹出root元素。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (hm *heapmap[K, V, P]) Pop() (Entry[K, V, P], <span class="typename">bool</span>) {</div><div class="line"> <span class="keyword">if</span> hm.Empty() {</div><div class="line"> <span class="keyword">return</span> Entry[K, V, P]{}, <span class="constant">false</span></div><div class="line"> }</div><div class="line"> e := *heap.Pop(&hm.h).(*Entry[K, V, P])</div><div class="line"> <span class="built_in">delete</span>(hm.m, e.Key)</div><div class="line"> <span class="keyword">return</span> e, <span class="constant">true</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>注意涉及到元素的删除操作,要同时删除 map 中的元素。</p>
<h5 id="Push_方法_(Set_方法)">Push 方法 (Set 方法)</h5>
<p>其实作者没有实现 Push 方法,而是使用Set 方法来实现的。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (hm *heapmap[K, V, P]) Set(key K, value V, priority P) {</div><div class="line"> <span class="keyword">if</span> e, ok := hm.m[key]; ok {</div><div class="line"> e.Value = value</div><div class="line"> e.Priority = priority</div><div class="line"> heap.Fix(&hm.h, e.index)</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> e := &Entry[K, V, P]{</div><div class="line"> Key: key,</div><div class="line"> Value: value,</div><div class="line"> Priority: priority,</div><div class="line"> }</div><div class="line"> heap.Push(&hm.h, e)</div><div class="line"> hm.m[key] = e</div><div class="line">}</div></pre></td></tr></table></figure>
<p>Set方法有两个功能。如果元素的Key已经存在,那么就是更新元素,并且根据优先级进行调整。<br>如果元素的Key不存在,那么就是插入元素。</p>
<h5 id="Get_方法">Get 方法</h5>
<p>Get 方法就是获取任意的元素。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">func</span> (hm *heapmap[K, V, P]) Get(key K) (Entry[K, V, P], <span class="typename">bool</span>) {</div><div class="line"> <span class="keyword">if</span> e, ok := hm.m[key]; ok {</div><div class="line"> <span class="keyword">return</span> *e, <span class="constant">true</span></div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> Entry[K, V, P]{}, <span class="constant">false</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>有一点你需要注意的是,这个数据结构不是线程安全的,如果你需要线程安全的话,你可以使用 <code>sync.Mutex</code>/<code>sync.RWMutex</code> 来保护它。</p>
]]></content>
<summary type="html">
<![CDATA[<p>今天在准备《秘而不宣》系列下一篇文章时,思绪飘散了,突然想到使用 Heap 的功能再加 HashTable (Map) 的功能,可以构造一种新的数据结构,然后把我聚合程序中的数据聚合数据结构替换掉,总之思绪翩翩。然后在网上搜了一下,这种数据结构其实早就有了,名字叫 <code>HeapMap</code>。</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[Go中秘而不宣的数据结构 CacheLinePad:精细化优化]]></title>
<link href="https://colobu.com/2024/11/17/go-internal-ds-cacheline/"/>
<id>https://colobu.com/2024/11/17/go-internal-ds-cacheline/</id>
<published>2024-11-17T08:19:01.000Z</published>
<updated>2024-11-17T09:16:18.533Z</updated>
<content type="html"><![CDATA[<p>在现代多核处理器中,高效的缓存机制极大地提升了程序性能,而“伪共享”问题却常常导致缓存机制的低效。</p>
<a id="more"></a>
<h2 id="1-_背景">1. 背景</h2>
<blockquote>
<p>cacheline 本文中有时又叫做 缓存行</p>
</blockquote>
<p>在现代多核处理器中,三级缓存通常分为三级:L1、L2 和 L3,每一级缓存的大小、速度和共享方式都不同:</p>
<ul>
<li><p><strong>L1 缓存</strong>:这是速度最快的缓存,通常每个 CPU 核心都有独立的 L1 缓存。L1 缓存分为两个部分:一个用于存储指令(L1I),另一个用于存储数据(L1D)。L1 缓存的容量一般较小(通常 32KB - 64KB),但是读取速度极快,以极低的延迟为 CPU 核心提供服务。</p>
</li>
<li><p><strong>L2 缓存</strong>:L2 缓存通常比 L1 缓存大一些,容量一般在 256KB - 1MB 左右,每个 CPU 核心通常也会有独立的 L2 缓存。虽然 L2 缓存的访问速度比 L1 缓存稍慢,但它仍然显著快于主存。</p>
</li>
<li><p><strong>L3 缓存</strong>:这是三级缓存中容量最大的,通常在 8MB - 64MB 或更大。L3 缓存往往由所有 CPU 核心共享,并且主要用于减少核心之间的数据传输延迟。L3 缓存的读取速度比 L1、L2 缓存慢,但相对主存依然较快。对于多核处理器,L3 缓存是多核心之间协作的重要纽带。<br><img src="network-1.png" alt=""></p>
</li>
</ul>
<p>CPU缓存将数据划分成若干个 <code>cacheline</code>,使得 CPU 访问特定数据时,能以 cacheline 为单位加载或存储数据。<code>cacheline</code> 的大小通常是固定的,x86 架构中常见的 <code>cacheline</code> 大小是 64 字节,而 Apple M 系列等一些 ARM 架构处理器上可能达到 128 字节。</p>
<p>在 CPU 执行程序时,若数据在某级缓存中命中,整个 <code>cacheline</code> 会从该缓存加载到寄存器中;若数据不在 L1 缓存中,则会依次查找 L2、L3 缓存,并最终在主存中查找并加载到缓存。由于 <code>cacheline</code> 是缓存操作的基本单位,每次数据传输都是以 <code>cacheline</code> 为最小粒度的。</p>
<p>比如在 mac mini m2 机器是,我们可以查看此 CPU 的缓存行大小为 128 字节:<br><img src="network-2.png" alt=""></p>
<p>Linux 下可以查看另外一台机器的各级别缓存行大小为 64 字节:<br><img src="network-3.png" alt=""></p>
<h3 id="1-1_伪共享_(False_Sharing)">1.1 伪共享 (False Sharing)</h3>
<p><strong>伪共享</strong> 是指多个线程访问同一个 cache line 中的不同变量时,导致频繁的缓存失效(cache invalidation),从而大大降低程序性能。伪共享通常在多线程编程中发生,因为在多个线程中,如果两个或多个线程操作的变量在同一个 cache line 中,但它们并没有真正的共享关系,每个线程对其变量的写操作会导致其他线程的缓存失效。这样,CPU 核心会不断地将数据写回并重新加载,产生了不必要的资源浪费。</p>
<p>设有两个线程,各自操作两个独立的变量 <code>x</code> 和 <code>y</code>:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> Data <span class="keyword">struct</span> {</div><div class="line"> x <span class="typename">int64</span> <span class="comment">// 线程A更新的变量</span></div><div class="line"> y <span class="typename">int64</span> <span class="comment">// 线程B更新的变量</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>如果变量 <code>x</code> 和 <code>y</code> 位于同一个 cache line 中,那么线程 A 更新 <code>x</code> 后,线程 B 也会因为缓存失效而重新加载 <code>y</code>,尽管 B 实际上并未使用 <code>x</code> 的值。这种情况下,虽然两个变量并没有直接共享,但每次写操作都会导致另一方的缓存失效,从而形成了伪共享。</p>
<h3 id="1-2_如何避免伪共享?">1.2 如何避免伪共享?</h3>
<p>伪共享会对性能产生严重影响,但可以通过以下几种方法来优化:</p>
<ol>
<li><strong>变量对齐(Padding)</strong>:将每个变量扩展至一个完整的 <code>cacheline</code>,以防止多个线程访问同一个 <code>cacheline</code>。例如,可以在变量之间添加填充数据来分隔不同的 <code>cacheline</code> (假定 CPU 缓存行是 64 字节):</li>
</ol>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> Data <span class="keyword">struct</span> {</div><div class="line"> x <span class="typename">int64</span> <span class="comment">// 线程A更新的变量</span></div><div class="line"> _ <span class="number">[7</span>]<span class="typename">int64</span> <span class="comment">// 填充7个int64以对齐至64字节的cache line大小</span></div><div class="line"> y <span class="typename">int64</span> <span class="comment">// 线程B更新的变量</span></div><div class="line">}</div></pre></td></tr></table></figure>
<ol start="2">
<li><strong>将变量分散到不同的结构体中</strong>:对于经常被多个线程更新的变量,可以考虑将它们分散到不同的结构体,避免同一结构体被多个线程同时频繁更新。</li>
<li><strong>使用原子变量</strong>:在某些情况下,可以使用原子变量进行更新。虽然这不会彻底消除伪共享,但可以减少缓存一致性带来的开销。</li>
<li><strong>绑定 CPU 核心(CPU Affinity)</strong>:可以将线程绑定到指定的 CPU 核心上,从而减少多个线程同时访问同一块缓存的数据的几率。</li>
</ol>
<h3 id="1-3_单线程的缓存行污染问题">1.3 单线程的缓存行污染问题</h3>
<p>虽然单线程不会出现伪共享的问题,但是单线程程序仍然有一些缓存优化的空间:</p>
<ul>
<li><strong>避免缓存行污染</strong>:在单线程程序中,如果频繁访问的变量分布在不同的 cache line 上,会导致缓存频繁更替,增加缓存开销。优化时可以将频繁使用的数据集中在同一个 cache line 内,减少 CPU 从内存加载数据的频率。</li>
<li><strong>数据布局优化</strong>:对于单线程程序,也可以通过调整数据的内存布局,让程序更好地利用缓存。将经常一起访问的数据放在连续的内存中,以提高缓存命中率。<br>比如下面一个测试,</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> main</div><div class="line"></div><div class="line"><span class="keyword">import</span> (</div><div class="line"> <span class="string">"testing"</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="comment">// NonAlignedStruct 未对齐的结构体,补充后占24个字节</span></div><div class="line"><span class="keyword">type</span> NonAlignedStruct <span class="keyword">struct</span> {</div><div class="line"> a <span class="typename">byte</span> <span class="comment">// 1字节,补齐7字节</span></div><div class="line"> b <span class="typename">int64</span> <span class="comment">// 8字节</span></div><div class="line"> c <span class="typename">byte</span> <span class="comment">// 1字节,补齐7字节</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">// AlignedStruct 已对齐的结构体,补充后占16个字节</span></div><div class="line"><span class="keyword">type</span> AlignedStruct <span class="keyword">struct</span> {</div><div class="line"> b <span class="typename">int64</span> <span class="comment">// 8字节</span></div><div class="line"> a <span class="typename">byte</span> <span class="comment">// 1字节</span></div><div class="line"> c <span class="typename">byte</span> <span class="comment">// 1字节</span></div><div class="line"> _ <span class="number">[6</span>]<span class="typename">byte</span> <span class="comment">// 填充6个字节,总共16个字节</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">const</span> arraySize =<span class="number"> 1024</span> *<span class="number"> 1024</span></div><div class="line"></div><div class="line"><span class="keyword">var</span> (</div><div class="line"> nonAlignedArray [arraySize]NonAlignedStruct</div><div class="line"> alignedArray [arraySize]AlignedStruct</div><div class="line"> result <span class="typename">int64</span></div><div class="line">)</div><div class="line"></div><div class="line"><span class="comment">// 初始化数组</span></div><div class="line"><span class="keyword">func</span> init() {</div><div class="line"> <span class="keyword">for</span> i :=<span class="number"> 0</span>; i < arraySize; i++ {</div><div class="line"> nonAlignedArray[i] = NonAlignedStruct{</div><div class="line"> a: <span class="typename">byte</span>(i),</div><div class="line"> b: <span class="typename">int64</span>(i),</div><div class="line"> c: <span class="typename">byte</span>(i),</div><div class="line"> }</div><div class="line"> alignedArray[i] = AlignedStruct{</div><div class="line"> a: <span class="typename">byte</span>(i),</div><div class="line"> b: <span class="typename">int64</span>(i),</div><div class="line"> c: <span class="typename">byte</span>(i),</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">// BenchmarkNonAligned 测试未对齐结构体的性能</span></div><div class="line"><span class="keyword">func</span> BenchmarkNonAligned(b *testing.B) {</div><div class="line"> <span class="keyword">var</span> sum <span class="typename">int64</span></div><div class="line"> b.ResetTimer()</div><div class="line"></div><div class="line"> <span class="keyword">for</span> i :=<span class="number"> 0</span>; i < b.N; i++ {</div><div class="line"> <span class="keyword">for</span> j :=<span class="number"> 0</span>; j < arraySize; j++ {</div><div class="line"> sum += nonAlignedArray[j].b <span class="comment">// 读取未对齐结构体的字段</span></div><div class="line"> }</div><div class="line"> }</div><div class="line"> result = sum <span class="comment">// 防止编译器优化</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">// BenchmarkAligned 测试已对齐结构体的性能</span></div><div class="line"><span class="keyword">func</span> BenchmarkAligned(b *testing.B) {</div><div class="line"> <span class="keyword">var</span> sum <span class="typename">int64</span></div><div class="line"> b.ResetTimer()</div><div class="line"></div><div class="line"> <span class="keyword">for</span> i :=<span class="number"> 0</span>; i < b.N; i++ {</div><div class="line"> <span class="keyword">for</span> j :=<span class="number"> 0</span>; j < arraySize; j++ {</div><div class="line"> sum += alignedArray[j].b <span class="comment">// 读取已对齐结构体的字段</span></div><div class="line"> }</div><div class="line"> }</div><div class="line"> result = sum <span class="comment">// 防止编译器优化</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p><img src="network-4.png" alt=""></p>
<p>可以看到读取对齐的结构体性能要远远好于未对齐的结构体。</p>
<p><img src="network-5.png" alt=""><br><img src="network-6.png" alt=""></p>
<p>很多高性能的库都会采用 CacheLine 优化的数据结构,比如 Java 生态圈知名的 LMAX Disruptor。 Go 标准库中也有类似的优化,让我们一起来看看它的实现和应用场景。</p>
<h2 id="2-_Go_运行时中的_CacheLine">2. Go 运行时中的 CacheLine</h2>
<h3 id="2-1_运行时中的_CacheLinePad">2.1 运行时中的 CacheLinePad</h3>
<p>我们支持,Go 语言支持不同的 CPU 架构,不同的 CPU 架构的缓存行的大小也可能不同,Go 语言是如何统一的呢?<br>方法很简单,就是针对不同的 CPU 架构,定义不同大小的缓存行。</p>
<p>首先定义统一的结构和变量:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// CacheLinePad 用来填充结构体,避免伪共享</span></div><div class="line"><span class="keyword">type</span> CacheLinePad <span class="keyword">struct</span>{ _ [CacheLinePadSize]<span class="typename">byte</span> }</div><div class="line"></div><div class="line"><span class="comment">// CacheLineSize 是 CPU 的缓存行大小,不同的 CPU 架构可能不同.</span></div><div class="line"><span class="comment">// 目前 Go 运行时没有检测真实的缓存行大小,所以代码实现使用每个 GOARCH 的常量 CacheLinePadSize 作为近似值。</span></div><div class="line"><span class="keyword">var</span> CacheLineSize <span class="typename">uintptr</span> = CacheLinePadSize</div></pre></td></tr></table></figure>
<p>然后针对不同的 CPU 架构定义不同的缓存行大小。<br>比如arm64的CPU, 文件<code>go/src/internal/cpu/cpu_arm64.go</code>中定义了缓存行大小为128字节:</p>
<figure class="highlight go"><figcaption><span>go/src/internal/cpu/cpu_arm64.go</span></figcaption><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// CacheLinePadSize is used to prevent false sharing of cache lines.</span></div><div class="line"><span class="comment">// We choose 128 because Apple Silicon, a.k.a. M1, has 128-byte cache line size.</span></div><div class="line"><span class="comment">// It doesn't cost much and is much more future-proof.</span></div><div class="line"><span class="keyword">const</span> CacheLinePadSize =<span class="number"> 128</span></div></pre></td></tr></table></figure>
<p>比如64bit的龙芯, 缓存行大小是64字节,文件<code>go/src/internal/cpu/cpu_loong64.go</code>中定义了缓存行大小为64字节:</p>
<figure class="highlight go"><figcaption><span>go/src/internal/cpu/cpu_loong64.go</span></figcaption><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// CacheLinePadSize is used to prevent false sharing of cache lines.</span></div><div class="line"><span class="comment">// We choose 64 because Loongson 3A5000 the L1 Dcache is 4-way 256-line 64-byte-per-line.</span></div><div class="line"><span class="keyword">const</span> CacheLinePadSize =<span class="number"> 64</span></div></pre></td></tr></table></figure>
<p>又比如x86和amd64的CPU, 缓存行大小是64字节,文件<code>go/src/internal/cpu/cpu_x86.go</code>中定义了缓存行大小为64字节:</p>
<figure class="highlight go"><figcaption><span>go/src/internal/cpu/cpu_x86.go</span></figcaption><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="comment">//go:build 386 || amd64</span></div><div class="line"></div><div class="line"><span class="keyword">package</span> cpu</div><div class="line"></div><div class="line"><span class="keyword">const</span> CacheLinePadSize =<span class="number"> 64</span></div></pre></td></tr></table></figure>
<p>所以Go运行时是根据它支持的不同的 CPU 架构,定义不同的缓存行大小,以此来避免伪共享问题。</p>
<p>但是这个数据结构是定义在Go运行时<code>internal</code>库中,不对外暴露,那么我们怎么用的?</p>
<h3 id="2-2_golang-org/x/sys/cpu">2.2 golang.org/x/sys/cpu</h3>
<p>没关系,Go的扩展库<code>golang.org/x/sys/cpu</code>中提供了<code>CacheLinePad</code>的定义,我们可以直接使用。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> CacheLinePad <span class="keyword">struct</span>{ _ [cacheLineSize]<span class="typename">byte</span> }</div></pre></td></tr></table></figure>
<p>它的实现和Go运行时中的一样,只是把<code>CacheLinePad</code>暴露出来了,所以我们可以在自己的项目中直接使用。</p>
<h3 id="2-3_Go运行时中的应用场景">2.3 Go运行时中的应用场景</h3>
<p>在这个系列的上一篇文章中,我们介绍了<code>treap</code>, <code>treap</code>使用在<code>semTable</code>中,<code>semTable</code>是Go运行时中的一个数据结构,用来管理<code>semaphore</code>的等待队列。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> semaRoot <span class="keyword">struct</span> {</div><div class="line"> lock mutex</div><div class="line"> treap *sudog <span class="comment">// root of balanced tree of unique waiters.</span></div><div class="line"> nwait atomic.Uint32 <span class="comment">// Number of waiters. Read w/o the lock.</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">var</span> semtable semTable</div><div class="line"></div><div class="line"><span class="comment">// Prime to not correlate with any user patterns.</span></div><div class="line"><span class="keyword">const</span> semTabSize =<span class="number"> 251</span></div><div class="line"></div><div class="line"><span class="keyword">type</span> semTable [semTabSize]<span class="keyword">struct</span> {</div><div class="line"> root semaRoot</div><div class="line"> pad [cpu.CacheLinePadSize - unsafe.Sizeof(semaRoot{})]<span class="typename">byte</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>等并发读取<code>semTable</code>时,由于<code>semTable</code>中的<code>root</code>是一个<code>semaRoot</code>结构体,<code>semaRoot</code>中有<code>mutex</code>,<code>treap</code>等字段,这些字段可能会被不同的CPU核心同时访问,导致伪共享问题。<br>为了解决伪共享问题,它增加了一个<code>Pad</code>字段,补齐字段的大小到<code>CacheLineSize</code>,这样就可以避免伪共享问题。当然这里可以确定<code>semaRoot</code>的大小不会超过一个<code>CacheLineSize</code>。</p>
<p><code>mheap</code> 结构体中展示了另外一种场景,将部分字段使用<code>CacheLinePad</code>隔开, 避免<code>arenas</code>字段和上面的字段之间的伪共享问题。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> mheap <span class="keyword">struct</span> {</div><div class="line"> _ sys.NotInHeap</div><div class="line"></div><div class="line"></div><div class="line"> lock mutex</div><div class="line"></div><div class="line"> pages pageAlloc <span class="comment">// page allocation data structure</span></div><div class="line"></div><div class="line"> sweepgen <span class="typename">uint32</span> <span class="comment">// sweep generation, see comment in mspan; written during STW</span></div><div class="line"></div><div class="line"> allspans []*mspan <span class="comment">// all spans out there</span></div><div class="line"></div><div class="line"> pagesInUse atomic.Uintptr <span class="comment">// pages of spans in stats mSpanInUse</span></div><div class="line"> pagesSwept atomic.Uint64 <span class="comment">// pages swept this cycle</span></div><div class="line"> pagesSweptBasis atomic.Uint64 <span class="comment">// pagesSwept to use as the origin of the sweep ratio</span></div><div class="line"> sweepHeapLiveBasis <span class="typename">uint64</span> <span class="comment">// value of gcController.heapLive to use as the origin of sweep ratio; written with lock, read without</span></div><div class="line"> sweepPagesPerByte <span class="typename">float64</span> <span class="comment">// proportional sweep ratio; written with lock, read without</span></div><div class="line"></div><div class="line"> reclaimIndex atomic.Uint64</div><div class="line"></div><div class="line"> reclaimCredit atomic.Uintptr</div><div class="line"></div><div class="line"> _ cpu.CacheLinePad <span class="comment">// prevents false-sharing between arenas and preceding variables</span></div><div class="line"></div><div class="line"> </div><div class="line"> arenas <span class="number">[1</span> << arenaL1Bits]*<span class="number">[1</span> << arenaL2Bits]*heapArena</div><div class="line"></div><div class="line"> ...</div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>go/src/runtime/stack.go</code>中<code>stackpool</code>结构体中也使用了<code>CacheLinePad</code>,展示了另外一种用法:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">var</span> stackpool [_NumStackOrders]<span class="keyword">struct</span> {</div><div class="line"> item stackpoolItem</div><div class="line"> _ [(cpu.CacheLinePadSize - unsafe.Sizeof(stackpoolItem{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]<span class="typename">byte</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>因为item的大小不确定,可能小于一个<code>CacheLineSize</code>,也可能大于一个<code>CacheLineSize</code>,所以这里对<code>CacheLinePad</code>求余,只需补充一个小于<code>CacheLineSize</code>的字节即可。</p>
<p>一般软件开发中,我们不需要关心这些细节,但是当我们需要优化性能时,了解这些底层的实现,可以帮助我们更好的理解和优化程序。</p>
]]></content>
<summary type="html">
<![CDATA[<p>在现代多核处理器中,高效的缓存机制极大地提升了程序性能,而“伪共享”问题却常常导致缓存机制的低效。</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[Go中秘而不宣的数据结构 Treap:随机化的二叉搜索树]]></title>
<link href="https://colobu.com/2024/11/17/go-internal-ds-treap/"/>
<id>https://colobu.com/2024/11/17/go-internal-ds-treap/</id>
<published>2024-11-17T08:19:00.000Z</published>
<updated>2024-11-17T08:58:11.942Z</updated>
<content type="html"><![CDATA[<p><code>treap</code> 是一棵二叉树,它同时维护二叉搜索树 (BST) 和堆的属性, 所以由此得名 (tree + heap ⇒ treap)。</p>
<p>从形式上讲,treap (tree + heap) 是一棵二叉树,其节点包含两个值,一个 <em>key</em> 和一个 <em>priority</em>,这样 <strong>key</strong> 保持 BST 属性,<strong>priority</strong> 是一个保持 heap 属性的随机值(至于是最大堆还是最小堆并不重要)。相对于其他的平衡二叉搜索树,treap的特点是实现简单,且能基本实现随机平衡的结构。属于弱平衡树。</p>
<p><code>treap</code> 由 Raimund Siedel 和 Cecilia Aragon 于 1989 年提出。</p>
<p>treap 通常也被称为“笛卡尔树”,因为它很容易嵌入到笛卡尔平面中:<br><img src="Pasted-image-20241026111752.png" alt=""></p>
<p>具体来说,<code>treap</code> 是一种在二叉树中存储键值对 <code>(X,Y)</code> 的数据结构,其特点是:按 <code>X</code> 值满足二叉搜索树的性质,同时按 <code>Y</code> 值满足二叉堆的性质。如果树中某个节点包含值 <code>(X₀,Y₀)</code>,那么:</p>
<ul>
<li>左子树中所有节点的X值都满足 <code>X ≤ X₀</code> (BST 属性)</li>
<li>右子树中所有节点的X值都满足 <code>X₀ ≤ X</code> (BST 属性)</li>
<li>左右子树中所有节点的Y值都满足 Y ≤ Y₀ (堆属性。这里以最大堆为例)</li>
</ul>
<p>在这种实现中, X是键(同时也是存储在 Treap 中的值),并且 Y称为<strong>优先级</strong>。如果没有优先级,则 treap 将是一个常规的二叉搜索树。</p>
<p>优先级(前提是每个节点的优先级都不相同)的特殊之处在于:它们可以确定性地决定树的最终结构(不会受到插入数据顺序的影响)。这一点是可以通过相关定理来证明的。<br>这里有个巧妙的设计:如果我们随机分配这些优先级值,就能在平均情况下得到一棵比较平衡的树(避免树退化成链表)。这样就能保证主要操作(如查找、插入、删除等)的时间复杂度保持在 O(log N) 水平。<br>正是因为这种随机分配优先级的特点,这种数据结构也被称为"随机二叉搜索树"。</p>
<p><img src="Pasted-image-20241026113542.png" alt=""></p>
<p>Treap维护堆性质的方法用到了旋转,且只需要进行两种旋转操作,因此编程复杂度较红黑树、AVL树要小一些。</p>
<p>红黑树的操作:<br><strong>插入</strong><br><em>以最大堆为例</em><br>给节点随机分配一个优先级,先和二叉搜索树的插入一样,先把要插入的点插入到一个叶子上,然后跟维护堆一样进行以下操作:</p>
<ol>
<li>如果当前节点的优先级比父节点大就进行2. 或3. 的操作</li>
<li>如果当前节点是父节点的左子叶就右旋</li>
<li>如果当前节点是父节点的右子叶就左旋。</li>
</ol>
<p><strong>删除</strong></p>
<p>因为 treap满足堆性质,所以只需要把要删除的节点旋转到叶节点上,然后直接删除就可以了。具体的方法就是每次找到优先级最大的子叶,向与其相反的方向旋转,直到那个节点被旋转到了叶节点,然后直接删除。</p>
<p><strong>查找</strong></p>
<p>和一般的二叉搜索树一样,但是由于 treap的随机化结构,Treap中查找的期望复杂度是 <code>O(logn)</code></p>
<p>以上是 treap 数据结构的背景知识,如果你想了解更多而关于 treap 的知识,你可以参考</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Treap" target="_blank" rel="external">https://en.wikipedia.org/wiki/Treap</a></li>
<li><a href="https://medium.com/carpanese/a-visual-introduction-to-treap-data-structure-part-1-6196d6cc12ee" target="_blank" rel="external">https://medium.com/carpanese/a-visual-introduction-to-treap-data-structure-part-1-6196d6cc12ee</a></li>
<li><a href="https://cp-algorithms.com/data_structures/treap.html" target="_blank" rel="external">https://cp-algorithms.com/data_structures/treap.html</a></li>
</ul>
<h2 id="Go_运行时的_treap_和用途">Go 运行时的 treap 和用途</h2>
<p>在 Go 运行时 <a href="https://github.com/golang/go/blob/master/src/runtime/sema.go#L40" target="_blank" rel="external">sema.go#semaRoot</a> 中,定义了一个数据结构 <code>semaRoot</code>:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> semaRoot <span class="keyword">struct</span> {</div><div class="line"> lock mutex</div><div class="line"> treap *sudog <span class="comment">// 不重复的等待者(goroutine)的平衡树(treap)的根节点</span></div><div class="line"> nwait atomic.Uint32 <span class="comment">// 等待者(goroutine)的数量</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">type</span> sudog <span class="keyword">struct</span> {</div><div class="line"> g *g</div><div class="line"></div><div class="line"> next *sudog</div><div class="line"> prev *sudog</div><div class="line"> elem unsafe.Pointer <span class="comment">// data element (may point to stack)</span></div><div class="line"></div><div class="line"> acquiretime <span class="typename">int64</span></div><div class="line"> releasetime <span class="typename">int64</span></div><div class="line"> ticket <span class="typename">uint32</span></div><div class="line"></div><div class="line"> isSelect <span class="typename">bool</span></div><div class="line"> success <span class="typename">bool</span></div><div class="line"></div><div class="line"> waiters <span class="typename">uint16</span></div><div class="line"></div><div class="line"> parent *sudog <span class="comment">// semaRoot binary tree</span></div><div class="line"> waitlink *sudog <span class="comment">// g.waiting list or semaRoot</span></div><div class="line"> waittail *sudog <span class="comment">// semaRoot</span></div><div class="line"> c *hchan <span class="comment">// channel</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>这是Go语言互斥锁(Mutex)底层实现中的关键数据结构,用于管理等待获取互斥锁的goroutine队列。我们已经知道,在获取 <code>sync.Mutex</code> 时,如果锁已经被其它 goroutine 获取,那么当前请求锁的 goroutine 会被 block 住,就会被放入到这样一个数据结构中 (所以你也知道这个数据结构中的 goroutine 都是唯一的,不重复)。</p>
<p><code>semaRoot</code> 保存了一个平衡树,树中的 <code>sudog</code> 节点都有不同的地址 <code>(s.elem)</code> ,每个 <code>sudog</code> 可能通过 <code>s.waitlink</code> 指向一个链表,该链表包含等待相同地址的其他 <code>sudog</code>。对具有相同地址的 <code>sudog</code> 内部链表的操作时间复杂度都是O(1).。扫描顶层semaRoot列表的时间复杂度是 <code>O(log n)</code>,其中 <code>n</code> 是具有被阻塞goroutine的不同地址的数量(这些地址会散列到给定的semaRoot)。</p>
<p><code>semaRoot</code> 的 <code>treap *sudog</code> 其实就是一个 treap, 我们来看看它的实现。</p>
<h2 id="增加一个元素(入队)">增加一个元素(入队)</h2>
<p>增加一个等待的goroutine(<code>sudog</code>)到 <code>semaRoot</code> 的 <code>treap</code> 中,如果 <code>lifo</code> 为 <code>true</code>,则将 <code>s</code> 替换到 <code>t</code> 的位置,否则将 <code>s</code> 添加到 <code>t</code> 的等待列表的末尾。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (root *semaRoot) queue(addr *<span class="typename">uint32</span>, s *sudog, lifo <span class="typename">bool</span>) {</div><div class="line"> <span class="comment">// 设置这个要加入的节点</span></div><div class="line"> s.g = getg()</div><div class="line"> s.elem = unsafe.Pointer(addr)</div><div class="line"> s.next = <span class="constant">nil</span></div><div class="line"> s.prev = <span class="constant">nil</span></div><div class="line"> s.waiters =<span class="number"> 0</span></div><div class="line"></div><div class="line"> <span class="keyword">var</span> last *sudog</div><div class="line"> pt := &root.treap</div><div class="line"> <span class="comment">// 从根节点开始</span></div><div class="line"> <span class="keyword">for</span> t := *pt; t != <span class="constant">nil</span>; t = *pt { <span class="comment">// ①</span></div><div class="line"> <span class="comment">// 如果地址已经在列表中,则加入到这个地址的链表中</span></div><div class="line"> <span class="keyword">if</span> t.elem == unsafe.Pointer(addr) {</div><div class="line"> <span class="comment">// 如果地址已经在列表中,并且指定了先入后出flag,这是一个替换操作</span></div><div class="line"> <span class="keyword">if</span> lifo {</div><div class="line"> <span class="comment">// 替换操作</span></div><div class="line"> *pt = s</div><div class="line"> s.ticket = t.ticket</div><div class="line"> ... <span class="comment">// 把t的各种信息复制给s</span></div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="comment">// 增加到到等待列表的末尾</span></div><div class="line"> <span class="keyword">if</span> t.waittail == <span class="constant">nil</span> {</div><div class="line"> t.waitlink = s</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> t.waittail.waitlink = s</div><div class="line"> }</div><div class="line"> t.waittail = s</div><div class="line"> s.waitlink = <span class="constant">nil</span></div><div class="line"> <span class="keyword">if</span> t.waiters<span class="number">+1</span> !=<span class="number"> 0</span> {</div><div class="line"> t.waiters++</div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> last = t</div><div class="line"> <span class="comment">// 二叉搜索树查找</span></div><div class="line"> <span class="keyword">if</span> <span class="typename">uintptr</span>(unsafe.Pointer(addr)) < <span class="typename">uintptr</span>(t.elem) { <span class="comment">// ②</span></div><div class="line"> pt = &t.prev</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> pt = &t.next</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 为新节点设置ticket.这个ticket是一个随机值,作为随机堆的优先级,用于保持treap的平衡。</span></div><div class="line"> s.ticket = cheaprand() |<span class="number"> 1</span> <span class="comment">// ③</span></div><div class="line"> s.parent = last</div><div class="line"> *pt = s</div><div class="line"></div><div class="line"> <span class="comment">// 根据优先级(ticket)旋转以保持treap的平衡</span></div><div class="line"> <span class="keyword">for</span> s.parent != <span class="constant">nil</span> && s.parent.ticket > s.ticket { <span class="comment">// ④</span></div><div class="line"> <span class="keyword">if</span> s.parent.prev == s {</div><div class="line"> root.rotateRight(s.parent) <span class="comment">// ⑤</span></div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="keyword">if</span> s.parent.next != s {</div><div class="line"> <span class="built_in">panic</span>(<span class="string">"semaRoot queue"</span>)</div><div class="line"> }</div><div class="line"> root.rotateLeft(s.parent) <span class="comment">// ⑥</span></div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>① 是遍历 treap 的过程,当然它是通过搜索二叉树的方式实现。 <code>addr</code>就是我们一开始讲的treap的key,也就是 <code>s.elem</code>。<br>首先检查 <code>addr</code> 已经在 treap 中,如果存在,那么就把 <code>s</code> 加入到 <code>addr</code> 对应的 <code>sudog</code> 链表中,或者替换掉 <code>addr</code> 对应的 <code>sudog</code>。</p>
<p>这个<code>addr</code>, 如果对于<code>sync.Mutex</code>来说,就是 <code>Mutex.sema</code>字段的地址。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> Mutex <span class="keyword">struct</span> {</div><div class="line"> state <span class="typename">int32</span></div><div class="line"> sema <span class="typename">uint32</span></div><div class="line">}</div></pre></td></tr></table></figure>
<p>所以对于阻塞在同一个<code>sync.Mutex</code>上的goroutine,他们的<code>addr</code>是相同的,所以他们会被加入到同一个<code>sudog</code>链表中。<br>如果是不同的<code>sync.Mutex</code>锁,他们的<code>addr</code>是不同的,那么他们会被加入到这个treap不同的节点。</p>
<p>进而,你可以知道,这个<code>rootSema</code>是维护多个<code>sync.Mutex</code>的等待队列的,可以快速找到不同的<code>sync.Mutex</code>的等待队列,也可以维护同一个<code>sync.Mutex</code>的等待队列。<br>这给了我们启发,如果你有类似的需求,可以参考这个实现。</p>
<p>③就是设置这个节点的优先级,它是一个随机值,用于保持treap的平衡。这里有个技巧就是总是把优先级最低位设置为1,这样保证优先级不为0.因为优先级经常和0做比较,我们将最低位设置为1,就表明优先级已经设置。</p>
<p>④ 就是将这个新加入的节点旋转到合适的位置,以保持treap的平衡。这里的旋转操作就是上面提到的左旋和右旋。稍后看。</p>
<h2 id="移除一个元素(出队)">移除一个元素(出队)</h2>
<p>对应的,还有出对的操作。这个操作就是从treap中移除一个节点,这个节点就是一个等待的goroutine(<code>sudog</code>)。</p>
<p><code>dequeue</code> 搜索并找到在<code>semaRoot</code>中第一个因<code>addr</code>而阻塞的<code>goroutine</code>。<br>比如需要唤醒一个goroutine, 让它继续执行(比如直接将锁交给它,或者唤醒它去争抢锁)。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (root *semaRoot) dequeue(addr *<span class="typename">uint32</span>) (found *sudog, now, tailtime <span class="typename">int64</span>) {</div><div class="line"> ps := &root.treap</div><div class="line"> s := *ps</div><div class="line"> <span class="keyword">for</span> ; s != <span class="constant">nil</span>; s = *ps { <span class="comment">// ①, 二叉搜索树查找</span></div><div class="line"> <span class="keyword">if</span> s.elem == unsafe.Pointer(addr) { <span class="comment">// ②</span></div><div class="line"> <span class="keyword">goto</span> Found</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> <span class="typename">uintptr</span>(unsafe.Pointer(addr)) < <span class="typename">uintptr</span>(s.elem) {</div><div class="line"> ps = &s.prev</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> ps = &s.next</div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span>,<span class="number"> 0</span>,<span class="number"> 0</span></div><div class="line"></div><div class="line">Found: <span class="comment">// ③</span></div><div class="line"> ...</div><div class="line"> <span class="keyword">if</span> t := s.waitlink; t != <span class="constant">nil</span> { <span class="comment">// ④</span></div><div class="line"> *ps = t</div><div class="line"> ...</div><div class="line"> } <span class="keyword">else</span> { <span class="comment">// ⑤</span></div><div class="line"> <span class="comment">// 旋转s到叶节点,以便删除</span></div><div class="line"> <span class="keyword">for</span> s.next != <span class="constant">nil</span> || s.prev != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">if</span> s.next == <span class="constant">nil</span> || s.prev != <span class="constant">nil</span> && s.prev.ticket < s.next.ticket {</div><div class="line"> root.rotateRight(s)</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> root.rotateLeft(s)</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// ⑤ 删除s</span></div><div class="line"> <span class="keyword">if</span> s.parent != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">if</span> s.parent.prev == s {</div><div class="line"> s.parent.prev = <span class="constant">nil</span></div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> s.parent.next = <span class="constant">nil</span></div><div class="line"> }</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> root.treap = <span class="constant">nil</span></div><div class="line"> }</div><div class="line"> tailtime = s.acquiretime</div><div class="line"> }</div><div class="line"> ... <span class="comment">// 清理s的不需要的信息</span></div><div class="line"> <span class="keyword">return</span> s, now, tailtime</div><div class="line">}</div></pre></td></tr></table></figure>
<p>① 是遍历 treap 的过程,当然它是通过搜索二叉树的方式实现。 <code>addr</code>就是我们一开始讲的treap的key,也就是 <code>s.elem</code>。如果找到了,就跳到 <code>Found</code> 标签。如果没有找到,就返回 <code>nil</code>。</p>
<p>④是检查这个地址上是不是有多个等待的goroutine,如果有,就把这个节点替换成链表中的下一个节点。把这个节点从treap中移除并返回。<br>如果就一个goroutine,那么把这个移除掉后,需要旋转treap,直到这个节点被旋转到叶节点,然后删除这个节点。</p>
<p>这里的旋转操作就是上面提到的左旋和右旋。</p>
<h2 id="左旋_rotateLeft">左旋 rotateLeft</h2>
<p><code>rotateLeft</code> 函数将以 <code>x</code> 为根的子树左旋,使其变为 <code>y</code> 为根的子树。<br>左旋之前的结构为 <code>(x a (y b c))</code>,旋转后变为 <code>(y (x a b) c)</code>。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (root *semaRoot) rotateLeft(x *sudog) {</div><div class="line"> <span class="comment">// p -> (x a (y b c))</span></div><div class="line"> p := x.parent</div><div class="line"> y := x.next</div><div class="line"> b := y.prev</div><div class="line"></div><div class="line"> y.prev = x <span class="comment">// ①</span></div><div class="line"> x.parent = y <span class="comment">// ②</span></div><div class="line"> x.next = b <span class="comment">// ③</span></div><div class="line"> <span class="keyword">if</span> b != <span class="constant">nil</span> {</div><div class="line"> b.parent = x <span class="comment">// ④</span></div><div class="line"> }</div><div class="line"></div><div class="line"> y.parent = p <span class="comment">// ⑤</span></div><div class="line"> <span class="keyword">if</span> p == <span class="constant">nil</span> {</div><div class="line"> root.treap = y <span class="comment">// ⑥</span></div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> p.prev == x { <span class="comment">// ⑦</span></div><div class="line"> p.prev = y</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="keyword">if</span> p.next != x {</div><div class="line"> throw(<span class="string">"semaRoot rotateLeft"</span>)</div><div class="line"> }</div><div class="line"> p.next = y</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>具体步骤:</p>
<ul>
<li>将 <code>y</code> 设为 <code>x</code> 的父节点(②),<code>x</code> 设为 <code>y</code> 的左子节点(①)。</li>
<li>将 <code>b</code> 设为 <code>x</code> 的右子节点(③),并更新其父节点为 <code>x</code>(④)。</li>
<li>更新 <code>y</code> 的父节点为 <code>p</code>(⑤),即 <code>x</code> 的原父节点。如果 <code>p</code> 为 nil,则 y 成为新的树根(⑥)。</li>
<li>根据 <code>y</code> 是 <code>p</code> 的左子节点还是右子节点,更新对应的指针(⑦)。</li>
</ul>
<p><img src="Pasted-image-20241026130741.png" alt=""><br>左旋为<br><img src="Pasted-image-20241026130908.png" alt=""></p>
<h2 id="右旋_rotateRight">右旋 rotateRight</h2>
<p>rotateRight 旋转以节点 y 为根的树。<br>将 <code>(y (x a b) c)</code> 变为 <code>(x a (y b c))</code>。</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (root *semaRoot) rotateRight(y *sudog) {</div><div class="line"> <span class="comment">// p -> (y (x a b) c)</span></div><div class="line"> p := y.parent</div><div class="line"> x := y.prev</div><div class="line"> b := x.next</div><div class="line"></div><div class="line"> x.next = y <span class="comment">// ①</span></div><div class="line"> y.parent = x <span class="comment">// ②</span></div><div class="line"> y.prev = b <span class="comment">// ③</span></div><div class="line"> <span class="keyword">if</span> b != <span class="constant">nil</span> {</div><div class="line"> b.parent = y <span class="comment">// ④</span></div><div class="line"> }</div><div class="line"></div><div class="line"> x.parent = p <span class="comment">// ⑤</span></div><div class="line"> <span class="keyword">if</span> p == <span class="constant">nil</span> {</div><div class="line"> root.treap = x <span class="comment">// ⑥</span></div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> p.prev == y { <span class="comment">// ⑦</span></div><div class="line"> p.prev = x</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="keyword">if</span> p.next != y {</div><div class="line"> throw(<span class="string">"semaRoot rotateRight"</span>)</div><div class="line"> }</div><div class="line"> p.next = x</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>具体步骤:</p>
<ul>
<li>将 y 设为 x 的右子节点(①), x 设为 y 的父节点(②)</li>
<li>将 b 设为 y 的左子节点(③),并更新其父节点为 y(④)</li>
<li>更新 x 的父节点为 p(⑤),即 y 的原父节点。如果 p 为 nil,则 x 成为新的树根(⑥)</li>
<li>根据 x 是 p 的左子节点还是右子节点,更新对应的指针(⑦)</li>
</ul>
<p><img src="Pasted-image-20241026132048.png" alt=""><br>右旋为<br><img src="Pasted-image-20241026132245.png" alt=""></p>
<p>理解了左旋和右旋,你就理解了出队代码中这一段为什么把当前节点旋转到叶结点中了:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// 旋转s到叶节点,以便删除</span></div><div class="line"><span class="keyword">for</span> s.next != <span class="constant">nil</span> || s.prev != <span class="constant">nil</span> {</div><div class="line"> <span class="keyword">if</span> s.next == <span class="constant">nil</span> || s.prev != <span class="constant">nil</span> && s.prev.ticket < s.next.ticket {</div><div class="line"> root.rotateRight(s)</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> root.rotateLeft(s)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>整体上看,treap这个数据结构确实简单可维护。左旋和右旋的代码量很少,结合图看起来也容易理解。 出入队的代码也很简单,只是简单的二叉搜索树的操作,加上旋转操作。</p>
<p>这是我介绍的Go秘而不宣的数据结构第三篇,希望你喜欢。你还希望看到Go运行时和标准库中的哪些数据结构呢,欢迎留言。</p>
<p>我会不定期的从关注者列表并点赞文章的同学中选出一位,送出版商和出版社的老师赠送的书,欢迎参与。</p>
]]></content>
<summary type="html">
<![CDATA[<p><code>treap</code> 是一棵二叉树,它同时维护二叉搜索树 (BST) 和堆的属性, 所以由此得名 (tree + heap ⇒ treap)。</p>
<p>从形式上讲,treap (tree + heap) 是一棵二叉树,其节点包含两个值,一个 <e]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[Go中秘而不宣的数据结构 BitVec, 资源优化方法之位向量]]></title>
<link href="https://colobu.com/2024/11/17/go-internal-ds-bitvec/"/>
<id>https://colobu.com/2024/11/17/go-internal-ds-bitvec/</id>
<published>2024-11-17T08:18:48.000Z</published>
<updated>2024-11-17T08:58:20.629Z</updated>
<content type="html"><![CDATA[<p>位图(bitmap)是一种优雅而高效的数据结构,它巧妙地利用了计算机最底层的位运算能力。你可以把它想象成一个巨大的开关阵列,每个开关只有打开和关闭两种状态 —— 这就是位图的本质。每一位都可以独立控制,却又可以通过位运算实现群体操作。</p>
<p>在实际应用中,位图的威力令人惊叹。设想你需要在海量数据中查找重复的数字,传统的哈希表或数组都会占用大量内存。而位图却能巧妙地用一个比特位标记一个数字的出现情况,极大地压缩了存储空间。在处理<strong>10亿个不重复的整数</strong>时,位图仅需要<strong>125MB内存</strong>,相比其他数据结构动辄需要几个GB,效率提升显著。</p>
<p>位图的运用也体现在我们日常使用的数据库系统中。数据库会用位图索引来加速查询,尤其是对于性别、状态这样的枚举字段,一个位图就能快速定位满足条件的记录。比如在电商系统中,快速筛选出"在售且有库存"的商品,位图索引可以通过简单的位与运算瞬间得出结果。</p>
<p>在大规模系统的权限控制中,位图也显示出其独特魅力。用户的各项权限可以编码到不同的位上,判断权限时只需一条位运算指令,既高效又直观。比如一个CMS系统,可以用一个32位的整数表示用户的全部权限状态,包括读、写、管理等多个维度。</p>
<p><strong>布隆过滤器</strong>更是位图思想的精妙应用。它用多个哈希函数在位图上标记数据,能够以极小的内存代价判断一个元素是否可能存在。这在网页爬虫、垃圾邮件过滤等场景下广泛应用。虽然可能有小概率的误判,但在实际应用中往往是可以接受的权衡。</p>
<p>正是由于以上特点,位图在处理<strong>海量数据、状态标记、数据压缩、快速统计</strong>等场景中表现出色。它用最简单的方式解决了最复杂的问题,这正是计算机科学之美的体现。</p>
<a id="more"></a>
<p><code>BitVec</code> 和 <code>BitMap</code> 类似,只是关注点有些不同。<strong>BitVec</strong>更像是位操作的抽象数据类型,它强调的是向量化的位运算操作。比如在Rust语言中, <a href="https://crates.io/crates/bitvec" target="_blank" rel="external">bitvec</a> 提供了一系列方便的接口来进行位操作。而<strong>Bitmap</strong>则更强调其作为"图"的特性,通常用固定大小的位数组来表示集合中元素的存在性。</p>
<p>BitVec 具有以下的优势:</p>
<ul>
<li><strong>空间效率高</strong> - 每个比特位只占用1位(bit)空间,可以表示0或1两种状态</li>
<li><strong>快速的位运算</strong> - 支持AND、OR、XOR等位运算操作,性能很高,甚至可以利用 SIMD 加速</li>
<li><strong>随机访问快</strong> - 可以O(1)时间定位到任意位置的比特位</li>
<li><strong>紧凑存储</strong> - 一个字节(byte)可以存储8个比特位的信息</li>
<li><strong>内存占用小</strong> - 对于数据量大但状态简单的场景很节省内存</li>
</ul>
<h2 id="Go_内部实现的_BitVec">Go 内部实现的 BitVec</h2>
<p>在 Go 运行时的内部, <a href="https://github.com/golang/go/blob/989eed28497cde7145958985f50bb3dd6ab698b6/src/cmd/compile/internal/bitvec/bv.go#L21" target="_blank" rel="external">cmd/compile/internal/bitvec</a> 实现了一个位向量数据结构 <code>BitVec</code>,在 ssa 活跃性分析中使用(bvecSet 封装了 BitVec)。在 <a href="https://github.com/golang/go/blob/master/src/runtime/stack.go#L595" target="_blank" rel="external">runtime/stack.go</a> 中实现了 <code>bitvector</code> 并在内存管理中使用。</p>
<p>我们重点看 <code>BitVec</code>, 它的方法比较全。</p>
<p>BitVec 的结构体定义如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">type</span> BitVec <span class="keyword">struct</span> {</div><div class="line"> N <span class="typename">int32</span> <span class="comment">// 这个向量中包含的bit数</span></div><div class="line"> B []<span class="typename">uint32</span> <span class="comment">// 保存这些bit所需的数组</span></div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">func</span> New(n <span class="typename">int32</span>) BitVec {</div><div class="line"> nword := (n + wordBits -<span class="number"> 1</span>) / wordBits <span class="comment">// 计算保存这些bit所需的最少的数组</span></div><div class="line"> <span class="keyword">return</span> BitVec{n, <span class="built_in">make</span>([]<span class="typename">uint32</span>, nword)}</div><div class="line">}</div></pre></td></tr></table></figure>
<p>然后定义了一批位操作的方法:</p>
<ul>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.And" target="_blank" rel="external">func (dst BitVec) And(src1, src2 BitVec)</a> :对两个位向量进行与操作,结果放入到 dst 位向量中</li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.AndNot" target="_blank" rel="external">func (dst BitVec) AndNot(src1, src2 BitVec)</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Clear" target="_blank" rel="external">func (bv BitVec) Clear()</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Copy" target="_blank" rel="external">func (dst BitVec) Copy(src BitVec)</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Count" target="_blank" rel="external">func (bv BitVec) Count() int</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Eq" target="_blank" rel="external">func (bv1 BitVec) Eq(bv2 BitVec) bool</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Get" target="_blank" rel="external">func (bv BitVec) Get(i int32) bool</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.IsEmpty" target="_blank" rel="external">func (bv BitVec) IsEmpty() bool</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Next" target="_blank" rel="external">func (bv BitVec) Next(i int32) int32</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Not" target="_blank" rel="external">func (bv BitVec) Not()</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Or" target="_blank" rel="external">func (dst BitVec) Or(src1, src2 BitVec)</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Set" target="_blank" rel="external">func (bv BitVec) Set(i int32)</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.String" target="_blank" rel="external">func (bv BitVec) String() string</a></li>
<li><a href="https://pkg.go.dev/cmd/compile/internal/bitvec@go1.23.2#BitVec.Unset" target="_blank" rel="external">func (bv BitVec) Unset(i int32)</a></li>
</ul>
<blockquote>
<p>这里可以看到 Go 内部实现也有一些"不规范"的方法,这些 Receiver 的名字不一致,叫做了 dst、bv、bv 1 三种名称,看起来是有深意的。dst 代表操作最后存储的位向量。不过 bv 1 就有点说不过去了,虽然也能理解,为了和参数中的 bv 2 保持一致。</p>
</blockquote>
<p><img src="Pasted-image-20241103115255.png" alt=""></p>
<p>我们可以挑几个方法看它的实现。</p>
<p>比如 <code>And</code> 方法:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (dst BitVec) And(src1, src2 BitVec) {</div><div class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(src1.B) ==<span class="number"> 0</span> {</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> _, _ = dst.B[<span class="built_in">len</span>(src1.B<span class="number">)-1</span>], src2.B[<span class="built_in">len</span>(src1.B<span class="number">)-1</span>] <span class="comment">// hoist bounds checks out of the loop</span></div><div class="line"></div><div class="line"> <span class="keyword">for</span> i, x := <span class="keyword">range</span> src1.B {</div><div class="line"> dst.B[i] = x & src2.B[i]</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>就是求两个位向量的交集,这里用到了位运算 <code>&</code>。逐个元素进行位与操作,然后存储到 dst 中。</p>
<blockquote>
<p>可以看到如果使用SIMD指令,这里的性能会有很大的提升。</p>
</blockquote>
<p>再比如<code>Not</code>方法:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (bv BitVec) Not() {</div><div class="line"> <span class="keyword">for</span> i, x := <span class="keyword">range</span> bv.B {</div><div class="line"> bv.B[i] = ^x</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> bv.N%wordBits !=<span class="number"> 0</span> {</div><div class="line"> bv.B[<span class="built_in">len</span>(bv.B<span class="number">)-1</span>] &=<span class="number"> 1</span><<<span class="typename">uint</span>(bv.N%wordBits) -<span class="number"> 1</span> <span class="comment">// clear bits past N in the last word</span></div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这里是对位向量取反,用到了位运算 <code>^</code>。然后对最后一个元素进行了特殊处理,清除了多余的位。<br>这里这一句<code>bv.B[len(bv.B)-1] &= 1<<uint(bv.N%wordBits) - 1</code>可能难以理解,其实是为了清除最后一个元素中多余的位,这里的 <code>1<<uint(bv.N%wordBits) - 1</code> 就是一个掩码,用来清除多余的位。</p>
<p>再比如<code>Count</code>方法:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">func</span> (bv BitVec) Count() <span class="typename">int</span> {</div><div class="line"> n :=<span class="number"> 0</span></div><div class="line"> <span class="keyword">for</span> _, x := <span class="keyword">range</span> bv.B {</div><div class="line"> n += bits.OnesCount32(x)</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> n</div><div class="line">}</div></pre></td></tr></table></figure>
<p>这里是统计位向量中 1 的个数,用到了 <code>bits.OnesCount32</code> 方法,这个方法是一个快速计算Uint32中bit为1的个数的方法。</p>
<p>这里的实现都是比较简单的,但是在实际应用中,位向量的操作是非常高效的,可以用来解决很多问题。</p>
<p>如果你的项目中有这种需求,比如你要实现一个布隆过滤器/布谷鸟过滤器,或者你要实现一个高效的权限控制系统,那么位向量是一个非常好的选择。</p>
]]></content>
<summary type="html">
<![CDATA[<p>位图(bitmap)是一种优雅而高效的数据结构,它巧妙地利用了计算机最底层的位运算能力。你可以把它想象成一个巨大的开关阵列,每个开关只有打开和关闭两种状态 —— 这就是位图的本质。每一位都可以独立控制,却又可以通过位运算实现群体操作。</p>
<p>在实际应用中,位图的威力令人惊叹。设想你需要在海量数据中查找重复的数字,传统的哈希表或数组都会占用大量内存。而位图却能巧妙地用一个比特位标记一个数字的出现情况,极大地压缩了存储空间。在处理<strong>10亿个不重复的整数</strong>时,位图仅需要<strong>125MB内存</strong>,相比其他数据结构动辄需要几个GB,效率提升显著。</p>
<p>位图的运用也体现在我们日常使用的数据库系统中。数据库会用位图索引来加速查询,尤其是对于性别、状态这样的枚举字段,一个位图就能快速定位满足条件的记录。比如在电商系统中,快速筛选出"在售且有库存"的商品,位图索引可以通过简单的位与运算瞬间得出结果。</p>
<p>在大规模系统的权限控制中,位图也显示出其独特魅力。用户的各项权限可以编码到不同的位上,判断权限时只需一条位运算指令,既高效又直观。比如一个CMS系统,可以用一个32位的整数表示用户的全部权限状态,包括读、写、管理等多个维度。</p>
<p><strong>布隆过滤器</strong>更是位图思想的精妙应用。它用多个哈希函数在位图上标记数据,能够以极小的内存代价判断一个元素是否可能存在。这在网页爬虫、垃圾邮件过滤等场景下广泛应用。虽然可能有小概率的误判,但在实际应用中往往是可以接受的权衡。</p>
<p>正是由于以上特点,位图在处理<strong>海量数据、状态标记、数据压缩、快速统计</strong>等场景中表现出色。它用最简单的方式解决了最复杂的问题,这正是计算机科学之美的体现。</p>
]]>
</summary>
<category term="Go" scheme="https://colobu.com/categories/Go/"/>
</entry>
<entry>
<title><![CDATA[Go中秘而不宣的数据结构 runq, 难怪运行时调度那么好]]></title>
<link href="https://colobu.com/2024/10/20/go-internal-ds-runq/"/>
<id>https://colobu.com/2024/10/20/go-internal-ds-runq/</id>
<published>2024-10-20T04:17:47.000Z</published>
<updated>2024-10-20T04:19:42.045Z</updated>
<content type="html"><![CDATA[<p>首先,让我们先来回顾 Go 运行时的 GPM 模型。这方面的介绍网上的资料都非常非常多了,但是我们也不妨回顾一下:</p>
<blockquote>
<p>GPM模型中的G代表goroutine。每个goroutine只占用几KB的内存,可以轻松创建成千上万个。G包含了goroutine的栈、指令指针和其他信息,如阻塞channel的等待队列等。</p>
<p>P代表processor,可以理解为一个抽象的CPU核心。P的数量默认等于实际的CPU核心数,但可以通过环境变量进行调整。P维护了一个本地的goroutine队列,还负责执行goroutine并管理与之关联的上下文信息。</p>
<p>M代表machine,是操作系统线程。一个M必须绑定一个P才能执行goroutine。当一个M阻塞时,运行时会创建一个新的M或者复用一个空闲的M来保证P的数量总是等于GOMAXPROCS的值,从而充分利用CPU资源。</p>
<p>在这个模型中,P扮演了承上启下的角色。它连接了G和M,实现了用户层级的goroutine到操作系统线程的映射。这种设计允许Go在用户空间进行调度,避免了频繁的系统调用,大大提高了并发效率。</p>
<p>调度过程中,当一个goroutine被创建时,它会被放到P的本地队列或全局队列中。如果P的本地队列已满,一些goroutine会被放到全局队列。当P执行完当前的goroutine后,会优先从本地队列获取新的goroutine来执行。如果本地队列为空,P会尝试从全局队列或其他P的队列中偷取goroutine。</p>
<p>这种工作窃取(work-stealing)算法确保了负载的动态平衡。当某个P的本地队列为空时,它可以从其他P的队列中窃取一半的goroutine,这有效地平衡了各个P之间的工作负载。</p>
</blockquote>
<a id="more"></a>
<p><img src="gpm.png" alt=""></p>
<p>Go 运行时这么做,主要还是减少 P 之间对获取 goroutine 之间的竞争。本地队列 runq 主要由持有它的 P 进行读写,只有在"被偷"的情况下,才可能有"数据竞争"的问题,而这种情况发生概率较少,所以它设计了一个高效的 <code>runq</code> 数据结构来应对这么场景。实际看起来和上面介绍的 PoolDequeue 有异曲同工之妙。</p>
<blockquote>
<p>本文还会介绍 global queue 等数据结构,但不是本文的重点。</p>
</blockquote>
<h2 id="runq">runq</h2>
<p>在运行时中 <code>P</code> 是一个复杂的数据结构,下面列出了本文关注的它的几个字段:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// 一个goroutine的指针</span></div><div class="line"><span class="keyword">type</span> guintptr <span class="typename">uintptr</span></div><div class="line"></div><div class="line"><span class="comment">//go:nosplit</span></div><div class="line"><span class="keyword">func</span> (gp guintptr) ptr() *g { <span class="keyword">return</span> (*g)(unsafe.Pointer(gp)) }</div><div class="line"></div><div class="line"><span class="comment">//go:nosplit</span></div><div class="line"><span class="keyword">func</span> (gp *guintptr) set(g *g) { *gp = guintptr(unsafe.Pointer(g)) }</div><div class="line"></div><div class="line"><span class="comment">//go:nosplit</span></div><div class="line"><span class="keyword">func</span> (gp *guintptr) cas(old, <span class="built_in">new</span> guintptr) <span class="typename">bool</span> {</div><div class="line"> <span class="keyword">return</span> atomic.Casuintptr((*<span class="typename">uintptr</span>)(unsafe.Pointer(gp)), <span class="typename">uintptr</span>(old), <span class="typename">uintptr</span>(<span class="built_in">new</span>))</div><div class="line">}</div><div class="line"></div><div class="line"><span class="keyword">type</span> p <span class="keyword">struct</span> {</div><div class="line"> id <span class="typename">int32</span></div><div class="line"> status <span class="typename">uint32</span> <span class="comment">// one of pidle/prunning/...</span></div><div class="line"> link puintptr</div><div class="line"> schedtick <span class="typename">uint32</span> <span class="comment">// incremented on every scheduler call</span></div><div class="line"> syscalltick <span class="typename">uint32</span> <span class="comment">// incremented on every system call</span></div><div class="line"> sysmontick sysmontick <span class="comment">// last tick observed by sysmon</span></div><div class="line"> m muintptr <span class="comment">// back-link to associated m (nil if idle)</span></div><div class="line"> mcache *mcache</div><div class="line"> pcache pageCache</div><div class="line"> raceprocctx <span class="typename">uintptr</span></div><div class="line"></div><div class="line"> deferpool []*_defer <span class="comment">// pool of available defer structs (see panic.go)</span></div><div class="line"> deferpoolbuf <span class="number">[32</span>]*_defer</div><div class="line"></div><div class="line"> <span class="comment">// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.</span></div><div class="line"> goidcache <span class="typename">uint64</span></div><div class="line"> goidcacheend <span class="typename">uint64</span></div><div class="line"></div><div class="line"> <span class="comment">// 本地运行的无锁循环队列</span></div><div class="line"> runqhead <span class="typename">uint32</span></div><div class="line"> runqtail <span class="typename">uint32</span></div><div class="line"> runq <span class="number">[256</span>]guintptr</div><div class="line"></div><div class="line"> <span class="comment">// 如果非nil,是一个可优先运行的G</span></div><div class="line"> runnext guintptr</div><div class="line"></div><div class="line"> ...</div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>runq</code> 是一个无锁循环队列,由数组实现,它的长度是 256,这个长度是固定的,不会动态调整。<code>runqhead</code> 和 <code>runqtail</code> 分别是队列的头和尾,<code>runqhead</code> 指向队列的头部,<code>runqtail</code> 指向队列的尾部。<br><code>runq</code> 数组的每个元素是一个 <code>guintptr</code> 类型,它是一个 <code>uintptr</code> 类型的别名,用来存储 <code>g</code> 的指针。</p>
<p><code>runq</code> 的操作主要是 <code>runqput</code>、<code>runqputslow</code>、<code>runqputbatch</code>、<code>runqget</code>、<code>runqdrain</code>、<code>runqgrab</code>、<code>runqsteal</code>等方法。</p>
<p>接下来我们捡重点的方法看一下它是怎么实现高效额度并发读写的.</p>
<h3 id="runqput">runqput</h3>
<p><code>runqput</code> 方法是向 <code>runq</code> 中添加一个 <code>g</code> 的方法,它是一个无锁的操作,不会阻塞。它的实现如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// runqput 尝试将 g 放到本地可运行队列上。</span></div><div class="line"><span class="comment">// 如果 next 为 false,runqput 将 g 添加到可运行队列的尾部。</span></div><div class="line"><span class="comment">// 如果 next 为 true,runqput 将 g 放在 pp.runnext 位置。</span></div><div class="line"><span class="comment">// 如果可运行队列已满,runnext 将 g 放到全局队列上。</span></div><div class="line"><span class="comment">// 只能由拥有 P 的所有者执行。</span></div><div class="line"><span class="keyword">func</span> runqput(pp *p, gp *g, next <span class="typename">bool</span>) {</div><div class="line"> <span class="keyword">if</span> !haveSysmon && next {</div><div class="line"> <span class="comment">// 如果没有 sysmon,我们必须完全避免 runnext,否则会导致饥饿。</span></div><div class="line"> next = <span class="constant">false</span></div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> randomizeScheduler && next && randn<span class="number">(2</span>) ==<span class="number"> 0</span> {</div><div class="line"> <span class="comment">// 如果随机调度器打开,我们有一半的机会避免运行 runnext</span></div><div class="line"> next = <span class="constant">false</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 如果 next 为 true,优先处理 runnext</span></div><div class="line"> <span class="comment">// 将当前的goroutine放到 runnext 中, 如果原来runnext中有goroutine, 则将其放到runq中</span></div><div class="line"> <span class="keyword">if</span> next {</div><div class="line"> retryNext:</div><div class="line"> oldnext := pp.runnext</div><div class="line"> <span class="keyword">if</span> !pp.runnext.cas(oldnext, guintptr(unsafe.Pointer(gp))) {</div><div class="line"> <span class="keyword">goto</span> retryNext</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> oldnext ==<span class="number"> 0</span> {</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> <span class="comment">// Kick the old runnext out to the regular run queue.</span></div><div class="line"> gp = oldnext.ptr()</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 重点来了,将goroutine放入runq中</span></div><div class="line">retry:</div><div class="line"> h := atomic.LoadAcq(&pp.runqhead) <span class="comment">// ①</span></div><div class="line"> t := pp.runqtail</div><div class="line"> <span class="keyword">if</span> t-h < <span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq)) { <span class="comment">// ② 如果队列未满</span></div><div class="line"> pp.runq[t%<span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq))].set(gp) <span class="comment">// ③ 将goroutine放入队列</span></div><div class="line"> atomic.StoreRel(&pp.runqtail, t<span class="number">+1</span>) <span class="comment">// ④ 更新队尾</span></div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> runqputslow(pp, gp, h, t) { <span class="comment">// ⑤ 如果队列满了,调用runqputslow 尝试将goroutine放入全局队列</span></div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> <span class="comment">// 如果队列未满,上面的操作应该已经成功返回,否则重试</span></div><div class="line"> <span class="keyword">goto</span> retry</div><div class="line">}</div></pre></td></tr></table></figure>
<p><code>runqput</code> 方法的实现非常简单,它首先判断是否需要优先处理 <code>runnext</code>,如果需要,就将 <code>g</code> 放到 <code>runnext</code> 中,然后再将 <code>g</code> 放到 <code>runq</code> 中。<br><code>runq</code> 的操作是无锁的,它通过 <code>atomic</code> 包提供的原子操作来实现。<br>这里使用的内部的更精细化的原子操作,这个也是我后面专门有一篇文章来讲解的。你现在大概把①、④ 理解为<code>Load</code>、<code>Store</code>操作即可。</p>
<p>②、⑤ 分别处理本地队列未满和队列已满的情况,如果队列未满,就将 <code>g</code> 放到队列中,然后更新队尾;如果队列已满,就调用 <code>runqputslow</code> 方法,将 <code>g</code> 放到全局队列中。</p>
<p>③ 处直接将 <code>g</code> 放到队列中,这是因为只有当前的 <code>P</code> 才能操作 <code>runq</code>,所以不会有并发问题。<br>同时我们也可以看到,我们总是往尾部插入, <code>t</code>总是一直增加的, 取余操作保证了循环队列的特性。</p>
<p><code>runqputslow</code> 会把本地队列中的一半的 <code>g</code> 放到全局队列中,包括当前要放入的 <code>g</code>。一旦涉及到全局队列,就会有一定的竞争,Go运行时使用了一把锁来控制并发,所以 <code>runqputslow</code> 方法是一个慢路径,是性能的瓶颈点。</p>
<h3 id="runqputbatch">runqputbatch</h3>
<p><code>func runqputbatch(pp *p, q *gQueue, qsize int)</code> 是批量往本地队列中放入 <code>g</code> 的方法,比如它从其它 <code>P</code> 那里偷来一批 <code>g</code> ,需要放到本地队列中,就会调用这个方法。它的实现如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// runqputbatch 尝试将 q 上的所有 G 放到本地可运行队列上。</span></div><div class="line"><span class="comment">// 如果队列已满,它们将被放到全局队列上;在这种情况下,这将暂时获取调度器锁。</span></div><div class="line"><span class="comment">// 只能由拥有 P 的所有者执行。</span></div><div class="line"><span class="keyword">func</span> runqputbatch(pp *p, q *gQueue, qsize <span class="typename">int</span>) {</div><div class="line"> h := atomic.LoadAcq(&pp.runqhead) <span class="comment">// ①</span></div><div class="line"> t := pp.runqtail</div><div class="line"> n := <span class="typename">uint32</span><span class="number">(0</span>)</div><div class="line"> <span class="keyword">for</span> !q.empty() && t-h < <span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq)) { <span class="comment">// ② 放入的批量goroutine非空, 并且本地队列还足以放入</span></div><div class="line"> gp := q.pop()</div><div class="line"> pp.runq[t%<span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq))].set(gp)</div><div class="line"> t++</div><div class="line"> n++</div><div class="line"> }</div><div class="line"> qsize -= <span class="typename">int</span>(n)</div><div class="line"></div><div class="line"> <span class="keyword">if</span> randomizeScheduler { <span class="comment">// ③ 随机调度器, 随机打乱</span></div><div class="line"> off := <span class="keyword">func</span>(o <span class="typename">uint32</span>) <span class="typename">uint32</span> {</div><div class="line"> <span class="keyword">return</span> (pp.runqtail + o) % <span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq))</div><div class="line"> }</div><div class="line"> <span class="keyword">for</span> i := <span class="typename">uint32</span><span class="number">(1</span>); i < n; i++ {</div><div class="line"> j := cheaprandn(i +<span class="number"> 1</span>)</div><div class="line"> pp.runq[off(i)], pp.runq[off(j)] = pp.runq[off(j)], pp.runq[off(i)]</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> atomic.StoreRel(&pp.runqtail, t) <span class="comment">// ④ 更新队尾</span></div><div class="line"> <span class="keyword">if</span> !q.empty() {</div><div class="line"> lock(&sched.lock)</div><div class="line"> globrunqputbatch(q, <span class="typename">int32</span>(qsize))</div><div class="line"> unlock(&sched.lock)</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>①获取队列头,使用原子操作获取队头。</p>
<blockquote>
<p>它下面一行是获取队尾的值,你可以思考下为什么不需要使用<code>atomic.LoadAcq</code>。</p>
</blockquote>
<p>② 逐个的将 <code>g</code> 放到队列中,直到放完或者放满。</p>
<p>如果是随机调度器,则使用混淆算法将队列中的 <code>g</code> 随机打乱。</p>
<p>最后如果队列还有剩余的 <code>g</code>,则调用 <code>globrunqputbatch</code> 方法,将剩余的 <code>g</code> 放到全局队列中。</p>
<h3 id="runqget">runqget</h3>
<p><code>runqget</code> 方法是从 <code>runq</code> 中获取一个 <code>g</code> 的方法,它是一个无锁的操作,不会阻塞。它的实现如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// runqget 从本地可运行队列中获取一个 G。</span></div><div class="line"><span class="comment">// 如果 inheritTime 为 true,gp 应该继承当前时间片的剩余时间。</span></div><div class="line"><span class="comment">// 否则,它应该开始一个新的时间片。</span></div><div class="line"><span class="comment">// 只能由拥有 P 的所有者执行。</span></div><div class="line"><span class="keyword">func</span> runqget(pp *p) (gp *g, inheritTime <span class="typename">bool</span>) {</div><div class="line"> next := pp.runnext</div><div class="line"> <span class="comment">// 如果有 runnext,优先处理 runnext</span></div><div class="line"> <span class="keyword">if</span> next !=<span class="number"> 0</span> && pp.runnext.cas(next,<span class="number"> 0</span>) { <span class="comment">// ①</span></div><div class="line"> <span class="keyword">return</span> next.ptr(), <span class="constant">true</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">for</span> {</div><div class="line"> h := atomic.LoadAcq(&pp.runqhead) <span class="comment">// ② 获取队头</span></div><div class="line"> t := pp.runqtail</div><div class="line"> <span class="keyword">if</span> t == h { <span class="comment">// ③ 队列为空</span></div><div class="line"> <span class="keyword">return</span> <span class="constant">nil</span>, <span class="constant">false</span></div><div class="line"> }</div><div class="line"> gp := pp.runq[h%<span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq))].ptr() <span class="comment">// ④ 获取队头的goroutine</span></div><div class="line"> <span class="keyword">if</span> atomic.CasRel(&pp.runqhead, h, h<span class="number">+1</span>) { <span class="comment">// ⑤ 更新队头</span></div><div class="line"> <span class="keyword">return</span> gp, <span class="constant">false</span></div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>① 如果有 <code>runnext</code>,则优先处理 <code>runnext</code>,将 <code>runnext</code> 中的 <code>g</code> 取出来。</p>
<p>② 获取队列头。 如果 ③ 队列为空,直接返回。</p>
<p>④ 获取队头的 <code>g</code>,这就是要读取的 <code>g</code>。</p>
<p>⑤ 更新队头,这里使用的是 <code>atomic.CasRel</code> 方法,它是一个原子的 <code>Compare-And-Swap</code> 操作,用来更新队头。</p>
<p>可以看到这里只使用到了队列头<code>runqhead</code>。</p>
<h3 id="runqdrain">runqdrain</h3>
<p><code>runqdrain</code> 方法是从 <code>runq</code> 中获取所有的 <code>g</code> 的方法,它是一个无锁的操作,不会阻塞。它的实现如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// runqdrain 从 pp 的本地可运行队列中获取所有的 G 并返回。</span></div><div class="line"><span class="comment">// 只能由拥有 P 的所有者执行。</span></div><div class="line"><span class="keyword">func</span> runqdrain(pp *p) (drainQ gQueue, n <span class="typename">uint32</span>) {</div><div class="line"> oldNext := pp.runnext</div><div class="line"> <span class="keyword">if</span> oldNext !=<span class="number"> 0</span> && pp.runnext.cas(oldNext,<span class="number"> 0</span>) {</div><div class="line"> drainQ.pushBack(oldNext.ptr()) <span class="comment">// ① 将 runnext 中的goroutine放入队列</span></div><div class="line"> n++</div><div class="line"> }</div><div class="line"></div><div class="line">retry:</div><div class="line"> h := atomic.LoadAcq(&pp.runqhead) <span class="comment">// ② 获取队头</span></div><div class="line"> t := pp.runqtail</div><div class="line"> qn := t - h</div><div class="line"> <span class="keyword">if</span> qn ==<span class="number"> 0</span> {</div><div class="line"> <span class="keyword">return</span></div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> qn > <span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq)) { <span class="comment">// ③ 居然超出队列的长度了?</span></div><div class="line"> <span class="keyword">goto</span> retry</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> !atomic.CasRel(&pp.runqhead, h, h+qn) { <span class="comment">// ④ 更新队头</span></div><div class="line"> <span class="keyword">goto</span> retry</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// ⑤ 将队列中的goroutine放入队列drainQ中</span></div><div class="line"> <span class="keyword">for</span> i := <span class="typename">uint32</span><span class="number">(0</span>); i < qn; i++ {</div><div class="line"> gp := pp.runq[(h+i)%<span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq))].ptr()</div><div class="line"> drainQ.pushBack(gp)</div><div class="line"> n++</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span></div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="runqgrab">runqgrab</h3>
<p><code>runqgrab</code> 方法是从 <code>runq</code> 中获取一半的 <code>g</code> 的方法,它是一个无锁的操作,不会阻塞。它的实现如下:</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// runqgrab 从 pp 的本地可运行队列中获取一半的 G 并返回。</span></div><div class="line"><span class="comment">// Batch 是一个环形缓冲区,从 batchHead 开始。</span></div><div class="line"><span class="comment">// 返回获取的 goroutine 数量。</span></div><div class="line"><span class="comment">// 可以由任何 P 执行。</span></div><div class="line"><span class="keyword">func</span> runqgrab(pp *p, batch *<span class="number">[256</span>]guintptr, batchHead <span class="typename">uint32</span>, stealRunNextG <span class="typename">bool</span>) <span class="typename">uint32</span> {</div><div class="line"> <span class="keyword">for</span> {</div><div class="line"> h := atomic.LoadAcq(&pp.runqhead) <span class="comment">// load-acquire, synchronize with other consumers</span></div><div class="line"> t := atomic.LoadAcq(&pp.runqtail) <span class="comment">// load-acquire, synchronize with the producer</span></div><div class="line"> n := t - h</div><div class="line"> n = n - n<span class="number">/2</span> <span class="comment">// ① 取一半的goroutine</span></div><div class="line"> <span class="keyword">if</span> n ==<span class="number"> 0</span> {</div><div class="line"> <span class="keyword">if</span> stealRunNextG {</div><div class="line"> <span class="comment">// ② 如果要偷取runnext中的goroutine</span></div><div class="line"> <span class="keyword">if</span> next := pp.runnext; next !=<span class="number"> 0</span> {</div><div class="line"> <span class="keyword">if</span> pp.status == _Prunning {</div><div class="line"> <span class="comment">// ② 如果要偷取runnext中的goroutine,这里会sleep一会</span></div><div class="line"> <span class="keyword">if</span> !osHasLowResTimer {</div><div class="line"> usleep<span class="number">(3</span>)</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> osyield()</div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> !pp.runnext.cas(next,<span class="number"> 0</span>) {</div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"> batch[batchHead%<span class="typename">uint32</span>(<span class="built_in">len</span>(batch))] = next</div><div class="line"> <span class="keyword">return</span><span class="number"> 1</span></div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span><span class="number"> 0</span></div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> n > <span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq)<span class="number">/2</span>) { <span class="comment">// ③ 如果要偷取的goroutine数量超过一半, 重试</span></div><div class="line"> <span class="keyword">continue</span></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// ④ 将队列中至多一半的goroutine放入batch中</span></div><div class="line"> <span class="keyword">for</span> i := <span class="typename">uint32</span><span class="number">(0</span>); i < n; i++ {</div><div class="line"> g := pp.runq[(h+i)%<span class="typename">uint32</span>(<span class="built_in">len</span>(pp.runq))]</div><div class="line"> batch[(batchHead+i)%<span class="typename">uint32</span>(<span class="built_in">len</span>(batch))] = g</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> atomic.CasRel(&pp.runqhead, h, h+n) { <span class="comment">// ⑤ 更新队头</span></div><div class="line"> <span class="keyword">return</span> n</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>① 取一半的 <code>g</code>,这里是一个简单的算法,取一半的 <code>g</code>。</p>
<p>② 如果要偷取 <code>runnext</code> 中的 <code>g</code>,则会尝试偷取 <code>runnext</code> 中的 <code>g</code>。</p>
<p>③ 如果要偷取的 <code>g</code> 数量超过一半,则重试。</p>
<p>④ 将队列中至多一半的 <code>g</code> 放入 <code>batch</code> 中。</p>
<p>⑤ 更新队头,这里使用的是 <code>atomic.CasRel</code> 方法,它是一个原子的 <code>Compare-And-Swap</code> 操作,用来更新队头。</p>
<h3 id="runqsteal">runqsteal</h3>