Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merge gpugraph to develop #48507

Merged
merged 453 commits into from
Dec 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
453 commits
Select commit Hold shift + click to select a range
949fdb7
merge gpugraph to develop, fix code style
lxsbupt Dec 6, 2022
125b08c
update for untrainable params for stage3. (#48577)
wuhuachaocoding Dec 6, 2022
de1e3da
merge gpugraph to develop, trigger ci
lxsbupt Dec 6, 2022
b7718f3
[CodeStyle][isort][Dy2St] sort imports in test_error (#48746)
SigureMo Dec 6, 2022
a26c439
Clear extra input (Bias, ResidualData) in OpMaker of conv2d (#47579)
zyfncg Dec 6, 2022
426bb6e
make bilinear interpolate stable. (#48644)
2742195759 Dec 6, 2022
dbe0595
clear tmp var in ptq (#48660)
ceci3 Dec 6, 2022
deeefb9
merge gpugraph to develop, fix py-api comment
lxsbupt Dec 6, 2022
edd4f56
merge gpugraph to develop, fix mac-python3
lxsbupt Dec 7, 2022
7a451ca
merge gpugraph to develop, fix mac-python3
lxsbupt Dec 7, 2022
fb12bde
[Dy2St] replace deprecated `load_module` with `exec_module` (#48679)
SigureMo Dec 7, 2022
772b0aa
merge gpugraph to develop, fix mac-python3
lxsbupt Dec 7, 2022
1d6f12c
modify d2d copy to xpu::copy in xpu kernel, test=kunlun (#48710)
zhangyk0314 Dec 7, 2022
50f22da
rm _test_eager_guard (#48767)
veyron95 Dec 7, 2022
c490276
delete sampling_id api (#48543)
201716010711 Dec 7, 2022
e6389e1
[NPU] add FLAGS_npu_storage_format env to enable npu storage format, …
qili93 Dec 7, 2022
ef765ea
optimize nchw<->nhwc kernel in fp16 model (#48692)
zhoutianzi666 Dec 7, 2022
81c9bc4
fix: oss just support sm>=75 (#48731)
Dec 7, 2022
424eb29
update kl1 op list and optimize matmul unitest for kunlun (#48775)
QingshuChen Dec 7, 2022
50e7947
Fix accuracy fp16 kernel return fp32 tensor error (#48803)
0x45f Dec 7, 2022
894c407
[phi::DenseTensor] Replace Tensor with phi::DenseTensor (#48682)
Liyulingyue Dec 7, 2022
d4fb5de
[Zero-Dim] Support 0D for paddle.diagflat (#48735)
Courtesy-Xs Dec 7, 2022
9733913
【fluid api clear】Move batch norm1 (#47965)
xiaoguoguo626807 Dec 7, 2022
317f1f9
[remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceCo…
wangzhen38 Dec 7, 2022
899e042
merge gpugraph to develop, rollback graph_send_recv
lxsbupt Dec 7, 2022
a9f17b9
fix ci (#48730)
zhwesky2010 Dec 7, 2022
c9c1685
Remove reduntant numpy output in Example code (1/3), test=document_fi…
kevinng77 Dec 7, 2022
e63a765
修改了英文API文档 (#48219)
Atlantisming Dec 7, 2022
49c5459
[PHI] Migrate squeeze and squeeze_grad kernels (#48634)
Silv3S Dec 7, 2022
34f5bf5
修复paddle.nn.functinal包和paddle.nn包下API文档 (#48581)
huajiao-hjyp Dec 7, 2022
683e69b
assign cve number to pdsa, test=document_fix (#48846)
VigiZhang Dec 7, 2022
48327e8
[fluid remove]: remove paddle.fluid.layers.yolo_box and paddle.fluid.…
zhengqiwen1997 Dec 7, 2022
4fa11d0
merge gpugraph to develop, fix windows compile
lxsbupt Dec 7, 2022
b42a6b0
merge gpugraph to develop, fix windows compile
lxsbupt Dec 7, 2022
613f3d7
merge gpugraph to develop, fix windows compile
lxsbupt Dec 8, 2022
a37f87a
Try add eval() to speedup the eigen performance. (#48855)
Xreki Dec 8, 2022
b2be092
[Fluid Clean]move inplace_apis_indygraph_only from paddle.flud.dygrap…
risemeup1 Dec 8, 2022
67c9c40
merge gpugraph to develop, fix windows compile
lxsbupt Dec 8, 2022
da8fb52
clean fluid task: transfer gaussian random api (#48529)
201716010711 Dec 8, 2022
c05dee7
Delete duplicate quant nodes in QAT (#48751)
yghstill Dec 8, 2022
9c8aba8
rm autograd func dynamic eager tests (#48788)
yjjiang11 Dec 8, 2022
79a37ca
Setuptools optimization (#48770)
risemeup1 Dec 8, 2022
4ee9e1f
[CodeStyle][F811] fix some test cases shadowed by the same name (#48745)
SigureMo Dec 8, 2022
09ec758
set free_when_no_cache_hit default value to true (#48815)
wanghuancoder Dec 8, 2022
44d0523
[Clean Fluid] Rm and mv some fluid dygrah apis (#48576)
sljlp Dec 8, 2022
c464488
[Inference] inference add cinn interface (#48741)
jiweibo Dec 8, 2022
49c1dfd
Clean and migrate fluid APIs of paddle.fluid.layers.control_flow (#48…
GhostScreaming Dec 8, 2022
b6329a0
remove gpu_info.h from phi dependencies (#48811)
Patrick-Star125 Dec 8, 2022
3313074
[Paddle Inference] Add add onehot trt converter (#48655)
zrr1999 Dec 8, 2022
05c45e4
[PHI decoupling] remove bbox_util.h from phi dependencies (#48761)
Patrick-Star125 Dec 8, 2022
c11612c
Optimize Paddle diagonal (#47904)
201716010711 Dec 8, 2022
fab9f0c
[API Clean]Clean __all__ to avoid exposing usless API (#48713)
Aurelius84 Dec 8, 2022
142eced
Clean fluid APIs in distributed and fleet files (#48851)
GhostScreaming Dec 8, 2022
9af653a
rm kunlun xpu2_op_list (#48826)
QingshuChen Dec 8, 2022
318e58a
remove detection_output, iou_similarity and bipartite_match (#48773)
zhengqiwen1997 Dec 8, 2022
5fa9f3c
Set WaiterType of kGpuSync to kCPU (#48758)
From00 Dec 8, 2022
94d15ea
[Migrate Fluid] Migrate Decoder, BeamSearchDecoder (#48754)
FrostML Dec 8, 2022
2fd0109
[Inference] Enable infer shape cache. (#48312)
jiweibo Dec 8, 2022
d7f5506
[Fluid Clean] remove unfold, deformable_roi_pooling, shard_index, har…
heyanru01 Dec 8, 2022
469bcc2
fix-gpups setup.py (#48888)
tianshuo78520a Dec 8, 2022
dccc42e
[PHI decoupling] move cuda_graph from fluid to phi (#48686)
huangjiyi Dec 8, 2022
4a52dd6
fix english docs typo errors (#48599)
enkilee Dec 8, 2022
055e038
[XPU] add load op into oplist. (#48860)
houj04 Dec 8, 2022
01f7717
【fluid clean】remove fluid.dygraph.rnn.lstmcell and fluid.dygraph.rnn.…
lugimzzz Dec 8, 2022
c409eca
refine bsd doc (#48882)
FrostML Dec 8, 2022
f41809b
[Paddle Inference] General optimization for no_varlen embedding layer…
Wangzheee Dec 8, 2022
ae61a7f
fix tmp directories (#48863)
sneaxiy Dec 8, 2022
082886c
rm dygraph_to_static eager guard tests part2 minst2ptb_lm (#48793)
yjjiang11 Dec 8, 2022
29acb99
merge gpugraph to develop, fix the_one_ps.py for gpups
lxsbupt Dec 8, 2022
99a9dcd
[remove fluid] under unittesets of linear api (#48564)
wangzhen38 Dec 8, 2022
81771d1
[remove fluid.layers.cross_entropy] remove unit tests (part 1) (#48726)
kangguangli Dec 8, 2022
e6a5486
proper fix (#48360)
jakpiase Dec 8, 2022
2083b31
[remove fluid.layers.matmul] remove fluid.layers.matmul in example co…
kangguangli Dec 8, 2022
26634c7
remove test_auto_search_dist_matmul_op.py (#48794)
kangguangli Dec 8, 2022
64bbb4e
delete mean api (#48764)
201716010711 Dec 8, 2022
fa65fed
clean test_op_name_conflict (#48704)
jiahy0825 Dec 8, 2022
3fbbee7
opt kernel_selection error msg (#48864)
jiahy0825 Dec 8, 2022
9f1b2b1
rewrite delete_weight_dequant_linear_op_encoder/decoder pass (#48650)
RichardWooSJTU Dec 8, 2022
ccbc5a4
[XPU] add set_value and set_value_grad (#48845)
HarperCy Dec 8, 2022
c4d70e2
merge gpugraph to develop, fix gpups ut
lxsbupt Dec 8, 2022
816065f
Add QuantizedMatmul in QAT (#47997)
RachelXu7 Dec 8, 2022
90d11c5
fix 'BlasAXPBY unimplemented' error with custom device (#48762)
USTCKAY Dec 8, 2022
02a8fed
first commit (#38143)
JamesLim-sy Dec 8, 2022
4385ad1
[Auto Parallel] Add cluster partition and dm to pm (#48320)
CjhHa1 Dec 8, 2022
1569f19
fix paddle2cinn float16 type support bug (#48249)
thisjiang Dec 8, 2022
23e83ab
remove pool2d from fluid (#48512)
ccrrong Dec 8, 2022
329ee31
[fluid remove]: remove paddle.fluid.layers.detection_map, paddle.flui…
zhengqiwen1997 Dec 8, 2022
124cc7d
[PHI decoupling] move "flags.h" from fluid to phi (#48696)
AndPuQing Dec 9, 2022
b4ca40d
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 9, 2022
5c52e38
add set_lr & get_lr for stage2 optimizer. (#48857)
wuhuachaocoding Dec 9, 2022
47c2821
move share_buffer kernel to phi (#48858)
zhiqiu Dec 9, 2022
0c6a823
[Kernel Selection] Simplify kernel selection process in phi, reduce s…
jiahy0825 Dec 9, 2022
c6d2a2f
Support static graph code-gen for scalar and int_array (#48792)
zyfncg Dec 9, 2022
a0385cf
clean unittest test_model_cast_to_bf16 (#48705)
jiahy0825 Dec 9, 2022
842008c
rm dy2static eager tests part1 bert2loop (#48790)
yjjiang11 Dec 9, 2022
7dd8bed
rm dygraph_to_static eager guard tests part3 reinforce2yolo (#48795)
yjjiang11 Dec 9, 2022
5cb67ed
rm distribution uniform eager guard test (#48768)
yjjiang11 Dec 9, 2022
59ddcb5
replace cross_entropy in python/paddle/fluid/tests/unittests/test_[a-…
kangguangli Dec 9, 2022
424c612
replace cross_entropy except in python/paddle/fluid/tests/unittests/*…
kangguangli Dec 9, 2022
fd5124a
[Paddle Inference]add cutlass act set in conv_elementwise_add_act_fus…
zhoutianzi666 Dec 9, 2022
fd77ed5
move fluid.layers.create_global_var to static.create_global_var (#48777)
cyber-pioneer Dec 9, 2022
3d35a3a
Modified the Kernel policy. When the compute is NHWC (#48563)
AnnaTrainingG Dec 9, 2022
5ebb634
temporally disable set_value (#48942)
HarperCy Dec 9, 2022
3c4e040
xpu support inplace flatten (#48909)
XiaociZhang Dec 9, 2022
d1986c9
fix:vit_attention ut (#48884)
Dec 9, 2022
515fcb5
mv fused_bias_dropout_residual_ln to fluid manual dir (#48824)
veyron95 Dec 9, 2022
814d825
bug fix (#48829)
b3602sss Dec 9, 2022
9896b29
move ops_extra_info_gen.py from phi to fluid (#48926)
huangjiyi Dec 9, 2022
b561b32
fix scale type in alpha and beta (#48887)
MARD1NO Dec 9, 2022
e0cce16
[inference][trt] upgrade prelu op (#48528)
zhangjun Dec 9, 2022
8d1262c
对多个文档按照要求修改 对应中文的#5453 (#48886)
yjphhw Dec 9, 2022
793b76b
replace cross_entropy in python/paddle/fluid/tests/unittests/*.py exc…
kangguangli Dec 9, 2022
a988eec
[remove fluid] Remove fluid APIs (#48641)
FrostML Dec 9, 2022
67795d4
[CodeStyle] fix renamed files not being monitored by Codestyle Check …
SigureMo Dec 9, 2022
e181a10
[fluid remove]: remove paddle.fluid.layers.box_coder and paddle.fluid…
zhengqiwen1997 Dec 9, 2022
9e6b632
[Custom XPU Support] Custom extension support xpu backend (#48733)
jiahy0825 Dec 9, 2022
df43ce7
rm mlu ops eager guard tests (#48769)
yjjiang11 Dec 9, 2022
a9507bc
rm npu instance_np op for eager guard tests (#48785)
yjjiang11 Dec 9, 2022
e2f765a
remove xpu eager guard tests (#48786)
yjjiang11 Dec 9, 2022
7425d22
[remove fluid.layers.cross_entropy] remove unit tests (part 3) (#48918)
kangguangli Dec 9, 2022
0b53147
[Inference] optimize some code and fix some bug (#48780)
yuanlehome Dec 9, 2022
957bf79
[PHI] Migrate reshape kernel (#48749)
Silv3S Dec 9, 2022
baffe5c
support py3 in setup.py (#48905)
risemeup1 Dec 9, 2022
5f3aacf
[Paddle-TRT] add cast between int64 tensor and Paddle-TRT (#45547)
zhoutianzi666 Dec 10, 2022
a4d8f11
fix sharding_stage1 amp O2 decorate bug (#48960)
sneaxiy Dec 10, 2022
96be852
[remove fluid] fluid dygraph Embedding (#48806)
wangzhen38 Dec 10, 2022
b4ea24d
fix for mkldnn (#48852)
jiweibo Dec 11, 2022
2d8bd16
H2D data transfer optimization with usage of structure type for stack…
JamesLim-sy Dec 11, 2022
09c7512
rm accuracy and auc in extra __all__ (#48986)
sljlp Dec 12, 2022
28db99a
Add dynamic checks for collective communication on NCCL (#48915)
HermitSun Dec 12, 2022
789e764
support sharding in fp16 on xpu, (#48897)
sljlp Dec 12, 2022
fe211ea
Support cross-step stream synchronization for standalone executor (#4…
From00 Dec 12, 2022
3bdfe12
Generate static graph code of some ops by yaml (#48771)
heavyrain-lzy Dec 12, 2022
e887f93
fix a bug in GetTrtWeight (#48993)
zhoutianzi666 Dec 12, 2022
3a7337b
add static_ops.yaml for static op (#48991)
zyfncg Dec 12, 2022
30068ff
[PHI decoupling] move norm_utils.cu.h from fluid to phi and remove no…
huangjiyi Dec 12, 2022
04424a3
forbid conv op whose weight is not a persistable weight into Paddle-T…
zhoutianzi666 Dec 12, 2022
92d57d8
fix: Move the pass location to the appropriate location (#48951)
Dec 12, 2022
30b1c1a
Enhance check_nan_inf implementation for CPU. (#48591)
Xreki Dec 12, 2022
321b719
[PHI] OneDNN version of Copy (#48539)
paulinagacek Dec 12, 2022
d550a0a
fix: there are some bugs with trt 8.0 (#48921)
Dec 12, 2022
5bc27b6
Optimization of Eigh op with ssyevj_batched runtime api (#48560)
Courtesy-Xs Dec 12, 2022
4794e8e
replace cross_entropy in python/paddle/fluid/tests/unittests/*/*.py e…
kangguangli Dec 12, 2022
e671824
[PHI decoupling] replace dependency of inclusive_scan.h from phi (#48…
Patrick-Star125 Dec 12, 2022
0a5af21
fluid API magration : Assert, increment, cond (#48885)
feifei-111 Dec 12, 2022
9d530e1
[Clean fluid] Add inner function _elementwise_op_with_axis (#48748)
jiahy0825 Dec 12, 2022
46665f0
test_convert_to_mixed_precision.py use tempfile for temporary models/…
jiweibo Dec 12, 2022
cc204bc
Tighten the Interception strategy (#48947)
YuanRisheng Dec 12, 2022
4a88d7c
[CodeStyle][isort][F401] fix some regression issues (#48936)
SigureMo Dec 12, 2022
2f500d4
rm multinode eager guard tests (#48766)
yjjiang11 Dec 12, 2022
5b048c4
rm unittests eager guard tests part5 dataloader2dygraph_mnist (#48816)
yjjiang11 Dec 12, 2022
49647cf
[PHI]Add new Tensor type and migrate save_combine kernel (#47856)
YuanRisheng Dec 12, 2022
6b89cc2
[Fluid Clean]move BatchNorm from flud.dygraph.nn to paddle.nn.layer.n…
risemeup1 Dec 12, 2022
263a352
[Setup] Ignore @PADDLE_BINARY_DIR@ files (#49002)
Aurelius84 Dec 12, 2022
293f746
reshape onednn test reimplemented (#48850)
jczaja Dec 12, 2022
0fdc140
update fused_multi_transformer_encoder_pass support GPT new matmul AP…
RichardWooSJTU Dec 12, 2022
8c900da
Revert "set free_when_no_cache_hit default value to true (#48815)" (#…
wanghuancoder Dec 12, 2022
f1e36cd
[Paddle Inference]fix some transformer unitest (#48929)
Wangzheee Dec 13, 2022
382c1d8
Enable Generic-Plugin support FP16 (#48807)
weishengying Dec 13, 2022
5aceddd
support conv1d quant & skip calibrate zero-size tensor (#48912)
yghstill Dec 13, 2022
a484bc6
enable custom device save model on device memory && fix conflict (#48…
engineer1109 Dec 13, 2022
3b2f754
[api move] cvm (#48989)
wangzhen38 Dec 13, 2022
4c3725b
Bugfix: xpu now only support single node multi-card, bkcl_comm_num sh…
XiaociZhang Dec 13, 2022
fe380f7
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 13, 2022
28b26cf
rm unittests eager guard tests part23 where2zeros (#48895)
yjjiang11 Dec 13, 2022
4097e31
rm unittests eager guard tests part17 number2pool1d (#48840)
yjjiang11 Dec 13, 2022
ba1a8f3
[NPU] fix FLAGS_npu_storage_format flag in python, test=develop (#48976)
qili93 Dec 13, 2022
68e48a7
remove fleet eager guard tests (#48765)
yjjiang11 Dec 13, 2022
9231334
rm unittests eager guard tests part6 eager_run2expand_v2 (#48817)
yjjiang11 Dec 13, 2022
106ec71
rm unittests eager guard tests part12 imperative_optimizer2resnet (#4…
yjjiang11 Dec 13, 2022
6dc3383
[fluid clean] remove 4 fluid.layers api and imigrate 2 fluid.layer ap…
lugimzzz Dec 13, 2022
60ef229
remove reset reference in unittest for `fluid.layers.cross_entropy` (…
kangguangli Dec 13, 2022
8f8d1fd
replace cross_entropy in test*.py except python/paddle/fluid/tests/un…
kangguangli Dec 13, 2022
cb56853
remove linear_chain_crf and crf_decoding from fluid (#48996)
ccrrong Dec 13, 2022
7d473fc
Generate static graph code of some ops by yaml (#48977)
heavyrain-lzy Dec 13, 2022
7392963
[tools] Update summary env (#48627)
gouzil Dec 13, 2022
0ebdb7e
[Dy2St] transforms.RandomVerticalFlip Support static mode (#49024)
DrRyanHuang Dec 13, 2022
96d3ca9
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 13, 2022
e11b2ca
Save fused_attention op memory when dropout_rate = 0.0 (#48902)
sneaxiy Dec 13, 2022
1ca6072
Correct multiple inputs and outputs (#48872)
wozna Dec 13, 2022
3d35959
[CodeStyle][isort][Dy2St] sort imports for paddle.jit (#48637)
SigureMo Dec 13, 2022
ee334fb
remove non-public apis from __all__ (#48952)
zoooo0820 Dec 13, 2022
f27584a
fix rmsprop_ yaml bug (#49026)
wanghuancoder Dec 13, 2022
ef5d812
Fixed the dead link bug in the API documentation (#48969)
jjyaoao Dec 13, 2022
ce966e3
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 14, 2022
8ffb13b
Change mutable_data to ctx.Alloc. (#49001)
Xreki Dec 14, 2022
fb7c7a6
[inference][trt] add more unary op and square (#48534)
zhangjun Dec 14, 2022
315179f
Support ninja (#48932)
risemeup1 Dec 14, 2022
b800464
Deleted mkldnn_inplace_pass code (#47818)
HulekJakub Dec 14, 2022
4554eef
hide log (#49045)
tianshuo78520a Dec 14, 2022
348f131
[Sparse]Optimize performance of sparse conv on T4 (#49009)
Dec 14, 2022
75ffc0f
modify cmake file for cuda11.8 compile (#49020)
zhengqiwen1997 Dec 14, 2022
3ccc7f5
remove dropout from fluid (#48319)
ccrrong Dec 14, 2022
38b38d4
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 14, 2022
3cbbd40
nullptr bugfix for XPU pg mode (#49043)
XiaociZhang Dec 14, 2022
82e890b
Divide elementwise case from BroadcastKernel and refine transpose aut…
JamesLim-sy Dec 14, 2022
9774b3f
add condition of skipif (#48791)
XieYunshen Dec 14, 2022
1065a7e
rm unittests eager guard tests part9 histogram2imperative_dataloader …
yjjiang11 Dec 14, 2022
4f81a30
rm unittests eager guard test part14 initializer2layer_norm (#48835)
yjjiang11 Dec 14, 2022
20b476d
[Bugfix] recompute dep filter param (#49010)
JZ-LIANG Dec 14, 2022
1f3bdfb
[Paddle Inference] rewrite convert_to_mixed_precision (#48853)
yuanlehome Dec 14, 2022
3b12162
[CodeStyle] fix c++17-extensions warning on macos (#49017)
AndPuQing Dec 14, 2022
03fbd81
Add custom CUDNN finding paths for 64bit Windows (#49066)
laitingsheng Dec 14, 2022
1798789
remove prior_box (#49006)
zhengqiwen1997 Dec 14, 2022
d2d3908
InstanceNorm1D、InstanceNorm2D、InstanceNorm3D (#48940)
Ayuan2021 Dec 14, 2022
1314aa8
[AutoParallel] recompute tuning (#48608)
zhaoyinglia Dec 14, 2022
22ab027
fluid API magration : array_read, array_write (#49022)
feifei-111 Dec 14, 2022
2fc38e3
Keep double-buffer reader for static mode (#49068)
sljlp Dec 14, 2022
0148968
Fix nullptr to TestFuseGemmEpilogueReluBWDFP* (#48997)
mingxu1067 Dec 14, 2022
19b6ed2
support fp16 index sample (#47897)
wangxn12138 Dec 14, 2022
0706407
rm unittest eager guard tests part20 sparse_mv2split (#48879)
yjjiang11 Dec 15, 2022
604c79b
rm unittests eager guard tests part11 imperative_layer2ocr (#48828)
yjjiang11 Dec 15, 2022
fd3fafb
rm eager guard tests part3_1 (#49059)
yjjiang11 Dec 15, 2022
2c1ff88
fix: gloo compatible (#49084)
HermitSun Dec 15, 2022
4ab87f9
rm eager guard tests part3_3 (#49061)
yjjiang11 Dec 15, 2022
2a170ff
fix bug (#49081)
haohongxiang Dec 15, 2022
0c4c757
[Inference] memory_optimize and mkdlnn problem (#49054)
jiweibo Dec 15, 2022
6295339
Remove/move 16 fluid APIs (#48377)
HydrogenSulfate Dec 15, 2022
14e8431
fix embedding multihead (#49085)
Wangzheee Dec 15, 2022
4c61ecf
SetDeviceId in StreamSafeCUDAAllocation (#49080)
From00 Dec 15, 2022
9735b82
[PHI decoupling] Remove fluid imports from MKLDNN code (#48981)
Silv3S Dec 15, 2022
fa79270
replace cross_entropy in python/paddle/fluid/tests/unittests/*.py (#4…
kangguangli Dec 15, 2022
88898fd
修复paddle.amp.decorate等API的文档 (#48983)
kk-2000 Dec 15, 2022
dd900b3
按在线文档需求 61~70 更新了部分文档 (#49014)
LearningPawn Dec 15, 2022
b51f905
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 15, 2022
c6f5ea0
merge gpugraph to develop, fix gloo wrapper
lxsbupt Dec 15, 2022
9856fe0
merge gpugraph to develop, fix ci
lxsbupt Dec 15, 2022
a0cfc49
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 9, 2022
badb67b
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 13, 2022
5f5d47c
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 13, 2022
1733b3f
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 14, 2022
c665963
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 14, 2022
a62907a
Merge remote-tracking branch 'develop/develop' into merge_gpugraph
lxsbupt Dec 15, 2022
3d9520f
merge gpugraph to develop, fix gloo wrapper
lxsbupt Dec 15, 2022
41b60b1
merge gpugraph to develop, fix ci
lxsbupt Dec 15, 2022
53ba9d7
Merge branch 'merge_gpugraph' of https://github.com/lxsbupt/Paddle in…
lxsbupt Dec 15, 2022
396f861
merge gpugraph to develop, fix fleet.py
lxsbupt Dec 16, 2022
d6442fb
merge gpugraph to develop, merge latest conflicts
lxsbupt Dec 17, 2022
a1f88bc
merge gpugraph to develop, fix merge error
lxsbupt Dec 17, 2022
46c56b8
merge gpugraph to develop, fix merge error
lxsbupt Dec 17, 2022
387661a
merge gpugraph to develop, fix conflicts
lxsbupt Dec 19, 2022
54faf3d
merge gpugraph to develop, add python ut
lxsbupt Dec 19, 2022
4efd55b
merge gpugraph to develop, fix code style
lxsbupt Dec 20, 2022
fcc7b8a
merge gpugraph to develop, add c++ ut
lxsbupt Dec 20, 2022
80adad3
merge gpugraph to develop, fix code style
lxsbupt Dec 20, 2022
2c70dc6
merge gpugraph to develop, fix data_feed.h
lxsbupt Dec 20, 2022
96d785c
merge gpugraph to develop, fix code style
lxsbupt Dec 20, 2022
71b4f94
merge gpugraph to develop, fix code style
lxsbupt Dec 20, 2022
18e106e
merge gpugraph to develop, fix code style
lxsbupt Dec 21, 2022
d84cf2e
merge gpugraph to develop, fix code style
lxsbupt Dec 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions cmake/external/jemalloc.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
include(ExternalProject)

set(JEMALLOC_PROJECT "extern_jemalloc")
set(JEMALLOC_URL
https://github.com/jemalloc/jemalloc/releases/download/5.1.0/jemalloc-5.1.0.tar.bz2
)
set(JEMALLOC_BUILD ${THIRD_PARTY_PATH}/jemalloc/src/extern_jemalloc)
set(JEMALLOC_SOURCE_DIR "${THIRD_PARTY_PATH}/jemalloc")
set(JEMALLOC_INSTALL ${THIRD_PARTY_PATH}/install/jemalloc)
set(JEMALLOC_INCLUDE_DIR ${JEMALLOC_INSTALL}/include)
set(JEMALLOC_DOWNLOAD_DIR "${JEMALLOC_SOURCE_DIR}/src/${JEMALLOC_PROJECT}")

set(JEMALLOC_STATIC_LIBRARIES
${THIRD_PARTY_PATH}/install/jemalloc/lib/libjemalloc_pic.a)
set(JEMALLOC_LIBRARIES
${THIRD_PARTY_PATH}/install/jemalloc/lib/libjemalloc_pic.a)

ExternalProject_Add(
extern_jemalloc
PREFIX ${JEMALLOC_SOURCE_DIR}
URL ${JEMALLOC_URL}
INSTALL_DIR ${JEMALLOC_INSTALL}
DOWNLOAD_DIR "${JEMALLOC_DOWNLOAD_DIR}"
BUILD_COMMAND $(MAKE)
BUILD_IN_SOURCE 1
INSTALL_COMMAND $(MAKE) install
CONFIGURE_COMMAND "${JEMALLOC_DOWNLOAD_DIR}/configure"
--prefix=${JEMALLOC_INSTALL} --disable-initial-exec-tls)

add_library(jemalloc STATIC IMPORTED GLOBAL)
set_property(TARGET jemalloc PROPERTY IMPORTED_LOCATION
${JEMALLOC_STATIC_LIBRARIES})

include_directories(${JEMALLOC_INCLUDE_DIR})
add_dependencies(jemalloc extern_jemalloc)
31 changes: 28 additions & 3 deletions cmake/external/rocksdb.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@

include(ExternalProject)

# find_package(jemalloc REQUIRED)

set(JEMALLOC_INCLUDE_DIR ${THIRD_PARTY_PATH}/install/jemalloc/include)
set(JEMALLOC_LIBRARIES
${THIRD_PARTY_PATH}/install/jemalloc/lib/libjemalloc_pic.a)
message(STATUS "rocksdb jemalloc:" ${JEMALLOC_LIBRARIES})

set(ROCKSDB_PREFIX_DIR ${THIRD_PARTY_PATH}/rocksdb)
set(ROCKSDB_INSTALL_DIR ${THIRD_PARTY_PATH}/install/rocksdb)
set(ROCKSDB_INCLUDE_DIR
Expand All @@ -22,21 +29,39 @@ set(ROCKSDB_INCLUDE_DIR
set(ROCKSDB_LIBRARIES
"${ROCKSDB_INSTALL_DIR}/lib/librocksdb.a"
CACHE FILEPATH "rocksdb library." FORCE)
set(ROCKSDB_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
set(ROCKSDB_COMMON_FLAGS
"-g -pipe -O2 -W -Wall -Wno-unused-parameter -fPIC -fno-builtin-memcmp -fno-omit-frame-pointer"
)
set(ROCKSDB_FLAGS
"-DNDEBUG -DROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -DROCKSDB_FALLOCATE_PRESENT -DHAVE_SSE42 -DHAVE_PCLMUL -DZLIB -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_SUPPORT_THREAD_LOCAL -DROCKSDB_USE_RTTI -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_AUXV_GETAUXVAL_PRESENT"
)
set(ROCKSDB_CMAKE_CXX_FLAGS
"${ROCKSDB_COMMON_FLAGS} -DROCKSDB_LIBAIO_PRESENT -msse -msse4.2 -mpclmul ${ROCKSDB_FLAGS} -fPIC -I${JEMALLOC_INCLUDE_DIR} -lz -ldl"
)
set(ROCKSDB_CMAKE_C_FLAGS
"${ROCKSDB_COMMON_FLAGS} ${ROCKSDB_FLAGS} -DROCKSDB_LIBAIO_PRESENT -fPIC -I${JEMALLOC_INCLUDE_DIR}"
)
include_directories(${ROCKSDB_INCLUDE_DIR})

set(CMAKE_CXX_LINK_EXECUTABLE
"${CMAKE_CXX_LINK_EXECUTABLE} -pthread -ldl -lrt -lz")
ExternalProject_Add(
extern_rocksdb
${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${ROCKSDB_PREFIX_DIR}
GIT_REPOSITORY "https://github.com/facebook/rocksdb"
GIT_TAG v6.10.1
GIT_REPOSITORY "https://github.com/Thunderbrook/rocksdb"
GIT_TAG 6.19.fb
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}
-DWITH_BZ2=OFF
-DPORTABLE=1
-DWITH_GFLAGS=OFF
-DWITH_TESTS=OFF
-DWITH_JEMALLOC=ON
-DWITH_BENCHMARK_TOOLS=OFF
-DJeMalloc_LIBRARIES=${JEMALLOC_LIBRARIES}
-DJeMalloc_INCLUDE_DIRS=${JEMALLOC_INCLUDE_DIR}
-DCMAKE_CXX_FLAGS=${ROCKSDB_CMAKE_CXX_FLAGS}
-DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
INSTALL_COMMAND
Expand Down
3 changes: 3 additions & 0 deletions cmake/third_party.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,9 @@ if(WITH_PSCORE)

include(external/rocksdb) # download, build, install rocksdb
list(APPEND third_party_deps extern_rocksdb)

include(external/jemalloc) # download, build, install jemalloc
list(APPEND third_party_deps extern_jemalloc)
endif()

if(WITH_RPC
Expand Down
12 changes: 12 additions & 0 deletions paddle/fluid/distributed/common/afs_warpper.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ class FsReadChannel {
return 0;
}

inline int read(char* data, size_t size) {
return fread(data, 1, size, _file.get());
}

private:
uint32_t _buffer_size;
FsChannelConfig _config;
Expand Down Expand Up @@ -114,6 +118,14 @@ class FsWriteChannel {
return write_line(data.c_str(), data.size());
}

inline uint32_t write(const char* data, size_t size) {
size_t write_count = fwrite(data, 1, size, _file.get());
if (write_count != size) {
return -1;
}
return 0;
}

private:
uint32_t _buffer_size;
FsChannelConfig _config;
Expand Down
15 changes: 13 additions & 2 deletions paddle/fluid/distributed/ps/service/ps_client.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,10 +148,12 @@ class PSClient {
return fut;
}

virtual ::std::future<int32_t> PullSparsePtr(char **select_values,
virtual ::std::future<int32_t> PullSparsePtr(int shard_id,
char **select_values,
size_t table_id,
const uint64_t *keys,
size_t num) {
size_t num,
uint16_t pass_id) {
VLOG(0) << "Did not implement";
std::promise<int32_t> promise;
std::future<int> fut = promise.get_future();
Expand All @@ -160,6 +162,15 @@ class PSClient {
}

virtual std::future<int32_t> PrintTableStat(uint32_t table_id) = 0;
virtual std::future<int32_t> SaveCacheTable(uint32_t table_id,
uint16_t pass_id,
size_t threshold) {
VLOG(0) << "Did not implement";
std::promise<int32_t> promise;
std::future<int> fut = promise.get_future();
promise.set_value(-1);
return fut;
}

// 确保所有积攒中的请求都发起发送
virtual std::future<int32_t> Flush() = 0;
Expand Down
30 changes: 28 additions & 2 deletions paddle/fluid/distributed/ps/service/ps_local_client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -260,10 +260,12 @@ ::std::future<int32_t> PsLocalClient::PushDense(const Region* regions,
// return done();
//}

::std::future<int32_t> PsLocalClient::PullSparsePtr(char** select_values,
::std::future<int32_t> PsLocalClient::PullSparsePtr(int shard_id,
char** select_values,
size_t table_id,
const uint64_t* keys,
size_t num) {
size_t num,
uint16_t pass_id) {
// FIXME
// auto timer =
// std::make_shared<CostTimer>("pslib_downpour_client_pull_sparse");
Expand All @@ -278,13 +280,37 @@ ::std::future<int32_t> PsLocalClient::PullSparsePtr(char** select_values,
table_context.pull_context.ptr_values = select_values;
table_context.use_ptr = true;
table_context.num = num;
table_context.shard_id = shard_id;
table_context.pass_id = pass_id;

// table_ptr->PullSparsePtr(select_values, keys, num);
table_ptr->Pull(table_context);

return done();
}

::std::future<int32_t> PsLocalClient::PrintTableStat(uint32_t table_id) {
auto* table_ptr = GetTable(table_id);
std::pair<int64_t, int64_t> ret = table_ptr->PrintTableStat();
VLOG(0) << "table id: " << table_id << ", feasign size: " << ret.first
<< ", mf size: " << ret.second;
return done();
}

::std::future<int32_t> PsLocalClient::SaveCacheTable(uint32_t table_id,
uint16_t pass_id,
size_t threshold) {
auto* table_ptr = GetTable(table_id);
std::pair<int64_t, int64_t> ret = table_ptr->PrintTableStat();
VLOG(0) << "table id: " << table_id << ", feasign size: " << ret.first
<< ", mf size: " << ret.second;
if (ret.first > (int64_t)threshold) {
VLOG(0) << "run cache table";
table_ptr->CacheTable(pass_id);
}
return done();
}

::std::future<int32_t> PsLocalClient::PushSparseRawGradient(
size_t table_id,
const uint64_t* keys,
Expand Down
17 changes: 9 additions & 8 deletions paddle/fluid/distributed/ps/service/ps_local_client.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,18 +76,19 @@ class PsLocalClient : public PSClient {
return fut;
}

virtual ::std::future<int32_t> PullSparsePtr(char** select_values,
virtual ::std::future<int32_t> PullSparsePtr(int shard_id,
char** select_values,
size_t table_id,
const uint64_t* keys,
size_t num);
size_t num,
uint16_t pass_id);

virtual ::std::future<int32_t> PrintTableStat(uint32_t table_id) {
std::promise<int32_t> prom;
std::future<int32_t> fut = prom.get_future();
prom.set_value(0);
virtual ::std::future<int32_t> PrintTableStat(uint32_t table_id);

virtual ::std::future<int32_t> SaveCacheTable(uint32_t table_id,
uint16_t pass_id,
size_t threshold);

return fut;
}
virtual ::std::future<int32_t> PushSparse(size_t table_id,
const uint64_t* keys,
const float** update_values,
Expand Down
11 changes: 0 additions & 11 deletions paddle/fluid/distributed/ps/table/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,6 @@ cc_library(
set_source_files_properties(
tensor_accessor.cc PROPERTIES COMPILE_FLAGS ${DISTRIBUTE_COMPILE_FLAGS})

cc_library(
tensor_table
SRCS
DEPS eigen3
ps_framework_proto
executor
scope
device_context
tensor
${TABLE_DEPS})
set_source_files_properties(table.cc PROPERTIES COMPILE_FLAGS
${DISTRIBUTE_COMPILE_FLAGS})

Expand Down Expand Up @@ -98,7 +88,6 @@ cc_library(
table.cc
DEPS ${TABLE_DEPS}
common_table
tensor_table
ps_framework_proto
string_helper
device_context
Expand Down
14 changes: 13 additions & 1 deletion paddle/fluid/distributed/ps/table/accessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,9 @@ class ValueAccessor {
virtual void UpdateStatAfterSave(float* value, int param) {}
// 判断该value是否保存到ssd
virtual bool SaveSSD(float* value) = 0;
// 判断热启时是否过滤slot对应的feasign
virtual bool FilterSlot(float* value) { return false; }

//
virtual bool SaveCache(float* value,
int param,
Expand Down Expand Up @@ -162,9 +165,18 @@ class ValueAccessor {
return 0;
}

virtual bool SaveMemCache(float* value,
int param,
double global_cache_threshold,
uint16_t pass_id) {
return true;
}

virtual void UpdatePassId(float* value, uint16_t pass_id) {}

virtual float GetField(float* value, const std::string& name) { return 0.0; }
#define DEFINE_GET_INDEX(class, field) \
virtual int get_##field##_index() override { return class ::field##_index(); }
virtual int get_##field##_index() { return class ::field##_index(); }

protected:
size_t _value_size;
Expand Down
Loading