Skip to content

Commit 620d28b

Browse files
authored
recipes/mobile_pref ๋ฒˆ์—ญ (#421)
translate mobile_pref, recipies_index
1 parent a9f03c6 commit 620d28b

File tree

2 files changed

+90
-90
lines changed

2 files changed

+90
-90
lines changed

โ€Žrecipes_source/mobile_perf.rst

+88-88
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,43 @@
1-
Pytorch Mobile Performance Recipes
2-
==================================
1+
PyTorch ๋ชจ๋ฐ”์ผ ์„ฑ๋Šฅ ๋ ˆ์‹œํ”ผ
2+
=========================
33

4-
Introduction
5-
----------------
6-
Performance (aka latency) is crucial to most, if not all,
7-
applications and use-cases of ML model inference on mobile devices.
4+
์†Œ๊ฐœ
5+
----
6+
์ „๋ถ€๋Š” ์•„๋‹ˆ์ง€๋งŒ, ๋ชจ๋ฐ”์ผ ๊ธฐ๊ธฐ์—์„œ์˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ML ๋ชจ๋ธ ์ถ”๋ก  ์‚ฌ์šฉ ์‚ฌ๋ก€์—
7+
์„ฑ๋Šฅ(์ง€์—ฐ์‹œ๊ฐ„)์€ ๋งค์šฐ ์ค‘๋Œ€ํ•œ ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.
88

9-
Today, PyTorch executes the models on the CPU backend pending availability
10-
of other hardware backends such as GPU, DSP, and NPU.
9+
์˜ค๋Š˜๋‚  PyTorch๋Š” GPU, DSP, NPU์™€ ๊ฐ™์€ ํ•˜๋“œ์›จ์–ด ๋ฐฑ์—”๋“œ๊ฐ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•  ๋•Œ๊นŒ์ง€
10+
CPU ๋ฐฑ์—”๋“œ์—์„œ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
1111

12-
In this recipe, you will learn:
12+
์ด ๋ ˆ์‹œํ”ผ์—์„œ ๋ฐฐ์šธ ๋‚ด์šฉ์€:
1313

14-
- How to optimize your model to help decrease execution time (higher performance, lower latency) on the mobile device.
15-
- How to benchmark (to check if optimizations helped your use case).
14+
- ๋ชจ๋ฐ”์ผ ๊ธฐ๊ธฐ์—์„œ ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ค„์ด๋Š”๋ฐ ๋„์›€์ด ๋ (์„ฑ๋Šฅ์€ ๋†’์ด๊ณ , ์ง€์—ฐ์‹œ๊ฐ„์€ ์ค„์ด๋Š”) ๋ชจ๋ธ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•
15+
- ๋ฒค์น˜๋งˆํ‚น(์ตœ์ ํ™”๊ฐ€ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋„์›€์ด ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ) ํ•˜๋Š” ๋ฐฉ๋ฒ•
1616

1717

18-
Model preparation
19-
-----------------
18+
๋ชจ๋ธ ์ค€๋น„
19+
--------
2020

21-
We will start with preparing to optimize your model to help decrease execution time
22-
(higher performance, lower latency) on the mobile device.
21+
๋ชจ๋ฐ”์ผ ๊ธฐ๊ธฐ์—์„œ ์‹คํ–‰ ์‹œ๊ฐ„์„ ์ค„์ด๋Š”๋ฐ ๋„์›€์ด ๋ (์„ฑ๋Šฅ์€ ๋†’์ด๊ณ , ์ง€์—ฐ์‹œ๊ฐ„์€ ์ค„์ด๋Š”)
22+
๋ชจ๋ธ์˜ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ์ค€๋น„๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
2323

2424

25-
Setup
26-
^^^^^^^
25+
์„ค์ •
26+
^^^^
2727

28-
First we need to installed pytorch using conda or pip with version at least 1.5.0.
28+
์ฒซ๋ฒˆ์งธ๋กœ ์ ์–ด๋„ ๋ฒ„์ „์ด 1.5.0 ์ด์ƒ์ธ PyTorch๋ฅผ conda๋‚˜ pip์œผ๋กœ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
2929

3030
::
3131

3232
conda install pytorch torchvision -c pytorch
3333

34-
or
34+
๋˜๋Š”
3535

3636
::
3737

3838
pip install torch torchvision
3939

40-
Code your model:
40+
๋ชจ๋ธ ์ฝ”๋“œ:
4141

4242
::
4343

@@ -65,82 +65,82 @@ Code your model:
6565
model = AnnotatedConvBnReLUModel()
6666

6767

68-
``torch.quantization.QuantStub`` and ``torch.quantization.DeQuantStub()`` are no-op stubs, which will be used for quantization step.
68+
``torch.quantization.QuantStub`` ์™€ ``torch.quantization.DeQuantStub()`` ์€ ๋ฏธ์‚ฌ์šฉ ์Šคํ…(stub)์ด๋ฉฐ, ์–‘์žํ™”(quantization) ๋‹จ๊ณ„์— ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
6969

7070

71-
1. Fuse operators using ``torch.quantization.fuse_modules``
72-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
71+
1. ``torch.quantization.fuse_modules`` ์ด์šฉํ•˜์—ฌ ์—ฐ์‚ฐ์ž ๊ฒฐํ•ฉ(fuse)ํ•˜๊ธฐ
72+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7373

74-
Do not be confused that fuse_modules is in the quantization package.
75-
It works for all ``torch.nn.Module``.
74+
fuse_modules์€ ์–‘์žํ™” ํŒจํ‚ค์ง€ ๋‚ด๋ถ€์— ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ˜ผ๋™ํ•˜์ง€ ๋งˆ์‹ญ์‹œ์˜ค.
75+
fuse_modules์€ ๋ชจ๋“  ``torch.nn.Module`` ์—์„œ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.
7676

77-
``torch.quantization.fuse_modules`` fuses a list of modules into a single module.
78-
It fuses only the following sequence of modules:
77+
``torch.quantization.fuse_modules`` ์€ ๋ชจ๋“ˆ๋“ค์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ•˜๋‚˜์˜ ๋ชจ๋“ˆ๋กœ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.
78+
์ด๊ฒƒ์€ ์•„๋ž˜ ์ˆœ์„œ์˜ ๋ชจ๋“ˆ๋“ค๋งŒ ๊ฒฐํ•ฉ์‹œํ‚ต๋‹ˆ๋‹ค:
7979

8080
- Convolution, Batch normalization
8181
- Convolution, Batch normalization, Relu
8282
- Convolution, Relu
8383
- Linear, Relu
8484

85-
This script will fuse Convolution, Batch Normalization and Relu in previously declared model.
85+
์ด ์Šคํฌ๋ฆฝํŠธ๋Š” ์ด์ „์— ์„ ์–ธ๋œ ๋ชจ๋ธ์—์„œ Convolution, Batch Normalization, Relu๋ฅผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.
8686

8787
::
8888

8989
torch.quantization.fuse_modules(model, [['conv', 'bn', 'relu']], inplace=True)
9090

9191

92-
2. Quantize your model
93-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
92+
2. ๋ชจ๋ธ ์–‘์žํ™”ํ•˜๊ธฐ
93+
^^^^^^^^^^^^^^^^^
9494

95-
You can find more about PyTorch quantization in
96-
`the dedicated tutorial <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/>`_.
95+
PyTorch ์–‘์žํ™”์— ๋Œ€ํ•œ ๋‚ด์šฉ์€
96+
`the dedicated tutorial <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/>`_ ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
9797

98-
Quantization of the model not only moves computation to int8,
99-
but also reduces the size of your model on a disk.
100-
That size reduction helps to reduce disk read operations during the first load of the model and decreases the amount of RAM.
101-
Both of those resources can be crucial for the performance of mobile applications.
102-
This code does quantization, using stub for model calibration function, you can find more about it `here <https://tutorials.pytorch.kr/advanced/static_quantization_tutorial.html#post-training-static-quantization>`__.
98+
๋ชจ๋ธ์˜ ์–‘์žํ™”๋Š” ์—ฐ์‚ฐ์„ int8๋กœ ์˜ฎ๊ธฐ๋ฉด์„œ
99+
๋””์Šคํฌ์ƒ์˜ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
100+
์ด๋Ÿฐ ํฌ๊ธฐ ๊ฐ์†Œ๋Š” ๋ชจ๋ธ์„ ์ฒ˜์Œ ์ฝ์–ด ๋“ค์ผ ๋•Œ ๋””์Šคํฌ ์ฝ๊ธฐ ์—ฐ์‚ฐ์„ ์ค„์ด๋Š”๋ฐ ๋„์›€์„ ์ฃผ๊ณ  ๋žจ(RAM)์˜ ์ด๋Ÿ‰๋„ ์ค„์ž…๋‹ˆ๋‹ค.
101+
์ด๋Ÿฌํ•œ ๋‘ ์ž์›์€ ๋ชจ๋ฐ”์ผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์„ฑ๋Šฅ์— ๋งค์šฐ ์ค‘์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
102+
์ด ์ฝ”๋“œ๋Š” ๋ชจ๋ธ ๋ณด์ •(calibration) ํ•จ์ˆ˜๋ฅผ ์œ„ํ•ด ์Šคํ…์„ ์‚ฌ์šฉํ•ด์„œ ์–‘์žํ™”๋ฅผ ํ•ฉ๋‹ˆ๋‹ค. `์—ฌ๊ธฐ <https://tutorials.pytorch.kr/advanced/static_quantization_tutorial.html#post-training-static-quantization>`__ ์—์„œ ๊ด€๋ จ๋œ ์‚ฌํ•ญ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
103103

104104
::
105105

106106
model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
107107
torch.quantization.prepare(model, inplace=True)
108-
# Calibrate your model
108+
# ๋ชจ๋ธ ๋ณด์ •
109109
def calibrate(model, calibration_data):
110-
# Your calibration code here
110+
# ๋ชจ๋ธ ๋ณด์ • ์ฝ”๋“œ
111111
return
112112
calibrate(model, [])
113113
torch.quantization.convert(model, inplace=True)
114114

115115

116116

117-
3. Use torch.utils.mobile_optimizer
118-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
117+
3. torch.utils.mobile_optimizer ์‚ฌ์šฉํ•˜๊ธฐ
118+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
119119

120-
Torch mobile_optimizer package does several optimizations with the scripted model,
121-
which will help to conv2d and linear operations.
122-
It pre-packs model weights in an optimized format and fuses ops above with relu
123-
if it is the next operation.
120+
Torch mobile_optimizer ํŒจํ‚ค์ง€๋Š” ์Šคํฌ๋ฆฝํŠธ๋œ ๋ชจ๋ธ์„ ์ด์šฉํ•ด์„œ ๋ช‡ ๊ฐ€์ง€ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ ,
121+
์ด๋Ÿฌํ•œ ์ตœ์ ํ™”๋Š” conv2d์™€ ์„ ํ˜• ์—ฐ์‚ฐ์— ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
122+
์ด ํŒจํ‚ค์ง€๋Š” ์ตœ์ ํ™”๋œ ํ˜•์‹์œผ๋กœ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ์šฐ์„  ํŒจํ‚ค์ง•ํ•˜๋ฉฐ(pre-packs)
123+
๋‹ค์Œ ์—ฐ์‚ฐ์ด relu์ด๋ฉด ์œ„์˜ ์—ฐ์‚ฐ๋“ค๊ณผ relu ์—ฐ์‚ฐ์„ ๊ฒฐํ•ฉ ์‹œํ‚ต๋‹ˆ๋‹ค.
124124

125-
First we script the result model from previous step:
125+
๋จผ์ € ์ด์ „ ๋‹จ๊ณ„์—์„œ๋ถ€ํ„ฐ ๊ฒฐ๊ณผ ๋ชจ๋ธ์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค:
126126

127127
::
128128

129129
torchscript_model = torch.jit.script(model)
130130

131-
Next we call ``optimize_for_mobile`` and save model on the disk.
131+
๋‹ค์Œ์€ ``optimize_for_mobile`` ์„ ํ˜ธ์ถœํ•˜๊ณ  ๋””์Šคํฌ์— ๋ชจ๋ธ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
132132

133133
::
134134

135135
torchscript_model_optimized = optimize_for_mobile(torchscript_model)
136136
torch.jit.save(torchscript_model_optimized, "model.pt")
137137

138-
4. Prefer Using Channels Last Tensor memory format
139-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138+
4. Channels Last Tensor ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹ ์„ ํƒํ•˜๊ธฐ
139+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
140140

141-
Channels Last(NHWC) memory format was introduced in PyTorch 1.4.0. It is supported only for four-dimensional tensors. This memory format gives a better memory locality for most operators, especially convolution. Our measurements showed a 3x speedup of MobileNetV2 model compared with the default Channels First(NCHW) format.
141+
Channels Last(NHWC) ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹์€ PyTorch 1.4.0์—์„œ ๋„์ž…๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ํ˜•์‹์€ ์˜ค์ง 4์ฐจ์› ํ…์„œ๋งŒ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹์€ ๋Œ€๋ถ€๋ถ„์˜ ์—ฐ์‚ฐ์—, ํŠนํžˆ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์— ๋” ๋‚˜์€ ๋ฉ”๋ชจ๋ฆฌ ์ง€์—ญ์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ธก์ • ๊ฒฐ๊ณผ๋Š” MobileNetV2 ๋ชจ๋ธ์—์„œ ๊ธฐ๋ณธ Channels First(NCHW) ํ˜•์‹์— ๋น„ํ•ด 3๋ฐฐ์˜ ์†๋„ ํ–ฅ์ƒ์„ ๋ณด์—ฌ ์ค๋‹ˆ๋‹ค.
142142

143-
At the moment of writing this recipe, PyTorch Android java API does not support using inputs in Channels Last memory format. But it can be used on the TorchScript model level, by adding the conversion to it for model inputs.
143+
์ด ๋ ˆ์‹œํ”ผ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ์‹œ์ ์—์„œ๋Š”, PyTorch Android ์ž๋ฐ” API๋Š” Channels Last ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹์œผ๋กœ ๋œ ์ž…๋ ฅ์„ ์ง€์›ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋ชจ๋ธ ์ž…๋ ฅ์„ ์œ„ํ•ด ์ด ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด TorchScript ๋ชจ๋ธ ์ˆ˜์ค€์—์„œ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
144144

145145
.. code-block:: python
146146
@@ -149,32 +149,32 @@ At the moment of writing this recipe, PyTorch Android java API does not support
149149
...
150150
151151
152-
This conversion is zero cost if your input is already in Channels Last memory format. After it, all operators will work preserving ChannelsLast memory format.
152+
์ด ๋ณ€ํ™˜์€ ์ž…๋ ฅ์ด Channels Last ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹์ด๋ฉด ๋น„์šฉ์ด ๋“ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ตญ์—๋Š” ๋ชจ๋“  ์—ฐ์‚ฐ์ž๊ฐ€ Channels Last ๋ฉ”๋ชจ๋ฆฌ ํ˜•์‹์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ž‘์—…์„ ํ•ฉ๋‹ˆ๋‹ค.
153153

154-
5. Android - Reusing tensors for forward
155-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
154+
5. Android - ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์œ„ํ•œ ํ…์„œ ์žฌ์‚ฌ์šฉํ•˜๊ธฐ
155+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
156156

157-
This part of the recipe is Android only.
157+
๋ ˆ์‹œํ”ผ์—์„œ ์ด ๋ถ€๋ถ„์€ Android์—๋งŒ ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
158158

159-
Memory is a critical resource for android performance, especially on old devices.
160-
Tensors can need a significant amount of memory.
161-
For example, standard computer vision tensor contains 1*3*224*224 elements,
162-
assuming that data type is float and will need 588Kb of memory.
159+
๋ฉ”๋ชจ๋ฆฌ๋Š” Android ์„ฑ๋Šฅ์— ๋งค์šฐ ์ค‘์š”ํ•œ ์ž์›์ž…๋‹ˆ๋‹ค. ์˜ค๋ž˜๋œ ๋””๋ฐ”์ด์Šค์—์„  ํŠนํžˆ๋‚˜ ๋” ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
160+
ํ…์„œ๋Š” ์ƒ๋‹นํ•œ ์–‘์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ•„์š”๋กœ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
161+
์˜ˆ๋ฅผ ๋“ค์–ด ํ‘œ์ค€ ์ปดํ“จํ„ฐ ๋น„์ „ ํ…์„œ๋Š” 1*3*224*224๊ฐœ์˜ ์š”์†Œ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
162+
๋ฐ์ดํ„ฐ ํƒ€์ž…์ด float์ด๊ณ  588kb ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•œ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.
163163

164164
::
165165

166166
FloatBuffer buffer = Tensor.allocateFloatBuffer(1*3*224*224);
167167
Tensor tensor = Tensor.fromBlob(buffer, new long[]{1, 3, 224, 224});
168168

169169

170-
Here we allocate native memory as ``java.nio.FloatBuffer`` and creating ``org.pytorch.Tensor`` which storage will be pointing to the memory of the allocated buffer.
170+
์—ฌ๊ธฐ์—์„  ๋„ค์ดํ‹ฐ๋ธŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ``java.nio.FloatBuffer`` ๋กœ ํ• ๋‹นํ•˜๊ณ  ์ €์žฅ์†Œ๊ฐ€ ํ• ๋‹น๋œ ๋ฒ„ํผ์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ€๋ฆฌํ‚ฌ ``org.pytorch.Tensor`` ๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
171171

172-
For most of the use cases, we do not do model forward only once, repeating it with some frequency or as fast as possible.
172+
๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€์—์„œ ๋ชจ๋ธ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ๋‹จ ํ•œ ๋ฒˆ๋งŒ ํ•˜์ง€ ์•Š๊ณ , ์ผ์ •ํ•œ ๋นˆ๋„๋กœ ํ˜น์€ ๊ฐ€๋Šฅํ•œ ํ•œ ๋นจ๋ฆฌ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
173173

174-
If we are doing new memory allocation for every module forward - that will be suboptimal.
175-
Instead of this, we can reuse the same memory that we allocated on the previous step, fill it with new data, and run module forward again on the same tensor object.
174+
๋งŒ์•ฝ ๋ชจ๋“  ๋ชจ๋“ˆ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์œ„ํ•ด ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น์„ ์ƒˆ๋กœ ํ•œ๋‹ค๋ฉด - ๊ทธ๊ฑด ์ตœ์ ํ™”๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค.
175+
๋Œ€์‹ ์—, ์ด์ „ ๋‹จ๊ณ„์—์„œ ํ• ๋‹นํ•œ ๋™์ผํ•œ ๋ฉ”๋ชจ๋ฆฌ์— ์ƒˆ ๋ฐ์ดํ„ฐ๋ฅผ ์ฑ„์šฐ๊ณ  ๋ชจ๋“ˆ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ๋™์ผํ•œ ํ…์„œ ๊ฐ์ฒด์—์„œ ๋‹ค์‹œ ์‹คํ–‰ํ•จ์œผ๋กœ์จ ๋™์ผํ•œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์žฌ์‚ฌ์šฉ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
176176

177-
You can check how it looks in code in `pytorch android application example <https://github.com/pytorch/android-demo-app/blob/master/PyTorchDemoApp/app/src/main/java/org/pytorch/demo/vision/ImageClassificationActivity.java#L174>`_.
177+
์ฝ”๋“œ๊ฐ€ ์–ด๋–ค ์‹์œผ๋กœ ๊ตฌ์„ฑ์ด ๋˜์–ด ์žˆ๋Š”์ง€๋Š” `pytorch android application example <https://github.com/pytorch/android-demo-app/blob/master/PyTorchDemoApp/app/src/main/java/org/pytorch/demo/vision/ImageClassificationActivity.java#L174>`_ ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
178178

179179
::
180180

@@ -196,44 +196,44 @@ You can check how it looks in code in `pytorch android application example <http
196196
Tensor outputTensor = mModule.forward(IValue.from(mInputTensor)).toTensor();
197197
}
198198

199-
Member fields ``mModule``, ``mInputTensorBuffer`` and ``mInputTensor`` are initialized only once
200-
and buffer is refilled using ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer``.
199+
๋ฉค๋ฒ„ ๋ณ€์ˆ˜ ``mModule`` , ``mInputTensorBuffer`` , ``mInputTensor`` ๋Š” ๋‹จ ํ•œ ๋ฒˆ ์ดˆ๊ธฐํ™”๋ฅผ ํ•˜๊ณ 
200+
๋ฒ„ํผ๋Š” ``org.pytorch.torchvision.TensorImageUtils.imageYUV420CenterCropToFloatBuffer`` ๋ฅผ ์ด์šฉํ•ด์„œ ๋‹ค์‹œ ์ฑ„์›Œ์ง‘๋‹ˆ๋‹ค.
201201

202-
Benchmarking
203-
------------
202+
๋ฒค์น˜๋งˆํ‚น
203+
-------
204204

205-
The best way to benchmark (to check if optimizations helped your use case) - is to measure your particular use case that you want to optimize, as performance behavior can vary in different environments.
205+
๋ฒค์น˜๋งˆํ‚น(์ตœ์ ํ™”๊ฐ€ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋„์›€์ด ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ)ํ•˜๋Š” ์ตœ๊ณ ์˜ ๋ฐฉ๋ฒ•์€ ์ตœ์ ํ™”๋ฅผ ํ•˜๊ณ  ์‹ถ์€ ํŠน์ •ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์„ฑ๋Šฅ ์ธก์ • ํ–‰์œ„๊ฐ€ ํ™˜๊ฒฝ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
206206

207-
PyTorch distribution provides a way to benchmark naked binary that runs the model forward,
208-
this approach can give more stable measurements rather than testing inside the application.
207+
PyTorch ๋ฐฐํฌํŒ์€ ๋ชจ๋ธ ์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ด์„œ ์›ํ˜• ๊ทธ๋Œ€๋กœ์˜(naked) ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ๋ฒค์น˜๋งˆํ‚นํ•˜๋Š” ์ˆ˜๋‹จ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
208+
์ด ์ ‘๊ทผ๋ฒ•์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋‚ด๋ถ€์—์„œ ์‹œํ—˜ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์•ˆ์ •์ ์ธ ์ธก์ •์น˜๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
209209

210210

211-
Android - Benchmarking Setup
212-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
211+
Android - ๋ฒค์น˜๋งˆํ‚น ์„ค์ •
212+
^^^^^^^^^^^^^^^^^^^^^^
213213

214-
This part of the recipe is Android only.
214+
๋ ˆ์‹œํ”ผ์—์„œ ์ด ๋ถ€๋ถ„์€ Android์—๋งŒ ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
215215

216-
For this you first need to build benchmark binary:
216+
๋ฒค์น˜๋งˆํ‚น์„ ์œ„ํ•ด ๋จผ์ € ๋ฒค์น˜๋งˆํฌ ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ๋นŒ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
217217

218218
::
219219

220220
<from-your-root-pytorch-dir>
221221
rm -rf build_android
222222
BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DBUILD_BINARY=ON
223223

224-
You should have arm64 binary at: ``build_android/bin/speed_benchmark_torch``.
225-
This binary takes ``--model=<path-to-model>``, ``--input_dim="1,3,224,224"`` as dimension information for the input and ``--input_type="float"`` as the type of the input as arguments.
224+
์ด ๊ณณ์— arm64 ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: ``build_android/bin/speed_benchmark_torch`` .
225+
์ด ๋ฐ”์ด๋„ˆ๋ฆฌ๋Š” ``--model=<path-to-model>``, ``--input_dim="1,3,224,224"`` ์„ ์ž…๋ ฅ์„ ์œ„ํ•œ ์ฐจ์› ์ •๋ณด๋กœ ๋ฐ›๊ณ  ``--input_type="float"`` ์œผ๋กœ ์ž…๋ ฅ ํƒ€์ž…์„ ์ธ์ž๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค.
226226

227-
Once you have your android device connected,
228-
push speedbenchark_torch binary and your model to the phone:
227+
Android ๋””๋ฐ”์ด์Šค๋ฅผ ์—ฐ๊ฒฐํ•œ ์ ์ด ์žˆ์œผ๋ฉด,
228+
speedbenchark_torch ๋ฐ”์ด๋„ˆ๋ฆฌ์™€ ๋ชจ๋ธ์„ ํฐ์œผ๋กœ ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค:
229229

230230
::
231231

232232
adb push <speedbenchmark-torch> /data/local/tmp
233233
adb push <path-to-scripted-model> /data/local/tmp
234234

235235

236-
Now we are ready to benchmark your model:
236+
์ด์ œ ๋ชจ๋ธ์„ ๋ฒค์น˜๋งˆํ‚นํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:
237237

238238
::
239239

@@ -245,12 +245,12 @@ Now we are ready to benchmark your model:
245245
Main run finished. Microseconds per iter: 121318. Iters per second: 8.24281
246246

247247

248-
iOS - Benchmarking Setup
249-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
248+
iOS - ๋ฒค์น˜๋งˆํ‚น ์„ค์ •
249+
^^^^^^^^^^^^^^^^^^
250250

251-
For iOS, we'll be using our `TestApp <https://github.com/pytorch/pytorch/tree/master/ios/TestApp>`_ as the benchmarking tool.
251+
iOS์˜ ๊ฒฝ์šฐ , ๋ฒค์น˜๋งˆํ‚น์˜ ๋„๊ตฌ๋กœ `TestApp <https://github.com/pytorch/pytorch/tree/master/ios/TestApp>`_ ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
252252

253-
To begin with, let's apply the ``optimize_for_mobile`` method to our python script located at `TestApp/benchmark/trace_model.py <https://github.com/pytorch/pytorch/blob/master/ios/TestApp/benchmark/trace_model.py>`_. Simply modify the code as below.
253+
๋จผ์ € ``optimize_for_mobile`` ๋ฉ”์†Œ๋“œ๋ฅผ `TestApp/benchmark/trace_model.py <https://github.com/pytorch/pytorch/blob/master/ios/TestApp/benchmark/trace_model.py>`_ ์— ์žˆ๋Š” ํŒŒ์ด์ฌ ์Šคํฌ๋ฆฝํŠธ์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ„๋‹จํžˆ ์•„๋ž˜์™€ ๊ฐ™์ด ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
254254

255255
::
256256

@@ -265,21 +265,21 @@ To begin with, let's apply the ``optimize_for_mobile`` method to our python scri
265265
torchscript_model_optimized = optimize_for_mobile(traced_script_module)
266266
torch.jit.save(torchscript_model_optimized, "model.pt")
267267

268-
Now let's run ``python trace_model.py``. If everything works well, we should be able to generate our optimized model in the benchmark directory.
268+
์ด์ œ ``python trace_model.py`` ๋ฅผ ์‹คํ–‰ํ•ฉ์‹œ๋‹ค. ๋ชจ๋“  ๊ฒƒ์ด ์ž˜ ์ž‘๋™ํ•œ๋‹ค๋ฉด ๋ฒค์น˜๋งˆํ‚น ๋””๋ ‰ํ† ๋ฆฌ ๋‚ด๋ถ€์— ์ตœ์ ํ™”๋œ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
269269

270-
Next, we're going to build the PyTorch libraries from source.
270+
๋‹ค์Œ์€ ์†Œ์Šค์—์„œ๋ถ€ํ„ฐ PyTorch ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋นŒ๋“œํ•ฉ๋‹ˆ๋‹ค.
271271

272272
::
273273

274274
BUILD_PYTORCH_MOBILE=1 IOS_ARCH=arm64 ./scripts/build_ios.sh
275275

276-
Now that we have the optimized model and PyTorch ready, it's time to generate our XCode project and do benchmarking. To do that, we'll be using a ruby script - `setup.rb` which does the heavy lifting jobs of setting up the XCode project.
276+
์ด์ œ ์ตœ์ ํ™”๋œ ๋ชจ๋ธ๊ณผ PyTorch๊ฐ€ ์ค€๋น„๋˜์—ˆ๊ธฐ์— XCode ํ”„๋กœ์ ํŠธ๋ฅผ ๋งŒ๋“ค๊ณ  ๋ฒค์น˜๋งˆํ‚นํ•  ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด XCode ํ”„๋กœ์ ํŠธ๋ฅผ ์„ค์ •ํ•˜๋Š” ๋ฌด๊ฑฐ์šด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฃจ๋น„ ์Šคํฌ๋ฆฝํŠธ `setup.rb` ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
277277

278278
::
279279

280280
ruby setup.rb
281281

282-
Now open the `TestApp.xcodeproj` and plug in your iPhone, you're ready to go. Below is an example result from iPhoneX
282+
์ด์ œ `TestApp.xcodeproj` ๋ฅผ ์—ด๊ณ  iPhone์„ ์—ฐ๊ฒฐํ•˜๋ฉด ์ค€๋น„๊ฐ€ ๋๋‚ฌ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” iPhoneX์—์„œ์˜ ์˜ˆ์ œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
283283

284284
::
285285

โ€Žrecipes_source/recipes_index.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -166,8 +166,8 @@ Recipes are bite-sized bite-sized, actionable examples of how to use specific Py
166166
:tags: Production,TorchScript
167167

168168
.. customcarditem::
169-
:header: PyTorch Mobile Performance Recipes
170-
:card_description: List of recipes for performance optimizations for using PyTorch on Mobile (Android and iOS).
169+
:header: PyTorch ๋ชจ๋ฐ”์ผ ์„ฑ๋Šฅ ๋ ˆ์‹œํ”ผ
170+
:card_description: ๋ชจ๋ฐ”์ผ(Android์™€ iOS) ์ƒ์—์„œ PyTorch๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ ˆ์‹œํ”ผ ๋ชฉ๋ก๋“ค.
171171
:image: ../_static/img/thumbnails/cropped/mobile.png
172172
:link: ../recipes/mobile_perf.html
173173
:tags: Mobile,Model-Optimization

0 commit comments

Comments
ย (0)