-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDeviceAdapter.tex
354 lines (304 loc) · 18.2 KB
/
DeviceAdapter.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
% -*- latex -*-
\chapter{Device Adapters}
\label{chap:DeviceAdapter}
\index{device adapter|(}
As multiple vendors vie to provide accelerator-type processors, a great
variance in the computer architecture exists. Likewise, there exist
multiple compiler environments and libraries for these devices such as
CUDA, OpenMP, and various threading libraries. These compiler technologies
also vary from system to system.
To make porting among these systems at all feasible, we require a base
language support, and the language we use is C++. The majority of the code
in VTK-m is constrained to the standard C++ language constructs to minimize
the specialization from one system to the next.
Each device and device technology requires some level of code
specialization, and that specialization is encapsulated in a unit called a
\keyterm{device adapter}. Thus, porting VTK-m to a new architecture can be
done by adding only a device adapter.
The device adapter is shown diagrammatically as the connection between the
control and execution environments in Figure~\ref{fig:VTKmDiagram} on
page~\pageref{fig:VTKmDiagram}. The functionality of the device adapter
comprises two main parts: a collection of parallel algorithms run in the
execution environment and a module to transfer data between the control and
execution environments.
This chapter describes how tags are used to specify which devices to use
for operations within VTK-m. The chapter also outlines the features provided
by a device adapter that allow you to directly control a device. Finally,
we document how to create a new device adapter to port or specialize VTK-m
for a different device or system.
\section{Device Adapter Tag}
\label{sec:DeviceAdapterTag}
\index{device adapter tag|(}
\index{tag!device adapter|(}
A device adapter is identified by a \keyterm{device adapter tag}. This tag,
which is simply an empty struct type, is used as the template parameter for
several classes in the VTK-m control environment and causes these classes
to direct their work to a particular device.
There are two ways to select a device adapter. The first is to make a
global selection of a default device adapter. The second is to specify a
specific device adapter as a template parameter.
\subsection{Default Device Adapter}
\label{sec:DefaultDeviceAdapter}
A default device adapter tag is specified in
\vtkmheader{vtkm/cont}{DeviceAdapter.h}. If no other
information is given, VTK-m attempts to choose a default device adapter
that is a best fit for the system it is compiled on. VTK-m currently select
the default device adapter with the following sequence of conditions.
\begin{itemize}
\item \index{CUDA}
If the source code is being compiled by CUDA, the CUDA device is used.
%% \item \index{OpenMP} If the CUDA compiler is not being used and the current
%% compiler supports OpenMP, then the OpenMP device is used.
\item \index{Intel Threading Building Blocks} \index{TBB}
If the compiler does not support CUDA and VTK-m was configured to use Intel Threading Building Blocks, then that device is used.
\item \index{serial}
If no parallel device adapters are found, then VTK-m falls back to a serial device.
\end{itemize}
You can also set the default device adapter specifically by setting the
\vtkmmacro{VTKM\_DEVICE\_ADAPTER} macro. This macro must be set
\emph{before} including any VTK-m files. You can set
\vtkmmacro{VTKM\_DEVICE\_ADAPTER} to any one of the following.
\begin{description}
\item[\vtkmmacro{VTKM\_DEVICE\_ADAPTER\_SERIAL}] Performs all computation on
the same single thread as the control environment. This device is useful
for debugging. This device is always available.
\item[\vtkmmacro{VTKM\_DEVICE\_ADAPTER\_CUDA}] Uses a CUDA capable GPU
device. For this device to work, VTK-m must be configured to use CUDA and
the code must be compiled by the CUDA \textfilename{nvcc} compiler.
%% \item[\vtkmmacro{VTKM\_DEVICE\_ADAPTER\_OPENMP}] Uses OpenMP compiler
%% extensions to run algorithms on multiple threads. For this device to
%% work, VTK-m must be configured to use OpenMP and the code must be
%% compiled with a compiler that supports OpenMP pragmas.
\item[\vtkmmacro{VTKM\_DEVICE\_ADAPTER\_TBB}] Uses the Intel Threading
Building Blocks library to run algorithms on multiple threads. For this
device to work, VTK-m must be configured to use TBB and the executable
must be linked to the TBB library.
\end{description}
These macros provide a useful mechanism for quickly reconfiguring code to
run on different devices. The following example shows a typical block of
code at the top of a source file that could be used for quick
reconfigurations.
\vtkmlisting{Macros to port VTK-m code among different devices}{DefaultDeviceAdapter.cxx}
\begin{didyouknow}
Filters do not actually use the default device adapter tag. They have a
more sophisticated device selection mechanism that determines the devices
available at runtime and will attempt running on multiple devices.
\end{didyouknow}
The default device adapter can always be overridden by specifying a device
adapter tag, as described in the next section. There is actually one more
internal default device adapter named
\vtkmmacro{VTKM\_DEVICE\_ADAPTER\_ERROR} that will cause a compile error if
any component attempts to use the default device adapter. This feature is
most often used in testing code to check when device adapters should be
specified.
\subsection{Specifying Device Adapter Tags}
In addition to setting a global default device adapter, it is possible to
explicitly set which device adapter to use in any feature provided by
VTK-m. This is done by providing a device adapter tag as a template
argument to VTK-m templated objects. The following device adapter tags are
available in VTK-m.
\index{device adapter tag!provided|(}
\index{tag!device adapter!provided|(}
\begin{description}
\item[\vtkmcont{DeviceAdapterTagSerial}] \index{serial} Performs all
computation on the same single thread as the control environment. This
device is useful for debugging. This device is always available. This tag
is defined in \vtkmheader{vtkm/cont}{DeviceAdapterSerial.h}.
\item[\vtkmcont{DeviceAdapterTagCuda}] \index{CUDA} Uses a CUDA capable
GPU device. For this device to work, VTK-m must be configured to use CUDA
and the code must be compiled by the CUDA \textfilename{nvcc}
compiler. This tag is defined in
\vtkmheader{vtkm/cont/cuda}{DeviceAdapterCuda.h}.
%% \item[\vtkmcont{DeviceAdapterTagOpenMP}] \index{OpenMP} Uses OpenMP
%% compiler extensions to run algorithms on multiple threads. For this
%% device to work, VTK-m must be configured to use OpenMP and the code must be
%% compiled with a compiler that supports OpenMP pragmas. This tag is
%% defined in \vtkmheader{vtkm/openmp/cont}{DeviceAdapterOpenMP.h}.
\item[\vtkmcont{DeviceAdapterTagTBB}]
\index{Intel Threading Building Blocks} \index{TBB} Uses the Intel
Threading Building Blocks library to run algorithms on multiple
threads. For this device to work, VTK-m must be configured to use TBB and
the executable must be linked to the TBB library. This tag is defined in
\vtkmheader{vtkm/cont/tbb}{DeviceAdapterTBB.h}.
\end{description}
\index{tag!device adapter!provided|)}
\index{device adapter tag!provided|)}
The following example uses the tag for the Intel Threading Building blocks
device adapter to prepare an output array for that device. In this case,
the device adapter tag is passed as a parameter for the
\textcode{PrepareForOutput} \index{PrepareForOutput} method of
\vtkmcont{ArrayHandle}.
\vtkmlisting{Specifying a device using a device adapter tag.}{SpecifyDeviceAdapter.cxx}
When structuring your code to always specify a particular device adapter,
consider setting the default device adapter (with the
\vtkmmacro{VTKM\_DEVICE\_ADAPTER} macro) to
\vtkmmacro{VTKM\_DEVICE\_ADAPTER\_ERROR}. This will cause the compiler to
produce an error if any object is instantiated with the default device
adapter, which checks to make sure the code properly specifies every device
adapter parameter.
VTK-m also defines a macro named
\vtkmmacro{VTKM\_DEFAULT\_DEVICE\_ADAPTER\_TAG}, which can be used in place
of an explicit device adapter tag to use the default tag. This macro is
used to create new templates that have template parameters for device
adapters that can use the default. The following example defines a functor
to be used with the \textcode{Schedule} operation (to be described later)
that is templated on the device it uses.
\vtkmlisting[ex:DefaultDeviceTemplateArg]{Specifying a default device for template parameters.}{DefaultDeviceTemplateArg.cxx}
\begin{commonerrors}
A device adapter tag is a class just like every other type in C++. Thus
it is possible to accidently use a type that is not a device adapter tag
when one is expected. This leads to unexpected errors in strange parts of
the code. To help identify these errors, it is good practice to use the
\vtkmmacro{VTKM\_IS\_DEVICE\_ADAPTER\_TAG} macro to verify the type is a
valid device adapter tag. Example~\ref{ex:DefaultDeviceTemplateArg} uses
this macro on line 4.
\end{commonerrors}
\index{tag!device adapter|)}
\index{device adapter tag|)}
\section{Device Adapter Algorithms}
\label{sec:DeviceAdapterAlgorithms}
\index{device adapter!algorithm|(}
\index{algorithm|(}
VTK-m comes with the templated class \vtkmcont{DeviceAdapterAlgorithm} that
provides a set of algorithms that can be invoked in the control environment
and are run on the execution environment. The template has a single
argument that specifies the device adapter tag.
\vtkmlisting{Prototype for \protect\vtkmcont{DeviceAdapterAlgorithm}.}{DeviceAdapterAlgorithmPrototype.cxx}
\textidentifier{DeviceAdapterAlgorithm} contains no state. It only has a
set of static methods that implement its algorithms. The following methods
are available.
\begin{didyouknow}
Many of the following device adapter algorithms take input and output
\textidentifier{ArrayHandle}s, and these functions will handle their own
memory management. This means that it is unnecessary to allocate output
arrays. \index{Allocate} For example, it is unnecessary to call
\textidentifier{ArrayHandle}\textcode{::Allocate()} for the output array
passed to the \textcode{Copy} method.
\end{didyouknow}
\begin{description}
\item[\textcode{Copy}] \index{copy} Copies data from an input array to an
output array. The copy takes place in the execution environment.
\item[\textcode{LowerBounds}] \index{lower bounds} The
\textcode{LowerBounds} method takes three arguments. The first argument
is an \textidentifier{ArrayHandle} of sorted values. The second argument
is another \textidentifier{ArrayHandle} of items to find in the first
array. \textcode{LowerBounds} find the index of the first item that is
greater than or equal to the target value, much like the
\textcode{std::lower\_bound} STL algorithm. The results are returned in
an \textidentifier{ArrayHandle} given in the third argument.
There are two specializations of \textcode{LowerBounds}. The first takes
an additional comparison function that defines the less-than
operation. The second takes only two parameters. The first is an
\textidentifier{ArrayHandle} of sorted \vtkm{Id}s and the second is an
\textidentifier{ArrayHandle} of \vtkm{Id}s to find in the first list. The
results are written back out to the second array. This second
specialization is useful for inverting index maps.
\item[\textcode{Reduce}] \index{reduce} The \textcode{Reduce} method takes
an input array, initial value, and a binary function and computes a
``total'' of applying the binary function to all entries in the array.
The provided binary function must be associative (but it need not be
commutative). There is a specialization of \textcode{Reduce} that does
not take a binary function and computes the sum.
\item[\textcode{ReduceByKey}] \index{reduce by key} The
\textcode{ReduceByKey} method works similarly to the \textcode{Reduce}
method except that it takes an additional array of keys, which must be
the same length as the values being reduced. The arrays are partitioned
into segments that have identical adjacent keys, and a separate reduction
is performed on each partition. The unique keys and reduced values are
returned in separate arrays.
\item[\textcode{ScanInclusive}] \index{scan!inclusive} The
\textcode{ScanInclusive} method takes an input and an output
\textidentifier{ArrayHandle} and performs a running sum on the input
array. The first value in the output is the same as the first value in
the input. The second value in the output is the sum of the first two
values in the input. The third value in the output is the sum of the
first three values of the input, and so on. \textcode{ScanInclusive}
returns the sum of all values in the input. There are two forms of
\textcode{ScanInclusive}: one performs the sum using addition whereas the
other accepts a custom binary function to use as the sum operator.
\item[\textcode{ScanExclusive}] \index{scan!exclusive} The
\textcode{ScanExclusive} method takes an input and an output
\textidentifier{ArrayHandle} and performs a running sum on the input
array. The first value in the output is always 0. The second value in the
output is the same as the first value in the input. The third value in
the output is the sum of the first two values in the input. The fourth
value in the output is the sum of the first three values of the input,
and so on. \textcode{ScanExclusive} returns the sum of all values in the
input. There are two forms of \textcode{ScanExclusive}: one performs the
sum using addition whereas the other accepts a custom binary function to
use as the sum operator and a custom initial value to use in the running
sum.
\item[\textcode{Schedule}] \index{schedule} The \textcode{Schedule} method
takes a functor as its first argument and invokes it a number of times
specified by the second argument. It should be assumed that each
invocation of the functor occurs on a separate thread although in
practice there could be some thread sharing.
There are two versions of the \textcode{Schedule} method. The first
version takes a \vtkm{Id} and invokes the functor that number of
times. The second version takes a \vtkm{Id3} and invokes the functor once
for every entry in a 3D array of the given dimensions.
The functor is expected to be an object with a const overloaded
parentheses operator. The operator takes as a parameter the index of the
invocation, which is either a \vtkm{Id} or a \vtkm{Id3} depending on what
version of \textcode{Schedule} is being used. The functor must also
subclass \vtkmexec{FunctorBase}, which provides the error handling
facilities for the execution environment. \textidentifier{FunctorBase}
contains a public method named \index{RaiseError}
\index{errors!execution environment} \textcode{RaiseError} that takes a
message and will cause a \vtkmcont{ErrorExecution} exception to be thrown
in the control environment.
\item[\textcode{Sort}] \index{sort} The \textcode{Sort} method provides an
unstable sort of an array. There are two forms of the \textcode{Sort}
method. The first takes an \textidentifier{ArrayHandle} and sorts the
values in place. The second takes an additional argument that is a
functor that provides the comparison operation for the sort.
\item[\textcode{SortByKey}] \index{sort!by key} The \textcode{SortByKey}
method works similarly to the \textcode{Sort} method except that it takes
two \textidentifier{ArrayHandle}s: an array of keys and a corresponding
array of values. The sort orders the array of keys in ascending values
and also reorders the values so they remain paired with the same
key. Like \textcode{Sort}, \textcode{SortByKey} has a version that sorts
by the default less-than operator and a version that accepts a custom
comparison functor.
\item[\textcode{StreamCompact}] \index{stream compact} The
\textcode{StreamCompact} method selectively removes values from an
array. The first argument is an \textidentifier{ArrayHandle} to be
compacted and the second argument is an \textidentifier{ArrayHandle} of
equal size with flags indicating whether the corresponding input value is
to be copied to the output. The third argument is an output
\textidentifier{ArrayHandle} whose length is set to the number of true
flags in the stencil and the passed values are put in order to the output
array.
There is also a second form of \textidentifier{StreamCompact} that only
has the stencil and output as arguments. In this version, the output gets
the corresponding index of where the input should be taken from.
\item[\textcode{Synchronize}] \index{synchronize} The
\textidentifier{Synchronize} method waits for any asynchronous operations
running on the device to complete and then returns.
\item[\textcode{Unique}] \index{unique} The \textcode{Unique} method
removes all duplicate values in an \textidentifier{ArrayHandle}. The
method will only find duplicates if they are adjacent to each other in
the array. The easiest way to ensure that duplicate values are adjacent
is to sort the array first.
There are two versions of \textcode{Unique}. The first uses the equals
operator to compare entries. The second accepts a binary functor to
perform the comparisons.
\item[\textcode{UpperBounds}] \index{upper bounds} The
\textcode{UpperBounds} method takes three arguments. The first argument
is an \textidentifier{ArrayHandle} of sorted values. The second argument
is another \textidentifier{ArrayHandle} of items to find in the first
array. \textcode{UpperBounds} find the index of the first item that is
greater than to the target value, much like the
\textcode{std::upper\_bound} STL algorithm. The results are returned in
an \textidentifier{ArrayHandle} given in the third argument.
There are two specializations of \textcode{UpperBounds}. The first takes
an additional comparison function that defines the less-than
operation. The second takes only two parameters. The first is an
\textidentifier{ArrayHandle} of sorted \vtkm{Id}s and the second is an
\textidentifier{ArrayHandle} of \vtkm{Id}s to find in the first list. The
results are written back out to the second array. This second
specialization is useful for inverting index maps.
\end{description}
\index{algorithm|)}
\index{device adapter!algorithm|)}
\index{device adapter|)}