-
Notifications
You must be signed in to change notification settings - Fork 77
/
Copy pathindex.html
261 lines (237 loc) · 7.57 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en ">
<head>
<link rel="icon" type="image/png" href="https://wiki.postgresql.org/images/3/30/PostgreSQL_logo.3colors.120x120.png" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>pg_bulkload: Project Home Page</title>
<link rel="stylesheet" TYPE="text/css"href="style.css">
</head>
<body>
<Div Align="right"><h2><a href="index_ja.html"> Japanese</a></h2></Div>
<center><img style="border: none; margin-left: auto; margin-right: auto; " src="https://wiki.postgresql.org/images/3/30/PostgreSQL_logo.3colors.120x120.png" height="75" width="75" />
<hr />
<h1>Welcome to the pg_bulkload Project Home Page</h1>
<hr />
</center>
<p>
pg_bulkload is a high speed data loading utility for PostgreSQL.
</p>
<p>
<a href="index_ja.html">日本語ページはこちら</a>
</p>
<hr />
<h1 id="documentation">Documentation</h1>
<ul>
<li>
<a href="pg_bulkload.html">High speed data loader: <strong>pg_bulkload</strong> documentation</a>
<ul>
<li>pg_bulkload provides high-speed data loading capability to PostgreSQL users.</li>
</ul>
</li>
<li>
<a href="pg_timestamp.html">Optional tool : <strong>pg_timestamp</strong> documentation</a>
<ul>
<li>pg_timestamp ia an optional tool to skip parsing overhead of timestamp string.</li>
</ul>
</li>
</ul>
<hr />
<h1 id="performance">Performance Results</h1>
<p>
Here is a comparison of the performance of PostgreSQL's COPY and pg_bulkload.
Performance was measured with basic-tuned PostgreSQL server.
</p>
<dl>
<dt>Table definition</dt>
<dd>
Customer table in <a href="http://www.osdl.net/lab_activities/kernel_testing/osdl_database_test_suite/osdl_dbt-2/">DBT-2</a> benchmark, that is an implementation of <a href="http://www.tpc.org/tpcc/">TPC-C</a>.</li>
<dt>Index definition</dt>
<dd>
There are 2 indexes.
The first one is a primary key with one ascending integer column.
The second one is a non-unique index with one random integer column.
</dd>
</dl>
<p>
There are the following measurement patterns.
</p>
<ol>
<li>Initial data loading to an empty table for 4GB of data</li>
<li>Appended data loading to a table with 4GB of data for 1GB of new data</li>
<li>Performance efficiencies by maintenance_work_mem and FILTER features</li>
</ol>
<h2>Result 1: Initial data loading</h2>
<p>
Pg_bulkload on WRITER = PARALLEL mode can load data with an almost half time against COPY.
In PARALLEL mode, performance will be improved on multi-CPU server because reading an input file and writing rows to a table are done with two processes.
</p>
<p>
COPY also can be more fast by loading into table without indexes and create indexes after it.
</p>
<table style="float:left">
<caption>Initial Load (4GB) </caption>
<thead>
<tr>
<th>Item</th>
<th>Duration</th>
<th>Duration comparison</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPY with indexes</td>
<td align="right">500 sec</td>
<td align="center">-</td>
</tr>
<tr>
<td>COPY without indexes<br>+ CREATE INDEX</td>
<td align="right">333 sec <br>(229 sec + 51 sec+ 53 sec)</td>
<td align="right">66.7 %</td>
</tr>
<tr>
<td>pg_bulkload (DIRECT)<br>with indexes</td>
<td align="right">334 sec</td>
<td align="right">66.8 %</td>
</tr>
<tr>
<td>pg_bulkload (PARALLEL)<br>with indexes</td>
<td align="right">221 sec</td>
<td align="right">44.2 %</td>
</tr>
</tbody>
</table>
<div style="float:left">
<img src="./img/initial_load_93.png">
</div>
<h2 style="clear:both">Result 2: Appended data loading</h2>
<p>
Pg_bulkload on WRITER = PARALLEL mode can load data with an almost half time against COPY in this case also.
COPY without indexes is not faster than COPY with indexes. Because it has to create indexes for total records of the table from initial.
</p>
<table style="float:left">
<caption>Appended Load (1GB)</caption>
<thead>
<tr>
<th>Item</th>
<th>Duration</th>
<th>Duration comparison</th>
</tr>
</thead>
<tbody>
<tr>
<td>COPY with indexes</td>
<td align="right">140 sec</td>
<td align="center">-</td>
</tr>
<tr>
<td>COPY without indexes<br>+ CREATE INDEX</td>
<td align="right">187 sec <br>(62 sec + 60 sec + 65 sec)</td>
<td align="right">133.6 %</td>
</tr>
<tr>
<td>pg_bulkload (DIRECT)<br>with indexes</td>
<td align="right">93 sec</td>
<td align="right">66.4 %</td>
</tr>
<tr>
<td>pg_bulkload (PARALLEL)<br>with indexes</td>
<td align="right">70 sec</td>
<td align="right">50.0 %</td>
</tr>
</tbody>
</table>
<div style="float:left">
<img src="./img/appended_load_93.png">
</div>
<h2 style="clear:both">Result 3: Influence from parameters and features</h2>
<p>
The maintenece_work_mem, PostgreSQL parameter, affects the performance of pg_bulkload.
The duration becomes almost 15 % shorter if this parameter is changed from 64 MB to 1 GB.
</p>
<p>
FILTER feature transforms input data in various operations, but it's not free.
The actual measurement shows the loading time is increased to almost 240 % with SQL functions and almost 140 % with C functions.
</p>
<table style="float:left">
<caption>pg_bulkload features</caption>
<thead>
<tr>
<th rowspan="2" colspan="2">項目</th>
<th colspan="2">時間</th>
</tr>
<tr>
<th>Initial (4GB)</th>
<th>Appended (1GB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>MWM = 64 MB</td>
<td>DIRECT</td>
<td align="right">397 sec</td>
<td align="right">109 sec</td>
</tr>
<tr>
<td rowspan="4">MWM = 1 GB</td>
</tr>
<tr>
<td>DIRECT</td>
<td align="right">334 sec</td>
<td align="right">93 sec</td>
</tr>
<tr>
<td>DIRECT with SQL FILTER</td>
<td align="right">801 sec</td>
<td align="right">216 sec</td>
</tr>
<tr>
<td>DIRECT with C FILTER</td>
<td align="right">456 sec</td>
<td align="right">126 sec</td>
</tr>
</tbody>
</table>
<div style="float:left">
<img src="./img/filter.png">
</div>
<h2 style="clear:both">Conditions</h2>
<table>
<thead>
<tr><th>Item</th><th>Value</th></tr>
</thead>
<tbody>
<tr><td>Server</td><td>Dell PowerEdge R410</td></tr>
<tr><td>CPU</td><td>Intel Xeon E5645 (2.4GHz) 12 core * 2</td></tr>
<tr><td>Memory</td><td>32GB</td></tr>
<tr><td>Disks</td><td>SAS 10000rpm 2TB * 4</td></tr>
<tr><td>OS</td><td>CentOS 6.2 (64bit)</td></tr>
<tr><td>PostgreSQL version</td><td>9.3.4</td></tr>
<tr><td>pg_bulkload version</td><td>3.1.6</td></tr>
<tr><td>shared_buffers</td><td>3210MB</td></tr>
<tr><td>checkpoint_segments</td><td>300</td></tr>
<tr><td>checkpoint_timeout</td><td>15min</td></tr>
<tr><td>work_mem</td><td>1MB</td></tr>
<tr><td>maintenance_work_mem</td><td>1GB</td></tr>
<tr><td>Table definition</td><td>DBT-2 customer table</td></tr>
<tr><td rowspan="2">Indexed columns</td><td>c_id (PRIMARY KEY)</td></tr>
<tr><td>c_d_id (non-unique B-Tree)</td></tr>
<tr><td>Constraints</td><td>NOT NULL for all columns</td></tr>
<tr><td>Input file format</td><td>CSV</td></tr>
</tbody>
</table>
<hr />
<p class="footer">Copyright (c) 2007-2025, NIPPON TELEGRAPH AND TELEPHONE CORPORATION</p>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-10244036-1");
pageTracker._trackPageview();
} catch(err) {}</script>
</body>
</html>