Skip to content

Commit 295ea4c

Browse files
authored
docs(guide): convert /tutorias/gridfs/* to rst
Fixes NODE-2206 Fixes NODE-2207
1 parent 31a5c9b commit 295ea4c

File tree

1 file changed

+207
-0
lines changed

1 file changed

+207
-0
lines changed

docs/guide/tutorials/gridfs.txt

+207
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,210 @@
11
======
22
GridFS
33
======
4+
5+
:manual:`GridFS </core/gridfs/>` is a specification for storing and
6+
retrieving files that exceed the :manual:`BSON-document size limit </reference/limits/#limit-bson-document-size>`
7+
of 16 megabytes.
8+
9+
Instead of storing a file in a single document, GridFS divides a file into parts, or chunks,
10+
and stores each of those chunks as a separate document. By default, GridFS limits chunk size
11+
to 255 kilobytes. GridFS uses two collections to store files: the ``chunks`` collection which
12+
stores the file chunks, and the ``files`` collection that stores the file metadata.
13+
14+
When you query a GridFS store for a file, the driver or client will reassemble the chunks as
15+
needed. GridFS is useful not only for storing files that exceed 16 megabytes but also for
16+
storing any files which you want to access without having to load the entire file into memory.
17+
18+
The Node Driver supports GridFS with an api that is compatible with
19+
`Node Streams <https://nodejs.org/dist/latest/docs/api/stream.html>`_ , so you can ``.pipe()``
20+
directly from file streams to MongoDB. In this tutorial, you will see how to use the GridFS
21+
streaming API to upload
22+
`a CC-licensed 28 MB recording of the overture from Richard Wagner's opera *Die Meistersinger von Nurnberg* <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_
23+
to MongoDB using streams.
24+
25+
Uploading a File
26+
----------------
27+
28+
You can use GridFS to upload a file to MongoDB. This example
29+
assumes that you have a file named ``meistersinger.mp3`` in the
30+
root directory of your project. You can use whichever file you want, or you
31+
can just download a `\ *Die Meistersinger* Overture mp3 <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_.
32+
33+
In order to use the streaming GridFS API, you first need to create
34+
a ``GridFSBucket``.
35+
36+
.. code-block:: js
37+
38+
const { MongoClient, GridFSBucket } = require('mongodb');
39+
const { createReadStream, createWriteStream } = require('fs');
40+
const { pipeline } = require('stream');
41+
const { promisify } = require('util');
42+
43+
// Allows us to use async/await with streams
44+
const pipelineAsync = promisify(pipeline);
45+
46+
const uri = 'mongodb://localhost:27017';
47+
48+
const client = new MongoClient(uri);
49+
50+
async function main(client) {
51+
const db = client.db('test');
52+
const bucket = new GridFSBucket(db);
53+
}
54+
55+
// Function to connect to the server and run your code
56+
async function run() {
57+
try {
58+
// Connect the client to the server
59+
await client.connect();
60+
console.log('Connected successfully to server');
61+
62+
await main(client);
63+
} finally {
64+
// Ensures that the client will close when you finish/error
65+
await client.close();
66+
}
67+
}
68+
69+
// Runs your code
70+
run();
71+
72+
73+
The bucket has an ``openUploadStream()`` method that creates an upload stream for a given
74+
file name. You can pipe a Node.js ``fs`` read stream to the upload stream.
75+
76+
.. code-block:: js
77+
78+
async function main(client) {
79+
const db = client.db('test');
80+
const bucket = new GridFSBucket(db);
81+
82+
await pipelineAsync(
83+
createReadStream('./meistersinger.mp3'),
84+
bucket.openUploadStream('meistersinger.mp3')
85+
);
86+
console.log('done!');
87+
}
88+
89+
Assuming that your ``test`` database was empty, you should see that the above
90+
script created 2 collections in your ``test`` database: ``fs.chunks`` and
91+
``fs.files``. The ``fs.files`` collection contains high-level metadata about
92+
the files stored in this bucket. For instance, the file you just uploaded
93+
has a document that looks like what you see below.
94+
95+
.. code-block:: js
96+
97+
> db.fs.files.findOne()
98+
{
99+
"_id" : ObjectId("561fc381e81346c82d6397bb"),
100+
"length" : 27847575,
101+
"chunkSize" : 261120,
102+
"uploadDate" : ISODate("2015-10-15T15:17:21.819Z"),
103+
"md5" : "2459f1cdec4d9af39117c3424326d5e5",
104+
"filename" : "meistersinger.mp3"
105+
}
106+
107+
The above document indicates that the file is named 'meistersinger.mp3', and tells
108+
you its size in bytes, when it was uploaded, and the
109+
`md5 <https://en.wikipedia.org/wiki/MD5>`_ of the contents. There's also a
110+
``chunkSize`` field indicating that the file is
111+
broken up into chunks of size 255 kilobytes, which is the
112+
default.
113+
114+
.. code-block:: js
115+
116+
> db.fs.chunks.count()
117+
107
118+
119+
Not surprisingly, 27847575/261120 is approximately 106.64, so the ``fs.chunks``
120+
collection contains 106 chunks with size 255KB and 1 chunk that's roughly
121+
255KB * 0.64. Each individual chunks document is similar to the document below.
122+
123+
.. code-block:: js
124+
125+
> db.fs.chunks.findOne({}, { data: 0 })
126+
{
127+
"_id" : ObjectId("561fc381e81346c82d6397bc"),
128+
"files_id" : ObjectId("561fc381e81346c82d6397bb"),
129+
"n" : 0
130+
}
131+
132+
The chunk document keeps track of which file it belongs to and its order in
133+
the list of chunks. The chunk document also has a ``data`` field that contains
134+
the raw bytes of the file.
135+
136+
You can configure both the chunk size and the ``fs`` prefix for the files and
137+
chunks collections at the bucket level. For instance, if you specify the
138+
``chunkSizeBytes`` and ``bucketName`` options as shown below, you'll get
139+
27195 chunks in the ``songs.chunks`` collection.
140+
141+
.. code-block:: js
142+
143+
async function main(client) {
144+
const db = client.db('test');
145+
const bucket = new GridFSBucket(db, {
146+
chunkSizeBytes: 1024,
147+
bucketName: 'songs'
148+
});
149+
150+
await pipelineAsync(
151+
createReadStream('./meistersinger.mp3'),
152+
bucket.openUploadStream('meistersinger.mp3')
153+
);
154+
console.log('done!');
155+
}
156+
157+
Downloading a File
158+
------------------
159+
160+
Congratulations, you've successfully uploaded a file to MongoDB! However,
161+
a file sitting in MongoDB isn't particularly useful. In order to stream the
162+
file to your hard drive, an HTTP response, or to npm modules like
163+
`speaker <https://www.npmjs.com/package/speaker>`_\ , you're going to need
164+
a download stream. The easiest way to get a download stream is
165+
the ``openDownloadStreamByName()`` method.
166+
167+
.. code-block:: js
168+
169+
async function main(client) {
170+
const db = client.db('test');
171+
const bucket = new GridFSBucket(db, {
172+
chunkSizeBytes: 1024,
173+
bucketName: 'songs'
174+
});
175+
176+
await pipelineAsync(
177+
bucket.openDownloadStreamByName('meistersinger.mp3'),
178+
createWriteStream('./output.mp3')
179+
);
180+
console.log('done!');
181+
}
182+
183+
Now, you have an ``output.mp3`` file that's a copy of the original
184+
``meistersinger.mp3`` file. The download stream also enables you to do some
185+
neat tricks. For instance, you can cut off the beginning of the song by
186+
specifying a number of bytes to skip. You can cut off the first 41 seconds of
187+
the mp3 and skip right to the good part of the song as shown below.
188+
189+
.. code-block:: js
190+
191+
192+
async function main(client) {
193+
const db = client.db('test');
194+
const bucket = new GridFSBucket(db, {
195+
chunkSizeBytes: 1024,
196+
bucketName: 'songs'
197+
});
198+
199+
await pipelineAsync(
200+
bucket.openDownloadStreamByName('meistersinger.mp3').start(1024 * 1585),
201+
createWriteStream('./output.mp3')
202+
);
203+
console.log('done!');
204+
}
205+
206+
An important point to be aware of regarding performance is that the GridFS
207+
streaming API can't load partial chunks. When a download stream needs to pull a
208+
chunk from MongoDB, it pulls the entire chunk into memory. The 255 kilobyte default
209+
chunk size is usually sufficient, but you can reduce the chunk size to reduce
210+
memory overhead.

0 commit comments

Comments
 (0)