|
1 | 1 | ======
|
2 | 2 | GridFS
|
3 | 3 | ======
|
| 4 | + |
| 5 | +:manual:`GridFS </core/gridfs/>` is a specification for storing and |
| 6 | +retrieving files that exceed the :manual:`BSON-document size limit </reference/limits/#limit-bson-document-size>` |
| 7 | +of 16 megabytes. |
| 8 | + |
| 9 | +Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, |
| 10 | +and stores each of those chunks as a separate document. By default, GridFS limits chunk size |
| 11 | +to 255 kilobytes. GridFS uses two collections to store files: the ``chunks`` collection which |
| 12 | +stores the file chunks, and the ``files`` collection that stores the file metadata. |
| 13 | + |
| 14 | +When you query a GridFS store for a file, the driver or client will reassemble the chunks as |
| 15 | +needed. GridFS is useful not only for storing files that exceed 16 megabytes but also for |
| 16 | +storing any files which you want to access without having to load the entire file into memory. |
| 17 | + |
| 18 | +The Node Driver supports GridFS with an api that is compatible with |
| 19 | +`Node Streams <https://nodejs.org/dist/latest/docs/api/stream.html>`_ , so you can ``.pipe()`` |
| 20 | +directly from file streams to MongoDB. In this tutorial, you will see how to use the GridFS |
| 21 | +streaming API to upload |
| 22 | +`a CC-licensed 28 MB recording of the overture from Richard Wagner's opera *Die Meistersinger von Nurnberg* <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_ |
| 23 | +to MongoDB using streams. |
| 24 | + |
| 25 | +Uploading a File |
| 26 | +---------------- |
| 27 | + |
| 28 | +You can use GridFS to upload a file to MongoDB. This example |
| 29 | +assumes that you have a file named ``meistersinger.mp3`` in the |
| 30 | +root directory of your project. You can use whichever file you want, or you |
| 31 | +can just download a `\ *Die Meistersinger* Overture mp3 <https://musopen.org/music/213/richard-wagner/die-meistersinger-von-nurnberg-overture/>`_. |
| 32 | + |
| 33 | +In order to use the streaming GridFS API, you first need to create |
| 34 | +a ``GridFSBucket``. |
| 35 | + |
| 36 | +.. code-block:: js |
| 37 | + |
| 38 | + const { MongoClient, GridFSBucket } = require('mongodb'); |
| 39 | + const { createReadStream, createWriteStream } = require('fs'); |
| 40 | + const { pipeline } = require('stream'); |
| 41 | + const { promisify } = require('util'); |
| 42 | + |
| 43 | + // Allows us to use async/await with streams |
| 44 | + const pipelineAsync = promisify(pipeline); |
| 45 | + |
| 46 | + const uri = 'mongodb://localhost:27017'; |
| 47 | + |
| 48 | + const client = new MongoClient(uri); |
| 49 | + |
| 50 | + async function main(client) { |
| 51 | + const db = client.db('test'); |
| 52 | + const bucket = new GridFSBucket(db); |
| 53 | + } |
| 54 | + |
| 55 | + // Function to connect to the server and run your code |
| 56 | + async function run() { |
| 57 | + try { |
| 58 | + // Connect the client to the server |
| 59 | + await client.connect(); |
| 60 | + console.log('Connected successfully to server'); |
| 61 | + |
| 62 | + await main(client); |
| 63 | + } finally { |
| 64 | + // Ensures that the client will close when you finish/error |
| 65 | + await client.close(); |
| 66 | + } |
| 67 | + } |
| 68 | + |
| 69 | + // Runs your code |
| 70 | + run(); |
| 71 | + |
| 72 | + |
| 73 | +The bucket has an ``openUploadStream()`` method that creates an upload stream for a given |
| 74 | +file name. You can pipe a Node.js ``fs`` read stream to the upload stream. |
| 75 | + |
| 76 | +.. code-block:: js |
| 77 | + |
| 78 | + async function main(client) { |
| 79 | + const db = client.db('test'); |
| 80 | + const bucket = new GridFSBucket(db); |
| 81 | + |
| 82 | + await pipelineAsync( |
| 83 | + createReadStream('./meistersinger.mp3'), |
| 84 | + bucket.openUploadStream('meistersinger.mp3') |
| 85 | + ); |
| 86 | + console.log('done!'); |
| 87 | + } |
| 88 | + |
| 89 | +Assuming that your ``test`` database was empty, you should see that the above |
| 90 | +script created 2 collections in your ``test`` database: ``fs.chunks`` and |
| 91 | +``fs.files``. The ``fs.files`` collection contains high-level metadata about |
| 92 | +the files stored in this bucket. For instance, the file you just uploaded |
| 93 | +has a document that looks like what you see below. |
| 94 | + |
| 95 | +.. code-block:: js |
| 96 | + |
| 97 | + > db.fs.files.findOne() |
| 98 | + { |
| 99 | + "_id" : ObjectId("561fc381e81346c82d6397bb"), |
| 100 | + "length" : 27847575, |
| 101 | + "chunkSize" : 261120, |
| 102 | + "uploadDate" : ISODate("2015-10-15T15:17:21.819Z"), |
| 103 | + "md5" : "2459f1cdec4d9af39117c3424326d5e5", |
| 104 | + "filename" : "meistersinger.mp3" |
| 105 | + } |
| 106 | + |
| 107 | +The above document indicates that the file is named 'meistersinger.mp3', and tells |
| 108 | +you its size in bytes, when it was uploaded, and the |
| 109 | +`md5 <https://en.wikipedia.org/wiki/MD5>`_ of the contents. There's also a |
| 110 | +``chunkSize`` field indicating that the file is |
| 111 | +broken up into chunks of size 255 kilobytes, which is the |
| 112 | +default. |
| 113 | + |
| 114 | +.. code-block:: js |
| 115 | + |
| 116 | + > db.fs.chunks.count() |
| 117 | + 107 |
| 118 | + |
| 119 | +Not surprisingly, 27847575/261120 is approximately 106.64, so the ``fs.chunks`` |
| 120 | +collection contains 106 chunks with size 255KB and 1 chunk that's roughly |
| 121 | +255KB * 0.64. Each individual chunks document is similar to the document below. |
| 122 | + |
| 123 | +.. code-block:: js |
| 124 | + |
| 125 | + > db.fs.chunks.findOne({}, { data: 0 }) |
| 126 | + { |
| 127 | + "_id" : ObjectId("561fc381e81346c82d6397bc"), |
| 128 | + "files_id" : ObjectId("561fc381e81346c82d6397bb"), |
| 129 | + "n" : 0 |
| 130 | + } |
| 131 | + |
| 132 | +The chunk document keeps track of which file it belongs to and its order in |
| 133 | +the list of chunks. The chunk document also has a ``data`` field that contains |
| 134 | +the raw bytes of the file. |
| 135 | + |
| 136 | +You can configure both the chunk size and the ``fs`` prefix for the files and |
| 137 | +chunks collections at the bucket level. For instance, if you specify the |
| 138 | +``chunkSizeBytes`` and ``bucketName`` options as shown below, you'll get |
| 139 | +27195 chunks in the ``songs.chunks`` collection. |
| 140 | + |
| 141 | +.. code-block:: js |
| 142 | + |
| 143 | + async function main(client) { |
| 144 | + const db = client.db('test'); |
| 145 | + const bucket = new GridFSBucket(db, { |
| 146 | + chunkSizeBytes: 1024, |
| 147 | + bucketName: 'songs' |
| 148 | + }); |
| 149 | + |
| 150 | + await pipelineAsync( |
| 151 | + createReadStream('./meistersinger.mp3'), |
| 152 | + bucket.openUploadStream('meistersinger.mp3') |
| 153 | + ); |
| 154 | + console.log('done!'); |
| 155 | + } |
| 156 | + |
| 157 | +Downloading a File |
| 158 | +------------------ |
| 159 | + |
| 160 | +Congratulations, you've successfully uploaded a file to MongoDB! However, |
| 161 | +a file sitting in MongoDB isn't particularly useful. In order to stream the |
| 162 | +file to your hard drive, an HTTP response, or to npm modules like |
| 163 | +`speaker <https://www.npmjs.com/package/speaker>`_\ , you're going to need |
| 164 | +a download stream. The easiest way to get a download stream is |
| 165 | +the ``openDownloadStreamByName()`` method. |
| 166 | + |
| 167 | +.. code-block:: js |
| 168 | + |
| 169 | + async function main(client) { |
| 170 | + const db = client.db('test'); |
| 171 | + const bucket = new GridFSBucket(db, { |
| 172 | + chunkSizeBytes: 1024, |
| 173 | + bucketName: 'songs' |
| 174 | + }); |
| 175 | + |
| 176 | + await pipelineAsync( |
| 177 | + bucket.openDownloadStreamByName('meistersinger.mp3'), |
| 178 | + createWriteStream('./output.mp3') |
| 179 | + ); |
| 180 | + console.log('done!'); |
| 181 | + } |
| 182 | + |
| 183 | +Now, you have an ``output.mp3`` file that's a copy of the original |
| 184 | +``meistersinger.mp3`` file. The download stream also enables you to do some |
| 185 | +neat tricks. For instance, you can cut off the beginning of the song by |
| 186 | +specifying a number of bytes to skip. You can cut off the first 41 seconds of |
| 187 | +the mp3 and skip right to the good part of the song as shown below. |
| 188 | + |
| 189 | +.. code-block:: js |
| 190 | + |
| 191 | + |
| 192 | + async function main(client) { |
| 193 | + const db = client.db('test'); |
| 194 | + const bucket = new GridFSBucket(db, { |
| 195 | + chunkSizeBytes: 1024, |
| 196 | + bucketName: 'songs' |
| 197 | + }); |
| 198 | + |
| 199 | + await pipelineAsync( |
| 200 | + bucket.openDownloadStreamByName('meistersinger.mp3').start(1024 * 1585), |
| 201 | + createWriteStream('./output.mp3') |
| 202 | + ); |
| 203 | + console.log('done!'); |
| 204 | + } |
| 205 | + |
| 206 | +An important point to be aware of regarding performance is that the GridFS |
| 207 | +streaming API can't load partial chunks. When a download stream needs to pull a |
| 208 | +chunk from MongoDB, it pulls the entire chunk into memory. The 255 kilobyte default |
| 209 | +chunk size is usually sufficient, but you can reduce the chunk size to reduce |
| 210 | +memory overhead. |
0 commit comments