Skip to content

Commit 51d497c

Browse files
authored
DOCSP-45207 Transform data with aggregation (#131)
1 parent e9ead1c commit 51d497c

File tree

3 files changed

+324
-5
lines changed

3 files changed

+324
-5
lines changed

source/aggregation.txt

+258
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
.. _ruby-aggregation:
2+
3+
====================================
4+
Transform Your Data with Aggregation
5+
====================================
6+
7+
.. facet::
8+
:name: genre
9+
:values: reference
10+
11+
.. meta::
12+
:keywords: code example, transform, computed, pipeline
13+
:description: Learn how to use the Ruby driver to perform aggregation operations.
14+
15+
.. contents:: On this page
16+
:local:
17+
:backlinks: none
18+
:depth: 2
19+
:class: singlecol
20+
21+
.. TODO:
22+
.. toctree::
23+
:titlesonly:
24+
:maxdepth: 1
25+
26+
/aggregation/aggregation-tutorials
27+
28+
Overview
29+
--------
30+
31+
In this guide, you can learn how to use the {+driver-short+} to perform
32+
**aggregation operations**.
33+
34+
Aggregation operations process data in your MongoDB collections and
35+
return computed results. The MongoDB Aggregation framework, which is
36+
part of the Query API, is modeled on the concept of data processing
37+
pipelines. Documents enter a pipeline that contains one or more stages,
38+
and this pipeline transforms the documents into an aggregated result.
39+
40+
An aggregation operation is similar to a car factory. A car factory has
41+
an assembly line, which contains assembly stations with specialized
42+
tools to do specific jobs, like drills and welders. Raw parts enter the
43+
factory, and then the assembly line transforms and assembles them into a
44+
finished product.
45+
46+
The **aggregation pipeline** is the assembly line, **aggregation stages** are the
47+
assembly stations, and **operator expressions** are the
48+
specialized tools.
49+
50+
Compare Aggregation and Find Operations
51+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52+
53+
The following table lists the different tasks that find
54+
operations can perform and compares them to what aggregation
55+
operations can perform. The aggregation framework provides
56+
expanded functionality that allows you to transform and manipulate
57+
your data.
58+
59+
.. list-table::
60+
:header-rows: 1
61+
:widths: 50 50
62+
63+
* - Find Operations
64+
- Aggregation Operations
65+
66+
* - | Select certain documents to return
67+
| Select which fields to return
68+
| Sort the results
69+
| Limit the results
70+
| Count the results
71+
- | Select certain documents to return
72+
| Select which fields to return
73+
| Sort the results
74+
| Limit the results
75+
| Count the results
76+
| Rename fields
77+
| Compute new fields
78+
| Summarize data
79+
| Connect and merge data sets
80+
81+
Limitations
82+
~~~~~~~~~~~
83+
84+
Consider the following limitations when performing aggregation operations:
85+
86+
- Returned documents cannot violate the
87+
:manual:`BSON document size limit </reference/limits/#mongodb-limit-BSON-Document-Size>`
88+
of 16 megabytes.
89+
- Pipeline stages have a memory limit of 100 megabytes by default. You can exceed this
90+
limit by passing a value of ``true`` to the ``allow_disk_use`` method and chaining the
91+
method to ``aggregate``.
92+
- The :manual:`$graphLookup </reference/operator/aggregation/graphLookup/>`
93+
operator has a strict memory limit of 100 megabytes and ignores the
94+
value passed to the ``allow_disk_use`` method.
95+
96+
.. _ruby-run-aggregation:
97+
98+
Run Aggregation Operations
99+
--------------------------
100+
101+
.. note:: Sample Data
102+
103+
The examples in this guide use the ``restaurants`` collection in the ``sample_restaurants``
104+
database from the :atlas:`Atlas sample datasets </sample-data>`. To learn how to create a
105+
free MongoDB Atlas cluster and load the sample datasets, see the :atlas:`Get Started with Atlas
106+
</getting-started>` guide.
107+
108+
To perform an aggregation, define each pipeline stage as a Ruby ``hash``, and
109+
then pass the pipeline of operations to the ``aggregate`` method.
110+
111+
.. _ruby-aggregation-example:
112+
113+
Aggregation Example
114+
~~~~~~~~~~~~~~~~~~~
115+
116+
The following code example produces a count of the number of bakeries in each
117+
borough of New York. To do so, it uses an aggregation pipeline with the
118+
following stages:
119+
120+
- A :manual:`$match </reference/operator/aggregation/match/>` stage to filter for documents whose ``cuisine`` field contains
121+
the value ``"Bakery"``.
122+
- A :manual:`$group </reference/operator/aggregation/group/>` stage to group the matching documents by the ``borough`` field,
123+
accumulating a count of documents for each distinct value.
124+
125+
.. io-code-block::
126+
:copyable:
127+
128+
.. input:: /includes/aggregation.rb
129+
:start-after: start-aggregation
130+
:end-before: end-aggregation
131+
:language: ruby
132+
:dedent:
133+
134+
.. output::
135+
:visible: false
136+
137+
{"_id"=>"Bronx", "count"=>71}
138+
{"_id"=>"Manhattan", "count"=>221}
139+
{"_id"=>"Queens", "count"=>204}
140+
{"_id"=>"Missing", "count"=>2}
141+
{"_id"=>"Staten Island", "count"=>20}
142+
{"_id"=>"Brooklyn", "count"=>173}
143+
144+
Explain an Aggregation
145+
~~~~~~~~~~~~~~~~~~~~~~
146+
147+
To view information about how MongoDB executes your operation, you can instruct
148+
the MongoDB :manual:`query planner </core/query-plans>` to **explain** it. When
149+
MongoDB explains an operation, it returns **execution plans** and performance
150+
statistics. An execution plan is a potential way in which MongoDB can complete
151+
an operation. When you instruct MongoDB to explain an operation, it returns both
152+
the plan MongoDB executed and any rejected execution plans by default.
153+
154+
To explain an aggregation operation, chain the ``explain`` method to the
155+
``aggregate`` method.
156+
157+
The following example instructs MongoDB to explain the aggregation operation
158+
from the preceding :ref:`ruby-aggregation-example`:
159+
160+
.. io-code-block::
161+
:copyable:
162+
163+
.. input:: /includes/aggregation.rb
164+
:start-after: start-explain-aggregation
165+
:end-before: end-explain-aggregation
166+
:language: ruby
167+
:dedent:
168+
169+
.. output::
170+
:visible: false
171+
172+
{"explainVersion"=>"2", "queryPlanner"=>{"namespace"=>"sample_restaurants.restaurants",
173+
"parsedQuery"=>{"cuisine"=> {"$eq"=> "Bakery"}}, "indexFilterSet"=>false,
174+
"planCacheKey"=>"6104204B", "optimizedPipeline"=>true, "maxIndexedOrSolutionsReached"=>false,
175+
"maxIndexedAndSolutionsReached"=>false, "maxScansToExplodeReached"=>false,
176+
"prunedSimilarIndexes"=>false, "winningPlan"=>{"isCached"=>false,
177+
"queryPlan"=>{"stage"=>"GROUP", "planNodeId"=>3,
178+
"inputStage"=>{"stage"=>"COLLSCAN", "planNodeId"=>1, "filter"=>{},
179+
"direction"=>"forward"}},...}
180+
181+
Run an Atlas Full-Text Search
182+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
183+
184+
.. note:: Only Available on Atlas for MongoDB v4.2 and later
185+
186+
This aggregation pipeline operator is only available for collections hosted
187+
on :atlas:`MongoDB Atlas </>` clusters running v4.2 or later that are
188+
covered by an :atlas:`Atlas Search index </reference/atlas-search/index-definitions/>`.
189+
190+
To specify a full-text search of one or more fields, you can create a
191+
``$search`` pipeline stage.
192+
193+
This example creates pipeline stages to perform the following actions:
194+
195+
- Search the ``name`` field for the term ``"Salt"``
196+
- Project only the ``_id`` and the ``name`` values of matching documents
197+
198+
.. important::
199+
200+
To run the following example, you must create an Atlas Search index on the ``restaurants``
201+
collection that covers the ``name`` field. Then, replace the ``"<your_search_index_name>"``
202+
placeholder with the name of the index.
203+
204+
.. TODO: Add a link in the callout to the Atlas Search index creation guide.
205+
206+
.. io-code-block::
207+
:copyable:
208+
209+
.. input:: /includes/aggregation.rb
210+
:start-after: start-search-aggregation
211+
:end-before: end-search-aggregation
212+
:language: ruby
213+
:dedent:
214+
215+
.. output::
216+
:visible: false
217+
218+
{"_id"=> {"$oid"=> "..."}, "name"=> "Fresh Salt"}
219+
{"_id"=> {"$oid"=> "..."}, "name"=> "Salt & Pepper"}
220+
{"_id"=> {"$oid"=> "..."}, "name"=> "Salt + Charcoal"}
221+
{"_id"=> {"$oid"=> "..."}, "name"=> "A Salt & Battery"}
222+
{"_id"=> {"$oid"=> "..."}, "name"=> "Salt And Fat"}
223+
{"_id"=> {"$oid"=> "..."}, "name"=> "Salt And Pepper Diner"}
224+
225+
Additional Information
226+
----------------------
227+
228+
MongoDB Server Manual
229+
~~~~~~~~~~~~~~~~~~~~~
230+
231+
To learn more about the topics discussed in this guide, see the following
232+
pages in the {+mdb-server+} manual:
233+
234+
- To view a full list of expression operators, see :manual:`Aggregation
235+
Operators </reference/operator/aggregation/>`.
236+
237+
- To learn about assembling an aggregation pipeline and to view examples, see
238+
:manual:`Aggregation Pipeline </core/aggregation-pipeline/>`.
239+
240+
- To learn more about creating pipeline stages, see :manual:`Aggregation
241+
Stages </reference/operator/aggregation-pipeline/>`.
242+
243+
- To learn more about explaining MongoDB operations, see
244+
:manual:`Explain Output </reference/explain-results/>` and
245+
:manual:`Query Plans </core/query-plans/>`.
246+
247+
.. TODO:
248+
Aggregation Tutorials
249+
~~~~~~~~~~~~~~~~~~~~~
250+
251+
.. To view step-by-step explanations of common aggregation tasks, see
252+
.. :ref:`ruby-aggregation-tutorials-landing`.
253+
254+
API Documentation
255+
~~~~~~~~~~~~~~~~~
256+
257+
To learn more about the Ruby driver's aggregation methods, see the
258+
API documentation for `Aggregation <{+api-root+}/Mongo/Collection/View/Aggregation.html>`__.

source/includes/aggregation.rb

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
require 'bundler/inline'
2+
gemfile do
3+
source 'https://rubygems.org'
4+
gem 'mongo'
5+
end
6+
7+
uri = '<connection string URI>'
8+
9+
Mongo::Client.new(uri) do |client|
10+
#start-aggregation
11+
database = client.use('sample_restaurants')
12+
restaurants_collection = database[:restaurants]
13+
14+
pipeline = [
15+
{ '$match' => { 'cuisine' => 'Bakery' } },
16+
{ '$group' => {
17+
'_id' => '$borough',
18+
'count' => { '$sum' => 1 }
19+
}
20+
}
21+
]
22+
23+
aggregation = restaurants_collection.aggregate(pipeline)
24+
25+
aggregation.each do |doc|
26+
puts doc
27+
end
28+
#end-aggregation
29+
30+
#start-explain-aggregation
31+
explanation = restaurants_collection.aggregate(pipeline).explain()
32+
33+
puts explanation
34+
#end-explain-aggregation
35+
36+
#start-search-aggregation
37+
search_pipeline = [
38+
{
39+
'$search' => {
40+
'index' => '<your_search_index_name>',
41+
'text' => {
42+
'query' => 'Salt',
43+
'path' => 'name'
44+
},
45+
}
46+
},
47+
{
48+
'$project' => {
49+
'_id' => 1,
50+
'name' => 1
51+
}
52+
}
53+
]
54+
55+
results = collection.aggregate(search_pipeline)
56+
57+
results.each do |document|
58+
puts document
59+
end
60+
#end-search-aggregation
61+
end

source/index.txt

+5-5
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
Read Data </read>
1919
Operations on Replica Sets </read-write-pref>
2020
Indexes </indexes>
21+
Data Aggregation </aggregation>
2122
Security </security>
2223
Data Formats </data-formats>
2324
View the Source <https://github.com/mongodb/mongo-ruby-driver>
@@ -29,7 +30,6 @@
2930
.. TODO:
3031
Write Data </write>
3132
Monitor Your Application </monitoring>
32-
Data Aggregation </aggregation>
3333
Security </security>
3434
Issues & Help </issues-and-help>
3535
Upgrade </upgrade>
@@ -80,11 +80,11 @@ Learn how to configure read and write operations on a replica set in the
8080
.. Learn how to work with common types of indexes in the :ref:`ruby-indexes`
8181
.. section.
8282

83-
.. Transform Your Data with Aggregation
84-
.. ------------------------------------
83+
Transform Your Data with Aggregation
84+
------------------------------------
8585

86-
.. Learn how to use the {+driver-short+} to perform aggregation operations in the
87-
.. :ref:`ruby-aggregation` section.
86+
Learn how to use the {+driver-short+} to perform aggregation operations in the
87+
:ref:`ruby-aggregation` section.
8888

8989
.. Secure Your Data
9090
.. ----------------

0 commit comments

Comments
 (0)