Disclaimer: having only used MongoDB on and off for a few years, I’m sure there’s a more efficient way to do this. But it’s a problem that I struggled to find a good answer for, and this solution seems fairly readable. Do let me know if there’s a better way to do it.

It’s not uncommon to have documents / records / items that can be ordered in several ways: by name, by date, by length, etc. It’s also not uncommon to select those documents by more than one criteria. In my particular case, I needed to:

  • group some documents by name
  • get the most recent document (i.e. the one with the latest timestamp) in each of those groups

This seems like it would be a common of multi-criteria searches, but I struggled to find a clear implementation for a few hours. Hopefully I can remedy that now.

MongoDB’s aggregation pipeline

In SQL, one would probably accomplish multi-criteria searches with some combination of joins, aggregate functions, and nested queries, which can get ugly very quickly. Mongo, on the other hand, uses what it calls an aggregation pipeline (see here for a condensed version).

This allows several “stages” of queries and document manipulations to be chained together in whatever order you need. Each stage returns a set of documents that can be filtered or manipulated by future stages, making it easy to break the query logic into distinct, readable steps.

The solution

To give you a visual representation, let’s say you have a bookshelf with books of several colors, and you want to get the most recently published book of each color. In Mongo, the data might look something like this:

{ "_id": 1, "color": "red", "date": ISODate("2013-05-10T00:00:00Z") }
{ "_id": 2, "color": "blue", "date": ISODate("2016-10-20T00:00:00Z") }
{ "_id": 3, "color": "green", "date": ISODate("2015-09-14T00:00:00Z") }
{ "_id": 4, "color": "green", "date": ISODate("2015-07-19T00:00:00Z") }
{ "_id": 5, "color": "red", "date": ISODate("2017-01-04T00:00:00Z") }
{ "_id": 6, "color": "blue", "date": ISODate("2014-03-06T00:00:00Z") }

In this particular case, you need books 2, 3, and 5.

Using db.books.aggregate(), here’s the approach:

  1. Sort the books by date in descending order.
    • { $sort: { "date": -1 } }

  2. Group the books by color. Add each document in its entirety to the group.
    • { $group: { _id: "$color", books: { $push: "$$ROOT" } } }
    • $group creates a totally new set of documents. Here, we’re saying that the _id of each group will come from the color field of our book documents. So we’ll end with three groups with _ids “red”, “green”, and “blue”.
    • books: { $push: "$$ROOT" } creates a books array in each color group, and adds each original document to the array in the correct group. For example, books in the red document would contain books 1 and 5 above. $$ROOT is used to specify that we want to add the whole document to the array (not just a single attribute).

  3. Since we sorted the books in descending date order before we grouped them, we know that the list of books in each color group will also be in descending date order. To get the most recent book, we just need to grab the first element of the books array.
    • { $replaceRoot: { newRoot: { $arrayElemAt: ["$books", 0] } } }
    • $replaceRoot means that we want to replace the entire document in question with the document specified by newRoot. For example, the red document will be completely replaced by the first element of red.books (book 5 above).

Putting it all together:

result = db.books.aggregate([
    { $sort: { "date": -1 } },
    { $group: {
        _id: "$color",
        books: { $push: "$$ROOT" }
    { $replaceRoot: {
        newRoot: { $arrayElemAt: ["$books", 0] }

Done! All the brackets are a bit of an eyesore, but I’ll take that over SQL any day :P

Docs for all the pipeline aggregation operators can be found at the links provided above.