Q: Assume there is a document with nested arrays that looks like the one below. How can you insert a “room” that has the name “Room 44” and size of “50” for a particular “house” that belongs to this user? { "_id": "682263", "userName" : "sherif", "email": "sharief@aucegypt.edu", "password": "67834783ujk", "houses": [ { "_id": "2178123", "name": "New Mansion", "rooms": [ { "name": "4th bedroom", "size": "12" }, { "name": "kitchen", "size": "100" } ] } ] }

We can do so with the following code, commented inline: db.users.update( { "_id": ObjectId("682263"), "houses._id":"2178123" // identify the id for the house that we want to update }, { "$push": { "houses.$.rooms": // identify the array we want to push items into { "name": "Room 44", // this is the payload that needs to be pushed "size": "50" } } } )

Question 1

What are the pros and cons of normalizing data in a MongoDB database?

Accepted Answer

Just like in traditional RDBMSes, updating documents is fast for normalized data and relatively slower for denormalized data. On the other hand, reading documents is fast in denormalized data and slower for normalized data. Denormalized data is harder to keep in sync and takes up more space.

Note that in MongoDB, denormalized data is a more common expectation. This is because RDBMSes have inherent support for normalization and allow data to be managed as a separate concern, whereas NoSQL DBMSes like MongoDB do not inherently support normalization.

Instead, normalization requires that client applications carefully maintain integrity themselves. To help with this, it’s possible to run audits to ensure that app data conforms to expected patterns of referential integrity.

Question 2

How are sharding and replication similar to each other in MongoDB, and how are they different?

Accepted Answer

Both sharding and replication involve using more than one instance to host the database.

Replicas are MongoDB instances with the same data, hence the name. We use replicas to increase redundancy and availability.

With sharding, on the other hand, each shard instance has different data from its neighbors. We use sharding for horizontal scaling.

Question 3

What is the difference between the save and insert commands in MongoDB, and when do they act similarly?

Accepted Answer

Whether we provide an _id determines the expected result for both of these commands. Here is the expected outcome for each case. save command while providing an _id: In this case, the newly provided document replaces the document found with a matching _id. save command while not providing an _id: Inserts a new document. insert command while providing an _id: Gives a E11000 duplicate key error listing the collection, index, and duplicate key. insert command while not providing an _id: Inserts a new document. As you can see, both the insert and save commands act similarly only when we do not provide an _id. For example, the commands below would give us the same result:

db.cars.save({motor:"6-cylinder",color:"black"})
db.cars.insert({motor:"6-cylinder",color:"black"})

Question 4

What are some differences between BSON documents used in MongoDB and JSON documents in general?

Accepted Answer

JSON (JavaScript Object Notation)—like XML, for example—is a human-readable standard used for data exchange. JSON has become the most widely used standard for data exchange on the web. JSON supports data types like booleans, numbers, strings, and arrays.

BSON, however, is the binary encoding that MongoDB uses to store its documents. It is similar to JSON, but it extends JSON to support more data types, like Date. BSON documents, unlike JSON documents, are ordered. BSON usually takes less space than JSON and is faster to traverse. BSON, since it is binary, is also quicker to encode and decode.

Question 5

What is the difference between the $all operator and the $in operator?

Accepted Answer

Both the $all operator and the $in operator are used to filter documents in a subarray based on a conditional. Let us assume we have the following documents in a collection.

[{
        "name": "Youssef",
        "sports": [
            "Boxing",
            "Wrestling",
            "Football"
        ]
    },
    {
        "name": "Kevin",
        "sports": [
            "Wrestling",
            "Football"
        ]
    },
    {
        "name": "Eva",
        "sports": [
            "Boxing",
            "Football"
        ]
    }
]

Using $all as shown below will return only the first two documents:

db.users.find({
    sports: {
        $all: ["Wrestling", "Football"]
    }
})

Using $in will return all three documents:

db.users.find({
    skills: {
        $in: ["Wrestling", "Football"]
    }
})

The $all operator is stricter than the $in operator. $all is comparable to an AND conditional, and likewise $in resembles an OR conditional. That is to say, $all retrieves documents that satisfy all conditions in the query array, whereas $in retrieves documents that meet any condition in the query array.

Question 6

Assume there is a document with nested arrays that looks like the one below. How can you insert a “room” that has the name “Room 44” and size of “50” for a particular “house” that belongs to this user?

{
    "_id": "682263",
    "userName" : "sherif",
    "email": "sharief@aucegypt.edu",
    "password": "67834783ujk",
    "houses": [
        {
        "_id": "2178123",
        "name": "New Mansion",
        "rooms": [
            {
            "name": "4th bedroom",
            "size": "12"
            },
            {
            "name": "kitchen",
            "size": "100"
            }
        ]
        }
  ]
}

Accepted Answer

We can do so with the following code, commented inline:

db.users.update(
    { 
        "_id": ObjectId("682263"),
        "houses._id":"2178123"     // identify the id for the house that we want to update
    },
    { "$push":   
        {
            "houses.$.rooms":      // identify the array we want to push items into
                {                  
                    "name": "Room 44",      // this is the payload that needs to be pushed 
                    "size": "50"
                }
        }
    }
)

Question 7

Assume there is a collection named users that looks like the one below. How can you get all houses in the “Rabia” neighborhood?

[
    {
        "_id" : ObjectId("5d011c94ee66e13d34c7c388"),
        "userName" : "kevin",
        "email" : "kevin@toptal.com",
        "password" : "affdsg342",
        "houses" : [
            {
                "name" : "Big Villa",
                "neighborhood" : "Zew Ine"
            },
            {
                "name" : "Small Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },

    {
        "_id" : ObjectId("5d011c94ee66e13d34c7c387"),
        "userName" : "sherif",
        "email" : "sharief@toptal.com",
        "password" : "67834783ujk",
        "houses" : [
            {
                "name" : "New Mansion",
                "neighborhood" : "Nasr City"
            },
            {
                "name" : "Old Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },

]

Accepted Answer

Use the $filter aggregation operator. The query is:

db.users.aggregate([
    { $match: { 'houses.neighborhood': 'Rabia' } },
    {
        $project: {
            filteredHouses: {   // This is just an alias 
                $filter: {
                    input: '$houses', // The field name we are checking
                    as: 'houseAlias', // just an alias
                    cond: { $eq: ['$$houseAlias.neighborhood', 'Rabia'] }
                }
            },
            _id: 0
        }
    }

])

The first match query will return all documents that have a house with the name Rabia. The first query in the pipeline, {$match: {'houses.neighborhood': 'Rabia'}}, will return the whole collection. This is because both users have one house in the neighborhood “Rabia”. This is the return for the first query in the pipeline

[
    {
        "_id" : ObjectId("5d011c94ee66e13d34c7c388"),
        "userName" : "kevin",
        "email" : "kevin@toptal.com",
        "password" : "affdsg342",
        "houses" : [
            {
                "name" : "Big Villa",
                "neighborhood" : "Zew Ine"
            },
            {
                "name" : "Small Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },

    {
        "_id" : ObjectId("5d011c94ee66e13d34c7c387"),
        "userName" : "sherif",
        "email" : "sharief@toptal.com",
        "password" : "67834783ujk",
        "houses" : [
            {
                "name" : "New Mansion",
                "neighborhood" : "Nasr City"
            },
            {
                "name" : "Old Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },

]

We do not want to display other user details nor display houses other than those in Rabia, so we will use the $filter operator inside the $project operator:

{
    $project: {
        filteredHouses: {   // This is just an alias 
            $filter: {
                input: '$houses', // The field name we check
                as: 'houseAlias', // just an alias
                cond: { $eq: ['$$houseAlias.neighborhood', 'Rabia'] }
            }
        },
        _id: 0
    }
}

The $$ prefix is required on houseAlias (instead of simply one $) due to nesting. Here is the result we obtain at the end of the pipeline:

[
    {
        "filteredHouses" : [
            {
                "name" : "Old Villa",
                "neighborhood" : "Rabia"
            }
        ]
    },
    {
        "filteredHouses" : [
            {
                "name" : "Small Villa",
                "neighborhood" : "Rabia"
            }
        ]
    }
]

Question 8

Could you catch how the two queries below are different?

dealers.find({
    "$and": [
        {
            "length": {
                "$gt": 2000
            }
        },
        {
            "cars.weight": {
                "$gte": 800
            }
        }
    ]
});

dealers.find({
    "length": {
        "$gt": 2000
    },

    "cars.weight": {
        "$gte": 800
    }
});

Accepted Answer

Actually, they are exactly the same. MongoDB implicitly uses the $and operator for comma-separated queries. Which one to use is more a matter of preference than best practices.

Question 9

“Both writes and reads become faster when you add more slaves to a replica set.” Is this statement true or false? Why?

Accepted Answer

False. All write operations are done only on the master. On the other hand, read operations can be made on any instance—slave or master. Therefore, only reads become faster when you add more slaves to a replica set.

Question 10

A staple feature of relational database systems is the JOIN clause. What is the equivalent in MongoDB, and does it have any known limitations?

Accepted Answer

The $lookup operator is the equivalent of JOIN. Here is an example of a nested lookup in MongoDB. Assume we have three collections (authors, authorInfo, and userRole) with the following data:

// authors collection
[
    {
        "_id" : ObjectId("5d0127aaee66e13d34c7c389"),
        "address" : "32 Makram Ebeid Street",
        "isActive" : true,
        "authorId" : "121"
    }
]

// authorInfo collection
[
    {
        "_id" : ObjectId("5d0f726bac65f929d0fa98b2"),
        "authorId" : "121",
        "description" : "A description"
    }
]

// userRole collection
[
    {
        "_id" : ObjectId("5d012a08ee66e13d34c7c38f"),
        "userId" : "121",
        "role" : "manager"
    }
]

What if we want to join the authors from all three collections? In the SQL world, a JOIN query for this might look like:

SELECT a._id, a.address, b.description, c.role
  FROM authors a
  INNER JOIN "authorInfo" b ON b."authorId" = a."authorId"
  INNER JOIN "userRole" c ON c."userId" = a."authorId"

But in MongoDB, here is the equivalent query:

db.authors.aggregate([

    // Join with authorInfo table
    {
        $lookup:{
            from: "authorInfo",       // connecting authorInfo collection
            localField: "authorId",   // name of field in the authors collection
            foreignField: "authorId", // name of field in the authorInfo collection
            as: "authorInfoAlias"     // any alias
        }
    },
    {   $unwind:"$authorInfoAlias" }, // use the alias here

    // Join with userRole collection
    {
        $lookup:{
            from: "userRole", 
            localField: "authorId", 
            foreignField: "userId",
            as: "authorRoleAlias"
        }
    },
    {   $unwind:"$authorRoleAlias" },
    {   
        $project: {                                          // Just projecting our data.
            _id : 1,
            address : 1,
            description : "$authorInfoAlias.description",
            role : "$authorRoleAlias.role",
        } 
    }

The $ prefix is required for aliases to work. The result of this query is the following:

[
    {
        "_id" : ObjectId("5d0127aaee66e13d34c7c389"),
        "address" : "32 Makram Ebeid Street",
        "description" : "A description",
        "role" : "manager"
    }
]

The major drawback of the $lookup operator is that it does not work in sharded collections. It’s worth noting that, instead of looking for a direct equivalent to JOIN, a more common approach with MongoDB developers is to simple denormalize the data, precluding the need for a JOIN equivalent.

10 Essential MongoDB Interview Questions *

Submit an interview question