MongoDB indexes: $in operator under the hood

Danya Sliusar
2 min readNov 28, 2020

At Wimark we are developing a platform to manage access points. They are our basic “access control” objects - if you have an access point in your access list, all linked data (such as clients, statistics, SSID, and so on) will be accessed to you.

List of access points (aka CPE — customer premises equipment) in location “demo”

We use MongoDB and of course, all clients are stored in the same collection (with MAC address as ID and link to an access point in cpe_id field). And for optimal queries, it has an index on the field “cpe_id”.

Client object (partly) from the collection “clients”

To get all clients from your access points you need the next query:

db.clients.find({cpe_id: {$in: [...]}})

On the small list of clients above query has great performance, but on 1M (million) client collection with 200 access points in the query, we have very slow performance. Let’s find why.

Find all clients from one access point

In the above example with clients from one access point, MongoDB will search for accurate values in the index with O(logN) time (where N stands for a collection size). Let’s find clients for two access points:

Find all clients from two access points

Wait.. we see two ranges for bounds to find in the index. It will cost us 2*O(logN) time. And if the array size will increase this estimation will be growth. To find all clients from M access points we need M*O(logN) time!

The only way to decrease this estimation is to use the additional field (like “group_id”) with an index on it instead of an array of access points.

Ba-Dum-Tss! Better to know your DB’s internals before design data model. 😏

--

--