Nicolas Martyanoff – Brain dump About

Pagination for database objects

So you have a server storing objects in a relational database, and an API, nowadays probably HTTP but it does not matter. Clients can fetch objects using the API. Obviously you do not want them loading too many objects at the same time, that would cause all kinds of performance issues. So you need a way to let clients request a manageable subset of all objects and iterate to finally obtain all objects. This is pagination.

I used to think it was a trivial problem. But given the number of complex and inefficient solutions I have seen over the years, I was clearly mistaken. Let us find out how to do it easily and efficiently.

Limit and offset

One of the first things you learn with SQL is how to use LIMIT and OFFSET to restrict the number of rows to retrieve.

Let us define a user table and load 10 users after the first 20:

CREATE TABLE users
  (id SERIAL PRIMARY KEY,
   name VARCHAR NOT NULL);

SELECT * FROM users
  LIMIT 10 OFFSET 20;

Have your clients send a limit and offset with each query, increment the offset to iterate. Easy, but there is a problem: the highest the offset is, the slower the request will be. LIMIT and OFFSET are applied after filtering, so if you have a million users, selecting the last 10 ones will still require processing the 999'990 before them. Not great.

Worse, this method can and will provide incorrect results when retrieving pages one after the other: adding rows to the table between each selection will make subsequent pages either miss rows or include some which we already returned in previous pages.

SQL Cursors

It can be tempting to use SQL cursors. After all, they let you consume rows iteratively without loading the whole result set in memory. But they are really not designed for the task: you would need to keep the associated database connection open between requests. Next idea.

Key-based pagination

Sometimes called “keyset” pagination, this method uses a key from the last row of the page you just loaded to obtain the next one efficiently.

Let us load the first page:

SELECT * FROM users
  ORDER BY id
  LIMIT 10;

Then for the next page, have the client provide the identifier of the last user in the page —let us say 42— and use it to load the second page:

SELECT * FROM users
  ORDER BY id
  WHERE id > 42
  LIMIT 10;

Continue the same way for subsequent pages.

As long as you are filtering on an indexed expression, the database will efficiently retrieve the rows for the page you requested. And you can use the exact same method to order results in the other direction, just invert the WHERE expression.

You will also avoid the inconsistencies of the LIMIT/OFFSET method: if the table is modified between two page retrieval operations, you may still miss rows (nothing you can do about it unless you are willing to retrieve the entire table in one operation), but you will not see rows from previous pages reappear in subsequent pages. Your users will thank you (or more accurately they will open less bug reports; even better).

Sorting

So you started using key-based pagination and everything looks fine, but now you need to sort results based on something else than the primary key. Your first idea could be to create an index on the column you want to use to sort, but it will not help: if multiple users have the same name, how will you select your pages?

We could use an index on two columns, name and id to obtain the strict total order we need then filter on these two columns:

CREATE INDEX users_name_id_idx
  ON users (name, id);

SELECT * FROM users
  WHERE (name, id) > ('bob', 42)
  ORDER BY name, id
  LIMIT 10;

But this is not standard SQL and would only work with databases supporting row-value comparisons.

Fortunately SQL lets you create indices on expressions and not just columns. So we create our own single-value keys by concatenating values. Of course we need a separator to avoid collisions: without it ('Bob', 42) and ('Bob4', 2) would both yield the same 'Bob42' key). The ASCII character set includes several separator characters which are perfect for this use case since they are not supposed to end up in your data (if they do, use another character); let us use the unit separator 0x1f:

CREATE INDEX users_name_pagination_idx
  ON users ((name || E'\x1f' || id));

Load the first page:

SELECT * FROM users
  ORDER BY name || E'\x1f' || id
  LIMIT 10;

And assuming the last user of the first page had the name Bob and the identifier 42, Load the second page.

SELECT * FROM users
  WHERE name || E'\x1f' || id > E'Bob\x1f42'
  ORDER BY name || E'\x1f' || id
  LIMIT 10;

Annoying to type but efficient and easy to generate. Of course you will want to make it easy for your clients: when returning a page, compute the key for the previous and next page and return them along the objects. The client can then trivially fetch the previous or next page.

Need to support multiple sort criteria? Just create an index for each of them.

That wasn’t so hard right?

Share the word!

Liked my article? Follow me on Twitter or on Mastodon to see what I'm up to.