Become a member!

PostgreSQL COUNT(*): The Simplest Query Can Become Your Worst Enemy

  • 👉 This article is available in english too.
  • 👉 Este artículo también está disponible en español.
  • 👉 Questo articolo è disponibile anche in italiano.

How It All Started

I recently faced an interesting problem with our product DMSContainer. As installations increased worldwide, the centralized license management system (which, by the way, is itself a DMSContainer module!) started showing signs of slowdown. Nothing serious in absolute terms, but multiplied by thousands of queries per minute, it was creating a problem. The culprit? A seemingly innocent query: SELECT COUNT(*) FROM licenses (the actual query is a bit more complex, but for our purposes, we can consider this simplified version).

This experience made me reflect on how often the impact of counting operations in databases is underestimated. It’s a problem that sooner or later affects every growing application, whether it’s a document management system, an e-commerce application, or any other system that accumulates data over time.

The Problem Is Not Just About Speed

The first reaction when a COUNT(*) becomes slow is to think it’s just a performance issue. But the real problem is much more insidious: the blocking of other operations.

Let’s do a practical experiment. Assuming we have a “very_big_table” containing millions of records, open two PostgreSQL terminals and try this:

-- Terminal 1
BEGIN;
SELECT COUNT(*) FROM my_table WHERE status = 'active'; -- millions of records

-- Terminal 2
UPDATE my_table SET status = 'inactive' WHERE id = 1;

For many, it might seem strange, but the UPDATE waits until the count completes. Clearly, the effect is more evident the more records are in the table. But even with few records, you could have the same problem in case of high concurrency. But why does this happen?

The Mechanism Behind the Scenes

PostgreSQL uses a system called MVCC (Multi-Version Concurrency Control) to manage concurrent transactions. When you execute a COUNT(*), the database must:

  1. Create a consistent “snapshot” of the data
  2. Keep this snapshot unchanged during the counting
  3. Verify each row for visibility in the current transaction

It’s as if, to count people in a store, you had to temporarily block all doors - nobody can enter or exit until you finish counting.

The First Warning Signs

How do you recognize if COUNT(*) is becoming a problem? Here are some signals:

  • Count queries taking more than a few seconds
  • Sporadic timeouts on update operations
  • Users reporting momentary application “freezes”
  • Reports or dashboards becoming increasingly slower

Basic Solutions

The first thing many try is adding an index:

CREATE INDEX idx_my_table_status ON my_table(status);

This can help, but it doesn’t solve the fundamental blocking problem. Another quick solution is using PostgreSQL statistics:

SELECT reltuples::bigint AS estimate 
FROM pg_class 
WHERE relname = 'my_table';

This query is lightning fast because it doesn’t actually count rows but uses database statistics. The trade-off? It’s not 100% accurate.

When These Solutions Are Not Enough

Basic solutions work well up to a point, but there are situations where more sophisticated approaches are needed:

  • When you need precise and real-time counts
  • When your count queries include multiple conditions or JOINs
  • When your data volume continues to grow
  • When you can’t afford blocks, even brief ones

Towards More Advanced Solutions

At this point, it’s clear that more sophisticated strategies are needed. The questions we need to ask are:

  • How can we maintain accurate counts without blocking the database?
  • How do we handle complex counts with multiple conditions?
  • How do we balance accuracy and performance?
  • How do we scale as data continues to grow?

[The article continues on Patreon…]


Want to know how to handle these problems professionally? In the Patreon version of this article, I’ll delve into:

  • Advanced counting techniques with triggers and support tables
  • Intelligent caching strategies for complex queries
  • Strategic denormalization patterns
  • Optimized count handling in high-concurrency scenarios
  • Continuous performance monitoring and optimization

Join the supporters on Patreon to access the full content and other in-depth technical articles!

Comments

comments powered by Disqus