PostgreSQL COUNT(*): The Simplest Query Can Become Your Worst Enemy
How It All Started
I recently faced an interesting problem with our product DMSContainer. As installations increased worldwide, the centralized license management system (which, by the way, is itself a DMSContainer module!) started showing signs of slowdown. Nothing serious in absolute terms, but multiplied by thousands of queries per minute, it was creating a problem. The culprit? A seemingly innocent query: SELECT COUNT(*) FROM licenses
(the actual query is a bit more complex, but for our purposes, we can consider this simplified version).
This experience made me reflect on how often the impact of counting operations in databases is underestimated. It’s a problem that sooner or later affects every growing application, whether it’s a document management system, an e-commerce application, or any other system that accumulates data over time.
The Problem Is Not Just About Speed
The first reaction when a COUNT(*)
becomes slow is to think it’s just a performance issue. But the real problem is much more insidious: the blocking of other operations.
Let’s do a practical experiment. Assuming we have a “very_big_table” containing millions of records, open two PostgreSQL terminals and try this:
-- Terminal 1
BEGIN;
SELECT COUNT(*) FROM my_table WHERE status = 'active'; -- millions of records
-- Terminal 2
UPDATE my_table SET status = 'inactive' WHERE id = 1;
For many, it might seem strange, but the UPDATE waits until the count completes. Clearly, the effect is more evident the more records are in the table. But even with few records, you could have the same problem in case of high concurrency. But why does this happen?
The Mechanism Behind the Scenes
PostgreSQL uses a system called MVCC (Multi-Version Concurrency Control) to manage concurrent transactions. When you execute a COUNT(*), the database must:
- Create a consistent “snapshot” of the data
- Keep this snapshot unchanged during the counting
- Verify each row for visibility in the current transaction
It’s as if, to count people in a store, you had to temporarily block all doors - nobody can enter or exit until you finish counting.
The First Warning Signs
How do you recognize if COUNT(*) is becoming a problem? Here are some signals:
- Count queries taking more than a few seconds
- Sporadic timeouts on update operations
- Users reporting momentary application “freezes”
- Reports or dashboards becoming increasingly slower
Basic Solutions
The first thing many try is adding an index:
CREATE INDEX idx_my_table_status ON my_table(status);
This can help, but it doesn’t solve the fundamental blocking problem. Another quick solution is using PostgreSQL statistics:
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'my_table';
This query is lightning fast because it doesn’t actually count rows but uses database statistics. The trade-off? It’s not 100% accurate.
When These Solutions Are Not Enough
Basic solutions work well up to a point, but there are situations where more sophisticated approaches are needed:
- When you need precise and real-time counts
- When your count queries include multiple conditions or JOINs
- When your data volume continues to grow
- When you can’t afford blocks, even brief ones
Towards More Advanced Solutions
At this point, it’s clear that more sophisticated strategies are needed. The questions we need to ask are:
- How can we maintain accurate counts without blocking the database?
- How do we handle complex counts with multiple conditions?
- How do we balance accuracy and performance?
- How do we scale as data continues to grow?
[The article continues on Patreon…]
Want to know how to handle these problems professionally? In the Patreon version of this article, I’ll delve into:
- Advanced counting techniques with triggers and support tables
- Intelligent caching strategies for complex queries
- Strategic denormalization patterns
- Optimized count handling in high-concurrency scenarios
- Continuous performance monitoring and optimization
Join the supporters on Patreon to access the full content and other in-depth technical articles!
Comments
comments powered by Disqus