What Happens Behind the Scenes When You Modify a Row in PostgreSQL?
Data is often called the new gold, and databases are where we store and manage this precious resource as it constantly changes and grows. At first glance, updating data might seem like a simple task—just modify a row. But behind the scenes, it’s more complex to ensure that data remains consistent and accessible. In today’s blog, I’ll answer some frequently asked questions from our customers and dive into why PostgreSQL relies on a process called VACUUM to efficiently manage data updates.Updating a row in PostgreSQL isn’t as straightforward as directly changing the existing data. Instead, PostgreSQL avoids in-place updates, meaning it doesn’t overwrite rows directly. But what does this actually mean? When an update occurs, PostgreSQL creates a new row, inserts the updated data there, and marks this new row as the latest version. The old row, meanwhile, is flagged as obsolete row.A similar process applies to deletes, where rows are marked as outdated rather than removed immediately. This raises an interesting question on why did PostgreSQL choose this more complex approach for handling updates and deletes? The answer lies in its design philosophy, which is rooted in Multi-Version Concurrency Control (MVCC). MVCC ensures data consistency and allows for high concurrency.What is Multi-Version Concurrency Control (MVCC)?Multi-Version Concurrency Control (MVCC) allows PostgreSQL to manage multiple transactions at once, enabling consistent data views for each transaction without interference.Imagine a library with a single book titled The Ultimate Guide to Databases. Without Multi-Version Concurrency Control (MVCC), if two people want to check out the book at the same time, one would have to wait for the other to finish reading it. This is similar to a traditional database where transactions can block each other, preventing simultaneous access.With MVCC, the process works differently. When the first person checks out the book, the library creates a copy just for them. The second person can also check out the book at the same time, but they receive their own copy. Both individuals can read and make notes in their respective copies without affecting the other’s experience. Once they’re done, they return the copies, and the library can clean them up for future use.In this analogy, the book represents a data record in a database, and the copies of the book are like different versions of that data. MVCC allows PostgreSQL to create and manage multiple versions of data, enabling multiple transactions to access and modify the data concurrently without interfering with each other. This ensures that each transaction gets a consistent view of the data while allowing for high performance and concurrency.However, just like the library ends up with multiple copies of the book that are no longer being read, PostgreSQL ends up with versions of the data that are no longer needed, called dead tuples. These dead tuples are like outdated copies of the book that no one is checking out anymore. Over time, as more transactions occur, these dead tuples accumulate, taking up space and potentially slowing down the system. This is where the process of vacuuming comes in—just like the library regularly clears out old, unused books to make room for new ones, PostgreSQL uses vacuuming to clean up dead tuples, reclaim storage, and maintain optimal performance.