MVCC (Multi Version Concurrency Control)

Rohit Sharma
Sep 6, 2021
3 min read

Updated: Sep 18, 2021

Hello Readers,

Hope you all are doing well!

we are back with another interesting topic in the database which is MVCC.

most of the backend engineers must hear about this jargon so we are just gonna redefine it in our own way.

MVCC(Multi-Version Concurrency Control ) simply stands for "One record or row of data having multiple versions".

Yeah, that's all MVCC is but let's deep dive into this.

Suppose there is a table named users_name having the following columns

									 users_name
id (primary key)  ||     name(varchar(60))

Now we insert a single record so our SQL statement will be like

insert into users_name (id,name) values (1, 'Rohit');

after inserting this record our table will look like this

									users_name
id (primary key)  ||     name(varchar(60))  
1    ||      Rohit

Hmm, Simple now let's talk about Lil bit inside database internal concept whatever we save into the database is saved into HDD (Hard disk drive) and in hard disk, we have data pages that store the data in our case when we have inserted one row or record so that record will be saved in dedicated data pages. (If you want to know more about paging in a database or about how pages sizes can be managed and other concepts do let me know in the comment section. :-) )

Till now I hope we are clear now.

So now run another query which is an update query

update users_name set name='Database noobs' where id=1

So after running the update query the table will look like this

users_name
id (primary key)  ||     name(varchar(60))  
1             ||      Database noobs

Simple we are good here so you must be wonder where does the MVCC comes from it's simple to insert and update operations but if we pull up the curtain and check into database internals the following this happened when we execute the update query

Instead of overwriting the current data pages in HDD its create a new data page and saved the updated data i.e.;(Database noobs)

from the above statement, we understand that we have two data pages one for 'Rohit' and another for 'Database noobs' so just quickly go back and read our first statement i.e.;

"One record or row of data having multiple versions".

So yeah here we go we have only one record having id=1 but if we see at database paging level we have two versions one is after update operation and one is at insert operation.

So yeah that's all the MVCC concept is.

So suppose now we are running our favourite SELECT query i.e;

	SELECT name FROM users_name;

(Note:- After updating the record whenever we run the select query we always gonna get the updated version because the database marked the previous version as a dead tuple)

So we gonna get 'Database noobs' so what will happen about our previous version which was 'Rohit' what will be the use of old records? why its still in database?

So let's clear all these queries here:-

So suppose we have our favourite table users_name and multiple users are concurrently accessing the record with id=1 so the following things will be happening:-

User A is reading record with id=1 i.e.; 'Database noobs' and at the same time User B is updating the record with id=1 with value 'We know MVCC'
So what are we gonna show to user A should be shown the value which is about to be updated i.e; 'We know MVCC'?
But it might be the case that the value that is about to be updated (We know MVCC) may fail and we go into READ UNCOMMITTED or DIRTY READ isolation level (These are transactions isolation levels let me know if you wanna learn about the same).
So what we can do is apply a pessimistic lock on the table. (This lock will block all the further read and write operations on the table).
We can do so but this will be not the more optimized way.
So here comes the MVCC in the picture when User B still in update operation till now the User A gonna see the older version i.e.; 'Database noobs'.
Once the User B update operation executes successfully then only we will be able to see the updated value i.e; 'We know MVCC'.
In this way, the reading process or transaction never blocks the writing process or transactions

finally, after so much long journey, we got to know what is MVCC? and what its use cases?

Till now we discuss pros let's discuss cons.

The main drawback of MVCC is storage consumption as it stored all the older versions of updated records so it requires more storage space
You can clean the older versions by using cleaning commands like VACCUM but it will take lot of time and maintenance as on those tables for removing the fragmentation(This is the process of inserting records in pages in order so that each) generated by an old version of data.

Database which supports MVCC:-

PostgreSQL
ORACLE
MySQL (with InnoDB engine)

That's all for this blog!

Stay tuned and keep learning!

3 Comments

Punya Yadav

Sep 11, 2021

I really want to know more about paging in a database or about how pages sizes can be managed

rohit sharma

Sep 12, 2021

Replying to

Thanks for your comment ! I will definitely cover this topic in upcoming blogs!

Ruby Pandey

Sep 06, 2021

Useful Information 👍 Nice.