The chemistry of MySQL transactions: Part 2

Transaction Isolation Levels

Kumar Swapnil

--

In the first part we have discussed about the Atomicity and Consistency aspects of a transaction. In this post we will see the Isolation aspect of a transaction.

Isolation in the sense of ACID means that concurrently executing transactions are isolated from each other; they cannot step on each other’s toes. The classic textbook formalize isolation as serializability, which means that each transaction can pretend that it is the only transaction running on the entire database. The database ensures that when the transactions have committed, the result is the same as if they had run serially (one after another), even though in reality they may have run concurrently.

Now, before a deep dive to different isolation levels that innoDB offers, let’s take a detour and understand the problems which are inherent to concurrent transactions.

  • Lost Updates: The Lost update problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle). If two transactions do this concurrently, one of the modifications can be lost, because the second write doesn’t include the first modification. This pattern occurs in various scenarios like incrementing a counter, updating an account balance or updating a JSON document (requires parsing the document, making the change and writing back the modified document).
  • Dirty Read: Imagine a transaction has written some data to the database, but the transaction has not yet committed or aborted. Can another transaction see the uncommitted data? If yes, that is called a dirty read. There are a few reasons why it’s useful to prevent dirty reads:
Dirty Read
  1. If a transaction needs to update several objects, a dirty read means that another transaction may see some of the updates but not others. For example, in the above figure, the user sees the new unread mail but not the updated counter. This is a dirty read of the email. Seeing the database in a partially updated state is confusing for users and may cause other transactions to take incorrect decisions.
  2. If a transaction aborts, any writes it has made need to be rolled back. If the database allows dirty reads, that means a transaction may see data that was never really committed to the database.
  • Non Repeatable Read: A nonrepeatable read occurs when a transaction reads the same row twice but gets different data each time. For example, suppose transaction 1 reads a row. Transaction 2 updates or deletes that row and commits the update or delete. If transaction 1 rereads the row, it retrieves different row values or discovers that the row has been deleted.
Read skew: Alice observes the database in an inconsistent state.

The above diagram illustrates the Non repeatable read problem. Say Alice has $1000 of savings at a bank, split across two accounts with $500 each. Now a transaction transfers $100 from one of her accounts to the other. If she is unlucky enough to look at her list of account balances in the same moment as that transaction is being processed, she may see one account balance at a time before the incoming payment has arrived (with a balance of $500), and the other account after the outgoing transfer has been made (the new balance being $400). To Alice, it now appears as though she only has a total of $900 in her accounts — it seems that $100 has vanished in thin air.

  • Phantoms: A phantom is a row that matches the search criteria but is not initially seen. For example, suppose transaction 1 reads a set of rows that satisfy some search criteria. Transaction 2 generates a new row (through either an update or an insert) that matches the search criteria for transaction 1. If transaction 1 reexecutes the statement that reads the rows, it gets a different set of rows.

InnoDB offers all four transaction isolation levels described by the SQL:1992 standard: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The default isolation level for InnoDB is REPEATABLE READ.

Read Uncommitted: Transactions are not isolated from each other. If the DBMS supports other transaction isolation levels, it ignores whatever mechanism it uses to implement those levels. So that they do not adversely affect other transactions, transactions running at the Read Uncommitted level are usually read-only.

Read Committed: The most basic level of transaction isolation is read committed. It makes two guarantees:

  1. When reading data from the database, you will only see data that has been committed (no dirty reads).
  2. When writing to the database, you will only overwrite data that has been committed (no dirty writes).

Repeatable Read: This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.

For locking reads (SELECT with FOR UPDATE or FOR SHARE), UPDATE, and DELETE statements, locking depends on whether the statement uses a unique index with a unique search condition, or a range-type search condition.

  • For a unique index with a unique search condition, InnoDB locks only the index record found, not the gap before it.
  • For other search conditions, InnoDB locks the index range scanned, using gap locks or next-key locks to block insertions by other sessions into the gaps covered by the range.

Serializable: This level is like REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to SELECT ... FOR SHARE if autocommit is disabled. If autocommit is enabled, the SELECT is its own transaction. It therefore is known to be read only and can be serialized if performed as a consistent (nonlocking) read and need not block for other transactions. (To force a plain SELECT to block if other transactions have modified the selected rows, disable autocommit.)

The different isolation levels and the phenomenon which can occur for a given level is depicted in the below diagram:

In the table, an “X” marks each phenomenon that can occur.

Serializable is the epitome of isolation, yet it is rarely used in practice, because it carries a performance penalty (since it locks the whole table). Some popular databases, such as Oracle 11g, don’t even implement it. In Oracle, there is an isolation level called “serializable”, but it actually implements something called snapshot isolation, which is a weaker guarantee than serializability.

I would be covering Durability in detail in final post. Till then, adios!

References:

--

--