Saturday, April 30, 2016

PostgreSQL Transaction Internal Code Flow

This post is in continuation of my previous post. Here I would explain main transaction working from code flow perspective. Sub-transaction, MVCC and other related details will be covered in subsequent posts. Request to see my previous post Basic of Transaction in order to get better understanding. 
Internally each transaction is represented as unique number in increasing order with an exception in-case of overflow. Also whole of transaction flow is tracked using various state as explained below:

Command Execution:

Usually each command execution has 3 steps as:
  • StartTransactionCommand
  • Do the operation specific to command
  • CommitTransactionCommmand
As mentioned in my earlier post, if transaction is not started explicitly then it will be started internally for any command execution. So this forms two way of startign the transaction. Below it explain both of the case and corresponding state transition involved. 

Meaning  of below nomenclature
                    X ====action====>Y
Current state is X, in this it does operation "action" and gets transferred to state "Y". In-case there is no action mentioned, means in current state it does not do any operation, it directly moves to state "Y".
There are mainly 3 actions mentioned below, the purpose of each actions are as below:
  • StartTransaction: Assign resources in terms of memory, initializes the guc variable specific, initialize the transaction properties like read-only transaction or read-write, create a new resource owner etc.
  • CommitTransaction: Undo all initialization done by StartTransaction. Execute any pending triggers, handle all on commit action if any. Generate a COMMIT WAL record and insert same in WAL buffer, Update commit TS data. Then depending on synchronous commit or asynchronous commit configured, wait for response from standby node or directly flush the transaction to CLOG respectively.
  • AbortTransaction: Almost similar to CommitTransaction except it writes an ABORT WAL, marks the transaction as ABORTED.

State transition mentioned in subsequent section are states from client queries perspective. In addition to these state, there are few states from server perspective also, these are:
  • TRANS_DEFAULT: Idle, its default state.
  • TRANS_START: This is the state in which transaction initialization happens (StartTransaction)
  • TRANS_INPROGRESS: Means transaction has been started.
  • TRANS_COMMIT: This state shows transaction commit is in progress (CommitTransaction)
  • TRANS_ABORT:This state shows transaction abort is in progress (AbortTransaction)

Case-1: Explicit Start of Transaction:


Execution of START TRANSACTION command:

  • StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
  • Processing this command:  BeginTransactionBlock(TBLOCK_STARTED) =====> TBLOCK_BEGIN
  • CommitTransactionCommmand(TBLOCK_BEGIN) ====> TBLOCK_INPROGRESS


Execution of a normal command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: Till this point no transaction ID has been assigned, as was not sure if any command requiring transaction id going to be executed. So now call the function AssignTransactionId to assign a new transaction id. Current number of transaction is maintained in a shared variable ShmemVariableCache->nextXid; So the current value of this variable is taken as transaction id for current transaction. Then value of ShmemVariableCache->nextXid is incremented (taking care of overflow case). 
        Also each transaction information needs to be made durable (its one of the properties), for which it maintains:
  1. Commit log for each transaction (Called CLOG stored in clog page)
  2. Each transaction commit timestamp (call CommitTs stores in separate page)
        If current XID is going to be stored in a new page (either because its first transaction in the system or existing page is full), then it needs reset whole content of new page with zero. This should be done for all pages used for storing these information.

        Finally it should:
  1. Also each session maintains MyPgXact, which maintains transaction information in memory. This is used by all other session for taking various decision. So assign this transaction to MyPgXact.
  2. The new transaction id is stored in the each tuple being created (more on this in coming post related to MVCC).
  3. Stores the current command id in each tuple being created.
  4. Then continue with normal command operation.

  • CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.


Executing again a command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: Since already transaction assigned nothing to do. Just continue with command execution.
  • CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.


Executing COMMIT/END command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: EndTransactionBlock(TBLOCK_INPROGRESS) ====> TBLOCK_END
  • CommitTransactionCommmand(TBLOCK_END)====CommitTransaction====> TBLOCK_DEFAULT


Executing ROLLBACK command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: UserAbortTransactionBlock(TBLOCK_INPROGRESS) ====> TBLOCK_ABORT_PENDING
  • CommitTransactionCommmand(TBLOCK_ABORT_PENDING)====AbortTransaction====> TBLOCK_DEFAULT

Case-2: Implicit Start of Transaction:

Transaction is started automatically in-case a transaction block start command was not executed. This transaction is valid only for the current command and it gets committed or aborted automatically once command gets executed successfully or gets fail respectively


Command Execution- Success:

  • StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
  • Do the actual operation
  • CommitTransactionCommmand(TBLOCK_STARTED)====CommitTransaction====> TBLOCK_DEFAULT


Command Execution - Fail:

  • StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
  • Do the actual operation
  • AbortCurrentTransaction(TBLOCK_STARTED) ====AbortTransaction====> TBLOCK_DEFAULT

In my subsequent post, I would cover some more details about transaction and their implementation details.

Any comment/query/suggestion welcome.

Basic of Transaction

Have you ever though about how below classic example works:
           You swipe your credit card in a shop and money gets debited from account but overall billing fails. In that case your money gets credited back to your account instantly.

If you can already relate this to one of the key property of Database called Transaction, then you already have some head start for this topic. If not then no need to worry, I will take you from very basic of Transactions.

Background

Transactions are one of the most fundamental key concept for any database including PostgreSQL. It enables multiple step actions to be grouped together in such a way that either all step executions are accepted or all are rejected. If all of step executions are success then as a group whole actions will be considered as success. If any of the step execution fails in between, then it will make sure to nullify the effect of all other previous successful steps. 
Consider an example of bank fund transfer, where in there is table consisting of customer name, their balance etc. There is a customer 'A' who wants to transfer 1000Rs to customer 'B'. In terms of SQL it will look like:

            UPDATE accounts SET balance = balance - 1000 where cust_name='A';
            UPDATE accounts SET balance = balance + 1000 where cust_name='B';

There are two steps involved in overall fund transfer activity and as per the above definition of transaction, either both of them should succeed i.e. once 1000Rs debited from A's account it must be credited to B's account or there should not be even debit from A's account. Also there should not be case where debit from A's account failed but B's account got credited with 1000Rs, in which case bank will not be happy.
Just explained property of relation is called as "Atomicity", there are other 3 additional properties of each transaction explained below:

Properties

There are four mandatory properties of each transaction:

Atomicity: 

This property make sure that all changes to data are performed as if they are a single operation. That is, all the changes are performed, or none of them. This is already explained above with example.

Consistency: 

This property ensures that database is in consistent state when transaction state and when it ends i.e. it just bring the database from one valid state to another. In above example balance data will remain in consistent state when transaction ends.

Isolation: 

This property insures that the intermediate state of a transaction is invisible to other concurrent running transactions. As a result, transactions that run concurrently appear to be serialized. In above example, if there is any concurrent transaction executing the below SQL command:

               SELECT sum(balance) FROM account;

Then it should always show the same amount i.e. it should not be able to see intermediate stage where 1000Rs has been debited from A's account but not credited to B's account.

Durability: 

This property related to action after transaction has completed and it ensures that once a transaction completes changes to data persists and are not undone even in event of system failure. In above example, once 1000Rs debited from A's account, even on system restart changed balance will be shown for A's account.

Major Commands

Below are the major commands related to transaction:

1. START TRANSACTION / BEGIN: This command start a new transaction. 
2. COMMIT: This command ends already running transaction.
3. ROLLBACK: This command is used to undo all changes done by the statements running in the current transaction. Some depending on your business logic you might need to execute this command.

In order to execute multiple statements in a single transaction, then before writing any statement, first command-1 needs to be executed and then after all statements are executed, command -2 or command -3 should be executed.
e.g..
START TRANSACTION
SQL-STMT-1
SQL-STMT-2
..
..
SQL-STMT-3
COMMIT/ROLLBACK.

Note: If an explicit START TRANSACTION/BEGIN command is not given before a SQL-statement then explicitly BEGIN command will be executed for that SQL-statement and once this statement finishes, an explicit COMMIT command will be executed.

I hope by now you understand how swiping credit card example works. Here debiting of money from your card, generation of bill all happens in the same transaction. So even if first step of debiting money success but final billing failed then whole transaction will be rollbacked and hence debited money will be credited back.

Do visit my next post to see how transaction works in PostgreSQL from code flow perspective.

Any comment/query/suggestion welcome.