Wednesday, March 1, 2017

Excited about upcoming PGConf India 2017

Yet another exciting conference PGConf 2017 is just one day away (its on 3rd March). I have been associated with this conference for last 3 years and I must say it is becoming bigger and bigger every year. This year also promises to be really great with many new exciting things:

  1. First time it is going to have multiple parallel track.
  2. Two of the top major contributor and committer are going to present.
  3. Training day, one day prior to conference on 2nd March.
Apart from above there are many exciting papers ranging from developer oriented to user specific. So be there to witness all actions happening there...:-)

Looking forward to learn new exciting things about PostgreSQL. Also excited to meet and greet new guys and re-connect with known guys.

Sunday, October 9, 2016

Major enhancement in PostgreSQL 9.6

A much awaited release of PostgrsSQL 9.6 was officially announced on 29th September 2016. This release is not only having new major features addition but also a major technology makeshift in terms of parallel processing capabilities. There has been always complain of relatively lower performance with respect to other similar category open source databases, which can be addressed by many of the performance improvement packed in this release.

Below are highlights of major addition/changes in this release:

Parallel Query

In recent few years hardware has changed a lot in terms of better execution capability and it is very much needed to twin our database to utilize maximum new capability. One of such improvement  is now each single machine comes with hundreds of cores and each of them can work independently. Earlier even though server deployed was having many cores, query processing used to happen sequentially by only one core and remaining core remained idle or busy doing some other work.
With the parallel query processing feature added, now each query can be divided in multiple parts and each part can be executed independently by each cores. Though in current release this parallelism is strictly for read-only queries but it is good start as base for supporting large parallel queries. Following read-only queries are supported for parallelism in this release:
1. Sequential scan on base table.
2. Hash Join
3. Nested Loop Join
4. Aggregation. 

Currently parallel query is not enabled by default. To enable set the new configuration parameter max_parallel_workers_per_gather to value greater than zero. Changing value of this parameter more than maximum core available may not be useful.


This works also makes PostgreSQL aligned with trend of increasing number of cores.
Earlier each transaction used to have a major bottleneck on a specific lock (i.e. ProcArrayLock), which does not allowed to efficiently utilize all core and hence degraded the performance of concurrent query execution. 
A lot of work has been done in this area to reduce the lock contention and hence efficiently use all possible cores in the single physical machine or virtual machine.

Though this improvement does not change the way user used to use earlier version but it is important to understand this so that they can deploy more powerful server (with maximum number of cores) in order to achieve better performance.

Multiple Synchronous Standby

This is also one of the major enhancement specially for the user interested in better availability compared to performance. 
Synchronous standby feature makes sure that at-least one of the standby nodes receives master data before master can commit and return to client. This has been available in earlier PostgreSQL versions also.
From this release, this feature has been extended to guarantees receipt of data at-least on the configured number of synchronous standby before master node can commit.

This functionality gets enabled by changing the configuration parameter synchronous_standby_names. Its syntax is either of them

        num_sync ( standby_name [, ...] )
        standby_name [, ...]

First syntax is newly introduced and second is existing one to keep backward compatibility. 
E.g. Suppose there are 5 standby servers s1, s2, s3, s4 and s5, 2 (s1, s5, s2, s3, s4) means first two of standby s1 and s5 will be active synchronous standby and at-least they need confirm receipt of data before master commit. If any or both of the active synchronous standby goes down, then correspondingly new standbys will be chosen as active in the order given.

Be careful in choosing synchronous standby server. Configure as many as really required as it has impact on performance.

Foreign Data Wrapper (postgres_fdw)

This also one the much awaited feature of PostgreSQL 9.6. Now following operations are allowed directly on foreign server:
  1. Sorting: Now sorting (ORDER BY) can be pushed to remote server so that order data can be used for merge join.
  2. Join: Earlier join of two remove tables were done by fetching all data of two tables and then performing join on the local node. This causes many unecessary tuple to flow. Now with PostgreSQL 9.6, a relatively striaghtforward join between two tables can be pushed to remove server itself and get the result of join. Currently SEMI and ANTI joins are not pushed.
  3. UPDATE: Earlier remote UPDATE involved sending a SELECT FOR UPDATE command and then updating or deleting the selected rows one by one. While that is still necessary if the operation requires any local processing, it can now be done remotely if all elements of the query are safe to send to the remote server.

These are my favorite pick from the current release. To find complete list of feature added/modified , please refer the PostgreSQL 9.6 release notes.
Overall PostgreSQL 9.6 looks to be very promising and comes with great news for existing customer as well as awaited customers.

Looking forward for your comment/feedback/response.

Saturday, June 18, 2016

Advanced Transaction: Savepoint And Prepared Transaction

Hope you found my two previous posts on transaction interesting and useful. Based on some of readers request, I would cover few advanced transaction topic in my subsequent posts. In this post, I am going to cover following advanced variant of traditional transaction:
     1. Savepoint
     2. Prepared Transaction


In order to understand its usage, imagine one of your customer work-load as below:
            Perform operation-1
            Perform operation-2
            Perform operation-3
Perform operation-4

As per the business logic it is likely that operation-3 and operation-4 are going to fail and also failure of these operations does not impact operation-1 and operation-2. Now as per transaction properties (recall the Basic of Transaction), whole of operation will be roll backed in-case of any of the four operation fails. So in the given business scenario though operation-1 and operation-2 should not be rollbacked but still it will be. 

In order to avoid this behavior  and control the main transaction life span, savepoint is being introduced.

Savepoint can be created only inside a transaction block. It creates a saving point for overall transaction i.e in-case if any commands results in error inside a transaction block, then instead of rollback whole transaction, it allows to rollback till the point where savepoint was created. E.g. Consider following example:

        INSERT INTO TBL VALUES(100, 100);
        INSERT INTO TBL VALUES(200, 200);
        SAVEPOINT first_save;
        INSERT INTO TBL VALUES(200, 300); <=== This will fail because of unique index.

Related Comamnds

1. SAVEPOINT savepoint_name: This command create a new savepoint inside a transaction block as shown in above example. There can be nested savepoint also i.e. savepoint can be created with-in already created savepoint.

2. ROLLBACK TO [SAVEPOINT ] savepoint_name: This command rollback all operation done from the point savepoint was created as shown in above example. Rollback to savepoint does not destroy the given savepoint but it destroys any savepoint nested with-in the given savepoint. E.g.

        INSERT INTO TBL VALUES(100, 100);
        INSERT INTO TBL VALUES(200, 200);
        SAVEPOINT first_save;
        INSERT INTO TBL VALUES(300, 300);
        SAVEPOINT second_save;
        INSERT INTO TBL VALUES(400, 400);
        ROLLBACK TO SAVEPOINT first_save;   <=== Will destroy second_save but first_save will remain active.

3. RELEASE [SAVEPOINT] savepoint_name: As the name suggest, it just removes/destroys the savepoint created. It does not have any impact on any other command executed after this savepoint. Similar to ROLLBACK TO SAVEPOINT, this command also destroy all savepoint nested with-in the savepoint getting released.

        INSERT INTO TBL VALUES(100, 100);
        INSERT INTO TBL VALUES(200, 200);
        SAVEPOINT first_save;
        INSERT INTO TBL VALUES(300, 300);
        SAVEPOINT second_save;
        INSERT INTO TBL VALUES(400, 400);
        RELEASE SAVEPOINT first_save; <=== Destroy both savepoints but no impact on records inserted.
        INSERT INTO TBL VALUES(400, 500); <=== As no savepoint active, so whole transaction gets rollbacked.

NOTE: There is no command like COMMIT TO SAVEPOINT as there is no meaning to commit only part of transaction (As it will be against transaction basic properties).

Prepare Transaction

Before jumping to define this variation, it is important to understand two-phase commit. As the name suggest it does commit of any transaction in two steps.

1.     First step is to verify all condition which needs to be satisfied in order to commit this transaction. This is called prepare stage.
2.    Second step is the actual commit, which is almost same as traditional commit.

This is more useful for a cluster based database or any similar variations, where in various nodes (databases) are responsible to run a part of bigger operation. So it is important that each node does either COMMIT or ROLLBACK at the same time in order to maintain overall transaction property. Many cluster solution achieve this using a dedicated transaction manager or just coordinator, whose tasks are:
  1. Step-1: will probe all nodes to check if commit can be done and in turns all nodes will respond with either positive or negative acknowledgment.
  2. Step-2: Once coordinator or transaction manager gets response from all nodes, it will issue COMMIT or ROLLBACK to all nodes.

I will cover more detailed two-phase commit in my future blogs.

So now coming back to understand Prepare Transaction, it actually prepares a normal current running transaction for two-phase commit. Once a transaction is prepared, it dissociates from current session and its gets stored on disk. Since its state gets stored on disk, it is almost certain that this transaction can be committed successfully even in-case of any database crash happened before explicit issue of commit.
Prepare transaction is given a unique identifier, which is used later to commit or rollback this transaction. Once prepare transaction is done, transaction gets dissociated from current session. Subsequently this transaction can be committed or rollbacked from any session using the name given during prepare.
Like traditional transaction, unless prepared transaction gets committed or rollbacked, impact of any operation done will not be visible by any session including the session which has performed this transaction.
This is not recommended to be used by application or any interactive sessions. Rather it should be used where there is need of any external transaction manager, which performs some validation or has some other dependency before actual commit.

Related Comamnds

  1. PREPARE TRANSACTION name: Prepares the current transaction and immediately dissociates from the current session. A unique name is given to this prepared transaction, so that later it can be committed/rollbacked using the same name.
  2. COMMIT PREPARED name: Commit the already prepared transaction. It’s like performing second phase of two phase commit protocol.
  3. ROLLBACK PREPARE name: Rollback the already prepared transaction.
        INSERT INTO TBL VALUES(100, 100);
        INSERT INTO TBL VALUES(200, 200);
        PREPARE TRANSACTION 'first_prepare';
        SELLECT * FROM TBL  <=== This will return zero record, as transaction is not yet committed

        COMMIT PREPARED 'first_prepare';
        SELLECT * FROM TBL  <=== This will return both records.

        INSERT INTO TBL VALUES(100, 100);
        INSERT INTO TBL VALUES(200, 200);
        PREPARE TRANSACTION 'first_prepare';
        ROLLBACK PREPARED 'first_prepare';
        SELLECT * FROM TBL  <=== This will return zero records as transaction rollbacked.

It is highly recommended to take PostgreSQL package from and try all of the commands described. 

Subscribe to my blog to stay informed about my new upcoming post.  

Please feel free to drop your query/suggestion/comments.

Saturday, May 14, 2016

Client connected to itself...Isn't it Strange (effect of TCP Simultaneous Connect)

Yes I observed this strange behavior during one of my work, which I would like to share. The symptom of the issue was that client was getting some unexpected junk response, which server never sent.


While working on a system, where client application has logic to keep reconnecting to server till connection becomes successful. At one point of time it was observed that the client connection has been established successfully even though server has not yet come up. Later when Server was brought up, it failed to start saying "Address already in use". Which gave us hint that client must have connected to any other application running on same address as the server.

Then we saw the result of netstat command at this point of time and below were the result:

    netstat -nA inet | fgrep :32246
    tcp        0      0         ESTABLISHED

Which shows client has connected to itself (source and destination IP:Port is same). Also there is no additional application listening on 32246 port. Address was the address on which client was trying to connect.

Cause Analysis

In our case we were trying to reconnect to server repetitively and every time it keeps failing as server was still down. As per TCP, every time client tries to re-connect it is assigned one new source port in increasing order from the range of ephemeral ports(Its range is defined in file /proc/sys/net/ipv4/ip_local_port_range). So at one point of time, it may use the same source port as used for destination (if destination port is also in the range of ephemeral ports).
Below was the range of ephemeral port on the machine where issue being observed:

    cat /proc/sys/net/ipv4/ip_local_port_range
    9000    65500

And the server port used in my set-up was 32246. So server port was also in the range of ephemeral ports. So it was possible that at one point of time source port will be same as that of destination port.


Now you may think that even though client has chosen same port as of server, still how it can connect as no server is listening on that port. This is because of the one of the TCP feature called simultaneous connect documented in RFC793. This allows two clients to connect to each other without anyone entering into listening state. The connection establishment as per this approach is different from the usual 3-way handshake. Here both clients performs an active OPEN as shown in below table:
Table-1: Synchronous Connect
This gets triggered only if two clients are trying to reach each other (In our case though one client is trying to connect to server but since port being same it gets treated as if both are trying to connect to each other and hence simultaneous connect gets triggered.)


Execute below command snippet:
while true
   telnet 32775

Even though there is no telnet server running on port 32775 but at some point of time it will succeed.
telnet: connect to address Connection refused
telnet: connect to address Connection refused
Connected to
Escape character is '^]'.


Once connection is successful, we can get the port number dynamically being assigned to client and compare the same with destination port. If it matches then we can disconnect the current connection and continue to retry.

    struct sockaddr_in cAddr;
    socklen_t cAddrLen = sizeof(cAddr);

    // Get the current client dynamic information allocated.
    int e = getsockname(fd, (sockaddr *)&cAddr, &cAddrLen);
    if (e != 0)


Though this is very rare scenario but very much possible. So any application having infinite reconnect logic and there is no authentication info exchange between client and server in order to finalize the connection, then it should take care of this.
In-case of PostgreSQL, client expect authentication request from server once connection is established. So issue related to self-connect will not happen.
But if there is any third party tool working on top of PostgreSQL and they have their own way of connection mechanism, then they should evaluate to check if this issue is applicable to them.

Please share your comment/feedback or if you have some other idea to address this issue.

Saturday, April 30, 2016

PostgreSQL Transaction Internal Code Flow

This post is in continuation of my previous post. Here I would explain main transaction working from code flow perspective. Sub-transaction, MVCC and other related details will be covered in subsequent posts. Request to see my previous post Basic of Transaction in order to get better understanding. 
Internally each transaction is represented as unique number in increasing order with an exception in-case of overflow. Also whole of transaction flow is tracked using various state as explained below:

Command Execution:

Usually each command execution has 3 steps as:
  • StartTransactionCommand
  • Do the operation specific to command
  • CommitTransactionCommmand
As mentioned in my earlier post, if transaction is not started explicitly then it will be started internally for any command execution. So this forms two way of startign the transaction. Below it explain both of the case and corresponding state transition involved. 

Meaning  of below nomenclature
                    X ====action====>Y
Current state is X, in this it does operation "action" and gets transferred to state "Y". In-case there is no action mentioned, means in current state it does not do any operation, it directly moves to state "Y".
There are mainly 3 actions mentioned below, the purpose of each actions are as below:
  • StartTransaction: Assign resources in terms of memory, initializes the guc variable specific, initialize the transaction properties like read-only transaction or read-write, create a new resource owner etc.
  • CommitTransaction: Undo all initialization done by StartTransaction. Execute any pending triggers, handle all on commit action if any. Generate a COMMIT WAL record and insert same in WAL buffer, Update commit TS data. Then depending on synchronous commit or asynchronous commit configured, wait for response from standby node or directly flush the transaction to CLOG respectively.
  • AbortTransaction: Almost similar to CommitTransaction except it writes an ABORT WAL, marks the transaction as ABORTED.

State transition mentioned in subsequent section are states from client queries perspective. In addition to these state, there are few states from server perspective also, these are:
  • TRANS_DEFAULT: Idle, its default state.
  • TRANS_START: This is the state in which transaction initialization happens (StartTransaction)
  • TRANS_INPROGRESS: Means transaction has been started.
  • TRANS_COMMIT: This state shows transaction commit is in progress (CommitTransaction)
  • TRANS_ABORT:This state shows transaction abort is in progress (AbortTransaction)

Case-1: Explicit Start of Transaction:

Execution of START TRANSACTION command:

  • StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
  • Processing this command:  BeginTransactionBlock(TBLOCK_STARTED) =====> TBLOCK_BEGIN
  • CommitTransactionCommmand(TBLOCK_BEGIN) ====> TBLOCK_INPROGRESS

Execution of a normal command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: Till this point no transaction ID has been assigned, as was not sure if any command requiring transaction id going to be executed. So now call the function AssignTransactionId to assign a new transaction id. Current number of transaction is maintained in a shared variable ShmemVariableCache->nextXid; So the current value of this variable is taken as transaction id for current transaction. Then value of ShmemVariableCache->nextXid is incremented (taking care of overflow case). 
        Also each transaction information needs to be made durable (its one of the properties), for which it maintains:
  1. Commit log for each transaction (Called CLOG stored in clog page)
  2. Each transaction commit timestamp (call CommitTs stores in separate page)
        If current XID is going to be stored in a new page (either because its first transaction in the system or existing page is full), then it needs reset whole content of new page with zero. This should be done for all pages used for storing these information.

        Finally it should:
  1. Also each session maintains MyPgXact, which maintains transaction information in memory. This is used by all other session for taking various decision. So assign this transaction to MyPgXact.
  2. The new transaction id is stored in the each tuple being created (more on this in coming post related to MVCC).
  3. Stores the current command id in each tuple being created.
  4. Then continue with normal command operation.

  • CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.

Executing again a command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: Since already transaction assigned nothing to do. Just continue with command execution.
  • CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.

Executing COMMIT/END command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: EndTransactionBlock(TBLOCK_INPROGRESS) ====> TBLOCK_END
  • CommitTransactionCommmand(TBLOCK_END)====CommitTransaction====> TBLOCK_DEFAULT

Executing ROLLBACK command:

  • StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
  • Processing this command: UserAbortTransactionBlock(TBLOCK_INPROGRESS) ====> TBLOCK_ABORT_PENDING
  • CommitTransactionCommmand(TBLOCK_ABORT_PENDING)====AbortTransaction====> TBLOCK_DEFAULT

Case-2: Implicit Start of Transaction:

Transaction is started automatically in-case a transaction block start command was not executed. This transaction is valid only for the current command and it gets committed or aborted automatically once command gets executed successfully or gets fail respectively

Command Execution- Success:

  • StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
  • Do the actual operation
  • CommitTransactionCommmand(TBLOCK_STARTED)====CommitTransaction====> TBLOCK_DEFAULT

Command Execution - Fail:

  • StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
  • Do the actual operation
  • AbortCurrentTransaction(TBLOCK_STARTED) ====AbortTransaction====> TBLOCK_DEFAULT

In my subsequent post, I would cover some more details about transaction and their implementation details.

Any comment/query/suggestion welcome.

Basic of Transaction

Have you ever though about how below classic example works:
           You swipe your credit card in a shop and money gets debited from account but overall billing fails. In that case your money gets credited back to your account instantly.

If you can already relate this to one of the key property of Database called Transaction, then you already have some head start for this topic. If not then no need to worry, I will take you from very basic of Transactions.


Transactions are one of the most fundamental key concept for any database including PostgreSQL. It enables multiple step actions to be grouped together in such a way that either all step executions are accepted or all are rejected. If all of step executions are success then as a group whole actions will be considered as success. If any of the step execution fails in between, then it will make sure to nullify the effect of all other previous successful steps. 
Consider an example of bank fund transfer, where in there is table consisting of customer name, their balance etc. There is a customer 'A' who wants to transfer 1000Rs to customer 'B'. In terms of SQL it will look like:

            UPDATE accounts SET balance = balance - 1000 where cust_name='A';
            UPDATE accounts SET balance = balance + 1000 where cust_name='B';

There are two steps involved in overall fund transfer activity and as per the above definition of transaction, either both of them should succeed i.e. once 1000Rs debited from A's account it must be credited to B's account or there should not be even debit from A's account. Also there should not be case where debit from A's account failed but B's account got credited with 1000Rs, in which case bank will not be happy.
Just explained property of relation is called as "Atomicity", there are other 3 additional properties of each transaction explained below:


There are four mandatory properties of each transaction:


This property make sure that all changes to data are performed as if they are a single operation. That is, all the changes are performed, or none of them. This is already explained above with example.


This property ensures that database is in consistent state when transaction state and when it ends i.e. it just bring the database from one valid state to another. In above example balance data will remain in consistent state when transaction ends.


This property insures that the intermediate state of a transaction is invisible to other concurrent running transactions. As a result, transactions that run concurrently appear to be serialized. In above example, if there is any concurrent transaction executing the below SQL command:

               SELECT sum(balance) FROM account;

Then it should always show the same amount i.e. it should not be able to see intermediate stage where 1000Rs has been debited from A's account but not credited to B's account.


This property related to action after transaction has completed and it ensures that once a transaction completes changes to data persists and are not undone even in event of system failure. In above example, once 1000Rs debited from A's account, even on system restart changed balance will be shown for A's account.

Major Commands

Below are the major commands related to transaction:

1. START TRANSACTION / BEGIN: This command start a new transaction. 
2. COMMIT: This command ends already running transaction.
3. ROLLBACK: This command is used to undo all changes done by the statements running in the current transaction. Some depending on your business logic you might need to execute this command.

In order to execute multiple statements in a single transaction, then before writing any statement, first command-1 needs to be executed and then after all statements are executed, command -2 or command -3 should be executed.

Note: If an explicit START TRANSACTION/BEGIN command is not given before a SQL-statement then explicitly BEGIN command will be executed for that SQL-statement and once this statement finishes, an explicit COMMIT command will be executed.

I hope by now you understand how swiping credit card example works. Here debiting of money from your card, generation of bill all happens in the same transaction. So even if first step of debiting money success but final billing failed then whole transaction will be rollbacked and hence debited money will be credited back.

Do visit my next post to see how transaction works in PostgreSQL from code flow perspective.

Any comment/query/suggestion welcome.

Tuesday, March 22, 2016

Native Compilation Part-2 @ PGDay Asia 2016 Conference

Finally first PGDay Asia held in Singapore has successfully finished. I am also one of the organizer of this conference. It took months of hard work by many specially Sameer Kumar to achieve this.

I had also opportunity to present my paper on Native Compilation part-2, it was subsequent to my paper presented in last PGCon 2015 (Procedure compilation is additional).

Summary of this talk (For detail talk visit Native Compilation at PGDay Asia 2016):

Why Native Compilation Required:

Seeing the current hardware trend, I/O is no more bottleneck for the executor mainly because of below reasons:
1. Increase in RAM Size
2. Prevalence of High speed SSD
At the same time, though lot of work has happened in hardware but there is no focus on improving the CPU efficiency. Because of this, current executor biggest bottleneck is CPU usage efficiency not I/O. So in order to tackle CPU bottleneck without compromising on any feature, we need mechanism to execute lesser number of instruction and still getting all functionality.

What is Native Compilation:

Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:
 1. Generate C-code specific to objects/query.
 2. Compile C-code to generate DLL and load with server executable.
 3. Call specialized function instead of generalized function.

Native compilation can be applied to any entity which are inefficient in terms of CPU e.g.:
 1. Table
 2. Procedure
 3. Query

Table Native Compilation (Aka Schema Binding):

Since most of the properties of a particular table remains same once it is created, so its data gets stored and accessed in the similar pattern irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the particular relation during its creation and the same gets used for further query containing that table. This approach eliminates the need for various comparison (e.g. data-type check, length check, number of attribute check) during the access of tuple. More the deforming of tuples contribute to total run of query, more performance will be observed. Generation of code can be done in 3-ways, details of each of them is available in presentation.

Fig-1: Table compilation

Once a create table command is issued, a C-file with all specialized access function is generated, which is in turns gets loaded as DLL. These loaded functions are used by all SQL query accessing the compiled table.

Performance Improvement:

Below is the performance improvement observed on standard TPC-H benchmark.
The system configuration is as below:
SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core
TPC-H Configuration: Default

Graph-1: TPC-H Performance

Table-1: TPC-H Performance 

With the above performance result, we observed that there is more than 70% reduction in CPU instruction and upto 36% performance improvement on TPC-H queries.

Procedure Compilation:

There are two parts in any of the procedure:
 1. SQL Query statements
 2. Non-SQL statement (e.g. loop, conditional, simple expression etc).

In this presentation, we focused only on 2nd part. As part of this whenever we create the procedure, we compile the Pl/Pgsql function into corresponding normal C-function. While compiling:
 1. Transform the Pl/Pgsql procedure signature to normal C-function signature.
 2. Transform all variable declaration as C-variable declaration.
 3. Transform all other non-SQL statement in corresponding normal C-statement.
 4. All SQL statements are transformed in the format SPI_xxx as below:
if (plan == NULL)
stmt = SPI_prepare(modified_query, number_of_variables, type_of_variable used);
plan = stmt;
SPI_execute_plan(plan, value,  isnull,  false, row_count);
 5. Finally compile the generated C-function using a traditional compiler to generate DLL and link  the same with SERVER executable.
 6. So on subsequent execution, directly C-function will be called instead of Pl/Pgsql function.

Fig-2: Procedure Compilation

Fig-3: Compiled Procedure invocation

Fig-2 highlights at what step and how procedure will be compiled.  Once the parsing of procedure is done, we will have all information about the procedure in PLpgSQL_function pointer, which we can traverse as planner does and for each statement corresponding C-code can be generated.
Fig-3 explain, how compiled function will be invoked.

How number of instruction reduced:

Consider a statment as x=2+y in Pl/Pgsql procedure. This statement will be executed as a traditional SELECT command as "SELECT 2+y;". So everytime this procedure gets executed, this query will be executed as if it was normal SQL query and hence many CPU instruction are executed.
But if we convert this to C-based statment, it execution will be directly as 2+y and hence will execute very few instruction comapred to original one.

Performance Improvement:

Below is the performance improvement observed on standard TPC-C benchmark:
The system configuration is as below:
SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core
TPC-C Configuration: runMins=0, runTxnsPerTerminal=200000
Checkpoint_segment = 100

Graph-2: TPC-C Performance

Table-2: TPC-C Performance

With some basic compilation of procedure, we are able to get around 23% performance improvement.

Query Compilation:

Though this was not part of my presentation but would like to give some details to do this.
Unlike above two compilation, in order to compile query we need to use LLVM as the compilation time is very important while executing the query.
LLVM is open source compilation framework for very efficient and fast compilation. The LLVM compiler can only understand the IR (Internal representation something similar to assembly code). So before compilation using LLVM, we need to generate IR form of the code, which can be either of the two ways:
1. Generate C-code and generate IR for this using Clang (a tool built for C-code)
2. Or just use the IR-Builder interface function provided by LLVM to generate code directly in IR format.
The steps to compile query is as below:
1. Generate the Query plan as it is.
2. Choose the plan node to be compiled (if whole query to be compiled, then all nodes), generate a specific code corresponding to the node. This can be done in either of the ways explained above.
3. Compile the generated code using LLVM.
4. The code compiled using LLVM will return an object handler, which can be attached to the corresponding node.
5. So while executing that node, we can call compiled stored object if the compilation object stored for the corresponding node.

The most of the benefit is expected out of this query compilation. Once query compilation is done, even the SQL query statement of procedure also can be compiled in the same way (perhaps nothing explicit will be required for this...).


Seeing the industry trend, we have implemented two way of specialization, which resulted in up to 36% and 23% of performance improvement on standard benchmark TPC-H and TPC-C respectively.

Any feedback/comments/queries are welcome.