Sunday, October 9, 2016

Major enhancement in PostgreSQL 9.6

A much awaited release of PostgrsSQL 9.6 was officially announced on 29th September 2016. This release is not only having new major features addition but also a major technology makeshift in terms of parallel processing capabilities. There has been always complain of relatively lower performance with respect to other similar category open source databases, which can be addressed by many of the performance improvement packed in this release.

Below are highlights of major addition/changes in this release:

Parallel Query

In recent few years hardware has changed a lot in terms of better execution capability and it is very much needed to twin our database to utilize maximum new capability. One of such improvement  is now each single machine comes with hundreds of cores and each of them can work independently. Earlier even though server deployed was having many cores, query processing used to happen sequentially by only one core and remaining core remained idle or busy doing some other work.
With the parallel query processing feature added, now each query can be divided in multiple parts and each part can be executed independently by each cores. Though in current release this parallelism is strictly for read-only queries but it is good start as base for supporting large parallel queries. Following read-only queries are supported for parallelism in this release:
1. Sequential scan on base table.
2. Hash Join
3. Nested Loop Join
4. Aggregation. 

Currently parallel query is not enabled by default. To enable set the new configuration parameter max_parallel_workers_per_gather to value greater than zero. Changing value of this parameter more than maximum core available may not be useful.

Locking

This works also makes PostgreSQL aligned with trend of increasing number of cores.
Earlier each transaction used to have a major bottleneck on a specific lock (i.e. ProcArrayLock), which does not allowed to efficiently utilize all core and hence degraded the performance of concurrent query execution. 
A lot of work has been done in this area to reduce the lock contention and hence efficiently use all possible cores in the single physical machine or virtual machine.

Though this improvement does not change the way user used to use earlier version but it is important to understand this so that they can deploy more powerful server (with maximum number of cores) in order to achieve better performance.

Multiple Synchronous Standby

This is also one of the major enhancement specially for the user interested in better availability compared to performance. 
Synchronous standby feature makes sure that at-least one of the standby nodes receives master data before master can commit and return to client. This has been available in earlier PostgreSQL versions also.
From this release, this feature has been extended to guarantees receipt of data at-least on the configured number of synchronous standby before master node can commit.

This functionality gets enabled by changing the configuration parameter synchronous_standby_names. Its syntax is either of them

        num_sync ( standby_name [, ...] )
        standby_name [, ...]

First syntax is newly introduced and second is existing one to keep backward compatibility. 
E.g. Suppose there are 5 standby servers s1, s2, s3, s4 and s5, 2 (s1, s5, s2, s3, s4) means first two of standby s1 and s5 will be active synchronous standby and at-least they need confirm receipt of data before master commit. If any or both of the active synchronous standby goes down, then correspondingly new standbys will be chosen as active in the order given.

Be careful in choosing synchronous standby server. Configure as many as really required as it has impact on performance.

Foreign Data Wrapper (postgres_fdw)

This also one the much awaited feature of PostgreSQL 9.6. Now following operations are allowed directly on foreign server:
  1. Sorting: Now sorting (ORDER BY) can be pushed to remote server so that order data can be used for merge join.
  2. Join: Earlier join of two remote tables were done by fetching all data of two tables and then performing join on the local node. This causes many unnecessary tuple to flow. Now with PostgreSQL 9.6, a relatively straightforward join between two tables can be pushed to remote server itself and get the result of join. Currently SEMI and ANTI joins are not pushed.
  3. UPDATE: Earlier remote UPDATE involved sending a SELECT FOR UPDATE command and then updating or deleting the selected rows one by one. While that is still necessary if the operation requires any local processing, it can now be done remotely if all elements of the query are safe to send to the remote server.

These are my favorite pick from the current release. To find complete list of feature added/modified , please refer the PostgreSQL 9.6 release notes.
Overall PostgreSQL 9.6 looks to be very promising and comes with great news for existing customer as well as awaited customers.

Looking forward for your comment/feedback/response.

No comments:

Post a Comment