All about Database

Tuesday, February 13, 2018

Yet another exciting conference PGConf 2018 is just few days away. I have been associated with this conference for last 4 years and I must say it is becoming bigger and bigger every year. This year also promises to be really great with many new exciting things:

First time it is going to have 2 days of conference on 22nd and 23rd Feb.
Training day, one day prior to conference on 21st Feb.
It has got its own brand new logo.
Top notch guys from many big companies.

Apart from above there are many exciting papers ranging from developer oriented to user specific. So be there to witness all actions happening there...:-)

Looking forward to learn new exciting things about PostgreSQL. Also excited to meet and greet new guys and re-connect with known guys.

Friday, February 9, 2018

Many of the PostgreSQL developers as well as users wonder when "bitmap scan" (here onward BMS) gets selected as winning plan. This confusion is result of general understanding that if very few percentage of records being selected then index scan will be winning plan otherwise sequence scan will be chosen.

So in order to understand when BMS be will be chosen, lets first understand the root of the confusion i.e. why index scan gets selected in case of only very few percentage of records selection otherwise not.

Sequence Scan: Execution of this plan causes all pages (also called heap) to be scanned sequentially and on top of that applies qualification if given. So even after qualification apply if we get only some percentage of records, still it needs to scan all pages. But one good thing here is that all pages are read sequentially and sequence I/O is much faster. (Explanation of Sequence I/O is faster than random I/O is in itself a big topic, which I will cover in some future blog).

Index Scan: Execution of this plan finds the exact B-Tree(containing index data) page and gets the data. As using the B-Tree finding the exact location of a key is very fast so getting index data is very efficient. But there is one problem here: As B-Tree contains only index data (i.e. key), then for remaining part of data, it needs to get from heap page. Though index data keep a pointer to indicate exact location of data in heap pages (in terms of page number and offset within that), it needs to load the new page. So effectively for each index key, order will be: Fetch from B-Tree page => Fetch from heap page => Fetch from B-Tree page => Fetch from heap page =>.............so on. So this causes random page access and hence random I/O, which is costly compared to sequential I/O.

So random I/O cost is major differentiation i.e. in order to select index scan, random I/O cost should be able to outperform sequential scanning of all pages.

Now imagine a plan, which gets benefit of Index Scan but still avoid random I/O to a great extent. So obviously as per the above discussion that plan will be best and this plan is called BMS.
Now you might be thinking if that is the case then why not always select BMS. As you know nothing comes for free. In-case of BMS, cost is paid in terms of two scan. Let's briefly understand how BMS works.

BIHS: Please see the below plan:

QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on blogtable (cost=64264.25..148070.07 rows=2313266 width=15)
Recheck Cond: (num < '230'::numeric)
-> Bitmap Index Scan on idx (cost=0.00..63685.93 rows=2313266 width=0)
Index Cond: (num < '230'::numeric)
(4 rows)

As you can see as part of plan, first node is Bitmap Heap Scan and its child node is Bitmap Index scan. Following steps are taken to execute this:

Bitmap Heap scan ask for a tuple from Bitmap Index Scan.
Bitmap Index Scan scan the index as per the condition (x<230) almost in the same way as done in normal Index Scan. But instead of returning TID (consisting of page no and offset within that) corresponding to heap data, it adds those TID in a bitmap. For simple understanding, you can consider this bitmap contains hash of all pages (hashed based on page no) and each page entry contains array of all offset within that page (Complete implementation details would make this blog post far longer than it already is, so I will leave it for another time).
Then Bitmap Heap Scan reads through the bitmap to get heap data corresponding to stored page number and offset. Then it check for visibility, qualification etc and returns the tuple based on outcome of all these checks.

It is clear that here neither all pages are read sequentially nor heap is accessed corresponding to each index tuple but it has to pay price for double scan.

So none of the plan can be clear winner all the time. It all depends on how many rows are getting selected and PostgreSQL planner does decent job to find the cost to choose best plan based on that.

In general, BMS is selected when number of records selected are neither too less for index scan nor too large for sequence scan. Let's analyze it using below examples:

NOTE: TID corresponding to each index data may be in random order e.g. for first index data I1, heap data will point to {blkno-10, offset = 20}, for I2 points to {blkno-1, offset = 30}, for I3 points to {blkno-20, offset=40} and so on. So you might wonder that the actual heap scan will still use random I/O. But that is not the case. Once bit mask is formed, all page lists gets sorted and hence sequential scan will use sequential I/O only.

Examples:

CREATE TABLE blogtable (num numeric, id int);

CREATE INDEX idx ON blogtable(num);

INSERT INTO blogtable SELECT random() * 1000, 2 FROM generate_series(1, 10000000);

ANALYZE;
VACUUM;

Plan - 1:
postgres=# explain SELECT * FROM blogtable WHERE num = 23000;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using idx on blogtable (cost=0.44..8.45 rows=1 width=15)
Index Cond: (num = '23000'::numeric)

(2 rows)

Plan - 2:
postgres=# explain SELECT * FROM blogtable WHERE num < 23000;
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on blogtable (cost=0.00..179890.00 rows=10000000 width=15)
Filter: (num < '23000'::numeric)

(2 rows)

Plan - 3:

postgres=# explain SELECT * FROM blogtable WHERE num < 230;
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on blogtable (cost=64264.25..148070.07 rows=2313266 width=15)
Recheck Cond: (num < '230'::numeric)
-> Bitmap Index Scan on idx (cost=0.00..63685.93 rows=2313266 width=0)
Index Cond: (num < '230'::numeric)
(4 rows)

Plan - 4:
postgres=# explain SELECT num FROM blogtable WHERE num < 230;
QUERY PLAN
-------------------------------------------------------------------------------------
Index Only Scan using idx on blogtable (cost=0.44..86818.59 rows=2313266 width
=11)
Index Cond: (num < '230'::numeric)

(2 rows)

Plan -1 to Plan -3 varies depending on the number of records being selected. Plan - 1 takes index scan as only one row needs to be selected, whereas plan -3 takes BMS as almost 2.3M out of 10M records are selected.

Can you please pause here and try to think why plan-4 takes Index Only Scan even though number of records selected are same as in plan -3.

I hope you found answer yourself. If not let me explain. As discussed above main villain which makes Index Scan costly is random I/O and random I/O happens in order to fetch heap data corresponding to each index key. But in-case of plan-4, if you observe we are selecting only index column and hence there is no need to fetch heap data and hence no random I/O.

I hope by now you have got some basic understanding of Bitmap Scan.
Any queries/comments/suggestions are welcome.

Wednesday, March 1, 2017

Excited about upcoming PGConf India 2017

Yet another exciting conference PGConf 2017 is just one day away (its on 3rd March). I have been associated with this conference for last 3 years and I must say it is becoming bigger and bigger every year. This year also promises to be really great with many new exciting things:

First time it is going to have multiple parallel track.
Two of the top major contributor and committer are going to present.
Training day, one day prior to conference on 2nd March.

Apart from above there are many exciting papers ranging from developer oriented to user specific. So be there to witness all actions happening there...:-)

Looking forward to learn new exciting things about PostgreSQL. Also excited to meet and greet new guys and re-connect with known guys.

Sunday, October 9, 2016

Major enhancement in PostgreSQL 9.6

A much awaited release of PostgrsSQL 9.6 was officially announced on 29th September 2016. This release is not only having new major features addition but also a major technology makeshift in terms of parallel processing capabilities. There has been always complain of relatively lower performance with respect to other similar category open source databases, which can be addressed by many of the performance improvement packed in this release.

Below are highlights of major addition/changes in this release:

Parallel Query

In recent few years hardware has changed a lot in terms of better execution capability and it is very much needed to twin our database to utilize maximum new capability. One of such improvement is now each single machine comes with hundreds of cores and each of them can work independently. Earlier even though server deployed was having many cores, query processing used to happen sequentially by only one core and remaining core remained idle or busy doing some other work.

With the parallel query processing feature added, now each query can be divided in multiple parts and each part can be executed independently by each cores. Though in current release this parallelism is strictly for read-only queries but it is good start as base for supporting large parallel queries. Following read-only queries are supported for parallelism in this release:

1. Sequential scan on base table.

2. Hash Join

3. Nested Loop Join

4. Aggregation.

Currently parallel query is not enabled by default. To enable set the new configuration parameter max_parallel_workers_per_gather to value greater than zero. Changing value of this parameter more than maximum core available may not be useful.

Locking

This works also makes PostgreSQL aligned with trend of increasing number of cores.

Earlier each transaction used to have a major bottleneck on a specific lock (i.e. ProcArrayLock), which does not allowed to efficiently utilize all core and hence degraded the performance of concurrent query execution.

A lot of work has been done in this area to reduce the lock contention and hence efficiently use all possible cores in the single physical machine or virtual machine.

Though this improvement does not change the way user used to use earlier version but it is important to understand this so that they can deploy more powerful server (with maximum number of cores) in order to achieve better performance.

Multiple Synchronous Standby

This is also one of the major enhancement specially for the user interested in better availability compared to performance.

Synchronous standby feature makes sure that at-least one of the standby nodes receives master data before master can commit and return to client. This has been available in earlier PostgreSQL versions also.

From this release, this feature has been extended to guarantees receipt of data at-least on the configured number of synchronous standby before master node can commit.

This functionality gets enabled by changing the configuration parameter synchronous_standby_names. Its syntax is either of them

num_sync ( standby_name [, ...] )

standby_name [, ...]

First syntax is newly introduced and second is existing one to keep backward compatibility.

E.g. Suppose there are 5 standby servers s1, s2, s3, s4 and s5, 2 (s1, s5, s2, s3, s4) means first two of standby s1 and s5 will be active synchronous standby and at-least they need confirm receipt of data before master commit. If any or both of the active synchronous standby goes down, then correspondingly new standbys will be chosen as active in the order given.

Be careful in choosing synchronous standby server. Configure as many as really required as it has impact on performance.

Foreign Data Wrapper (postgres_fdw)

This also one the much awaited feature of PostgreSQL 9.6. Now following operations are allowed directly on foreign server:

Sorting: Now sorting (ORDER BY) can be pushed to remote server so that order data can be used for merge join.
Join: Earlier join of two remote tables were done by fetching all data of two tables and then performing join on the local node. This causes many unnecessary tuple to flow. Now with PostgreSQL 9.6, a relatively straightforward join between two tables can be pushed to remote server itself and get the result of join. Currently SEMI and ANTI joins are not pushed.
UPDATE: Earlier remote UPDATE involved sending a SELECT FOR UPDATE command and then updating or deleting the selected rows one by one. While that is still necessary if the operation requires any local processing, it can now be done remotely if all elements of the query are safe to send to the remote server.

These are my favorite pick from the current release. To find complete list of feature added/modified , please refer the PostgreSQL 9.6 release notes.
Overall PostgreSQL 9.6 looks to be very promising and comes with great news for existing customer as well as awaited customers.

Looking forward for your comment/feedback/response.

Saturday, June 18, 2016

Advanced Transaction: Savepoint And Prepared Transaction

Hope you found my two previous posts on transaction interesting and useful. Based on some of readers request, I would cover few advanced transaction topic in my subsequent posts. In this post, I am going to cover following advanced variant of traditional transaction:

1. Savepoint

2. Prepared Transaction

Savepoint

In order to understand its usage, imagine one of your customer work-load as below:

START TRANSACTION;

Perform operation-1

Perform operation-2

Perform operation-3

Perform operation-4

COMMIT;

As per the business logic it is likely that operation-3 and operation-4 are going to fail and also failure of these operations does not impact operation-1 and operation-2. Now as per transaction properties (recall the Basic of Transaction), whole of operation will be roll backed in-case of any of the four operation fails. So in the given business scenario though operation-1 and operation-2 should not be rollbacked but still it will be.

In order to avoid this behavior  and control the main transaction life span, savepoint is being introduced.

Savepoint can be created only inside a transaction block. It creates a saving point for overall transaction i.e in-case if any commands results in error inside a transaction block, then instead of rollback whole transaction, it allows to rollback till the point where savepoint was created. E.g. Consider following example:
  CREATE TABLE TBL(ID INT, ID2 INT);
CREATE UNIQUE INDEX IDX ON TBL(ID);

START TRANSACTION
INSERT INTO TBL VALUES(100, 100);
INSERT INTO TBL VALUES(200, 200);
  SAVEPOINT first_save;
INSERT INTO TBL VALUES(200, 300); <=== This will fail because of unique index.

Related Comamnds

1. SAVEPOINT savepoint_name: This command create a new savepoint inside a transaction block as shown in above example. There can be nested savepoint also i.e. savepoint can be created with-in already created savepoint.

2. ROLLBACK TO [SAVEPOINT ] savepoint_name: This command rollback all operation done from the point savepoint was created as shown in above example. Rollback to savepoint does not destroy the given savepoint but it destroys any savepoint nested with-in the given savepoint. E.g.

START TRANSACTION

INSERT INTO TBL VALUES(100, 100);

INSERT INTO TBL VALUES(200, 200);

SAVEPOINT first_save;

INSERT INTO TBL VALUES(300, 300);

SAVEPOINT second_save;

INSERT INTO TBL VALUES(400, 400);

ROLLBACK TO SAVEPOINT first_save; <=== Will destroy second_save but first_save will remain active.

3. RELEASE [SAVEPOINT] savepoint_name: As the name suggest, it just removes/destroys the savepoint created. It does not have any impact on any other command executed after this savepoint. Similar to ROLLBACK TO SAVEPOINT, this command also destroy all savepoint nested with-in the savepoint getting released.

START TRANSACTION

INSERT INTO TBL VALUES(100, 100);

INSERT INTO TBL VALUES(200, 200);

SAVEPOINT first_save;

INSERT INTO TBL VALUES(300, 300);

SAVEPOINT second_save;

INSERT INTO TBL VALUES(400, 400);

RELEASE SAVEPOINT first_save; <=== Destroy both savepoints but no impact on records inserted.

INSERT INTO TBL VALUES(400, 500); <=== As no savepoint active, so whole transaction gets rollbacked.

NOTE: There is no command like COMMIT TO SAVEPOINT as there is no meaning to commit only part of transaction (As it will be against transaction basic properties).

Prepare Transaction

Before jumping to define this variation, it is important to understand two-phase commit. As the name suggest it does commit of any transaction in two steps.

1. First step is to verify all condition which needs to be satisfied in order to commit this transaction. This is called prepare stage.

2. Second step is the actual commit, which is almost same as traditional commit.

This is more useful for a cluster based database or any similar variations, where in various nodes (databases) are responsible to run a part of bigger operation. So it is important that each node does either COMMIT or ROLLBACK at the same time in order to maintain overall transaction property. Many cluster solution achieve this using a dedicated transaction manager or just coordinator, whose tasks are:

Step-1: will probe all nodes to check if commit can be done and in turns all nodes will respond with either positive or negative acknowledgment.
Step-2: Once coordinator or transaction manager gets response from all nodes, it will issue COMMIT or ROLLBACK to all nodes.

I will cover more detailed two-phase commit in my future blogs.

So now coming back to understand Prepare Transaction, it actually prepares a normal current running transaction for two-phase commit. Once a transaction is prepared, it dissociates from current session and its gets stored on disk. Since its state gets stored on disk, it is almost certain that this transaction can be committed successfully even in-case of any database crash happened before explicit issue of commit.

Prepare transaction is given a unique identifier, which is used later to commit or rollback this transaction. Once prepare transaction is done, transaction gets dissociated from current session. Subsequently this transaction can be committed or rollbacked from any session using the name given during prepare.

Like traditional transaction, unless prepared transaction gets committed or rollbacked, impact of any operation done will not be visible by any session including the session which has performed this transaction.

This is not recommended to be used by application or any interactive sessions. Rather it should be used where there is need of any external transaction manager, which performs some validation or has some other dependency before actual commit.

Related Comamnds

PREPARE TRANSACTION name: Prepares the current transaction and immediately dissociates from the current session. A unique name is given to this prepared transaction, so that later it can be committed/rollbacked using the same name.
COMMIT PREPARED name: Commit the already prepared transaction. It’s like performing second phase of two phase commit protocol.
ROLLBACK PREPARE name: Rollback the already prepared transaction.

Example:

START TRANSACTION

INSERT INTO TBL VALUES(100, 100);

INSERT INTO TBL VALUES(200, 200);

PREPARE TRANSACTION 'first_prepare';

SELLECT * FROM TBL <=== This will return zero record, as transaction is not yet committed

COMMIT PREPARED 'first_prepare';

SELLECT * FROM TBL <=== This will return both records.

START TRANSACTION

INSERT INTO TBL VALUES(100, 100);

INSERT INTO TBL VALUES(200, 200);

PREPARE TRANSACTION 'first_prepare';

ROLLBACK PREPARED 'first_prepare';

SELLECT * FROM TBL <=== This will return zero records as transaction rollbacked.

It is highly recommended to take PostgreSQL package from https://www.postgresql.org/download/ and try all of the commands described.

Subscribe to my blog to stay informed about my new upcoming post.

Please feel free to drop your query/suggestion/comments.

Saturday, May 14, 2016

Client connected to itself...Isn't it Strange (effect of TCP Simultaneous Connect)

Yes I observed this strange behavior during one of my work, which I would like to share. The symptom of the issue was that client was getting some unexpected junk response, which server never sent.

Background

While working on a system, where client application has logic to keep reconnecting to server till connection becomes successful. At one point of time it was observed that the client connection has been established successfully even though server has not yet come up. Later when Server was brought up, it failed to start saying "Address already in use". Which gave us hint that client must have connected to any other application running on same address as the server.

Then we saw the result of netstat command at this point of time and below were the result:

netstat -nA inet | fgrep :32246

tcp 0 0 127.0.0.1:32246 127.0.0.1:32246 ESTABLISHED

Which shows client has connected to itself (source and destination IP:Port is same). Also there is no additional application listening on 32246 port. Address 127.0.0.1:32246 was the address on which client was trying to connect.

Cause Analysis

In our case we were trying to reconnect to server repetitively and every time it keeps failing as server was still down. As per TCP, every time client tries to re-connect it is assigned one new source port in increasing order from the range of ephemeral ports(Its range is defined in file /proc/sys/net/ipv4/ip_local_port_range). So at one point of time, it may use the same source port as used for destination (if destination port is also in the range of ephemeral ports).

Below was the range of ephemeral port on the machine where issue being observed:

cat /proc/sys/net/ipv4/ip_local_port_range

9000 65500

And the server port used in my set-up was 32246. So server port was also in the range of ephemeral ports. So it was possible that at one point of time source port will be same as that of destination port.

Reason

Now you may think that even though client has chosen same port as of server, still how it can connect as no server is listening on that port. This is because of the one of the TCP feature called simultaneous connect documented in RFC793. This allows two clients to connect to each other without anyone entering into listening state. The connection establishment as per this approach is different from the usual 3-way handshake. Here both clients performs an active OPEN as shown in below table:

Table-1: Synchronous Connect

This gets triggered only if two clients are trying to reach each other (In our case though one client is trying to connect to server but since port being same it gets treated as if both are trying to connect to each other and hence simultaneous connect gets triggered.)

Experiment

Execute below command snippet:

while true

telnet 127.0.0.1 32775

done

Even though there is no telnet server running on port 32775 but at some point of time it will succeed.

telnet: connect to address 127.0.0.1: Connection refused

Trying 127.0.0.1...

telnet: connect to address 127.0.0.1: Connection refused

Trying 127.0.0.1...

Connected to 127.0.0.1.

Escape character is '^]'.

hello

Solution

Once connection is successful, we can get the port number dynamically being assigned to client and compare the same with destination port. If it matches then we can disconnect the current connection and continue to retry.

struct sockaddr_in cAddr;

socklen_t cAddrLen = sizeof(cAddr);

// Get the current client dynamic information allocated.

int e = getsockname(fd, (sockaddr *)&cAddr, &cAddrLen);

if (e != 0)

{

close(fd);

}

Conclusion

Though this is very rare scenario but very much possible. So any application having infinite reconnect logic and there is no authentication info exchange between client and server in order to finalize the connection, then it should take care of this.

In-case of PostgreSQL, client expect authentication request from server once connection is established. So issue related to self-connect will not happen.

But if there is any third party tool working on top of PostgreSQL and they have their own way of connection mechanism, then they should evaluate to check if this issue is applicable to them.

Please share your comment/feedback or if you have some other idea to address this issue.

Saturday, April 30, 2016

PostgreSQL Transaction Internal Code Flow

This post is in continuation of my previous post. Here I would explain main transaction working from code flow perspective. Sub-transaction, MVCC and other related details will be covered in subsequent posts. Request to see my previous post Basic of Transaction in order to get better understanding.

Internally each transaction is represented as unique number in increasing order with an exception in-case of overflow. Also whole of transaction flow is tracked using various state as explained below:

Command Execution:

Usually each command execution has 3 steps as:

StartTransactionCommand
Do the operation specific to command
CommitTransactionCommmand

As mentioned in my earlier post, if transaction is not started explicitly then it will be started internally for any command execution. So this forms two way of startign the transaction. Below it explain both of the case and corresponding state transition involved.

Meaning of below nomenclature

X ====action====>Y

Current state is X, in this it does operation "action" and gets transferred to state "Y". In-case there is no action mentioned, means in current state it does not do any operation, it directly moves to state "Y".

There are mainly 3 actions mentioned below, the purpose of each actions are as below:

StartTransaction: Assign resources in terms of memory, initializes the guc variable specific, initialize the transaction properties like read-only transaction or read-write, create a new resource owner etc.
CommitTransaction: Undo all initialization done by StartTransaction. Execute any pending triggers, handle all on commit action if any. Generate a COMMIT WAL record and insert same in WAL buffer, Update commit TS data. Then depending on synchronous commit or asynchronous commit configured, wait for response from standby node or directly flush the transaction to CLOG respectively.
AbortTransaction: Almost similar to CommitTransaction except it writes an ABORT WAL, marks the transaction as ABORTED.

State transition mentioned in subsequent section are states from client queries perspective. In addition to these state, there are few states from server perspective also, these are:

TRANS_DEFAULT: Idle, its default state.
TRANS_START: This is the state in which transaction initialization happens (StartTransaction)
TRANS_INPROGRESS: Means transaction has been started.
TRANS_COMMIT: This state shows transaction commit is in progress (CommitTransaction)
TRANS_ABORT:This state shows transaction abort is in progress (AbortTransaction)

Case-1: Explicit Start of Transaction:

Execution of START TRANSACTION command:

StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
Processing this command: BeginTransactionBlock(TBLOCK_STARTED) =====> TBLOCK_BEGIN.
CommitTransactionCommmand(TBLOCK_BEGIN) ====> TBLOCK_INPROGRESS

Execution of a normal command:

StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
Processing this command: Till this point no transaction ID has been assigned, as was not sure if any command requiring transaction id going to be executed. So now call the function AssignTransactionId to assign a new transaction id. Current number of transaction is maintained in a shared variable ShmemVariableCache->nextXid; So the current value of this variable is taken as transaction id for current transaction. Then value of ShmemVariableCache->nextXid is incremented (taking care of overflow case).

Also each transaction information needs to be made durable (its one of the properties), for which it maintains:

Commit log for each transaction (Called CLOG stored in clog page)
Each transaction commit timestamp (call CommitTs stores in separate page)

If current XID is going to be stored in a new page (either because its first transaction in the system or existing page is full), then it needs reset whole content of new page with zero. This should be done for all pages used for storing these information.

Finally it should:

Also each session maintains MyPgXact, which maintains transaction information in memory. This is used by all other session for taking various decision. So assign this transaction to MyPgXact.
The new transaction id is stored in the each tuple being created (more on this in coming post related to MVCC).
Stores the current command id in each tuple being created.
Then continue with normal command operation.

CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.

Executing again a command:

StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
Processing this command: Since already transaction assigned nothing to do. Just continue with command execution.
CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.

Executing COMMIT/END command:

StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
Processing this command: EndTransactionBlock(TBLOCK_INPROGRESS) ====> TBLOCK_END
CommitTransactionCommmand(TBLOCK_END)====CommitTransaction====> TBLOCK_DEFAULT

Executing ROLLBACK command:

StartTransactionCommand (TBLOCK_INPROGRESS): Nothing to do.
Processing this command: UserAbortTransactionBlock(TBLOCK_INPROGRESS) ====> TBLOCK_ABORT_PENDING
CommitTransactionCommmand(TBLOCK_ABORT_PENDING)====AbortTransaction====> TBLOCK_DEFAULT

Case-2: Implicit Start of Transaction:

Transaction is started automatically in-case a transaction block start command was not executed. This transaction is valid only for the current command and it gets committed or aborted automatically once command gets executed successfully or gets fail respectively

Command Execution- Success:

StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
Do the actual operation
CommitTransactionCommmand(TBLOCK_STARTED)====CommitTransaction====> TBLOCK_DEFAULT

Command Execution - Fail:

StartTransactionCommand (TBLOCK_DEFAULT)====StartTransaction====>TBLOCK_STARTED
Do the actual operation
AbortCurrentTransaction(TBLOCK_STARTED) ====AbortTransaction====> TBLOCK_DEFAULT

In my subsequent post, I would cover some more details about transaction and their implementation details.

Any comment/query/suggestion welcome.

Tuesday, February 13, 2018

PGConf India 2018

Yet another exciting conference PGConf 2018 is just few days away. I have been associated with this conference for last 4 years and I must say it is becoming bigger and bigger every year. This year also promises to be really great with many new exciting things:

Friday, February 9, 2018

Bitmap Scan In PostgreSQL

Wednesday, March 1, 2017

Excited about upcoming PGConf India 2017

Sunday, October 9, 2016

Major enhancement in PostgreSQL 9.6

Parallel Query

Locking

Multiple Synchronous Standby

Foreign Data Wrapper (postgres_fdw)

Saturday, June 18, 2016

Advanced Transaction: Savepoint And Prepared Transaction

Savepoint

Related Comamnds

Prepare Transaction

Related Comamnds

Saturday, May 14, 2016

Client connected to itself...Isn't it Strange (effect of TCP Simultaneous Connect)

Background

Cause Analysis

Reason

Experiment

Solution

Conclusion

Saturday, April 30, 2016

PostgreSQL Transaction Internal Code Flow

Command Execution:

Case-1: Explicit Start of Transaction:

Execution of START TRANSACTION command:

Execution of a normal command:

CommitTransactionCommmand(TBLOCK_INPROGRESS): Does command counter increment (CommandCounterIncrement) i.e. increments the command id so that if multiple commands running in same transaction then next command can see the operation done by previous command. No state transition.

Executing again a command:

Executing COMMIT/END command:

Executing ROLLBACK command:

Case-2: Implicit Start of Transaction:

Command Execution- Success:

Command Execution - Fail: