Tuesday, February 13, 2018

PGConf India 2018

Yet another exciting conference PGConf 2018 is just few days away. I have been associated with this conference for last 4 years and I must say it is becoming bigger and bigger every year. This year also promises to be really great with many new exciting things:

  1. First time it is going to have 2 days of conference on 22nd and 23rd Feb.
  2. Training day, one day prior to conference on 21st Feb.
  3. It has got its own brand new logo.
  4. Top notch guys from many big companies.
Apart from above there are many exciting papers ranging from developer oriented to user specific. So be there to witness all actions happening there...:-)

Looking forward to learn new exciting things about PostgreSQL. Also excited to meet and greet new guys and re-connect with known guys.

Friday, February 9, 2018

Bitmap Scan In PostgreSQL

Many of the PostgreSQL developers as well as users wonder when "bitmap scan" (here onward BMS) gets selected as winning plan. This confusion is result of general understanding that if very few percentage of records being selected then index scan will be winning plan otherwise sequence scan will be chosen.

So in order to understand when BMS be will be chosen, lets first understand the root of the confusion i.e. why index scan gets selected in case of only very few percentage of records selection otherwise not.

Sequence Scan: Execution of this plan causes all pages (also called heap) to be scanned sequentially and on top of that applies qualification if given. So even after qualification apply if we get only some percentage of records, still it needs to scan all pages. But one good thing here is that all pages are read sequentially and sequence I/O is much faster. (Explanation of Sequence I/O is faster than random I/O is in itself a big topic, which I will cover in some future blog).

Index Scan: Execution of this plan finds the exact B-Tree(containing index data) page and gets the data. As using the B-Tree finding the exact location of a key is very fast so getting index data is very efficient. But there is one problem here: As B-Tree contains only index data (i.e. key), then for remaining part of data, it needs to get from heap page. Though index data keep a pointer to indicate exact location of data in heap pages (in terms of page number and offset within that), it needs to load the new page. So effectively for each index key, order will be: Fetch from B-Tree page => Fetch from heap page => Fetch from B-Tree page => Fetch from heap page =>.............so on. So this causes random page access and hence random I/O, which is costly compared to sequential I/O.

So random I/O cost is major differentiation i.e. in order to select index scan, random I/O cost should be able to outperform sequential scanning of all pages.

Now imagine a plan, which gets benefit of Index Scan but still avoid random I/O to a great extent. So obviously as per the above discussion that plan will be best and this plan is called BMS.
Now you might be thinking if that is the case then why not always select BMS. As you know nothing comes for free. In-case of BMS, cost is paid in terms of two scan. Let's briefly understand how BMS works.

BIHS:  Please see the below plan:

                                   QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on blogtable  (cost=64264.25..148070.07 rows=2313266 width=15)
   Recheck Cond: (num < '230'::numeric)
   ->  Bitmap Index Scan on idx  (cost=0.00..63685.93 rows=2313266 width=0)
         Index Cond: (num < '230'::numeric)
(4 rows)

As you can see as part of plan, first node is Bitmap Heap Scan and its child node is Bitmap Index scan. Following steps are taken to execute this:
  1. Bitmap Heap scan ask for a tuple from Bitmap Index Scan.
  2. Bitmap Index Scan scan the index as per the condition (x<230) almost in the same way as done in normal Index Scan. But instead of returning TID (consisting of page no and offset within that) corresponding to heap data, it adds those TID in a bitmap.  For simple understanding, you can consider this bitmap contains hash of all pages (hashed based on page no) and each page entry contains array of all offset within that page (Complete  implementation details would make this blog post far longer than it already is, so I will leave it for another time).
  3. Then Bitmap Heap Scan reads through the bitmap to get heap data corresponding to stored page number and offset. Then it check for visibility, qualification etc and returns the tuple based on outcome of all these checks.

It is clear that here neither all pages are read sequentially nor heap is accessed corresponding to each index tuple but it has to pay price for double scan.
So none of the plan can be clear winner all the time. It all depends on how many rows are getting selected and PostgreSQL planner does decent job to find the cost to choose best plan based on that.

In general, BMS is selected when number of records selected are neither too less for index scan nor too large for sequence scan. Let's analyze it using below examples:

NOTE: TID corresponding to each index data may be in random order e.g. for first index data I1, heap data will point to {blkno-10, offset = 20}, for I2 points to {blkno-1, offset = 30}, for I3 points to {blkno-20, offset=40} and so on. So you might wonder that the actual heap scan will still use random I/O. But that is not the case. Once bit mask is formed, all page lists gets sorted and hence sequential scan will use sequential I/O only.

Examples:
       CREATE TABLE blogtable (num numeric, id int);
       CREATE INDEX idx ON blogtable(num);  
       INSERT INTO blogtable SELECT random() * 1000, 2 FROM generate_series(1, 10000000);
       ANALYZE;
       VACUUM;

Plan - 1: 
postgres=# explain SELECT * FROM blogtable WHERE num = 23000;
                              QUERY PLAN
----------------------------------------------------------------------
 Index Scan using idx on blogtable  (cost=0.44..8.45 rows=1 width=15)
   Index Cond: (num = '23000'::numeric)

(2 rows)

Plan - 2: 
postgres=# explain SELECT * FROM blogtable WHERE num < 23000;
                              QUERY PLAN
----------------------------------------------------------------------
 Seq Scan on blogtable  (cost=0.00..179890.00 rows=10000000 width=15)
   Filter: (num < '23000'::numeric)

(2 rows)

Plan - 3: 
postgres=# explain SELECT * FROM blogtable WHERE num < 230;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Bitmap Heap Scan on blogtable  (cost=64264.25..148070.07 rows=2313266 width=15)
   Recheck Cond: (num < '230'::numeric)
   ->  Bitmap Index Scan on idx  (cost=0.00..63685.93 rows=2313266 width=0)
         Index Cond: (num < '230'::numeric)
(4 rows)

Plan - 4: 
postgres=# explain SELECT num FROM blogtable WHERE num < 230;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Only Scan using idx on blogtable  (cost=0.44..86818.59 rows=2313266 width
=11)
   Index Cond: (num < '230'::numeric)

(2 rows)

Plan -1 to Plan -3 varies depending on the number of records being selected. Plan - 1 takes index scan as only one row needs to be selected, whereas plan -3 takes BMS as almost 2.3M out of 10M records are selected.

Can you please pause here and try to think why plan-4 takes Index Only Scan even though number of records selected are same as in plan -3.

I hope you found answer yourself. If not let me explain. As discussed above main villain which makes Index Scan costly is random I/O and random I/O happens in order to fetch heap data corresponding to each index key. But in-case of plan-4, if you observe we are selecting only index column and hence there is no need to fetch heap data and hence no random I/O.

I hope by now you have got some basic understanding of Bitmap Scan.
Any queries/comments/suggestions are welcome.