Many of the PostgreSQL developers as well as users wonder when "bitmap scan" (here onward BMS) gets selected as winning plan. This confusion is result of general understanding that if very few percentage of records being selected then index scan will be winning plan otherwise sequence scan will be chosen.
So in order to understand when BMS be will be chosen, lets first understand the root of the confusion i.e. why index scan gets selected in case of only very few percentage of records selection otherwise not.
Sequence Scan: Execution of this plan causes all pages (also called heap) to be scanned sequentially and on top of that applies qualification if given. So even after qualification apply if we get only some percentage of records, still it needs to scan all pages. But one good thing here is that all pages are read sequentially and sequence I/O is much faster. (Explanation of Sequence I/O is faster than random I/O is in itself a big topic, which I will cover in some future blog).
Index Scan: Execution of this plan finds the exact B-Tree(containing index data) page and gets the data. As using the B-Tree finding the exact location of a key is very fast so getting index data is very efficient. But there is one problem here: As B-Tree contains only index data (i.e. key), then for remaining part of data, it needs to get from heap page. Though index data keep a pointer to indicate exact location of data in heap pages (in terms of page number and offset within that), it needs to load the new page. So effectively for each index key, order will be: Fetch from B-Tree page => Fetch from heap page => Fetch from B-Tree page => Fetch from heap page =>.............so on. So this causes random page access and hence random I/O, which is costly compared to sequential I/O.
So random I/O cost is major differentiation i.e. in order to select index scan, random I/O cost should be able to outperform sequential scanning of all pages.
Now imagine a plan, which gets benefit of Index Scan but still avoid random I/O to a great extent. So obviously as per the above discussion that plan will be best and this plan is called BMS.
Now you might be thinking if that is the case then why not always select BMS. As you know nothing comes for free. In-case of BMS, cost is paid in terms of two scan. Let's briefly understand how BMS works.
BIHS: Please see the below plan:
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on blogtable (cost=64264.25..148070.07 rows=2313266 width=15)
Recheck Cond: (num < '230'::numeric)
-> Bitmap Index Scan on idx (cost=0.00..63685.93 rows=2313266 width=0)
Index Cond: (num < '230'::numeric)
(4 rows)
As you can see as part of plan, first node is Bitmap Heap Scan and its child node is Bitmap Index scan. Following steps are taken to execute this:
In general, BMS is selected when number of records selected are neither too less for index scan nor too large for sequence scan. Let's analyze it using below examples:
NOTE: TID corresponding to each index data may be in random order e.g. for first index data I1, heap data will point to {blkno-10, offset = 20}, for I2 points to {blkno-1, offset = 30}, for I3 points to {blkno-20, offset=40} and so on. So you might wonder that the actual heap scan will still use random I/O. But that is not the case. Once bit mask is formed, all page lists gets sorted and hence sequential scan will use sequential I/O only.
Examples:
VACUUM;
Plan - 1:
postgres=# explain SELECT * FROM blogtable WHERE num = 23000;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using idx on blogtable (cost=0.44..8.45 rows=1 width=15)
Index Cond: (num = '23000'::numeric)
(2 rows)
Plan - 2:
postgres=# explain SELECT * FROM blogtable WHERE num < 23000;
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on blogtable (cost=0.00..179890.00 rows=10000000 width=15)
Filter: (num < '23000'::numeric)
(2 rows)
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on blogtable (cost=64264.25..148070.07 rows=2313266 width=15)
Recheck Cond: (num < '230'::numeric)
-> Bitmap Index Scan on idx (cost=0.00..63685.93 rows=2313266 width=0)
Index Cond: (num < '230'::numeric)
(4 rows)
Plan - 4:
postgres=# explain SELECT num FROM blogtable WHERE num < 230;
QUERY PLAN
-------------------------------------------------------------------------------------
Index Only Scan using idx on blogtable (cost=0.44..86818.59 rows=2313266 width
=11)
Index Cond: (num < '230'::numeric)
(2 rows)
Plan -1 to Plan -3 varies depending on the number of records being selected. Plan - 1 takes index scan as only one row needs to be selected, whereas plan -3 takes BMS as almost 2.3M out of 10M records are selected.
Can you please pause here and try to think why plan-4 takes Index Only Scan even though number of records selected are same as in plan -3.
I hope you found answer yourself. If not let me explain. As discussed above main villain which makes Index Scan costly is random I/O and random I/O happens in order to fetch heap data corresponding to each index key. But in-case of plan-4, if you observe we are selecting only index column and hence there is no need to fetch heap data and hence no random I/O.
I hope by now you have got some basic understanding of Bitmap Scan.
Any queries/comments/suggestions are welcome.
So in order to understand when BMS be will be chosen, lets first understand the root of the confusion i.e. why index scan gets selected in case of only very few percentage of records selection otherwise not.
Sequence Scan: Execution of this plan causes all pages (also called heap) to be scanned sequentially and on top of that applies qualification if given. So even after qualification apply if we get only some percentage of records, still it needs to scan all pages. But one good thing here is that all pages are read sequentially and sequence I/O is much faster. (Explanation of Sequence I/O is faster than random I/O is in itself a big topic, which I will cover in some future blog).
Index Scan: Execution of this plan finds the exact B-Tree(containing index data) page and gets the data. As using the B-Tree finding the exact location of a key is very fast so getting index data is very efficient. But there is one problem here: As B-Tree contains only index data (i.e. key), then for remaining part of data, it needs to get from heap page. Though index data keep a pointer to indicate exact location of data in heap pages (in terms of page number and offset within that), it needs to load the new page. So effectively for each index key, order will be: Fetch from B-Tree page => Fetch from heap page => Fetch from B-Tree page => Fetch from heap page =>.............so on. So this causes random page access and hence random I/O, which is costly compared to sequential I/O.
So random I/O cost is major differentiation i.e. in order to select index scan, random I/O cost should be able to outperform sequential scanning of all pages.
Now imagine a plan, which gets benefit of Index Scan but still avoid random I/O to a great extent. So obviously as per the above discussion that plan will be best and this plan is called BMS.
Now you might be thinking if that is the case then why not always select BMS. As you know nothing comes for free. In-case of BMS, cost is paid in terms of two scan. Let's briefly understand how BMS works.
BIHS: Please see the below plan:
QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on blogtable (cost=64264.25..148070.07 rows=2313266 width=15)
Recheck Cond: (num < '230'::numeric)
-> Bitmap Index Scan on idx (cost=0.00..63685.93 rows=2313266 width=0)
Index Cond: (num < '230'::numeric)
(4 rows)
As you can see as part of plan, first node is Bitmap Heap Scan and its child node is Bitmap Index scan. Following steps are taken to execute this:
- Bitmap Heap scan ask for a tuple from Bitmap Index Scan.
- Bitmap Index Scan scan the index as per the condition (x<230) almost in the same way as done in normal Index Scan. But instead of returning TID (consisting of page no and offset within that) corresponding to heap data, it adds those TID in a bitmap. For simple understanding, you can consider this bitmap contains hash of all pages (hashed based on page no) and each page entry contains array of all offset within that page (Complete implementation details would make this blog post far longer than it already is, so I will leave it for another time).
- Then Bitmap Heap Scan reads through the bitmap to get heap data corresponding to stored page number and offset. Then it check for visibility, qualification etc and returns the tuple based on outcome of all these checks.
It is clear that here neither all pages are read sequentially nor heap is accessed corresponding to each index tuple but it has to pay price for double scan.
So none of the plan can be clear winner all the time. It all depends on how many rows are getting selected and PostgreSQL planner does decent job to find the cost to choose best plan based on that.
In general, BMS is selected when number of records selected are neither too less for index scan nor too large for sequence scan. Let's analyze it using below examples:
NOTE: TID corresponding to each index data may be in random order e.g. for first index data I1, heap data will point to {blkno-10, offset = 20}, for I2 points to {blkno-1, offset = 30}, for I3 points to {blkno-20, offset=40} and so on. So you might wonder that the actual heap scan will still use random I/O. But that is not the case. Once bit mask is formed, all page lists gets sorted and hence sequential scan will use sequential I/O only.
Examples:
CREATE TABLE blogtable (num numeric, id int);
CREATE INDEX idx ON blogtable(num);
INSERT INTO blogtable SELECT random() * 1000, 2 FROM generate_series(1, 10000000);
ANALYZE;VACUUM;
Plan - 1:
postgres=# explain SELECT * FROM blogtable WHERE num = 23000;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using idx on blogtable (cost=0.44..8.45 rows=1 width=15)
Index Cond: (num = '23000'::numeric)
(2 rows)
Plan - 2:
postgres=# explain SELECT * FROM blogtable WHERE num < 23000;
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on blogtable (cost=0.00..179890.00 rows=10000000 width=15)
Filter: (num < '23000'::numeric)
(2 rows)
Plan - 3:
postgres=# explain SELECT * FROM blogtable WHERE num < 230;QUERY PLAN
---------------------------------------------------------------------------------
Bitmap Heap Scan on blogtable (cost=64264.25..148070.07 rows=2313266 width=15)
Recheck Cond: (num < '230'::numeric)
-> Bitmap Index Scan on idx (cost=0.00..63685.93 rows=2313266 width=0)
Index Cond: (num < '230'::numeric)
(4 rows)
Plan - 4:
postgres=# explain SELECT num FROM blogtable WHERE num < 230;
QUERY PLAN
-------------------------------------------------------------------------------------
Index Only Scan using idx on blogtable (cost=0.44..86818.59 rows=2313266 width
=11)
Index Cond: (num < '230'::numeric)
(2 rows)
Plan -1 to Plan -3 varies depending on the number of records being selected. Plan - 1 takes index scan as only one row needs to be selected, whereas plan -3 takes BMS as almost 2.3M out of 10M records are selected.
Can you please pause here and try to think why plan-4 takes Index Only Scan even though number of records selected are same as in plan -3.
I hope you found answer yourself. If not let me explain. As discussed above main villain which makes Index Scan costly is random I/O and random I/O happens in order to fetch heap data corresponding to each index key. But in-case of plan-4, if you observe we are selecting only index column and hence there is no need to fetch heap data and hence no random I/O.
I hope by now you have got some basic understanding of Bitmap Scan.
Any queries/comments/suggestions are welcome.
Hello
ReplyDeleteGood explaination of the kinds of data access
But are you sure, in the last example, that the first step is Bitmap Heap Scan and that Bitmap Index Scan is its child node. IMHO I think it's the opposite...
Best regards
Bruno
Thanks for your feedback.
DeleteIn PostgreSQL, plan execution uses top down approach. So to get a tuple in the example mentioned above first it will execute Bitmap Heap Scan node but this node does not have any data yet so it will ask its child node (in this case Bitmap Index Scan). Once Bitmap index scan return bitmap, then again Bitmap heap scan will resume processing.
So Bitmap index scan is child node (also called outer plan node) of Bitmap Heap Scan Node.
Hope this clarifies.
Hello Rajeev,
ReplyDeleteyour blog is too good.
I am stuck with one issue in PostgreSQL.
issue is related to LIKE Operator with '%hello%' leading wildcard
because of performance concern we added GIN Index on search column
problem is select query is not giving consistence result. sometime it returns 0 rows, some time 2 rows, but correct output is 3 rows.
select * from entity_xyz
where entity_name_denorm_xyz like UPPER('%ABC%') ;
entity_xyz table having around 2 million records and total 10 indexes.
As per my knowledge Issue is with Bitmap Index Scan on GIN index.
any suggestion will be appreciated.
Nice explanation!
ReplyDelete