All about Database: Native Compilation Part-2 @ PGDay Asia 2016 Conference

Finally first PGDay Asia held in Singapore has successfully finished. I am also one of the organizer of this conference. It took months of hard work by many specially Sameer Kumar to achieve this.

I had also opportunity to present my paper on Native Compilation part-2, it was subsequent to my paper presented in last PGCon 2015 (Procedure compilation is additional).

Summary of this talk (For detail talk visit Native Compilation at PGDay Asia 2016):

Why Native Compilation Required:

Seeing the current hardware trend, I/O is no more bottleneck for the executor mainly because of below reasons:

1. Increase in RAM Size

2. Prevalence of High speed SSD

At the same time, though lot of work has happened in hardware but there is no focus on improving the CPU efficiency. Because of this, current executor biggest bottleneck is CPU usage efficiency not I/O. So in order to tackle CPU bottleneck without compromising on any feature, we need mechanism to execute lesser number of instruction and still getting all functionality.

What is Native Compilation:

Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:

1. Generate C-code specific to objects/query.

2. Compile C-code to generate DLL and load with server executable.

3. Call specialized function instead of generalized function.

Native compilation can be applied to any entity which are inefficient in terms of CPU e.g.:

1. Table

2. Procedure

3. Query

Table Native Compilation (Aka Schema Binding):

Since most of the properties of a particular table remains same once it is created, so its data gets stored and accessed in the similar pattern irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the particular relation during its creation and the same gets used for further query containing that table. This approach eliminates the need for various comparison (e.g. data-type check, length check, number of attribute check) during the access of tuple. More the deforming of tuples contribute to total run of query, more performance will be observed. Generation of code can be done in 3-ways, details of each of them is available in presentation.

Fig-1: Table compilation

Once a create table command is issued, a C-file with all specialized access function is generated, which is in turns gets loaded as DLL. These loaded functions are used by all SQL query accessing the compiled table.

Performance Improvement:

Below is the performance improvement observed on standard TPC-H benchmark.

The system configuration is as below:

SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core

TPC-H Configuration: Default

Graph-1: TPC-H Performance

Table-1: TPC-H Performance

With the above performance result, we observed that there is more than 70% reduction in CPU instruction and upto 36% performance improvement on TPC-H queries.

Procedure Compilation:

There are two parts in any of the procedure:

1. SQL Query statements

2. Non-SQL statement (e.g. loop, conditional, simple expression etc).

In this presentation, we focused only on 2nd part. As part of this whenever we create the procedure, we compile the Pl/Pgsql function into corresponding normal C-function. While compiling:

1. Transform the Pl/Pgsql procedure signature to normal C-function signature.

2. Transform all variable declaration as C-variable declaration.

3. Transform all other non-SQL statement in corresponding normal C-statement.

4. All SQL statements are transformed in the format SPI_xxx as below:

if (plan == NULL)

{

stmt = SPI_prepare(modified_query, number_of_variables, type_of_variable used);

SPI_keepplan(stmt);

plan = stmt;

}

SPI_execute_plan(plan, value, isnull, false, row_count);

5. Finally compile the generated C-function using a traditional compiler to generate DLL and link the same with SERVER executable.

6. So on subsequent execution, directly C-function will be called instead of Pl/Pgsql function.

Fig-2: Procedure Compilation

Fig-3: Compiled Procedure invocation

Fig-2 highlights at what step and how procedure will be compiled. Once the parsing of procedure is done, we will have all information about the procedure in PLpgSQL_function pointer, which we can traverse as planner does and for each statement corresponding C-code can be generated.

Fig-3 explain, how compiled function will be invoked.

How number of instruction reduced:

Consider a statment as x=2+y in Pl/Pgsql procedure. This statement will be executed as a traditional SELECT command as "SELECT 2+y;". So everytime this procedure gets executed, this query will be executed as if it was normal SQL query and hence many CPU instruction are executed.

But if we convert this to C-based statment, it execution will be directly as 2+y and hence will execute very few instruction comapred to original one.

Performance Improvement:

Below is the performance improvement observed on standard TPC-C benchmark:

The system configuration is as below:

SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core

TPC-C Configuration: runMins=0, runTxnsPerTerminal=200000

Checkpoint_segment = 100

Graph-2: TPC-C Performance

Table-2: TPC-C Performance

With some basic compilation of procedure, we are able to get around 23% performance improvement.

Query Compilation:

Though this was not part of my presentation but would like to give some details to do this.

Unlike above two compilation, in order to compile query we need to use LLVM as the compilation time is very important while executing the query.

LLVM is open source compilation framework for very efficient and fast compilation. The LLVM compiler can only understand the IR (Internal representation something similar to assembly code). So before compilation using LLVM, we need to generate IR form of the code, which can be either of the two ways:

1. Generate C-code and generate IR for this using Clang (a tool built for C-code)

2. Or just use the IR-Builder interface function provided by LLVM to generate code directly in IR format.

The steps to compile query is as below:

1. Generate the Query plan as it is.

2. Choose the plan node to be compiled (if whole query to be compiled, then all nodes), generate a specific code corresponding to the node. This can be done in either of the ways explained above.

3. Compile the generated code using LLVM.

4. The code compiled using LLVM will return an object handler, which can be attached to the corresponding node.

5. So while executing that node, we can call compiled stored object if the compilation object stored for the corresponding node.

The most of the benefit is expected out of this query compilation. Once query compilation is done, even the SQL query statement of procedure also can be compiled in the same way (perhaps nothing explicit will be required for this...).

Conclusion:

Seeing the industry trend, we have implemented two way of specialization, which resulted in up to 36% and 23% of performance improvement on standard benchmark TPC-H and TPC-C respectively.

Any feedback/comments/queries are welcome.

17 comments:

JobinMarch 22, 2016 at 10:26 PM
Very interesting topic.
Oracle done something similar from 9i ownwards.. they called it NCOMP.
http://www.oracle.com/technetwork/database/features/plsql/ncomp-faq-087606.html
upto 10g they used C compliner in the system. process was same.. PL -> c -> shared library
but from 11g onward they are shipping a built in compiler which is capable of directly converting pl code to native code.
But the second part (Query compilation) is something unique. that may add great value to postgres
Dato0011March 24, 2016 at 5:18 AM
Is this already implemented in Postgre?
David FetterMarch 24, 2016 at 11:29 AM
Where is the source code and the test harness you used to test it?
Kumar Rajeev RastogiApril 8, 2016 at 2:34 PM
Good to see my post being cited by Bruce Momjian's (One of the core member of PostgreSQL) recent Post:
http://momjian.us/main/blogs/pgblog/2016.html#April_1_2016
UnknownAugust 16, 2016 at 9:05 PM
Get Blazing Fast Asia Optimized Server For Affordable Price Here - THESTACK.NET j
oncasinosite 카지노사이트October 24, 2021 at 6:14 PM
Greetings! Very helpful advice within this post! It is the little changes that will make the most significant changes. Thanks a lot for sharing! Feel free to visit my website; 온라인카지노
yadongbizNovember 7, 2021 at 3:39 PM
I am really glad to glance at this website posts which includes plenty of useful data, thanks for providing such information. 한국야동

Also feel free to visit may web page check this link 야한동영상

yahanvideonetNovember 7, 2021 at 3:40 PM
Thank you. I authentically greeting your way for writing an article. I safe as a majority loved it to my bookmark website sheet list and will checking rear quite than later. 야설

Also feel free to visit may web page check this link 한국야동

koreayadongcomNovember 7, 2021 at 3:40 PM
This is a great post I seen because of offer it. It is truly what I needed to see seek in future you will proceed after sharing such a magnificent post. 야동

Also feel free to visit may web page check this link 일본야동

japanyadongcomNovember 7, 2021 at 3:41 PM
Having read this I believed it was extremely enlightening. I appreciate you spending some time and effort to put this short article together. 국산야동

Also feel free to visit may web page check this link 야설

chinayadongnetNovember 7, 2021 at 3:41 PM
Yes i am totally agreed with this article and i just want to say that this article is very nice and very informative. 한국야동

Also feel free to visit may web page check this link 야설

All about Database

Tuesday, March 22, 2016

Native Compilation Part-2 @ PGDay Asia 2016 Conference

Why Native Compilation Required:

What is Native Compilation:

Table Native Compilation (Aka Schema Binding):

Performance Improvement:

Procedure Compilation:

How number of instruction reduced:

Performance Improvement:

Query Compilation:

Conclusion:

17 comments:

About Me

Blog Archive