Tuesday, March 22, 2016

Native Compilation Part-2 @ PGDay Asia 2016 Conference

Finally first PGDay Asia held in Singapore has successfully finished. I am also one of the organizer of this conference. It took months of hard work by many specially Sameer Kumar to achieve this.

I had also opportunity to present my paper on Native Compilation part-2, it was subsequent to my paper presented in last PGCon 2015 (Procedure compilation is additional).

Summary of this talk (For detail talk visit Native Compilation at PGDay Asia 2016):

Why Native Compilation Required:

Seeing the current hardware trend, I/O is no more bottleneck for the executor mainly because of below reasons:
1. Increase in RAM Size
2. Prevalence of High speed SSD
At the same time, though lot of work has happened in hardware but there is no focus on improving the CPU efficiency. Because of this, current executor biggest bottleneck is CPU usage efficiency not I/O. So in order to tackle CPU bottleneck without compromising on any feature, we need mechanism to execute lesser number of instruction and still getting all functionality.


What is Native Compilation:

Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:
 1. Generate C-code specific to objects/query.
 2. Compile C-code to generate DLL and load with server executable.
 3. Call specialized function instead of generalized function.

Native compilation can be applied to any entity which are inefficient in terms of CPU e.g.:
 1. Table
 2. Procedure
 3. Query

Table Native Compilation (Aka Schema Binding):

Since most of the properties of a particular table remains same once it is created, so its data gets stored and accessed in the similar pattern irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the particular relation during its creation and the same gets used for further query containing that table. This approach eliminates the need for various comparison (e.g. data-type check, length check, number of attribute check) during the access of tuple. More the deforming of tuples contribute to total run of query, more performance will be observed. Generation of code can be done in 3-ways, details of each of them is available in presentation.


Fig-1: Table compilation


Once a create table command is issued, a C-file with all specialized access function is generated, which is in turns gets loaded as DLL. These loaded functions are used by all SQL query accessing the compiled table.

Performance Improvement:

Below is the performance improvement observed on standard TPC-H benchmark.
The system configuration is as below:
SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core
TPC-H Configuration: Default

Graph-1: TPC-H Performance

Table-1: TPC-H Performance 



With the above performance result, we observed that there is more than 70% reduction in CPU instruction and upto 36% performance improvement on TPC-H queries.

Procedure Compilation:

There are two parts in any of the procedure:
 1. SQL Query statements
 2. Non-SQL statement (e.g. loop, conditional, simple expression etc).

In this presentation, we focused only on 2nd part. As part of this whenever we create the procedure, we compile the Pl/Pgsql function into corresponding normal C-function. While compiling:
 1. Transform the Pl/Pgsql procedure signature to normal C-function signature.
 2. Transform all variable declaration as C-variable declaration.
 3. Transform all other non-SQL statement in corresponding normal C-statement.
 4. All SQL statements are transformed in the format SPI_xxx as below:
if (plan == NULL)
{
stmt = SPI_prepare(modified_query, number_of_variables, type_of_variable used);
SPI_keepplan(stmt);
plan = stmt;
}
SPI_execute_plan(plan, value,  isnull,  false, row_count);
 5. Finally compile the generated C-function using a traditional compiler to generate DLL and link  the same with SERVER executable.
 6. So on subsequent execution, directly C-function will be called instead of Pl/Pgsql function.


Fig-2: Procedure Compilation

Fig-3: Compiled Procedure invocation

Fig-2 highlights at what step and how procedure will be compiled.  Once the parsing of procedure is done, we will have all information about the procedure in PLpgSQL_function pointer, which we can traverse as planner does and for each statement corresponding C-code can be generated.
Fig-3 explain, how compiled function will be invoked.

How number of instruction reduced:

Consider a statment as x=2+y in Pl/Pgsql procedure. This statement will be executed as a traditional SELECT command as "SELECT 2+y;". So everytime this procedure gets executed, this query will be executed as if it was normal SQL query and hence many CPU instruction are executed.
But if we convert this to C-based statment, it execution will be directly as 2+y and hence will execute very few instruction comapred to original one.

Performance Improvement:

Below is the performance improvement observed on standard TPC-C benchmark:
The system configuration is as below:
SUSE Linux Enterprise Server 11 (x86_64), 2 Cores, 10 sockets per core
TPC-C Configuration: runMins=0, runTxnsPerTerminal=200000
Checkpoint_segment = 100

Graph-2: TPC-C Performance

Table-2: TPC-C Performance

With some basic compilation of procedure, we are able to get around 23% performance improvement.

Query Compilation:

Though this was not part of my presentation but would like to give some details to do this.
Unlike above two compilation, in order to compile query we need to use LLVM as the compilation time is very important while executing the query.
LLVM is open source compilation framework for very efficient and fast compilation. The LLVM compiler can only understand the IR (Internal representation something similar to assembly code). So before compilation using LLVM, we need to generate IR form of the code, which can be either of the two ways:
1. Generate C-code and generate IR for this using Clang (a tool built for C-code)
2. Or just use the IR-Builder interface function provided by LLVM to generate code directly in IR format.
The steps to compile query is as below:
1. Generate the Query plan as it is.
2. Choose the plan node to be compiled (if whole query to be compiled, then all nodes), generate a specific code corresponding to the node. This can be done in either of the ways explained above.
3. Compile the generated code using LLVM.
4. The code compiled using LLVM will return an object handler, which can be attached to the corresponding node.
5. So while executing that node, we can call compiled stored object if the compilation object stored for the corresponding node.

The most of the benefit is expected out of this query compilation. Once query compilation is done, even the SQL query statement of procedure also can be compiled in the same way (perhaps nothing explicit will be required for this...).

Conclusion:

Seeing the industry trend, we have implemented two way of specialization, which resulted in up to 36% and 23% of performance improvement on standard benchmark TPC-H and TPC-C respectively.


Any feedback/comments/queries are welcome.