If you have attended previous IIUG Informix User Conferences you may already be familiar with the Advanced DataTools Fastest DBA Contest that we have hosted over the last few years. The contest has differed each year changing from tuning the engine for a Batch SQL job to managing an OLTP benchmark load. This year we combined elements of both the batch and OLTP together to make the most challenging contest to date. This is also the first year that we have made the contest available outside of the User Conference, allowing participants from around the world to enter. We did indeed receive the entries from all around the world and the more challenging contest this year resulted in some of the most ingenious and varied entries than we have seen before.
The Contest
The contest this year had two distinct elements:
- A batch processing SQL that would run in dbaccess, looping continually, performing inserts into a bill table and updates of customer records. This was followed by queries of the bill and customer tables to show counts and totals.
- An OLTP benchmark that simulated 1,000 users running an ordering system from 100 different warehouses
Both the batch and OLTP benchmark were run at the same time against the same database for 10 minutes. Winners were scored on the average number of OLTP transactions/minute + total number of bills inserted/10. The contest was run against an Apple Mac Mini with an Intel i7 2 GHz processor, 8 GB RAM and a single SSD disk.
The database and OLTP benchmark program was made available for download, giving contestants the opportunity to develop and test the batch SQL and tune the database and configuration before they were submitted.
The Changes
Running both the batch and OLTP processes at the same time complicated the tuning as a compromise was needed to tune for scans of increasingly large tables while allowing the best throughput for thousands of statements per second from the OLTP benchmark.
The winners of the contest decided to concentrate on tuning the batch processing, where it was possible to reach an average of 60,000 – 90,000 inserts/minute versus a maximum of about 10,000 transactions/minute with the OLTP processing. While most contestants made a strong effort to tune the OLTP benchmark, the winner of the contest deliberately slowed the OLTP processing by leaving the tables with page level locking, whereas the others had set the tables to row level locking to improve the concurrency of the 1,000 simulated user connections. While this resulted in lower OLTP score, the batch score was the highest of all the entries received.
Even with the attempt to slow down the OLTP processing, further testing of the winning entry showed that running the batch processing on its own, without the OLTP benchmark running, a speed improvement of 3-4 times could have been achieved.
Contestants had an opportunity to modify three things:
- The onconfig file
- A SQL “prep” script to be run immediately before the test was run. In here would be included new indexes, alter statements, etc.
- The Batch SQL script
1. ONCONFIG file:
To see a PDF file of a chart that shows the changes made to the onconfig files for each of the top 4 contestants click here: Fastest_Informix_DBA_2013_ONCONFIG.pdf:
All of the winning contestants increased the checkpoint interval (CKPTINTVL) to avoid a checkpoint running during the 10 minute test and reducing performance as all the dirty pages were written to disk. Contestant #2 was the only person who experienced a checkpoint during their run, and this was triggered by a lack of logical log resources.
The test machine has 4 cores, and all contestants increased the number of CPU VPs (NUMCPUVPS) from the baseline to make use of the additional cores.
All contestants realized that increasing the size of the initial shared memory segment (SHMVIRTSIZE) was important to avoid the creation of many smaller virtual memory segments. Additional memory segments were still added for each contestant during the run, but only 3 or 4 of the larger memory segments (all contestants increased SHMADD) instead of the many segments that would have been seen had this change not been made.
The OLTP benchmark alone created 1,000 network connections to the database. By increasing the number of poll threads and listener threads for the network connections (NETTYPE soctcp) the engine would be able to better service these connections.
The contestant who achieved the best Batch SQL performance optimized the Decision Support parameters and was the only person to use PDQ within the batch process together with forcing hash joins between some of the tables. See the comments later in this article for more on this.
This same contestant was the only person who limited the read ahead (AUTO_READAHEAD) to 16 pages (if not specified the default value would bring in 128 pages at a time). At first glance, it would seem that the batch process would benefit from additional read-ahead as it does perform several sequential scans each time it runs. However it is more likely is that reducing the read-ahead from the default helped the batch processing by not causing buffers to be filled with extra pages from the OLTP tables, allowing the batch tables to remain resident for longer.
Those who achieved the first and second place in the OLTP run both enabled the SQL Statement Cache. The statements executed by the OLTP benchmark are all prepared, but are likely prepared and executed each time they are run. Using the SQL Statement Cache to store the execution plan will have eliminated some of the overhead when executing these statements repeatedly.
All contestants increased the size of the buffer pool (BUFFERPOOL). This was the single most important configuration change as both the OLTP and Batch processes benefit from caching. In fact, during additional testing, as the buffer pool began to be used up, and pages needed to be retrieved from disk, the batch performance decreased significantly, and is why the scans to produce the totals against the expanding bills table took much longer each time.
As well as increasing the size of the buffer pool, the winner of the batch processing lowered the LRU Min and Max threshold values, which would have caused dirty pages to be written to disk in between checkpoints. Although this would seem to be counter-intuitive when each contestant was trying to eliminate checkpoints during the test, we have seen many circumstances where frequent writes of dirty pages can improve the performance of this test even when there are no checkpoints.
Three of the four winners modified USELASTCOMMITTED to ALL or COMMITTED READ (both would have had the same effect as the OLTP benchmark used a committed read isolation level). This parameter only applies for tables that use row-level locking, so would have no effect for Contestant #1 who used page level locking for the OLTP tables. When this parameter is set, and a read encounters a locked row, it will return the most recently committed version of the data. If this parameter is set to NONE, then any reads of locked rows will wait or fail (depending on the lock wait mode that has been set). The OLTP benchmark runs into many locked records and will wait for 10 seconds for the locks to be released. Setting this parameter to ALL or COMMITTED READ will increase the concurrency of the OLTP benchmark by avoiding much of these lock waits.
2. Prep Script Summary:
The following is a comparison of what each of the top 4 contestants included in the “prep” file – the SQL executed immediately before the timed run was started:
Contestant #1 |
Alter OLTP tables to page level locking |
Contestant #2 |
Alter OLTP tables to row level locking |
Contestant #3 |
Increase next extent size of bills, oorder, history, new_order tables |
Contestant #4 |
Set all OLTP and batch tables to row level locking |
3. Batch SQL Changes Summary:
The following is a comparison of the changes made to the batch SQL:
Baseline |
Select row from bill_product_cycle table (1 row table) |
Contestant #1 |
Set PDQPRIORITY 100 |
Contestant #2 |
Select row from bill_product_cycle table (1 row table) |
Contestant #3 |
Select row from bill_product_cycle table (1 row table) |
Contestant #4 |
Set lock mode to wait 10 |
Three of the four winners modified the insert statement to use a case statement to set the values of product_discount and total_bill. This eliminated the need to execute the two updates of the bills table. With each iteration of the batch SQL, the bills table grew by just over 100,000 records, so while the insert statement would run about the same speed each time, the update statements would slow over time.
The contestant who did not use a case statement, but executed the updates of the bills table created an index on the bills table (product_number, customer_number). An index on the bills table was critical to prevent a scan of this table with the two updates. Submissions that performed the update on the bills table without first creating an index on customer_number at the very least would have failed to complete even a single pass of the batch SQL.
Two of the four winners eliminated the update of the bill_customer table by creating an insert trigger on the bills table to perform the update as each new bill record was inserted. The trigger prevented the need to evaluate the filters on the separate updates as the bill record was already the current record and was cached. Of the two contestants above who used a trigger, one placed a condition within the trigger SQL to only perform the update if the amount being updated was greater than zero:
create trigger upd_bc insert on bills referencing NEW as newbill for each row when (newbill.total_bill > 0) ( update bill_customer set balance_due = balance_due + newbill.total_bill where bill_customer.customer_number = newbill.customer_number );
About 40% of the bills records had a bill amount of 0, so introducing the condition resulted in fewer updates and slightly improved performance.
The contestant who ran the fastest batch run was the only person out of the 4 winners who set PDQ in the batch run. PDQ on its own did not have a significant effect on the execution times, and can sometimes produce a different query plan that may not be optimal if there are insufficient resources. However this entry included an insert statement that had been tuned by eliminating the subquery on the bill_product_cycle table and used an optimizer directive to force a HASH JOIN between the bill_product and bill_customer table. Using hash joins together with PDQ can dramatically improve performance when joining large tables together and returning a lot of rows. With a hash join, a hash table is created from one of the tables. Then the same hash key is applied to records in the other table and compared against the hash table. Using hash joins with PDQ allows much of this process to be performed in memory. Using PDQ will have also helped with the scans of the bills and bill_customer tables where the summary information would have been performed in memory rather than temp space.
Contestant #3 compressed the bills and bill_customer tables. Compressed tables take up less space on disk and, more importantly for this contest, less space in the buffer pool, leaving more memory free to cache pages from other tables. Without compression, the bill_customer table uses approximately 100,000 pages on disk. With compression, this same table consumes only 16,000 pages. This table is used a lot within the batch SQL, including a full scan with each run of the SQL, so a performance gain would be achieved by having this table fully cached. With the competition for the limited buffer pool from the growing bills table and the heavily used OLTP tables, a smaller footprint is an advantage. There is an overhead associated with compressing and uncompressing the pages, but this is often offset by the reduced disk I/O and improved caching.
Although outside of this contest, I performed some benchmarking of the batch SQL on its own, without the OLTP benchmark running, to compare the performance of the compressed and uncompressed tables. I discovered that the insert into the bills table was approximately 20% faster with compression, however the scans of the bills and bill_customer tables (used to get the row counts and totals) were approximately 50% slower with compression. Where compression would have helped substantially would be if the batch process was allowed to run continually. After about 25 iterations, the bills table would have grown to several million records and the buffer pool would no longer be able to hold all of the uncompressed pages for the bills tables and would have to start reading pages in from disk. The scan of the bills table jumped from about 4 seconds to over 30 seconds, and after 4 more runs has increased to over 45 seconds. With the compressed tables, the bills table is still able to fit comfortably into the buffer pool after 25 iterations, and although the scan time increases each time, after even 30 iterations it was still under 10 seconds.
The increased efficiency of the buffer pool usage with the compressed tables is likely a major reason for why this contestant achieved the highest number of OLTP transactions.