19.2.14

Quick Performance Test for Vertica on IEEE Floating Point Format Compression/Decompression Performance

I didn't think about it in this way before. But today I did some test and finally hit the performance issue. Surprisingly, it is not about I/O, but CPU.

I am planning to do a standard deviation calculation on some 4 million data item. Each data item has 50 columns, and 4,032 samples. So basically it need to handle 200 million data item, echo of which has about 4K samples.

I began my experiment with this and found out that it took me 39 minutes to finish the calculation. What? This calculation is supposed to run every 1 hour, and I probably going to handle 100 million data item, and possibly 50 K samples.

I suspected this was because Vertica has problem not to use enough memory to cache the intermediate result. I got more than 100G RAM on each of the Vertica node, but it only used 12G for the calculation. However, after I did a simple match, I found out it 12G was all it needed.

To verify this, I did the following two tests.

Test 2a: do the calculation on only one column. It took 3 minutes.
Test 2b: do the calculation on all columns but don't distinguish the data items. Which means the results will be 50 double values, instead of 200 millions values. It took 27 minutes to finish.

Obviously, this was not about how to use memory, but about traversing the data. When I looked into the result on Linux top command, it clearly showed me that CPU has big work load, which made the system load up to 30 or even 40 sometimes. My system has 24 CPU/cores, system load as 30 means full workload on the CPU/cores.



No comments: