27.5.13

UUID.randomUUID() is slow!

When I monitored the performance of my Spring Integration application, I found out that a low of threads were blocked, according to VisualVM. My application is simple, so I replaced the whole place that I applied with synchronized keyword and tested it again. It didn't improve anything. So I generated the jstack result. Funny enough, it seemed Spring Integration itself was blocked by some kinds of lock from the Java Fundamental Library.

  java.lang.Thread.State: BLOCKED (on object monitor)
        at java.security.SecureRandom.nextBytes(SecureRandom.java:455)
        - waiting to lock <0x00002aaac2aa3580> (a java.security.SecureRandom)
        at java.util.UUID.randomUUID(UUID.java:145)
        at org.springframework.integration.MessageHeaders.<init>(MessageHeaders.java:101)
        at org.springframework.integration.message.GenericMessage.<init>(GenericMessage.java:68)
        at org.springframework.integration.support.MessageBuilder.build(MessageBuilder.java:298)
        at org.springframework.integration.transformer.HeaderEnricher.transform(HeaderEnricher.java:119)
        at org.springframework.integration.transformer.MessageTransformingHandler.handleRequestMessage(MessageTransformingHandler.java:67)
        at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:134)
        at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:73)
        at org.springframework.integration.handler.MessageHandlerChain$1.send(MessageHandlerChain.java:154)
        at org.springframework.integration.core.MessagingTemplate.doSend(MessagingTemplate.java:288)
        at org.springframework.integration.core.MessagingTemplate.send(MessagingTemplate.java:149)

By using Google, I found this discussion:


Performance Test Result for the Default UUID Generator

The default UUID generator has a bad performance. But how bad it is? I tried to generate 1 billion (1024 * 1024 * 1024) UUIDs in a single thread with my Windows PC (i7 2620M), it took about 450 seconds to do so. It took about 1 millisecond to generate 2 K UUID, and 2 seconds for 2 M.

Since it uses blocked strategy, it looks bad in multithreading environment. I got the following monitoring picture when I was trying to use 10 threads to generate these 1 billion UUIDs.


It tooks 800 seconds to finish the work.

The One with Best Performance in a Single Thread but Worse Performance with Heavy Work Load in Multi-Threads

This is an implementation that I got very good performance when running in a Single thread, and maybe with a light load.

The tricky part is this:

boolean found = false;
long n = 0;
int c = 0;
  
while(!found)
{
 c = counter.get();   
 n = lastNano;
   
 if( c < MASK)
 {
  //claim the value
  //
  found = counter.compareAndSet(c, c + 1); //<1>
  continue;
 }
 else
 {
  if(counter.compareAndSet(c, 0))  //<2>
  {
   n = System.nanoTime();  
   if(n < lastNano)
    n = lastNano++; 
   else
    lastNano = n;
   
   c = 0;
   found = true;
  }
 }   
}

This is actually an IMPROVED solution, but apparently, it is not improved as much as it was expected. I believed System.nanoTime() would be slightly heavy. It took me 15 seconds to call System.nanoTime() for 1 billion times. So I thought I don't really need to call System.nanoTime() every time. For a certain amount of iterations, I just need to increase a counter with the same nanoTime, then I will get different IDs. 

However, I need to update the nano value when the counter runs out. This is a typical synchronization issue. To avoid performance issue, I tried to implement it in the none-blocking way. So the counter is designed as the synchronization guard, it claims the current stored nano value, re-check the counter to finally assure its ownership, which was implemented at <1>. If the counter reach its capacity, only one of the current thread could update the nano value, while others need to retry, which was implemented at <2>.

It doesn't look bad at all. In a single thread, it took me about 20 seconds to generate 1 billion items. However, if I starts 10 threads to heavily generate the items, a lot of retried happened at <1>, which ruined its performance. I couldn't even finished my test.

Even though I still believe this is not a very bad idea, because nobody would just starts a lot of thread and do nothing but generate UUIDs, I don't want this solution to work in my project, because System.nanoTime() is not that heavy.

A Workable Solution

This is a workable solution. In a test with Single thread, it took me 60 seconds to generate 1 billion items. Since a pure System.nanoTime() call will take 15 seconds for 1 billion times, this solution is fast. And it is 85% faster then the JDK version. When I tried it with 10 threads, it took 100 seconds to finish its work with monitoring (JVisualVM). It fully utilized my CPUs, while the default version only consumed 60% to 70% CPU utilization. No blocking issue at all.

The bad thing with this solution is, if it works in you PC, you can't barely move your mouse.

No comments: