Lately I have been busy working on optimizing the application I work on. The core of the application is written in C. When faced with performance problem there are two ways to go about tackling it, a) fix the algorithm such that it takes less time, b) make the algorithm multi-threaded.
In our case it was not possible to use option a as the algorithm was already very well optimized. Going with option b was the only choice. Making a code multi-threaded requires some prerequisites, i.e. the code should be such that it can be broken into pieces and executed without any dependency and the multi-threading overhead of creating and destroying threads should not undermine the performance.
For C programs due to the procedural nature having a control over threads using a thread pool is very difficult. This is where OpenMP saves the day. OpenMP has a brilliant thread pooling mechanism which has absolutely no overhead, there by providing superb performance even for very short lived multi-threaded pieces of code. For example if you have a loop which runs over a certain array elements in 100 ms in sequential manner. Using OpenMP you can reduce the time to 40 ms. In any other threading implementation the overhead of creating and destroying threads will itself take more than 100 ms.
Another great thing about OpenMP is that it provides functionality for the most commonly needed synchronization tools by using directives like reduction, firstprivate, lastprivate etc..
Not to forget, to use OpenMP you normally don’t need to modify your code at all, just adding a line of #pragma does the work. So you can comfortably switch between serial and parallel code by using a compiler switch.