OpenMP is one of the easiest ways to make existing code run across CPU cores. In the simplest cases you simply add a single #pragma to C code and it goes N times faster. This is when you're running a function in a loop with no side effects. Some examples I've done:
1) ray tracing. Looping over all the pixels in an image using ray tracing to determine the color of each pixel. The algorithm and data structures are complex but don't change during the rendering. N cores is about N times as fast.
2) in Solvespace we had a small loop which calls a tessellation function on a bunch of NURBS surfaces. The function was appending triangles to a list, so I made a thread-local list for each call and combined them after to avoid writes to shared data structure. Again N times faster with very little effort.
The code is also fine to build single threaded without change if you don't have OpenMP. Your compiler will just ignore the #pragmas.
You can now (already in OpenMP5) use it to write GPU programs. Intels OneAPI uses OpenMP 5.5 to write programs for the Intel PonteVecchio GPUs which are on par to the Nvidia A100.
I was just googling to see if there's any Emscripten/WASM implementation of OpenMP. The emscripten github issue [1] has a link to this "simpleomp" [2][3] where
> In ncnn project, we implement a minimal openmp runtime for webassembly target
> It only works for #pragma omp parallel for num_threads(N)
OpenMP 6.0
(openmp.org)83 points by mshachkov 7 hours ago | 9 comments
Comments
1) ray tracing. Looping over all the pixels in an image using ray tracing to determine the color of each pixel. The algorithm and data structures are complex but don't change during the rendering. N cores is about N times as fast.
2) in Solvespace we had a small loop which calls a tessellation function on a bunch of NURBS surfaces. The function was appending triangles to a list, so I made a thread-local list for each call and combined them after to avoid writes to shared data structure. Again N times faster with very little effort.
The code is also fine to build single threaded without change if you don't have OpenMP. Your compiler will just ignore the #pragmas.
https://www.intel.com/content/www/us/en/docs/oneapi/optimiza...
gcc also provides support for NVidia and AMD GPUs
https://gcc.gnu.org/wiki/Offloading
Here is an example how you can use openmp for running a kernel on a nvidia A100:
https://people.montefiore.uliege.be/geuzaine/INFO0939/notes/...
> In ncnn project, we implement a minimal openmp runtime for webassembly target
> It only works for #pragma omp parallel for num_threads(N)
[1] https://github.com/emscripten-core/emscripten/issues/13892
[2] https://github.com/Tencent/ncnn/blob/master/src/simpleomp.h
[3] https://github.com/Tencent/ncnn/blob/master/src/simpleomp.cp...