OpenMP 6.0

(openmp.org)

Comments

phkahler 6 hours ago
OpenMP is one of the easiest ways to make existing code run across CPU cores. In the simplest cases you simply add a single #pragma to C code and it goes N times faster. This is when you're running a function in a loop with no side effects. Some examples I've done:

1) ray tracing. Looping over all the pixels in an image using ray tracing to determine the color of each pixel. The algorithm and data structures are complex but don't change during the rendering. N cores is about N times as fast.

2) in Solvespace we had a small loop which calls a tessellation function on a bunch of NURBS surfaces. The function was appending triangles to a list, so I made a thread-local list for each call and combined them after to avoid writes to shared data structure. Again N times faster with very little effort.

The code is also fine to build single threaded without change if you don't have OpenMP. Your compiler will just ignore the #pragmas.

fxj 4 hours ago
You can now (already in OpenMP5) use it to write GPU programs. Intels OneAPI uses OpenMP 5.5 to write programs for the Intel PonteVecchio GPUs which are on par to the Nvidia A100.

https://www.intel.com/content/www/us/en/docs/oneapi/optimiza...

gcc also provides support for NVidia and AMD GPUs

https://gcc.gnu.org/wiki/Offloading

Here is an example how you can use openmp for running a kernel on a nvidia A100:

https://people.montefiore.uliege.be/geuzaine/INFO0939/notes/...

  #include <stdlib.h>
  #include <stdio.h>
  #include <omp.h>

  void saxpy(int n, float a, float *x, float *y) {
  double elapsed = -1.0 \* omp_get_wtime();

  // We don't need to map the variable a as scalars are firstprivate by default
  #pragma omp target teams distribute parallel for map(to:x[0:n]) map(tofrom:y[0:n])
  for(int i = 0; i < n; i++) {
    y[i] = a * x[i] + y[i];
  }

  elapsed += omp_get_wtime();
  printf("saxpy done in %6.3lf seconds.\n", elapsed);
  }

  int main() {
  int n = 2000000;
  float *x = (float*) malloc(n*sizeof(float));
  float *y = (float*) malloc(n*sizeof(float));
  float alpha = 2.0;

  #pragma omp parallel for
  for (int i = 0; i < n; i++) {
     x[i] = 1;
     y[i] = i;
  }

  saxpy(n, alpha, x, y);

  free(x);
  free(y);

  return 0;
  }
Conscat 6 hours ago
OpenMP was pivotal to my last workplace, but because some customers required MSVC, we barely had support for OpenMP 2.0.
dsp_person 6 hours ago
I was just googling to see if there's any Emscripten/WASM implementation of OpenMP. The emscripten github issue [1] has a link to this "simpleomp" [2][3] where

> In ncnn project, we implement a minimal openmp runtime for webassembly target

> It only works for #pragma omp parallel for num_threads(N)

[1] https://github.com/emscripten-core/emscripten/issues/13892

[2] https://github.com/Tencent/ncnn/blob/master/src/simpleomp.h

[3] https://github.com/Tencent/ncnn/blob/master/src/simpleomp.cp...