OpenMP 6.0 Hackernews Viewer - by Brendan Jarvis

OpenMP 6.0

107 points by mshachkov 14 November 2024 | 12 comments

Comments

phkahler 14 November 2024

OpenMP is one of the easiest ways to make existing code run across CPU cores. In the simplest cases you simply add a single #pragma to C code and it goes N times faster. This is when you're running a function in a loop with no side effects. Some examples I've done:

1) ray tracing. Looping over all the pixels in an image using ray tracing to determine the color of each pixel. The algorithm and data structures are complex but don't change during the rendering. N cores is about N times as fast.

2) in Solvespace we had a small loop which calls a tessellation function on a bunch of NURBS surfaces. The function was appending triangles to a list, so I made a thread-local list for each call and combined them after to avoid writes to shared data structure. Again N times faster with very little effort.

The code is also fine to build single threaded without change if you don't have OpenMP. Your compiler will just ignore the #pragmas.

fxj 14 November 2024

You can now (already in OpenMP5) use it to write GPU programs. Intels OneAPI uses OpenMP 5.5 to write programs for the Intel PonteVecchio GPUs which are on par to the Nvidia A100.

https://www.intel.com/content/www/us/en/docs/oneapi/optimiza...

gcc also provides support for NVidia and AMD GPUs

https://gcc.gnu.org/wiki/Offloading

Here is an example how you can use openmp for running a kernel on a nvidia A100:

https://people.montefiore.uliege.be/geuzaine/INFO0939/notes/...

  #include <stdlib.h>
  #include <stdio.h>
  #include <omp.h>

  void saxpy(int n, float a, float *x, float *y) {
  double elapsed = -1.0 \* omp_get_wtime();

  // We don't need to map the variable a as scalars are firstprivate by default
  #pragma omp target teams distribute parallel for map(to:x[0:n]) map(tofrom:y[0:n])
  for(int i = 0; i < n; i++) {
    y[i] = a * x[i] + y[i];
  }

  elapsed += omp_get_wtime();
  printf("saxpy done in %6.3lf seconds.\n", elapsed);
  }

  int main() {
  int n = 2000000;
  float *x = (float*) malloc(n*sizeof(float));
  float *y = (float*) malloc(n*sizeof(float));
  float alpha = 2.0;

  #pragma omp parallel for
  for (int i = 0; i < n; i++) {
     x[i] = 1;
     y[i] = i;
  }

  saxpy(n, alpha, x, y);

  free(x);
  free(y);

  return 0;
  }

Conscat 14 November 2024

OpenMP was pivotal to my last workplace, but because some customers required MSVC, we barely had support for OpenMP 2.0.

pornel 15 November 2024

I've used it a while ago, but got burned by very uneven support across compilers — MSVC required special tweaks, and old GCC would create crashy code without warning.

It was okay for basic embarrassingly parallel for loops. I ended up not using any more advanced features, because apart from even worse compiler support, non-trivial multi-threading in C without any safeguards is just too easy to mess up.

dsp_person 14 November 2024

I was just googling to see if there's any Emscripten/WASM implementation of OpenMP. The emscripten github issue [1] has a link to this "simpleomp" [2][3] where

> In ncnn project, we implement a minimal openmp runtime for webassembly target

> It only works for #pragma omp parallel for num_threads(N)

[1] https://github.com/emscripten-core/emscripten/issues/13892

[2] https://github.com/Tencent/ncnn/blob/master/src/simpleomp.h

[3] https://github.com/Tencent/ncnn/blob/master/src/simpleomp.cp...