[Openmp-commits] [PATCH] D105699: [libomptarget][devicertl] Remove branches around setting parallelLevel

Fri Jul 9 09:40:43 PDT 2021

JonChesterfield added a comment.

It cleans up the IR a lot but performance change is in the noise on amdgpu. I suspect our main bottlenecks are elsewhere. I'd be interested to hear how it changes a recent nvptx card.

My working theory is that once all the branches that are introduced by the openmp runtime have been optimised out, the IR will look much like it does under cuda and perform much the same, except for overheads in the host runtime.

Two options to avoid the race are to take only the first transform, which doesn't currently fold the loads but might do after some other optimisations improve, or to use a relaxed atomic store to make the race well defined (and probably emit exactly the same ISA).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105699/new/

https://reviews.llvm.org/D105699