[Openmp-commits] [PATCH] D105699: [libomptarget][devicertl] Remove branches around setting parallelLevel
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Jul 9 09:40:43 PDT 2021
JonChesterfield added a comment.
It cleans up the IR a lot but performance change is in the noise on amdgpu. I suspect our main bottlenecks are elsewhere. I'd be interested to hear how it changes a recent nvptx card.
My working theory is that once all the branches that are introduced by the openmp runtime have been optimised out, the IR will look much like it does under cuda and perform much the same, except for overheads in the host runtime.
Two options to avoid the race are to take only the first transform, which doesn't currently fold the loads but might do after some other optimisations improve, or to use a relaxed atomic store to make the race well defined (and probably emit exactly the same ISA).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105699/new/
https://reviews.llvm.org/D105699
More information about the Openmp-commits
mailing list