[Openmp-commits] [PATCH] D105699: [libomptarget][devicertl] Remove branches around setting parallelLevel
    Jon Chesterfield via Phabricator via Openmp-commits 
    openmp-commits at lists.llvm.org
       
    Fri Jul  9 09:40:43 PDT 2021
    
    
  
JonChesterfield added a comment.
It cleans up the IR a lot but performance change is in the noise on amdgpu. I suspect our main bottlenecks are elsewhere. I'd be interested to hear how it changes a recent nvptx card.
My working theory is that once all the branches that are introduced by the openmp runtime have been optimised out, the IR will look much like it does under cuda and perform much the same, except for overheads in the host runtime.
Two options to avoid the race are to take only the first transform, which doesn't currently fold the loads but might do after some other optimisations improve, or to use a relaxed atomic store to make the race well defined (and probably emit exactly the same ISA).
Repository:
  rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105699/new/
https://reviews.llvm.org/D105699
    
    
More information about the Openmp-commits
mailing list