[LLVMdev] Optimization on Atomics (and the OpenMP memory model)

Fri Apr 10 10:12:06 PDT 2015

Hi everyone,

The OpenMP standards committee has begun work to formalize their memory model, and define its relationship to the C/C++ memory models. A questionnaire has been put together (pasted below), and I'd like everyone's help in composing detailed answers to inform their decision-making process. While our OpenMP support is still in active development, many of these questions apply equally to C/C++ atomics, and a lot of work has certainly been done here on that front.

* Which processor architectures does your compiler target (e.g. x86, Power, ARM, ARM v8, Xeon Phi, Nvidia GPUs, etc.)?
    [I'll just answer "yes" for that one ;)]
* What is a flush lowered to in assembly for each of the supported architectures? For instance, a flush might be implemented as an MFENCE on the x86 architecture in some compilers.
* What are non-seq_cst atomic read, write, update and capture lowered to for each of your targets?
* What are seq_cst atomic read, write, update and capture lowered to for each of your targets?
* What is the taskwait construct lowered to for each of your targets?
* What are omp_set_lock and omp_unset_lock lowered to for each of your targets?
* What is a barrier lowered to for each of your targets?
* Are any optimisations allowed to reorder, change or remove code that uses any of the synchronisation constructs above, or any of the other synchronisation constructs in section 2.12 of the OpenMP 4.0 specification?

I'll be happy to collate answers to send back to the committee; please provide as much feedback as you can.

Thanks in advance,
Hal

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory