[Openmp-commits] [openmp] r302929 - Clang-format and whitespace cleanup of source code

Jonathan Peyton via Openmp-commits openmp-commits at lists.llvm.org
Fri May 12 11:01:35 PDT 2017


Modified: openmp/trunk/runtime/src/kmp_atomic.cpp
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_atomic.cpp?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_atomic.cpp (original)
+++ openmp/trunk/runtime/src/kmp_atomic.cpp Fri May 12 13:01:32 2017
@@ -14,17 +14,19 @@
 
 
 #include "kmp_atomic.h"
-#include "kmp.h"                  // TRUE, asm routines prototypes
+#include "kmp.h" // TRUE, asm routines prototypes
 
 typedef unsigned char uchar;
 typedef unsigned short ushort;
 
 /*!
 @defgroup ATOMIC_OPS Atomic Operations
-These functions are used for implementing the many different varieties of atomic operations.
+These functions are used for implementing the many different varieties of atomic
+operations.
 
-The compiler is at liberty to inline atomic operations that are naturally supported
-by the target architecture. For instance on IA-32 architecture an atomic like this can be inlined
+The compiler is at liberty to inline atomic operations that are naturally
+supported by the target architecture. For instance on IA-32 architecture an
+atomic like this can be inlined
 @code
 static int s = 0;
 #pragma omp atomic
@@ -32,11 +34,12 @@ static int s = 0;
 @endcode
 using the single instruction: `lock; incl s`
 
-However the runtime does provide entrypoints for these operations to support compilers that choose
-not to inline them. (For instance, `__kmpc_atomic_fixed4_add` could be used to perform the
-increment above.)
+However the runtime does provide entrypoints for these operations to support
+compilers that choose not to inline them. (For instance,
+`__kmpc_atomic_fixed4_add` could be used to perform the increment above.)
 
-The names of the functions are encoded by using the data type name and the operation name, as in these tables.
+The names of the functions are encoded by using the data type name and the
+operation name, as in these tables.
 
 Data Type  | Data type encoding
 -----------|---------------
@@ -75,14 +78,17 @@ minimum | min
 .neqv.  | neqv
 
 <br>
-For non-commutative operations, `_rev` can also be added for the reversed operation.
-For the functions that capture the result, the suffix `_cpt` is added.
+For non-commutative operations, `_rev` can also be added for the reversed
+operation. For the functions that capture the result, the suffix `_cpt` is
+added.
 
 Update Functions
 ================
-The general form of an atomic function that just performs an update (without a `capture`)
+The general form of an atomic function that just performs an update (without a
+`capture`)
 @code
-void __kmpc_atomic_<datatype>_<operation>( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs );
+void __kmpc_atomic_<datatype>_<operation>( ident_t *id_ref, int gtid, TYPE *
+lhs, TYPE rhs );
 @endcode
 @param ident_t  a pointer to source location
 @param gtid  the global thread id
@@ -91,32 +97,36 @@ void __kmpc_atomic_<datatype>_<operation
 
 `capture` functions
 ===================
-The capture functions perform an atomic update and return a result, which is either the value
-before the capture, or that after. They take an additional argument to determine which result is returned.
+The capture functions perform an atomic update and return a result, which is
+either the value before the capture, or that after. They take an additional
+argument to determine which result is returned.
 Their general form is therefore
 @code
-TYPE __kmpc_atomic_<datatype>_<operation>_cpt( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs, int flag );
+TYPE __kmpc_atomic_<datatype>_<operation>_cpt( ident_t *id_ref, int gtid, TYPE *
+lhs, TYPE rhs, int flag );
 @endcode
 @param ident_t  a pointer to source location
 @param gtid  the global thread id
 @param lhs   a pointer to the left operand
 @param rhs   the right operand
- at param flag  one if the result is to be captured *after* the operation, zero if captured *before*.
+ at param flag  one if the result is to be captured *after* the operation, zero if
+captured *before*.
 
-The one set of exceptions to this is the `complex<float>` type where the value is not returned,
-rather an extra argument pointer is passed.
+The one set of exceptions to this is the `complex<float>` type where the value
+is not returned, rather an extra argument pointer is passed.
 
 They look like
 @code
-void __kmpc_atomic_cmplx4_<op>_cpt(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag );
+void __kmpc_atomic_cmplx4_<op>_cpt(  ident_t *id_ref, int gtid, kmp_cmplx32 *
+lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag );
 @endcode
 
 Read and Write Operations
 =========================
-The OpenMP<sup>*</sup> standard now supports atomic operations that simply ensure that the
-value is read or written atomically, with no modification
-performed. In many cases on IA-32 architecture these operations can be inlined since
-the architecture guarantees that no tearing occurs on aligned objects
+The OpenMP<sup>*</sup> standard now supports atomic operations that simply
+ensure that the value is read or written atomically, with no modification
+performed. In many cases on IA-32 architecture these operations can be inlined
+since the architecture guarantees that no tearing occurs on aligned objects
 accessed with a single memory operation of up to 64 bits in size.
 
 The general form of the read operations is
@@ -126,7 +136,8 @@ TYPE __kmpc_atomic_<type>_rd ( ident_t *
 
 For the write operations the form is
 @code
-void __kmpc_atomic_<type>_wr ( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs );
+void __kmpc_atomic_<type>_wr ( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs
+);
 @endcode
 
 Full list of functions
@@ -135,7 +146,8 @@ This leads to the generation of 376 atom
 
 Functons for integers
 ---------------------
-There are versions here for integers of size 1,2,4 and 8 bytes both signed and unsigned (where that matters).
+There are versions here for integers of size 1,2,4 and 8 bytes both signed and
+unsigned (where that matters).
 @code
     __kmpc_atomic_fixed1_add
     __kmpc_atomic_fixed1_add_cpt
@@ -377,8 +389,8 @@ There are versions here for integers of
 
 Functions for floating point
 ----------------------------
-There are versions here for floating point numbers of size 4, 8, 10 and 16 bytes.
-(Ten byte floats are used by X87, but are now rare).
+There are versions here for floating point numbers of size 4, 8, 10 and 16
+bytes. (Ten byte floats are used by X87, but are now rare).
 @code
     __kmpc_atomic_float4_add
     __kmpc_atomic_float4_add_cpt
@@ -472,9 +484,10 @@ There are versions here for floating poi
 
 Functions for Complex types
 ---------------------------
-Functions for complex types whose component floating point variables are of size 4,8,10 or 16 bytes.
-The names here are based on the size of the component float, *not* the size of the complex type. So
-`__kmpc_atomc_cmplx8_add` is an operation on a `complex<double>` or `complex(kind=8)`, *not* `complex<float>`.
+Functions for complex types whose component floating point variables are of size
+4,8,10 or 16 bytes. The names here are based on the size of the component float,
+*not* the size of the complex type. So `__kmpc_atomc_cmplx8_add` is an operation
+on a `complex<double>` or `complex(kind=8)`, *not* `complex<float>`.
 
 @code
     __kmpc_atomic_cmplx4_add
@@ -553,104 +566,155 @@ The names here are based on the size of
  */
 
 #ifndef KMP_GOMP_COMPAT
-int __kmp_atomic_mode = 1;      // Intel perf
+int __kmp_atomic_mode = 1; // Intel perf
 #else
-int __kmp_atomic_mode = 2;      // GOMP compatibility
+int __kmp_atomic_mode = 2; // GOMP compatibility
 #endif /* KMP_GOMP_COMPAT */
 
 KMP_ALIGN(128)
 
-kmp_atomic_lock_t __kmp_atomic_lock;     /* Control access to all user coded atomics in Gnu compat mode   */
-kmp_atomic_lock_t __kmp_atomic_lock_1i;  /* Control access to all user coded atomics for 1-byte fixed data types */
-kmp_atomic_lock_t __kmp_atomic_lock_2i;  /* Control access to all user coded atomics for 2-byte fixed data types */
-kmp_atomic_lock_t __kmp_atomic_lock_4i;  /* Control access to all user coded atomics for 4-byte fixed data types */
-kmp_atomic_lock_t __kmp_atomic_lock_4r;  /* Control access to all user coded atomics for kmp_real32 data type    */
-kmp_atomic_lock_t __kmp_atomic_lock_8i;  /* Control access to all user coded atomics for 8-byte fixed data types */
-kmp_atomic_lock_t __kmp_atomic_lock_8r;  /* Control access to all user coded atomics for kmp_real64 data type    */
-kmp_atomic_lock_t __kmp_atomic_lock_8c;  /* Control access to all user coded atomics for complex byte data type  */
-kmp_atomic_lock_t __kmp_atomic_lock_10r; /* Control access to all user coded atomics for long double data type   */
-kmp_atomic_lock_t __kmp_atomic_lock_16r; /* Control access to all user coded atomics for _Quad data type         */
-kmp_atomic_lock_t __kmp_atomic_lock_16c; /* Control access to all user coded atomics for double complex data type*/
-kmp_atomic_lock_t __kmp_atomic_lock_20c; /* Control access to all user coded atomics for long double complex type*/
-kmp_atomic_lock_t __kmp_atomic_lock_32c; /* Control access to all user coded atomics for _Quad complex data type */
-
-
-/*
-  2007-03-02:
-  Without "volatile" specifier in OP_CMPXCHG and MIN_MAX_CMPXCHG we have a
-  bug on *_32 and *_32e. This is just a temporary workaround for the problem.
-  It seems the right solution is writing OP_CMPXCHG and MIN_MAX_CMPXCHG
-  routines in assembler language.
-*/
+// Control access to all user coded atomics in Gnu compat mode
+kmp_atomic_lock_t __kmp_atomic_lock;
+// Control access to all user coded atomics for 1-byte fixed data types
+kmp_atomic_lock_t __kmp_atomic_lock_1i;
+// Control access to all user coded atomics for 2-byte fixed data types
+kmp_atomic_lock_t __kmp_atomic_lock_2i;
+// Control access to all user coded atomics for 4-byte fixed data types
+kmp_atomic_lock_t __kmp_atomic_lock_4i;
+// Control access to all user coded atomics for kmp_real32 data type
+kmp_atomic_lock_t __kmp_atomic_lock_4r;
+// Control access to all user coded atomics for 8-byte fixed data types
+kmp_atomic_lock_t __kmp_atomic_lock_8i;
+// Control access to all user coded atomics for kmp_real64 data type
+kmp_atomic_lock_t __kmp_atomic_lock_8r;
+// Control access to all user coded atomics for complex byte data type
+kmp_atomic_lock_t __kmp_atomic_lock_8c;
+// Control access to all user coded atomics for long double data type
+kmp_atomic_lock_t __kmp_atomic_lock_10r;
+// Control access to all user coded atomics for _Quad data type
+kmp_atomic_lock_t __kmp_atomic_lock_16r;
+// Control access to all user coded atomics for double complex data type
+kmp_atomic_lock_t __kmp_atomic_lock_16c;
+// Control access to all user coded atomics for long double complex type
+kmp_atomic_lock_t __kmp_atomic_lock_20c;
+// Control access to all user coded atomics for _Quad complex data type
+kmp_atomic_lock_t __kmp_atomic_lock_32c;
+
+/* 2007-03-02:
+   Without "volatile" specifier in OP_CMPXCHG and MIN_MAX_CMPXCHG we have a bug
+   on *_32 and *_32e. This is just a temporary workaround for the problem. It
+   seems the right solution is writing OP_CMPXCHG and MIN_MAX_CMPXCHG routines
+   in assembler language. */
 #define KMP_ATOMIC_VOLATILE volatile
 
-#if ( KMP_ARCH_X86 ) && KMP_HAVE_QUAD
+#if (KMP_ARCH_X86) && KMP_HAVE_QUAD
 
-    static inline void operator +=( Quad_a4_t & lhs, Quad_a4_t & rhs ) { lhs.q += rhs.q; };
-    static inline void operator -=( Quad_a4_t & lhs, Quad_a4_t & rhs ) { lhs.q -= rhs.q; };
-    static inline void operator *=( Quad_a4_t & lhs, Quad_a4_t & rhs ) { lhs.q *= rhs.q; };
-    static inline void operator /=( Quad_a4_t & lhs, Quad_a4_t & rhs ) { lhs.q /= rhs.q; };
-    static inline bool operator < ( Quad_a4_t & lhs, Quad_a4_t & rhs ) { return lhs.q < rhs.q; }
-    static inline bool operator > ( Quad_a4_t & lhs, Quad_a4_t & rhs ) { return lhs.q > rhs.q; }
-
-    static inline void operator +=( Quad_a16_t & lhs, Quad_a16_t & rhs ) { lhs.q += rhs.q; };
-    static inline void operator -=( Quad_a16_t & lhs, Quad_a16_t & rhs ) { lhs.q -= rhs.q; };
-    static inline void operator *=( Quad_a16_t & lhs, Quad_a16_t & rhs ) { lhs.q *= rhs.q; };
-    static inline void operator /=( Quad_a16_t & lhs, Quad_a16_t & rhs ) { lhs.q /= rhs.q; };
-    static inline bool operator < ( Quad_a16_t & lhs, Quad_a16_t & rhs ) { return lhs.q < rhs.q; }
-    static inline bool operator > ( Quad_a16_t & lhs, Quad_a16_t & rhs ) { return lhs.q > rhs.q; }
-
-    static inline void operator +=( kmp_cmplx128_a4_t & lhs, kmp_cmplx128_a4_t & rhs ) { lhs.q += rhs.q; };
-    static inline void operator -=( kmp_cmplx128_a4_t & lhs, kmp_cmplx128_a4_t & rhs ) { lhs.q -= rhs.q; };
-    static inline void operator *=( kmp_cmplx128_a4_t & lhs, kmp_cmplx128_a4_t & rhs ) { lhs.q *= rhs.q; };
-    static inline void operator /=( kmp_cmplx128_a4_t & lhs, kmp_cmplx128_a4_t & rhs ) { lhs.q /= rhs.q; };
-
-    static inline void operator +=( kmp_cmplx128_a16_t & lhs, kmp_cmplx128_a16_t & rhs ) { lhs.q += rhs.q; };
-    static inline void operator -=( kmp_cmplx128_a16_t & lhs, kmp_cmplx128_a16_t & rhs ) { lhs.q -= rhs.q; };
-    static inline void operator *=( kmp_cmplx128_a16_t & lhs, kmp_cmplx128_a16_t & rhs ) { lhs.q *= rhs.q; };
-    static inline void operator /=( kmp_cmplx128_a16_t & lhs, kmp_cmplx128_a16_t & rhs ) { lhs.q /= rhs.q; };
+static inline void operator+=(Quad_a4_t &lhs, Quad_a4_t &rhs) {
+  lhs.q += rhs.q;
+};
+static inline void operator-=(Quad_a4_t &lhs, Quad_a4_t &rhs) {
+  lhs.q -= rhs.q;
+};
+static inline void operator*=(Quad_a4_t &lhs, Quad_a4_t &rhs) {
+  lhs.q *= rhs.q;
+};
+static inline void operator/=(Quad_a4_t &lhs, Quad_a4_t &rhs) {
+  lhs.q /= rhs.q;
+};
+static inline bool operator<(Quad_a4_t &lhs, Quad_a4_t &rhs) {
+  return lhs.q < rhs.q;
+}
+static inline bool operator>(Quad_a4_t &lhs, Quad_a4_t &rhs) {
+  return lhs.q > rhs.q;
+}
+
+static inline void operator+=(Quad_a16_t &lhs, Quad_a16_t &rhs) {
+  lhs.q += rhs.q;
+};
+static inline void operator-=(Quad_a16_t &lhs, Quad_a16_t &rhs) {
+  lhs.q -= rhs.q;
+};
+static inline void operator*=(Quad_a16_t &lhs, Quad_a16_t &rhs) {
+  lhs.q *= rhs.q;
+};
+static inline void operator/=(Quad_a16_t &lhs, Quad_a16_t &rhs) {
+  lhs.q /= rhs.q;
+};
+static inline bool operator<(Quad_a16_t &lhs, Quad_a16_t &rhs) {
+  return lhs.q < rhs.q;
+}
+static inline bool operator>(Quad_a16_t &lhs, Quad_a16_t &rhs) {
+  return lhs.q > rhs.q;
+}
+
+static inline void operator+=(kmp_cmplx128_a4_t &lhs, kmp_cmplx128_a4_t &rhs) {
+  lhs.q += rhs.q;
+};
+static inline void operator-=(kmp_cmplx128_a4_t &lhs, kmp_cmplx128_a4_t &rhs) {
+  lhs.q -= rhs.q;
+};
+static inline void operator*=(kmp_cmplx128_a4_t &lhs, kmp_cmplx128_a4_t &rhs) {
+  lhs.q *= rhs.q;
+};
+static inline void operator/=(kmp_cmplx128_a4_t &lhs, kmp_cmplx128_a4_t &rhs) {
+  lhs.q /= rhs.q;
+};
+
+static inline void operator+=(kmp_cmplx128_a16_t &lhs,
+                              kmp_cmplx128_a16_t &rhs) {
+  lhs.q += rhs.q;
+};
+static inline void operator-=(kmp_cmplx128_a16_t &lhs,
+                              kmp_cmplx128_a16_t &rhs) {
+  lhs.q -= rhs.q;
+};
+static inline void operator*=(kmp_cmplx128_a16_t &lhs,
+                              kmp_cmplx128_a16_t &rhs) {
+  lhs.q *= rhs.q;
+};
+static inline void operator/=(kmp_cmplx128_a16_t &lhs,
+                              kmp_cmplx128_a16_t &rhs) {
+  lhs.q /= rhs.q;
+};
 
 #endif
 
-/* ------------------------------------------------------------------------ */
-/* ATOMIC implementation routines                                           */
-/* one routine for each operation and operand type                          */
-/* ------------------------------------------------------------------------ */
-
+// ATOMIC implementation routines -----------------------------------------
+// One routine for each operation and operand type.
 // All routines declarations looks like
 // void __kmpc_atomic_RTYPE_OP( ident_t*, int, TYPE *lhs, TYPE rhs );
-// ------------------------------------------------------------------------
 
-#define KMP_CHECK_GTID                                                    \
-    if ( gtid == KMP_GTID_UNKNOWN ) {                                     \
-        gtid = __kmp_entry_gtid();                                        \
-    } // check and get gtid when needed
+#define KMP_CHECK_GTID                                                         \
+  if (gtid == KMP_GTID_UNKNOWN) {                                              \
+    gtid = __kmp_entry_gtid();                                                 \
+  } // check and get gtid when needed
 
 // Beginning of a definition (provides name, parameters, gebug trace)
-//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned fixed)
+//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned
+//     fixed)
 //     OP_ID   - operation identifier (add, sub, mul, ...)
 //     TYPE    - operands' type
-#define ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE, RET_TYPE) \
-RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs ) \
-{                                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid ));
+#define ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, RET_TYPE)                           \
+  RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID(ident_t *id_ref, int gtid,        \
+                                             TYPE *lhs, TYPE rhs) {            \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid));
 
 // ------------------------------------------------------------------------
 // Lock variables used for critical sections for various size operands
-#define ATOMIC_LOCK0   __kmp_atomic_lock       // all types, for Gnu compat
-#define ATOMIC_LOCK1i  __kmp_atomic_lock_1i    // char
-#define ATOMIC_LOCK2i  __kmp_atomic_lock_2i    // short
-#define ATOMIC_LOCK4i  __kmp_atomic_lock_4i    // long int
-#define ATOMIC_LOCK4r  __kmp_atomic_lock_4r    // float
-#define ATOMIC_LOCK8i  __kmp_atomic_lock_8i    // long long int
-#define ATOMIC_LOCK8r  __kmp_atomic_lock_8r    // double
-#define ATOMIC_LOCK8c  __kmp_atomic_lock_8c    // float complex
-#define ATOMIC_LOCK10r __kmp_atomic_lock_10r   // long double
-#define ATOMIC_LOCK16r __kmp_atomic_lock_16r   // _Quad
-#define ATOMIC_LOCK16c __kmp_atomic_lock_16c   // double complex
-#define ATOMIC_LOCK20c __kmp_atomic_lock_20c   // long double complex
-#define ATOMIC_LOCK32c __kmp_atomic_lock_32c   // _Quad complex
+#define ATOMIC_LOCK0 __kmp_atomic_lock // all types, for Gnu compat
+#define ATOMIC_LOCK1i __kmp_atomic_lock_1i // char
+#define ATOMIC_LOCK2i __kmp_atomic_lock_2i // short
+#define ATOMIC_LOCK4i __kmp_atomic_lock_4i // long int
+#define ATOMIC_LOCK4r __kmp_atomic_lock_4r // float
+#define ATOMIC_LOCK8i __kmp_atomic_lock_8i // long long int
+#define ATOMIC_LOCK8r __kmp_atomic_lock_8r // double
+#define ATOMIC_LOCK8c __kmp_atomic_lock_8c // float complex
+#define ATOMIC_LOCK10r __kmp_atomic_lock_10r // long double
+#define ATOMIC_LOCK16r __kmp_atomic_lock_16r // _Quad
+#define ATOMIC_LOCK16c __kmp_atomic_lock_16c // double complex
+#define ATOMIC_LOCK20c __kmp_atomic_lock_20c // long double complex
+#define ATOMIC_LOCK32c __kmp_atomic_lock_32c // _Quad complex
 
 // ------------------------------------------------------------------------
 // Operation on *lhs, rhs bound by critical section
@@ -658,12 +722,12 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //     LCK_ID - lock identifier
 // Note: don't check gtid as it should always be valid
 // 1, 2-byte - expect valid parameter, other - check before this macro
-#define OP_CRITICAL(OP,LCK_ID) \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                    \
-                                                                          \
-    (*lhs) OP (rhs);                                                      \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );
+#define OP_CRITICAL(OP, LCK_ID)                                                \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  (*lhs) OP(rhs);                                                              \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);
 
 // ------------------------------------------------------------------------
 // For GNU compatibility, we may need to use a critical section,
@@ -686,23 +750,22 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 // If FLAG is 0, then we are relying on dead code elimination by the build
 // compiler to get rid of the useless block of code, and save a needless
 // branch at runtime.
-//
 
 #ifdef KMP_GOMP_COMPAT
-# define OP_GOMP_CRITICAL(OP,FLAG)                                        \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL( OP, 0 );                                             \
-        return;                                                           \
-    }
-# else
-# define OP_GOMP_CRITICAL(OP,FLAG)
+#define OP_GOMP_CRITICAL(OP, FLAG)                                             \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(OP, 0);                                                        \
+    return;                                                                    \
+  }
+#else
+#define OP_GOMP_CRITICAL(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 #if KMP_MIC
-# define KMP_DO_PAUSE _mm_delay_32( 1 )
+#define KMP_DO_PAUSE _mm_delay_32(1)
 #else
-# define KMP_DO_PAUSE KMP_CPU_PAUSE()
+#define KMP_DO_PAUSE KMP_CPU_PAUSE()
 #endif /* KMP_MIC */
 
 // ------------------------------------------------------------------------
@@ -710,51 +773,48 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //     TYPE    - operands' type
 //     BITS    - size in bits, used to distinguish low level calls
 //     OP      - operator
-#define OP_CMPXCHG(TYPE,BITS,OP)                                          \
-    {                                                                     \
-        TYPE old_value, new_value;                                        \
-        old_value = *(TYPE volatile *)lhs;                                \
-        new_value = old_value OP rhs;                                     \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,         \
-                      *VOLATILE_CAST(kmp_int##BITS *) &new_value ) )      \
-        {                                                                 \
-                KMP_DO_PAUSE;                                             \
-                                                                          \
-            old_value = *(TYPE volatile *)lhs;                            \
-            new_value = old_value OP rhs;                                 \
-        }                                                                 \
-    }
+#define OP_CMPXCHG(TYPE, BITS, OP)                                             \
+  {                                                                            \
+    TYPE old_value, new_value;                                                 \
+    old_value = *(TYPE volatile *)lhs;                                         \
+    new_value = old_value OP rhs;                                              \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) & old_value,     \
+        *VOLATILE_CAST(kmp_int##BITS *) & new_value)) {                        \
+      KMP_DO_PAUSE;                                                            \
+                                                                               \
+      old_value = *(TYPE volatile *)lhs;                                       \
+      new_value = old_value OP rhs;                                            \
+    }                                                                          \
+  }
 
 #if USE_CMPXCHG_FIX
 // 2007-06-25:
-// workaround for C78287 (complex(kind=4) data type)
-// lin_32, lin_32e, win_32 and win_32e are affected (I verified the asm)
-// Compiler ignores the volatile qualifier of the temp_val in the OP_CMPXCHG macro.
-// This is a problem of the compiler.
-// Related tracker is C76005, targeted to 11.0.
-// I verified the asm of the workaround.
-#define OP_CMPXCHG_WORKAROUND(TYPE,BITS,OP)                               \
-    {                                                                     \
-	struct _sss {                                                     \
-	    TYPE            cmp;                                          \
-	    kmp_int##BITS   *vvv;                                         \
-	};                                                                \
-        struct _sss old_value, new_value;                                 \
-        old_value.vvv = ( kmp_int##BITS * )&old_value.cmp;                \
-        new_value.vvv = ( kmp_int##BITS * )&new_value.cmp;                \
-        *old_value.vvv = * ( volatile kmp_int##BITS * ) lhs;              \
-        new_value.cmp = old_value.cmp OP rhs;                             \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) old_value.vvv,      \
-                      *VOLATILE_CAST(kmp_int##BITS *) new_value.vvv ) )   \
-        {                                                                 \
-            KMP_DO_PAUSE;                                                 \
-                                                                          \
-	    *old_value.vvv = * ( volatile kmp_int##BITS * ) lhs;          \
-	    new_value.cmp = old_value.cmp OP rhs;                         \
-        }                                                                 \
-    }
+// workaround for C78287 (complex(kind=4) data type). lin_32, lin_32e, win_32
+// and win_32e are affected (I verified the asm). Compiler ignores the volatile
+// qualifier of the temp_val in the OP_CMPXCHG macro. This is a problem of the
+// compiler. Related tracker is C76005, targeted to 11.0. I verified the asm of
+// the workaround.
+#define OP_CMPXCHG_WORKAROUND(TYPE, BITS, OP)                                  \
+  {                                                                            \
+    struct _sss {                                                              \
+      TYPE cmp;                                                                \
+      kmp_int##BITS *vvv;                                                      \
+    };                                                                         \
+    struct _sss old_value, new_value;                                          \
+    old_value.vvv = (kmp_int##BITS *)&old_value.cmp;                           \
+    new_value.vvv = (kmp_int##BITS *)&new_value.cmp;                           \
+    *old_value.vvv = *(volatile kmp_int##BITS *)lhs;                           \
+    new_value.cmp = old_value.cmp OP rhs;                                      \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) old_value.vvv,   \
+        *VOLATILE_CAST(kmp_int##BITS *) new_value.vvv)) {                      \
+      KMP_DO_PAUSE;                                                            \
+                                                                               \
+      *old_value.vvv = *(volatile kmp_int##BITS *)lhs;                         \
+      new_value.cmp = old_value.cmp OP rhs;                                    \
+    }                                                                          \
+  }
 // end of the first part of the workaround for C78287
 #endif // USE_CMPXCHG_FIX
 
@@ -762,84 +822,98 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 
 // ------------------------------------------------------------------------
 // X86 or X86_64: no alignment problems ====================================
-#define ATOMIC_FIXED_ADD(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                      \
-    /* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */      \
-    KMP_TEST_THEN_ADD##BITS( lhs, OP rhs );                                \
-}
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG)   \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                      \
-    OP_CMPXCHG(TYPE,BITS,OP)                                               \
-}
+#define ATOMIC_FIXED_ADD(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,         \
+                         GOMP_FLAG)                                            \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  /* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */            \
+  KMP_TEST_THEN_ADD##BITS(lhs, OP rhs);                                        \
+  }
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,           \
+                       GOMP_FLAG)                                              \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  OP_CMPXCHG(TYPE, BITS, OP)                                                   \
+  }
 #if USE_CMPXCHG_FIX
 // -------------------------------------------------------------------------
 // workaround for C78287 (complex(kind=4) data type)
-#define ATOMIC_CMPXCHG_WORKAROUND(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG)   \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                                 \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                 \
-    OP_CMPXCHG_WORKAROUND(TYPE,BITS,OP)                                               \
-}
+#define ATOMIC_CMPXCHG_WORKAROUND(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID,      \
+                                  MASK, GOMP_FLAG)                             \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  OP_CMPXCHG_WORKAROUND(TYPE, BITS, OP)                                        \
+  }
 // end of the second part of the workaround for C78287
 #endif
 
 #else
 // -------------------------------------------------------------------------
 // Code for other architectures that don't handle unaligned accesses.
-#define ATOMIC_FIXED_ADD(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                      \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                           \
-        /* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */  \
-        KMP_TEST_THEN_ADD##BITS( lhs, OP rhs );                            \
-    } else {                                                               \
-        KMP_CHECK_GTID;                                                    \
-        OP_CRITICAL(OP##=,LCK_ID)  /* unaligned address - use critical */  \
-    }                                                                      \
-}
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG)   \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                      \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                           \
-        OP_CMPXCHG(TYPE,BITS,OP)     /* aligned address */                 \
-    } else {                                                               \
-        KMP_CHECK_GTID;                                                    \
-        OP_CRITICAL(OP##=,LCK_ID)  /* unaligned address - use critical */  \
-    }                                                                      \
-}
+#define ATOMIC_FIXED_ADD(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,         \
+                         GOMP_FLAG)                                            \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    /* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */          \
+    KMP_TEST_THEN_ADD##BITS(lhs, OP rhs);                                      \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(OP## =, LCK_ID) /* unaligned address - use critical */         \
+  }                                                                            \
+  }
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,           \
+                       GOMP_FLAG)                                              \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    OP_CMPXCHG(TYPE, BITS, OP) /* aligned address */                           \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(OP## =, LCK_ID) /* unaligned address - use critical */         \
+  }                                                                            \
+  }
 #if USE_CMPXCHG_FIX
 // -------------------------------------------------------------------------
 // workaround for C78287 (complex(kind=4) data type)
-#define ATOMIC_CMPXCHG_WORKAROUND(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG)   \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                                 \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                 \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                                      \
-        OP_CMPXCHG(TYPE,BITS,OP)             /* aligned address */                    \
-    } else {                                                                          \
-        KMP_CHECK_GTID;                                                               \
-        OP_CRITICAL(OP##=,LCK_ID)  /* unaligned address - use critical */             \
-    }                                                                                 \
-}
+#define ATOMIC_CMPXCHG_WORKAROUND(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID,      \
+                                  MASK, GOMP_FLAG)                             \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    OP_CMPXCHG(TYPE, BITS, OP) /* aligned address */                           \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(OP## =, LCK_ID) /* unaligned address - use critical */         \
+  }                                                                            \
+  }
 // end of the second part of the workaround for C78287
 #endif // USE_CMPXCHG_FIX
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
 // Routines for ATOMIC 4-byte operands addition and subtraction
-ATOMIC_FIXED_ADD( fixed4, add, kmp_int32,  32, +, 4i, 3, 0            )  // __kmpc_atomic_fixed4_add
-ATOMIC_FIXED_ADD( fixed4, sub, kmp_int32,  32, -, 4i, 3, 0            )  // __kmpc_atomic_fixed4_sub
-
-ATOMIC_CMPXCHG( float4,  add, kmp_real32, 32, +,  4r, 3, KMP_ARCH_X86 )  // __kmpc_atomic_float4_add
-ATOMIC_CMPXCHG( float4,  sub, kmp_real32, 32, -,  4r, 3, KMP_ARCH_X86 )  // __kmpc_atomic_float4_sub
+ATOMIC_FIXED_ADD(fixed4, add, kmp_int32, 32, +, 4i, 3,
+                 0) // __kmpc_atomic_fixed4_add
+ATOMIC_FIXED_ADD(fixed4, sub, kmp_int32, 32, -, 4i, 3,
+                 0) // __kmpc_atomic_fixed4_sub
+
+ATOMIC_CMPXCHG(float4, add, kmp_real32, 32, +, 4r, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_float4_add
+ATOMIC_CMPXCHG(float4, sub, kmp_real32, 32, -, 4r, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_float4_sub
 
 // Routines for ATOMIC 8-byte operands addition and subtraction
-ATOMIC_FIXED_ADD( fixed8, add, kmp_int64,  64, +, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_add
-ATOMIC_FIXED_ADD( fixed8, sub, kmp_int64,  64, -, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_sub
-
-ATOMIC_CMPXCHG( float8,  add, kmp_real64, 64, +,  8r, 7, KMP_ARCH_X86 )  // __kmpc_atomic_float8_add
-ATOMIC_CMPXCHG( float8,  sub, kmp_real64, 64, -,  8r, 7, KMP_ARCH_X86 )  // __kmpc_atomic_float8_sub
+ATOMIC_FIXED_ADD(fixed8, add, kmp_int64, 64, +, 8i, 7,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed8_add
+ATOMIC_FIXED_ADD(fixed8, sub, kmp_int64, 64, -, 8i, 7,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub
+
+ATOMIC_CMPXCHG(float8, add, kmp_real64, 64, +, 8r, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_float8_add
+ATOMIC_CMPXCHG(float8, sub, kmp_real64, 64, -, 8r, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_float8_sub
 
 // ------------------------------------------------------------------------
 // Entries definition for integer operands
@@ -856,316 +930,420 @@ ATOMIC_CMPXCHG( float8,  sub, kmp_real64
 // Routines for ATOMIC integer operands, other operators
 // ------------------------------------------------------------------------
 //              TYPE_ID,OP_ID, TYPE,          OP, LCK_ID, GOMP_FLAG
-ATOMIC_CMPXCHG( fixed1,  add, kmp_int8,    8, +,  1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_add
-ATOMIC_CMPXCHG( fixed1, andb, kmp_int8,    8, &,  1i, 0, 0            )  // __kmpc_atomic_fixed1_andb
-ATOMIC_CMPXCHG( fixed1,  div, kmp_int8,    8, /,  1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_div
-ATOMIC_CMPXCHG( fixed1u, div, kmp_uint8,   8, /,  1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_div
-ATOMIC_CMPXCHG( fixed1,  mul, kmp_int8,    8, *,  1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_mul
-ATOMIC_CMPXCHG( fixed1,  orb, kmp_int8,    8, |,  1i, 0, 0            )  // __kmpc_atomic_fixed1_orb
-ATOMIC_CMPXCHG( fixed1,  shl, kmp_int8,    8, <<, 1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shl
-ATOMIC_CMPXCHG( fixed1,  shr, kmp_int8,    8, >>, 1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shr
-ATOMIC_CMPXCHG( fixed1u, shr, kmp_uint8,   8, >>, 1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_shr
-ATOMIC_CMPXCHG( fixed1,  sub, kmp_int8,    8, -,  1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_sub
-ATOMIC_CMPXCHG( fixed1,  xor, kmp_int8,    8, ^,  1i, 0, 0            )  // __kmpc_atomic_fixed1_xor
-ATOMIC_CMPXCHG( fixed2,  add, kmp_int16,  16, +,  2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_add
-ATOMIC_CMPXCHG( fixed2, andb, kmp_int16,  16, &,  2i, 1, 0            )  // __kmpc_atomic_fixed2_andb
-ATOMIC_CMPXCHG( fixed2,  div, kmp_int16,  16, /,  2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_div
-ATOMIC_CMPXCHG( fixed2u, div, kmp_uint16, 16, /,  2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_div
-ATOMIC_CMPXCHG( fixed2,  mul, kmp_int16,  16, *,  2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_mul
-ATOMIC_CMPXCHG( fixed2,  orb, kmp_int16,  16, |,  2i, 1, 0            )  // __kmpc_atomic_fixed2_orb
-ATOMIC_CMPXCHG( fixed2,  shl, kmp_int16,  16, <<, 2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shl
-ATOMIC_CMPXCHG( fixed2,  shr, kmp_int16,  16, >>, 2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shr
-ATOMIC_CMPXCHG( fixed2u, shr, kmp_uint16, 16, >>, 2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_shr
-ATOMIC_CMPXCHG( fixed2,  sub, kmp_int16,  16, -,  2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_sub
-ATOMIC_CMPXCHG( fixed2,  xor, kmp_int16,  16, ^,  2i, 1, 0            )  // __kmpc_atomic_fixed2_xor
-ATOMIC_CMPXCHG( fixed4, andb, kmp_int32,  32, &,  4i, 3, 0            )  // __kmpc_atomic_fixed4_andb
-ATOMIC_CMPXCHG( fixed4,  div, kmp_int32,  32, /,  4i, 3, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_div
-ATOMIC_CMPXCHG( fixed4u, div, kmp_uint32, 32, /,  4i, 3, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_div
-ATOMIC_CMPXCHG( fixed4,  mul, kmp_int32,  32, *,  4i, 3, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_mul
-ATOMIC_CMPXCHG( fixed4,  orb, kmp_int32,  32, |,  4i, 3, 0            )  // __kmpc_atomic_fixed4_orb
-ATOMIC_CMPXCHG( fixed4,  shl, kmp_int32,  32, <<, 4i, 3, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shl
-ATOMIC_CMPXCHG( fixed4,  shr, kmp_int32,  32, >>, 4i, 3, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shr
-ATOMIC_CMPXCHG( fixed4u, shr, kmp_uint32, 32, >>, 4i, 3, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_shr
-ATOMIC_CMPXCHG( fixed4,  xor, kmp_int32,  32, ^,  4i, 3, 0            )  // __kmpc_atomic_fixed4_xor
-ATOMIC_CMPXCHG( fixed8, andb, kmp_int64,  64, &,  8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_andb
-ATOMIC_CMPXCHG( fixed8,  div, kmp_int64,  64, /,  8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_div
-ATOMIC_CMPXCHG( fixed8u, div, kmp_uint64, 64, /,  8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_div
-ATOMIC_CMPXCHG( fixed8,  mul, kmp_int64,  64, *,  8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_mul
-ATOMIC_CMPXCHG( fixed8,  orb, kmp_int64,  64, |,  8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_orb
-ATOMIC_CMPXCHG( fixed8,  shl, kmp_int64,  64, <<, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shl
-ATOMIC_CMPXCHG( fixed8,  shr, kmp_int64,  64, >>, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shr
-ATOMIC_CMPXCHG( fixed8u, shr, kmp_uint64, 64, >>, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_shr
-ATOMIC_CMPXCHG( fixed8,  xor, kmp_int64,  64, ^,  8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_xor
-ATOMIC_CMPXCHG( float4,  div, kmp_real32, 32, /,  4r, 3, KMP_ARCH_X86 )  // __kmpc_atomic_float4_div
-ATOMIC_CMPXCHG( float4,  mul, kmp_real32, 32, *,  4r, 3, KMP_ARCH_X86 )  // __kmpc_atomic_float4_mul
-ATOMIC_CMPXCHG( float8,  div, kmp_real64, 64, /,  8r, 7, KMP_ARCH_X86 )  // __kmpc_atomic_float8_div
-ATOMIC_CMPXCHG( float8,  mul, kmp_real64, 64, *,  8r, 7, KMP_ARCH_X86 )  // __kmpc_atomic_float8_mul
+ATOMIC_CMPXCHG(fixed1, add, kmp_int8, 8, +, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_add
+ATOMIC_CMPXCHG(fixed1, andb, kmp_int8, 8, &, 1i, 0,
+               0) // __kmpc_atomic_fixed1_andb
+ATOMIC_CMPXCHG(fixed1, div, kmp_int8, 8, /, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_div
+ATOMIC_CMPXCHG(fixed1u, div, kmp_uint8, 8, /, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div
+ATOMIC_CMPXCHG(fixed1, mul, kmp_int8, 8, *, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_mul
+ATOMIC_CMPXCHG(fixed1, orb, kmp_int8, 8, |, 1i, 0,
+               0) // __kmpc_atomic_fixed1_orb
+ATOMIC_CMPXCHG(fixed1, shl, kmp_int8, 8, <<, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_shl
+ATOMIC_CMPXCHG(fixed1, shr, kmp_int8, 8, >>, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_shr
+ATOMIC_CMPXCHG(fixed1u, shr, kmp_uint8, 8, >>, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1u_shr
+ATOMIC_CMPXCHG(fixed1, sub, kmp_int8, 8, -, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub
+ATOMIC_CMPXCHG(fixed1, xor, kmp_int8, 8, ^, 1i, 0,
+               0) // __kmpc_atomic_fixed1_xor
+ATOMIC_CMPXCHG(fixed2, add, kmp_int16, 16, +, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_add
+ATOMIC_CMPXCHG(fixed2, andb, kmp_int16, 16, &, 2i, 1,
+               0) // __kmpc_atomic_fixed2_andb
+ATOMIC_CMPXCHG(fixed2, div, kmp_int16, 16, /, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_div
+ATOMIC_CMPXCHG(fixed2u, div, kmp_uint16, 16, /, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div
+ATOMIC_CMPXCHG(fixed2, mul, kmp_int16, 16, *, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_mul
+ATOMIC_CMPXCHG(fixed2, orb, kmp_int16, 16, |, 2i, 1,
+               0) // __kmpc_atomic_fixed2_orb
+ATOMIC_CMPXCHG(fixed2, shl, kmp_int16, 16, <<, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_shl
+ATOMIC_CMPXCHG(fixed2, shr, kmp_int16, 16, >>, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_shr
+ATOMIC_CMPXCHG(fixed2u, shr, kmp_uint16, 16, >>, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2u_shr
+ATOMIC_CMPXCHG(fixed2, sub, kmp_int16, 16, -, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub
+ATOMIC_CMPXCHG(fixed2, xor, kmp_int16, 16, ^, 2i, 1,
+               0) // __kmpc_atomic_fixed2_xor
+ATOMIC_CMPXCHG(fixed4, andb, kmp_int32, 32, &, 4i, 3,
+               0) // __kmpc_atomic_fixed4_andb
+ATOMIC_CMPXCHG(fixed4, div, kmp_int32, 32, /, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4_div
+ATOMIC_CMPXCHG(fixed4u, div, kmp_uint32, 32, /, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4u_div
+ATOMIC_CMPXCHG(fixed4, mul, kmp_int32, 32, *, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4_mul
+ATOMIC_CMPXCHG(fixed4, orb, kmp_int32, 32, |, 4i, 3,
+               0) // __kmpc_atomic_fixed4_orb
+ATOMIC_CMPXCHG(fixed4, shl, kmp_int32, 32, <<, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4_shl
+ATOMIC_CMPXCHG(fixed4, shr, kmp_int32, 32, >>, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4_shr
+ATOMIC_CMPXCHG(fixed4u, shr, kmp_uint32, 32, >>, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4u_shr
+ATOMIC_CMPXCHG(fixed4, xor, kmp_int32, 32, ^, 4i, 3,
+               0) // __kmpc_atomic_fixed4_xor
+ATOMIC_CMPXCHG(fixed8, andb, kmp_int64, 64, &, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_andb
+ATOMIC_CMPXCHG(fixed8, div, kmp_int64, 64, /, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_div
+ATOMIC_CMPXCHG(fixed8u, div, kmp_uint64, 64, /, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div
+ATOMIC_CMPXCHG(fixed8, mul, kmp_int64, 64, *, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_mul
+ATOMIC_CMPXCHG(fixed8, orb, kmp_int64, 64, |, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_orb
+ATOMIC_CMPXCHG(fixed8, shl, kmp_int64, 64, <<, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_shl
+ATOMIC_CMPXCHG(fixed8, shr, kmp_int64, 64, >>, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_shr
+ATOMIC_CMPXCHG(fixed8u, shr, kmp_uint64, 64, >>, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8u_shr
+ATOMIC_CMPXCHG(fixed8, xor, kmp_int64, 64, ^, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_xor
+ATOMIC_CMPXCHG(float4, div, kmp_real32, 32, /, 4r, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_float4_div
+ATOMIC_CMPXCHG(float4, mul, kmp_real32, 32, *, 4r, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_float4_mul
+ATOMIC_CMPXCHG(float8, div, kmp_real64, 64, /, 8r, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_float8_div
+ATOMIC_CMPXCHG(float8, mul, kmp_real64, 64, *, 8r, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_float8_mul
 //              TYPE_ID,OP_ID, TYPE,          OP, LCK_ID, GOMP_FLAG
 
-
 /* ------------------------------------------------------------------------ */
 /* Routines for C/C++ Reduction operators && and ||                         */
-/* ------------------------------------------------------------------------ */
 
 // ------------------------------------------------------------------------
 // Need separate macros for &&, || because there is no combined assignment
 //   TODO: eliminate ATOMIC_CRIT_{L,EQV} macros as not used
-#define ATOMIC_CRIT_L(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)             \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL( = *lhs OP, GOMP_FLAG )                              \
-    OP_CRITICAL( = *lhs OP, LCK_ID )                                      \
-}
+#define ATOMIC_CRIT_L(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)             \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(= *lhs OP, GOMP_FLAG)                                       \
+  OP_CRITICAL(= *lhs OP, LCK_ID)                                               \
+  }
 
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 
 // ------------------------------------------------------------------------
 // X86 or X86_64: no alignment problems ===================================
-#define ATOMIC_CMPX_L(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG)   \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL( = *lhs OP, GOMP_FLAG )                              \
-    OP_CMPXCHG(TYPE,BITS,OP)                                              \
-}
+#define ATOMIC_CMPX_L(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK, GOMP_FLAG) \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(= *lhs OP, GOMP_FLAG)                                       \
+  OP_CMPXCHG(TYPE, BITS, OP)                                                   \
+  }
 
 #else
 // ------------------------------------------------------------------------
 // Code for other architectures that don't handle unaligned accesses.
-#define ATOMIC_CMPX_L(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG)   \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(= *lhs OP,GOMP_FLAG)                                 \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                          \
-        OP_CMPXCHG(TYPE,BITS,OP)       /* aligned address */              \
-    } else {                                                              \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL(= *lhs OP,LCK_ID)  /* unaligned - use critical */     \
-    }                                                                     \
-}
+#define ATOMIC_CMPX_L(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK, GOMP_FLAG) \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(= *lhs OP, GOMP_FLAG)                                       \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    OP_CMPXCHG(TYPE, BITS, OP) /* aligned address */                           \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(= *lhs OP, LCK_ID) /* unaligned - use critical */              \
+  }                                                                            \
+  }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
-ATOMIC_CMPX_L( fixed1, andl, char,       8, &&, 1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_andl
-ATOMIC_CMPX_L( fixed1,  orl, char,       8, ||, 1i, 0, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_orl
-ATOMIC_CMPX_L( fixed2, andl, short,     16, &&, 2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_andl
-ATOMIC_CMPX_L( fixed2,  orl, short,     16, ||, 2i, 1, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_orl
-ATOMIC_CMPX_L( fixed4, andl, kmp_int32, 32, &&, 4i, 3, 0 )             // __kmpc_atomic_fixed4_andl
-ATOMIC_CMPX_L( fixed4,  orl, kmp_int32, 32, ||, 4i, 3, 0 )             // __kmpc_atomic_fixed4_orl
-ATOMIC_CMPX_L( fixed8, andl, kmp_int64, 64, &&, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_andl
-ATOMIC_CMPX_L( fixed8,  orl, kmp_int64, 64, ||, 8i, 7, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_orl
-
+ATOMIC_CMPX_L(fixed1, andl, char, 8, &&, 1i, 0,
+              KMP_ARCH_X86) // __kmpc_atomic_fixed1_andl
+ATOMIC_CMPX_L(fixed1, orl, char, 8, ||, 1i, 0,
+              KMP_ARCH_X86) // __kmpc_atomic_fixed1_orl
+ATOMIC_CMPX_L(fixed2, andl, short, 16, &&, 2i, 1,
+              KMP_ARCH_X86) // __kmpc_atomic_fixed2_andl
+ATOMIC_CMPX_L(fixed2, orl, short, 16, ||, 2i, 1,
+              KMP_ARCH_X86) // __kmpc_atomic_fixed2_orl
+ATOMIC_CMPX_L(fixed4, andl, kmp_int32, 32, &&, 4i, 3,
+              0) // __kmpc_atomic_fixed4_andl
+ATOMIC_CMPX_L(fixed4, orl, kmp_int32, 32, ||, 4i, 3,
+              0) // __kmpc_atomic_fixed4_orl
+ATOMIC_CMPX_L(fixed8, andl, kmp_int64, 64, &&, 8i, 7,
+              KMP_ARCH_X86) // __kmpc_atomic_fixed8_andl
+ATOMIC_CMPX_L(fixed8, orl, kmp_int64, 64, ||, 8i, 7,
+              KMP_ARCH_X86) // __kmpc_atomic_fixed8_orl
 
 /* ------------------------------------------------------------------------- */
 /* Routines for Fortran operators that matched no one in C:                  */
 /* MAX, MIN, .EQV., .NEQV.                                                   */
 /* Operators .AND., .OR. are covered by __kmpc_atomic_*_{andl,orl}           */
 /* Intrinsics IAND, IOR, IEOR are covered by __kmpc_atomic_*_{andb,orb,xor}  */
-/* ------------------------------------------------------------------------- */
 
 // -------------------------------------------------------------------------
 // MIN and MAX need separate macros
 // OP - operator to check if we need any actions?
-#define MIN_MAX_CRITSECT(OP,LCK_ID)                                        \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                     \
-                                                                           \
-    if ( *lhs OP rhs ) {                 /* still need actions? */         \
-        *lhs = rhs;                                                        \
-    }                                                                      \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );
+#define MIN_MAX_CRITSECT(OP, LCK_ID)                                           \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (*lhs OP rhs) { /* still need actions? */                                 \
+    *lhs = rhs;                                                                \
+  }                                                                            \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);
 
 // -------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define GOMP_MIN_MAX_CRITSECT(OP,FLAG)                                     \
-    if (( FLAG ) && ( __kmp_atomic_mode == 2 )) {                          \
-        KMP_CHECK_GTID;                                                    \
-        MIN_MAX_CRITSECT( OP, 0 );                                         \
-        return;                                                            \
-    }
-#else
-#define GOMP_MIN_MAX_CRITSECT(OP,FLAG)
+#define GOMP_MIN_MAX_CRITSECT(OP, FLAG)                                        \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    MIN_MAX_CRITSECT(OP, 0);                                                   \
+    return;                                                                    \
+  }
+#else
+#define GOMP_MIN_MAX_CRITSECT(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 // -------------------------------------------------------------------------
-#define MIN_MAX_CMPXCHG(TYPE,BITS,OP)                                      \
-    {                                                                      \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                 \
-        TYPE old_value;                                                    \
-        temp_val = *lhs;                                                   \
-        old_value = temp_val;                                              \
-        while ( old_value OP rhs &&          /* still need actions? */     \
-            ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs,      \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,          \
-                      *VOLATILE_CAST(kmp_int##BITS *) &rhs ) )             \
-        {                                                                  \
-            KMP_CPU_PAUSE();                                               \
-            temp_val = *lhs;                                               \
-            old_value = temp_val;                                          \
-        }                                                                  \
-    }
+#define MIN_MAX_CMPXCHG(TYPE, BITS, OP)                                        \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    TYPE old_value;                                                            \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    while (old_value OP rhs && /* still need actions? */                       \
+           !KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+               (kmp_int##BITS *)lhs,                                           \
+               *VOLATILE_CAST(kmp_int##BITS *) & old_value,                    \
+               *VOLATILE_CAST(kmp_int##BITS *) & rhs)) {                       \
+      KMP_CPU_PAUSE();                                                         \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+    }                                                                          \
+  }
 
 // -------------------------------------------------------------------------
 // 1-byte, 2-byte operands - use critical section
-#define MIN_MAX_CRITICAL(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)           \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    if ( *lhs OP rhs ) {     /* need actions? */                           \
-        GOMP_MIN_MAX_CRITSECT(OP,GOMP_FLAG)                                \
-        MIN_MAX_CRITSECT(OP,LCK_ID)                                        \
-    }                                                                      \
-}
+#define MIN_MAX_CRITICAL(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)          \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  if (*lhs OP rhs) { /* need actions? */                                       \
+    GOMP_MIN_MAX_CRITSECT(OP, GOMP_FLAG)                                       \
+    MIN_MAX_CRITSECT(OP, LCK_ID)                                               \
+  }                                                                            \
+  }
 
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 
 // -------------------------------------------------------------------------
 // X86 or X86_64: no alignment problems ====================================
-#define MIN_MAX_COMPXCHG(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    if ( *lhs OP rhs ) {                                                   \
-        GOMP_MIN_MAX_CRITSECT(OP,GOMP_FLAG)                                \
-        MIN_MAX_CMPXCHG(TYPE,BITS,OP)                                      \
-    }                                                                      \
-}
+#define MIN_MAX_COMPXCHG(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,         \
+                         GOMP_FLAG)                                            \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  if (*lhs OP rhs) {                                                           \
+    GOMP_MIN_MAX_CRITSECT(OP, GOMP_FLAG)                                       \
+    MIN_MAX_CMPXCHG(TYPE, BITS, OP)                                            \
+  }                                                                            \
+  }
 
 #else
 // -------------------------------------------------------------------------
 // Code for other architectures that don't handle unaligned accesses.
-#define MIN_MAX_COMPXCHG(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                      \
-    if ( *lhs OP rhs ) {                                                   \
-        GOMP_MIN_MAX_CRITSECT(OP,GOMP_FLAG)                                \
-        if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                       \
-            MIN_MAX_CMPXCHG(TYPE,BITS,OP) /* aligned address */            \
-        } else {                                                           \
-            KMP_CHECK_GTID;                                                \
-            MIN_MAX_CRITSECT(OP,LCK_ID)   /* unaligned address */          \
-        }                                                                  \
-    }                                                                      \
-}
+#define MIN_MAX_COMPXCHG(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,         \
+                         GOMP_FLAG)                                            \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  if (*lhs OP rhs) {                                                           \
+    GOMP_MIN_MAX_CRITSECT(OP, GOMP_FLAG)                                       \
+    if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                    \
+      MIN_MAX_CMPXCHG(TYPE, BITS, OP) /* aligned address */                    \
+    } else {                                                                   \
+      KMP_CHECK_GTID;                                                          \
+      MIN_MAX_CRITSECT(OP, LCK_ID) /* unaligned address */                     \
+    }                                                                          \
+  }                                                                            \
+  }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
-MIN_MAX_COMPXCHG( fixed1,  max, char,        8, <, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_max
-MIN_MAX_COMPXCHG( fixed1,  min, char,        8, >, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_min
-MIN_MAX_COMPXCHG( fixed2,  max, short,      16, <, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_max
-MIN_MAX_COMPXCHG( fixed2,  min, short,      16, >, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_min
-MIN_MAX_COMPXCHG( fixed4,  max, kmp_int32,  32, <, 4i, 3, 0 )            // __kmpc_atomic_fixed4_max
-MIN_MAX_COMPXCHG( fixed4,  min, kmp_int32,  32, >, 4i, 3, 0 )            // __kmpc_atomic_fixed4_min
-MIN_MAX_COMPXCHG( fixed8,  max, kmp_int64,  64, <, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_max
-MIN_MAX_COMPXCHG( fixed8,  min, kmp_int64,  64, >, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_min
-MIN_MAX_COMPXCHG( float4,  max, kmp_real32, 32, <, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_max
-MIN_MAX_COMPXCHG( float4,  min, kmp_real32, 32, >, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_min
-MIN_MAX_COMPXCHG( float8,  max, kmp_real64, 64, <, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_max
-MIN_MAX_COMPXCHG( float8,  min, kmp_real64, 64, >, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_min
+MIN_MAX_COMPXCHG(fixed1, max, char, 8, <, 1i, 0,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed1_max
+MIN_MAX_COMPXCHG(fixed1, min, char, 8, >, 1i, 0,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed1_min
+MIN_MAX_COMPXCHG(fixed2, max, short, 16, <, 2i, 1,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed2_max
+MIN_MAX_COMPXCHG(fixed2, min, short, 16, >, 2i, 1,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed2_min
+MIN_MAX_COMPXCHG(fixed4, max, kmp_int32, 32, <, 4i, 3,
+                 0) // __kmpc_atomic_fixed4_max
+MIN_MAX_COMPXCHG(fixed4, min, kmp_int32, 32, >, 4i, 3,
+                 0) // __kmpc_atomic_fixed4_min
+MIN_MAX_COMPXCHG(fixed8, max, kmp_int64, 64, <, 8i, 7,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed8_max
+MIN_MAX_COMPXCHG(fixed8, min, kmp_int64, 64, >, 8i, 7,
+                 KMP_ARCH_X86) // __kmpc_atomic_fixed8_min
+MIN_MAX_COMPXCHG(float4, max, kmp_real32, 32, <, 4r, 3,
+                 KMP_ARCH_X86) // __kmpc_atomic_float4_max
+MIN_MAX_COMPXCHG(float4, min, kmp_real32, 32, >, 4r, 3,
+                 KMP_ARCH_X86) // __kmpc_atomic_float4_min
+MIN_MAX_COMPXCHG(float8, max, kmp_real64, 64, <, 8r, 7,
+                 KMP_ARCH_X86) // __kmpc_atomic_float8_max
+MIN_MAX_COMPXCHG(float8, min, kmp_real64, 64, >, 8r, 7,
+                 KMP_ARCH_X86) // __kmpc_atomic_float8_min
 #if KMP_HAVE_QUAD
-MIN_MAX_CRITICAL( float16, max,     QUAD_LEGACY,      <, 16r,   1 )            // __kmpc_atomic_float16_max
-MIN_MAX_CRITICAL( float16, min,     QUAD_LEGACY,      >, 16r,   1 )            // __kmpc_atomic_float16_min
-#if ( KMP_ARCH_X86 )
-    MIN_MAX_CRITICAL( float16, max_a16, Quad_a16_t,     <, 16r,   1 )            // __kmpc_atomic_float16_max_a16
-    MIN_MAX_CRITICAL( float16, min_a16, Quad_a16_t,     >, 16r,   1 )            // __kmpc_atomic_float16_min_a16
+MIN_MAX_CRITICAL(float16, max, QUAD_LEGACY, <, 16r,
+                 1) // __kmpc_atomic_float16_max
+MIN_MAX_CRITICAL(float16, min, QUAD_LEGACY, >, 16r,
+                 1) // __kmpc_atomic_float16_min
+#if (KMP_ARCH_X86)
+MIN_MAX_CRITICAL(float16, max_a16, Quad_a16_t, <, 16r,
+                 1) // __kmpc_atomic_float16_max_a16
+MIN_MAX_CRITICAL(float16, min_a16, Quad_a16_t, >, 16r,
+                 1) // __kmpc_atomic_float16_min_a16
 #endif
 #endif
 // ------------------------------------------------------------------------
 // Need separate macros for .EQV. because of the need of complement (~)
 // OP ignored for critical sections, ^=~ used instead
-#define ATOMIC_CRIT_EQV(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)           \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(^=~,GOMP_FLAG)  /* send assignment */                \
-    OP_CRITICAL(^=~,LCK_ID)    /* send assignment and complement */       \
-}
+#define ATOMIC_CRIT_EQV(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)           \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(^= ~, GOMP_FLAG) /* send assignment */                      \
+  OP_CRITICAL(^= ~, LCK_ID) /* send assignment and complement */               \
+  }
 
 // ------------------------------------------------------------------------
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 // ------------------------------------------------------------------------
 // X86 or X86_64: no alignment problems ===================================
-#define ATOMIC_CMPX_EQV(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(^=~,GOMP_FLAG)  /* send assignment */                \
-    OP_CMPXCHG(TYPE,BITS,OP)                                              \
-}
+#define ATOMIC_CMPX_EQV(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,          \
+                        GOMP_FLAG)                                             \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(^= ~, GOMP_FLAG) /* send assignment */                      \
+  OP_CMPXCHG(TYPE, BITS, OP)                                                   \
+  }
 // ------------------------------------------------------------------------
 #else
 // ------------------------------------------------------------------------
 // Code for other architectures that don't handle unaligned accesses.
-#define ATOMIC_CMPX_EQV(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(^=~,GOMP_FLAG)                                       \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                          \
-        OP_CMPXCHG(TYPE,BITS,OP)   /* aligned address */                  \
-    } else {                                                              \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL(^=~,LCK_ID)    /* unaligned address - use critical */ \
-    }                                                                     \
-}
+#define ATOMIC_CMPX_EQV(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, MASK,          \
+                        GOMP_FLAG)                                             \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(^= ~, GOMP_FLAG)                                            \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    OP_CMPXCHG(TYPE, BITS, OP) /* aligned address */                           \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(^= ~, LCK_ID) /* unaligned address - use critical */           \
+  }                                                                            \
+  }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
-ATOMIC_CMPXCHG(  fixed1, neqv, kmp_int8,   8,   ^, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_neqv
-ATOMIC_CMPXCHG(  fixed2, neqv, kmp_int16, 16,   ^, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_neqv
-ATOMIC_CMPXCHG(  fixed4, neqv, kmp_int32, 32,   ^, 4i, 3, KMP_ARCH_X86 ) // __kmpc_atomic_fixed4_neqv
-ATOMIC_CMPXCHG(  fixed8, neqv, kmp_int64, 64,   ^, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_neqv
-ATOMIC_CMPX_EQV( fixed1, eqv,  kmp_int8,   8,  ^~, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_eqv
-ATOMIC_CMPX_EQV( fixed2, eqv,  kmp_int16, 16,  ^~, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_eqv
-ATOMIC_CMPX_EQV( fixed4, eqv,  kmp_int32, 32,  ^~, 4i, 3, KMP_ARCH_X86 ) // __kmpc_atomic_fixed4_eqv
-ATOMIC_CMPX_EQV( fixed8, eqv,  kmp_int64, 64,  ^~, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_eqv
-
+ATOMIC_CMPXCHG(fixed1, neqv, kmp_int8, 8, ^, 1i, 0,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_neqv
+ATOMIC_CMPXCHG(fixed2, neqv, kmp_int16, 16, ^, 2i, 1,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_neqv
+ATOMIC_CMPXCHG(fixed4, neqv, kmp_int32, 32, ^, 4i, 3,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4_neqv
+ATOMIC_CMPXCHG(fixed8, neqv, kmp_int64, 64, ^, 8i, 7,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_neqv
+ATOMIC_CMPX_EQV(fixed1, eqv, kmp_int8, 8, ^~, 1i, 0,
+                KMP_ARCH_X86) // __kmpc_atomic_fixed1_eqv
+ATOMIC_CMPX_EQV(fixed2, eqv, kmp_int16, 16, ^~, 2i, 1,
+                KMP_ARCH_X86) // __kmpc_atomic_fixed2_eqv
+ATOMIC_CMPX_EQV(fixed4, eqv, kmp_int32, 32, ^~, 4i, 3,
+                KMP_ARCH_X86) // __kmpc_atomic_fixed4_eqv
+ATOMIC_CMPX_EQV(fixed8, eqv, kmp_int64, 64, ^~, 8i, 7,
+                KMP_ARCH_X86) // __kmpc_atomic_fixed8_eqv
 
 // ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
 //     TYPE_ID, OP_ID, TYPE - detailed above
 //     OP      - operator
 //     LCK_ID  - lock identifier, used to possibly distinguish lock variable
-#define ATOMIC_CRITICAL(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)           \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)  /* send assignment */              \
-    OP_CRITICAL(OP##=,LCK_ID)          /* send assignment */              \
-}
+#define ATOMIC_CRITICAL(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)           \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG) /* send assignment */                    \
+  OP_CRITICAL(OP## =, LCK_ID) /* send assignment */                            \
+  }
 
 /* ------------------------------------------------------------------------- */
 // routines for long double type
-ATOMIC_CRITICAL( float10, add, long double,     +, 10r,   1 )            // __kmpc_atomic_float10_add
-ATOMIC_CRITICAL( float10, sub, long double,     -, 10r,   1 )            // __kmpc_atomic_float10_sub
-ATOMIC_CRITICAL( float10, mul, long double,     *, 10r,   1 )            // __kmpc_atomic_float10_mul
-ATOMIC_CRITICAL( float10, div, long double,     /, 10r,   1 )            // __kmpc_atomic_float10_div
+ATOMIC_CRITICAL(float10, add, long double, +, 10r,
+                1) // __kmpc_atomic_float10_add
+ATOMIC_CRITICAL(float10, sub, long double, -, 10r,
+                1) // __kmpc_atomic_float10_sub
+ATOMIC_CRITICAL(float10, mul, long double, *, 10r,
+                1) // __kmpc_atomic_float10_mul
+ATOMIC_CRITICAL(float10, div, long double, /, 10r,
+                1) // __kmpc_atomic_float10_div
 #if KMP_HAVE_QUAD
 // routines for _Quad type
-ATOMIC_CRITICAL( float16, add, QUAD_LEGACY,     +, 16r,   1 )            // __kmpc_atomic_float16_add
-ATOMIC_CRITICAL( float16, sub, QUAD_LEGACY,     -, 16r,   1 )            // __kmpc_atomic_float16_sub
-ATOMIC_CRITICAL( float16, mul, QUAD_LEGACY,     *, 16r,   1 )            // __kmpc_atomic_float16_mul
-ATOMIC_CRITICAL( float16, div, QUAD_LEGACY,     /, 16r,   1 )            // __kmpc_atomic_float16_div
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL( float16, add_a16, Quad_a16_t, +, 16r, 1 )           // __kmpc_atomic_float16_add_a16
-    ATOMIC_CRITICAL( float16, sub_a16, Quad_a16_t, -, 16r, 1 )           // __kmpc_atomic_float16_sub_a16
-    ATOMIC_CRITICAL( float16, mul_a16, Quad_a16_t, *, 16r, 1 )           // __kmpc_atomic_float16_mul_a16
-    ATOMIC_CRITICAL( float16, div_a16, Quad_a16_t, /, 16r, 1 )           // __kmpc_atomic_float16_div_a16
+ATOMIC_CRITICAL(float16, add, QUAD_LEGACY, +, 16r,
+                1) // __kmpc_atomic_float16_add
+ATOMIC_CRITICAL(float16, sub, QUAD_LEGACY, -, 16r,
+                1) // __kmpc_atomic_float16_sub
+ATOMIC_CRITICAL(float16, mul, QUAD_LEGACY, *, 16r,
+                1) // __kmpc_atomic_float16_mul
+ATOMIC_CRITICAL(float16, div, QUAD_LEGACY, /, 16r,
+                1) // __kmpc_atomic_float16_div
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL(float16, add_a16, Quad_a16_t, +, 16r,
+                1) // __kmpc_atomic_float16_add_a16
+ATOMIC_CRITICAL(float16, sub_a16, Quad_a16_t, -, 16r,
+                1) // __kmpc_atomic_float16_sub_a16
+ATOMIC_CRITICAL(float16, mul_a16, Quad_a16_t, *, 16r,
+                1) // __kmpc_atomic_float16_mul_a16
+ATOMIC_CRITICAL(float16, div_a16, Quad_a16_t, /, 16r,
+                1) // __kmpc_atomic_float16_div_a16
 #endif
 #endif
 // routines for complex types
 
 #if USE_CMPXCHG_FIX
 // workaround for C78287 (complex(kind=4) data type)
-ATOMIC_CMPXCHG_WORKAROUND( cmplx4, add, kmp_cmplx32, 64, +, 8c, 7, 1 )   // __kmpc_atomic_cmplx4_add
-ATOMIC_CMPXCHG_WORKAROUND( cmplx4, sub, kmp_cmplx32, 64, -, 8c, 7, 1 )   // __kmpc_atomic_cmplx4_sub
-ATOMIC_CMPXCHG_WORKAROUND( cmplx4, mul, kmp_cmplx32, 64, *, 8c, 7, 1 )   // __kmpc_atomic_cmplx4_mul
-ATOMIC_CMPXCHG_WORKAROUND( cmplx4, div, kmp_cmplx32, 64, /, 8c, 7, 1 )   // __kmpc_atomic_cmplx4_div
+ATOMIC_CMPXCHG_WORKAROUND(cmplx4, add, kmp_cmplx32, 64, +, 8c, 7,
+                          1) // __kmpc_atomic_cmplx4_add
+ATOMIC_CMPXCHG_WORKAROUND(cmplx4, sub, kmp_cmplx32, 64, -, 8c, 7,
+                          1) // __kmpc_atomic_cmplx4_sub
+ATOMIC_CMPXCHG_WORKAROUND(cmplx4, mul, kmp_cmplx32, 64, *, 8c, 7,
+                          1) // __kmpc_atomic_cmplx4_mul
+ATOMIC_CMPXCHG_WORKAROUND(cmplx4, div, kmp_cmplx32, 64, /, 8c, 7,
+                          1) // __kmpc_atomic_cmplx4_div
 // end of the workaround for C78287
 #else
-ATOMIC_CRITICAL( cmplx4,  add, kmp_cmplx32,     +,  8c,   1 )            // __kmpc_atomic_cmplx4_add
-ATOMIC_CRITICAL( cmplx4,  sub, kmp_cmplx32,     -,  8c,   1 )            // __kmpc_atomic_cmplx4_sub
-ATOMIC_CRITICAL( cmplx4,  mul, kmp_cmplx32,     *,  8c,   1 )            // __kmpc_atomic_cmplx4_mul
-ATOMIC_CRITICAL( cmplx4,  div, kmp_cmplx32,     /,  8c,   1 )            // __kmpc_atomic_cmplx4_div
+ATOMIC_CRITICAL(cmplx4, add, kmp_cmplx32, +, 8c, 1) // __kmpc_atomic_cmplx4_add
+ATOMIC_CRITICAL(cmplx4, sub, kmp_cmplx32, -, 8c, 1) // __kmpc_atomic_cmplx4_sub
+ATOMIC_CRITICAL(cmplx4, mul, kmp_cmplx32, *, 8c, 1) // __kmpc_atomic_cmplx4_mul
+ATOMIC_CRITICAL(cmplx4, div, kmp_cmplx32, /, 8c, 1) // __kmpc_atomic_cmplx4_div
 #endif // USE_CMPXCHG_FIX
 
-ATOMIC_CRITICAL( cmplx8,  add, kmp_cmplx64,     +, 16c,   1 )            // __kmpc_atomic_cmplx8_add
-ATOMIC_CRITICAL( cmplx8,  sub, kmp_cmplx64,     -, 16c,   1 )            // __kmpc_atomic_cmplx8_sub
-ATOMIC_CRITICAL( cmplx8,  mul, kmp_cmplx64,     *, 16c,   1 )            // __kmpc_atomic_cmplx8_mul
-ATOMIC_CRITICAL( cmplx8,  div, kmp_cmplx64,     /, 16c,   1 )            // __kmpc_atomic_cmplx8_div
-ATOMIC_CRITICAL( cmplx10, add, kmp_cmplx80,     +, 20c,   1 )            // __kmpc_atomic_cmplx10_add
-ATOMIC_CRITICAL( cmplx10, sub, kmp_cmplx80,     -, 20c,   1 )            // __kmpc_atomic_cmplx10_sub
-ATOMIC_CRITICAL( cmplx10, mul, kmp_cmplx80,     *, 20c,   1 )            // __kmpc_atomic_cmplx10_mul
-ATOMIC_CRITICAL( cmplx10, div, kmp_cmplx80,     /, 20c,   1 )            // __kmpc_atomic_cmplx10_div
+ATOMIC_CRITICAL(cmplx8, add, kmp_cmplx64, +, 16c, 1) // __kmpc_atomic_cmplx8_add
+ATOMIC_CRITICAL(cmplx8, sub, kmp_cmplx64, -, 16c, 1) // __kmpc_atomic_cmplx8_sub
+ATOMIC_CRITICAL(cmplx8, mul, kmp_cmplx64, *, 16c, 1) // __kmpc_atomic_cmplx8_mul
+ATOMIC_CRITICAL(cmplx8, div, kmp_cmplx64, /, 16c, 1) // __kmpc_atomic_cmplx8_div
+ATOMIC_CRITICAL(cmplx10, add, kmp_cmplx80, +, 20c,
+                1) // __kmpc_atomic_cmplx10_add
+ATOMIC_CRITICAL(cmplx10, sub, kmp_cmplx80, -, 20c,
+                1) // __kmpc_atomic_cmplx10_sub
+ATOMIC_CRITICAL(cmplx10, mul, kmp_cmplx80, *, 20c,
+                1) // __kmpc_atomic_cmplx10_mul
+ATOMIC_CRITICAL(cmplx10, div, kmp_cmplx80, /, 20c,
+                1) // __kmpc_atomic_cmplx10_div
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL( cmplx16, add, CPLX128_LEG,     +, 32c,   1 )            // __kmpc_atomic_cmplx16_add
-ATOMIC_CRITICAL( cmplx16, sub, CPLX128_LEG,     -, 32c,   1 )            // __kmpc_atomic_cmplx16_sub
-ATOMIC_CRITICAL( cmplx16, mul, CPLX128_LEG,     *, 32c,   1 )            // __kmpc_atomic_cmplx16_mul
-ATOMIC_CRITICAL( cmplx16, div, CPLX128_LEG,     /, 32c,   1 )            // __kmpc_atomic_cmplx16_div
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL( cmplx16, add_a16, kmp_cmplx128_a16_t, +, 32c, 1 )   // __kmpc_atomic_cmplx16_add_a16
-    ATOMIC_CRITICAL( cmplx16, sub_a16, kmp_cmplx128_a16_t, -, 32c, 1 )   // __kmpc_atomic_cmplx16_sub_a16
-    ATOMIC_CRITICAL( cmplx16, mul_a16, kmp_cmplx128_a16_t, *, 32c, 1 )   // __kmpc_atomic_cmplx16_mul_a16
-    ATOMIC_CRITICAL( cmplx16, div_a16, kmp_cmplx128_a16_t, /, 32c, 1 )   // __kmpc_atomic_cmplx16_div_a16
+ATOMIC_CRITICAL(cmplx16, add, CPLX128_LEG, +, 32c,
+                1) // __kmpc_atomic_cmplx16_add
+ATOMIC_CRITICAL(cmplx16, sub, CPLX128_LEG, -, 32c,
+                1) // __kmpc_atomic_cmplx16_sub
+ATOMIC_CRITICAL(cmplx16, mul, CPLX128_LEG, *, 32c,
+                1) // __kmpc_atomic_cmplx16_mul
+ATOMIC_CRITICAL(cmplx16, div, CPLX128_LEG, /, 32c,
+                1) // __kmpc_atomic_cmplx16_div
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL(cmplx16, add_a16, kmp_cmplx128_a16_t, +, 32c,
+                1) // __kmpc_atomic_cmplx16_add_a16
+ATOMIC_CRITICAL(cmplx16, sub_a16, kmp_cmplx128_a16_t, -, 32c,
+                1) // __kmpc_atomic_cmplx16_sub_a16
+ATOMIC_CRITICAL(cmplx16, mul_a16, kmp_cmplx128_a16_t, *, 32c,
+                1) // __kmpc_atomic_cmplx16_mul_a16
+ATOMIC_CRITICAL(cmplx16, div_a16, kmp_cmplx128_a16_t, /, 32c,
+                1) // __kmpc_atomic_cmplx16_div_a16
 #endif
 #endif
 
@@ -1181,34 +1359,34 @@ ATOMIC_CRITICAL( cmplx16, div, CPLX128_L
 //     LCK_ID - lock identifier
 // Note: don't check gtid as it should always be valid
 // 1, 2-byte - expect valid parameter, other - check before this macro
-#define OP_CRITICAL_REV(OP,LCK_ID) \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    (*lhs) = (rhs) OP (*lhs);                                             \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );
-
-#ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_REV(OP,FLAG)                                     \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_REV( OP, 0 );                                         \
-        return;                                                           \
-    }
+#define OP_CRITICAL_REV(OP, LCK_ID)                                            \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  (*lhs) = (rhs)OP(*lhs);                                                      \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);
+
+#ifdef KMP_GOMP_COMPAT
+#define OP_GOMP_CRITICAL_REV(OP, FLAG)                                         \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_REV(OP, 0);                                                    \
+    return;                                                                    \
+  }
 #else
-#define OP_GOMP_CRITICAL_REV(OP,FLAG)
+#define OP_GOMP_CRITICAL_REV(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
-
 // Beginning of a definition (provides name, parameters, gebug trace)
-//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned fixed)
+//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned
+//     fixed)
 //     OP_ID   - operation identifier (add, sub, mul, ...)
 //     TYPE    - operands' type
-#define ATOMIC_BEGIN_REV(TYPE_ID,OP_ID,TYPE, RET_TYPE) \
-RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID##_rev( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs ) \
-{                                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID "_rev: T#%d\n", gtid ));
+#define ATOMIC_BEGIN_REV(TYPE_ID, OP_ID, TYPE, RET_TYPE)                       \
+  RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID##_rev(ident_t *id_ref, int gtid,  \
+                                                   TYPE *lhs, TYPE rhs) {      \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID "_rev: T#%d\n", gtid));
 
 // ------------------------------------------------------------------------
 // Operation on *lhs, rhs using "compare_and_store" routine
@@ -1217,31 +1395,30 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //     OP      - operator
 // Note: temp_val introduced in order to force the compiler to read
 //       *lhs only once (w/o it the compiler reads *lhs twice)
-#define OP_CMPXCHG_REV(TYPE,BITS,OP)                                      \
-    {                                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-        TYPE old_value, new_value;                                        \
-        temp_val = *lhs;                                                  \
-        old_value = temp_val;                                             \
-        new_value = rhs OP old_value;                                     \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,         \
-                      *VOLATILE_CAST(kmp_int##BITS *) &new_value ) )      \
-        {                                                                 \
-            KMP_DO_PAUSE;                                                 \
-                                                                          \
-            temp_val = *lhs;                                              \
-            old_value = temp_val;                                         \
-            new_value = rhs OP old_value;                                 \
-        }                                                                 \
-    }
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_REV(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,GOMP_FLAG)   \
-ATOMIC_BEGIN_REV(TYPE_ID,OP_ID,TYPE,void)                                 \
-    OP_GOMP_CRITICAL_REV(OP,GOMP_FLAG)                                    \
-    OP_CMPXCHG_REV(TYPE,BITS,OP)                                          \
-}
+#define OP_CMPXCHG_REV(TYPE, BITS, OP)                                         \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    TYPE old_value, new_value;                                                 \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    new_value = rhs OP old_value;                                              \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) & old_value,     \
+        *VOLATILE_CAST(kmp_int##BITS *) & new_value)) {                        \
+      KMP_DO_PAUSE;                                                            \
+                                                                               \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+      new_value = rhs OP old_value;                                            \
+    }                                                                          \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_REV(TYPE_ID, OP_ID, TYPE, BITS, OP, LCK_ID, GOMP_FLAG)  \
+  ATOMIC_BEGIN_REV(TYPE_ID, OP_ID, TYPE, void)                                 \
+  OP_GOMP_CRITICAL_REV(OP, GOMP_FLAG)                                          \
+  OP_CMPXCHG_REV(TYPE, BITS, OP)                                               \
+  }
 
 // ------------------------------------------------------------------------
 // Entries definition for integer operands
@@ -1257,88 +1434,131 @@ ATOMIC_BEGIN_REV(TYPE_ID,OP_ID,TYPE,void
 // Routines for ATOMIC integer operands, other operators
 // ------------------------------------------------------------------------
 //                  TYPE_ID,OP_ID, TYPE,    BITS, OP, LCK_ID, GOMP_FLAG
-ATOMIC_CMPXCHG_REV( fixed1,  div, kmp_int8,    8, /,  1i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_div_rev
-ATOMIC_CMPXCHG_REV( fixed1u, div, kmp_uint8,   8, /,  1i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_div_rev
-ATOMIC_CMPXCHG_REV( fixed1,  shl, kmp_int8,    8, <<, 1i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shl_rev
-ATOMIC_CMPXCHG_REV( fixed1,  shr, kmp_int8,    8, >>, 1i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shr_rev
-ATOMIC_CMPXCHG_REV( fixed1u, shr, kmp_uint8,   8, >>, 1i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_shr_rev
-ATOMIC_CMPXCHG_REV( fixed1,  sub, kmp_int8,    8, -,  1i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_sub_rev
-
-ATOMIC_CMPXCHG_REV( fixed2,  div, kmp_int16,  16, /,  2i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_div_rev
-ATOMIC_CMPXCHG_REV( fixed2u, div, kmp_uint16, 16, /,  2i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_div_rev
-ATOMIC_CMPXCHG_REV( fixed2,  shl, kmp_int16,  16, <<, 2i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shl_rev
-ATOMIC_CMPXCHG_REV( fixed2,  shr, kmp_int16,  16, >>, 2i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shr_rev
-ATOMIC_CMPXCHG_REV( fixed2u, shr, kmp_uint16, 16, >>, 2i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_shr_rev
-ATOMIC_CMPXCHG_REV( fixed2,  sub, kmp_int16,  16, -,  2i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_sub_rev
-
-ATOMIC_CMPXCHG_REV( fixed4,  div, kmp_int32,  32, /,  4i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_div_rev
-ATOMIC_CMPXCHG_REV( fixed4u, div, kmp_uint32, 32, /,  4i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_div_rev
-ATOMIC_CMPXCHG_REV( fixed4,  shl, kmp_int32,  32, <<, 4i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shl_rev
-ATOMIC_CMPXCHG_REV( fixed4,  shr, kmp_int32,  32, >>, 4i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shr_rev
-ATOMIC_CMPXCHG_REV( fixed4u, shr, kmp_uint32, 32, >>, 4i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_shr_rev
-ATOMIC_CMPXCHG_REV( fixed4,  sub, kmp_int32,  32, -,  4i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_sub_rev
-
-ATOMIC_CMPXCHG_REV( fixed8,  div, kmp_int64,  64, /,  8i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_div_rev
-ATOMIC_CMPXCHG_REV( fixed8u, div, kmp_uint64, 64, /,  8i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_div_rev
-ATOMIC_CMPXCHG_REV( fixed8,  shl, kmp_int64,  64, <<, 8i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shl_rev
-ATOMIC_CMPXCHG_REV( fixed8,  shr, kmp_int64,  64, >>, 8i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shr_rev
-ATOMIC_CMPXCHG_REV( fixed8u, shr, kmp_uint64, 64, >>, 8i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_shr_rev
-ATOMIC_CMPXCHG_REV( fixed8,  sub, kmp_int64,  64, -,  8i, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_sub_rev
-
-ATOMIC_CMPXCHG_REV( float4,  div, kmp_real32, 32, /,  4r, KMP_ARCH_X86 )  // __kmpc_atomic_float4_div_rev
-ATOMIC_CMPXCHG_REV( float4,  sub, kmp_real32, 32, -,  4r, KMP_ARCH_X86 )  // __kmpc_atomic_float4_sub_rev
-
-ATOMIC_CMPXCHG_REV( float8,  div, kmp_real64, 64, /,  8r, KMP_ARCH_X86 )  // __kmpc_atomic_float8_div_rev
-ATOMIC_CMPXCHG_REV( float8,  sub, kmp_real64, 64, -,  8r, KMP_ARCH_X86 )  // __kmpc_atomic_float8_sub_rev
+ATOMIC_CMPXCHG_REV(fixed1, div, kmp_int8, 8, /, 1i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_rev
+ATOMIC_CMPXCHG_REV(fixed1u, div, kmp_uint8, 8, /, 1i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_rev
+ATOMIC_CMPXCHG_REV(fixed1, shl, kmp_int8, 8, <<, 1i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_shl_rev
+ATOMIC_CMPXCHG_REV(fixed1, shr, kmp_int8, 8, >>, 1i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_shr_rev
+ATOMIC_CMPXCHG_REV(fixed1u, shr, kmp_uint8, 8, >>, 1i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_shr_rev
+ATOMIC_CMPXCHG_REV(fixed1, sub, kmp_int8, 8, -, 1i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_rev
+
+ATOMIC_CMPXCHG_REV(fixed2, div, kmp_int16, 16, /, 2i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_rev
+ATOMIC_CMPXCHG_REV(fixed2u, div, kmp_uint16, 16, /, 2i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_rev
+ATOMIC_CMPXCHG_REV(fixed2, shl, kmp_int16, 16, <<, 2i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_shl_rev
+ATOMIC_CMPXCHG_REV(fixed2, shr, kmp_int16, 16, >>, 2i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_shr_rev
+ATOMIC_CMPXCHG_REV(fixed2u, shr, kmp_uint16, 16, >>, 2i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_shr_rev
+ATOMIC_CMPXCHG_REV(fixed2, sub, kmp_int16, 16, -, 2i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_rev
+
+ATOMIC_CMPXCHG_REV(fixed4, div, kmp_int32, 32, /, 4i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_div_rev
+ATOMIC_CMPXCHG_REV(fixed4u, div, kmp_uint32, 32, /, 4i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4u_div_rev
+ATOMIC_CMPXCHG_REV(fixed4, shl, kmp_int32, 32, <<, 4i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_shl_rev
+ATOMIC_CMPXCHG_REV(fixed4, shr, kmp_int32, 32, >>, 4i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_shr_rev
+ATOMIC_CMPXCHG_REV(fixed4u, shr, kmp_uint32, 32, >>, 4i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4u_shr_rev
+ATOMIC_CMPXCHG_REV(fixed4, sub, kmp_int32, 32, -, 4i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_sub_rev
+
+ATOMIC_CMPXCHG_REV(fixed8, div, kmp_int64, 64, /, 8i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_rev
+ATOMIC_CMPXCHG_REV(fixed8u, div, kmp_uint64, 64, /, 8i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_rev
+ATOMIC_CMPXCHG_REV(fixed8, shl, kmp_int64, 64, <<, 8i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_shl_rev
+ATOMIC_CMPXCHG_REV(fixed8, shr, kmp_int64, 64, >>, 8i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_shr_rev
+ATOMIC_CMPXCHG_REV(fixed8u, shr, kmp_uint64, 64, >>, 8i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_shr_rev
+ATOMIC_CMPXCHG_REV(fixed8, sub, kmp_int64, 64, -, 8i,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_rev
+
+ATOMIC_CMPXCHG_REV(float4, div, kmp_real32, 32, /, 4r,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_div_rev
+ATOMIC_CMPXCHG_REV(float4, sub, kmp_real32, 32, -, 4r,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_sub_rev
+
+ATOMIC_CMPXCHG_REV(float8, div, kmp_real64, 64, /, 8r,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_div_rev
+ATOMIC_CMPXCHG_REV(float8, sub, kmp_real64, 64, -, 8r,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_sub_rev
 //                  TYPE_ID,OP_ID, TYPE,     BITS,OP,LCK_ID, GOMP_FLAG
 
 // ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
 //     TYPE_ID, OP_ID, TYPE - detailed above
 //     OP      - operator
 //     LCK_ID  - lock identifier, used to possibly distinguish lock variable
-#define ATOMIC_CRITICAL_REV(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)           \
-ATOMIC_BEGIN_REV(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL_REV(OP,GOMP_FLAG)                                        \
-    OP_CRITICAL_REV(OP,LCK_ID)                                                \
-}
+#define ATOMIC_CRITICAL_REV(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)       \
+  ATOMIC_BEGIN_REV(TYPE_ID, OP_ID, TYPE, void)                                 \
+  OP_GOMP_CRITICAL_REV(OP, GOMP_FLAG)                                          \
+  OP_CRITICAL_REV(OP, LCK_ID)                                                  \
+  }
 
 /* ------------------------------------------------------------------------- */
 // routines for long double type
-ATOMIC_CRITICAL_REV( float10, sub, long double,     -, 10r,   1 )            // __kmpc_atomic_float10_sub_rev
-ATOMIC_CRITICAL_REV( float10, div, long double,     /, 10r,   1 )            // __kmpc_atomic_float10_div_rev
+ATOMIC_CRITICAL_REV(float10, sub, long double, -, 10r,
+                    1) // __kmpc_atomic_float10_sub_rev
+ATOMIC_CRITICAL_REV(float10, div, long double, /, 10r,
+                    1) // __kmpc_atomic_float10_div_rev
 #if KMP_HAVE_QUAD
 // routines for _Quad type
-ATOMIC_CRITICAL_REV( float16, sub, QUAD_LEGACY,     -, 16r,   1 )            // __kmpc_atomic_float16_sub_rev
-ATOMIC_CRITICAL_REV( float16, div, QUAD_LEGACY,     /, 16r,   1 )            // __kmpc_atomic_float16_div_rev
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_REV( float16, sub_a16, Quad_a16_t, -, 16r, 1 )           // __kmpc_atomic_float16_sub_a16_rev
-    ATOMIC_CRITICAL_REV( float16, div_a16, Quad_a16_t, /, 16r, 1 )           // __kmpc_atomic_float16_div_a16_rev
+ATOMIC_CRITICAL_REV(float16, sub, QUAD_LEGACY, -, 16r,
+                    1) // __kmpc_atomic_float16_sub_rev
+ATOMIC_CRITICAL_REV(float16, div, QUAD_LEGACY, /, 16r,
+                    1) // __kmpc_atomic_float16_div_rev
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_REV(float16, sub_a16, Quad_a16_t, -, 16r,
+                    1) // __kmpc_atomic_float16_sub_a16_rev
+ATOMIC_CRITICAL_REV(float16, div_a16, Quad_a16_t, /, 16r,
+                    1) // __kmpc_atomic_float16_div_a16_rev
 #endif
 #endif
 
 // routines for complex types
-ATOMIC_CRITICAL_REV( cmplx4,  sub, kmp_cmplx32,     -, 8c,    1 )            // __kmpc_atomic_cmplx4_sub_rev
-ATOMIC_CRITICAL_REV( cmplx4,  div, kmp_cmplx32,     /, 8c,    1 )            // __kmpc_atomic_cmplx4_div_rev
-ATOMIC_CRITICAL_REV( cmplx8,  sub, kmp_cmplx64,     -, 16c,   1 )            // __kmpc_atomic_cmplx8_sub_rev
-ATOMIC_CRITICAL_REV( cmplx8,  div, kmp_cmplx64,     /, 16c,   1 )            // __kmpc_atomic_cmplx8_div_rev
-ATOMIC_CRITICAL_REV( cmplx10, sub, kmp_cmplx80,     -, 20c,   1 )            // __kmpc_atomic_cmplx10_sub_rev
-ATOMIC_CRITICAL_REV( cmplx10, div, kmp_cmplx80,     /, 20c,   1 )            // __kmpc_atomic_cmplx10_div_rev
+ATOMIC_CRITICAL_REV(cmplx4, sub, kmp_cmplx32, -, 8c,
+                    1) // __kmpc_atomic_cmplx4_sub_rev
+ATOMIC_CRITICAL_REV(cmplx4, div, kmp_cmplx32, /, 8c,
+                    1) // __kmpc_atomic_cmplx4_div_rev
+ATOMIC_CRITICAL_REV(cmplx8, sub, kmp_cmplx64, -, 16c,
+                    1) // __kmpc_atomic_cmplx8_sub_rev
+ATOMIC_CRITICAL_REV(cmplx8, div, kmp_cmplx64, /, 16c,
+                    1) // __kmpc_atomic_cmplx8_div_rev
+ATOMIC_CRITICAL_REV(cmplx10, sub, kmp_cmplx80, -, 20c,
+                    1) // __kmpc_atomic_cmplx10_sub_rev
+ATOMIC_CRITICAL_REV(cmplx10, div, kmp_cmplx80, /, 20c,
+                    1) // __kmpc_atomic_cmplx10_div_rev
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_REV( cmplx16, sub, CPLX128_LEG,     -, 32c,   1 )            // __kmpc_atomic_cmplx16_sub_rev
-ATOMIC_CRITICAL_REV( cmplx16, div, CPLX128_LEG,     /, 32c,   1 )            // __kmpc_atomic_cmplx16_div_rev
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_REV( cmplx16, sub_a16, kmp_cmplx128_a16_t, -, 32c, 1 )   // __kmpc_atomic_cmplx16_sub_a16_rev
-    ATOMIC_CRITICAL_REV( cmplx16, div_a16, kmp_cmplx128_a16_t, /, 32c, 1 )   // __kmpc_atomic_cmplx16_div_a16_rev
+ATOMIC_CRITICAL_REV(cmplx16, sub, CPLX128_LEG, -, 32c,
+                    1) // __kmpc_atomic_cmplx16_sub_rev
+ATOMIC_CRITICAL_REV(cmplx16, div, CPLX128_LEG, /, 32c,
+                    1) // __kmpc_atomic_cmplx16_div_rev
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_REV(cmplx16, sub_a16, kmp_cmplx128_a16_t, -, 32c,
+                    1) // __kmpc_atomic_cmplx16_sub_a16_rev
+ATOMIC_CRITICAL_REV(cmplx16, div_a16, kmp_cmplx128_a16_t, /, 32c,
+                    1) // __kmpc_atomic_cmplx16_div_a16_rev
 #endif
 #endif
 
-
-#endif //KMP_ARCH_X86 || KMP_ARCH_X86_64
+#endif // KMP_ARCH_X86 || KMP_ARCH_X86_64
 // End of OpenMP 4.0: x = expr binop x for non-commutative operations.
 
-#endif //OMP_40_ENABLED
-
+#endif // OMP_40_ENABLED
 
 /* ------------------------------------------------------------------------ */
 /* Routines for mixed types of LHS and RHS, when RHS is "larger"            */
@@ -1351,156 +1571,242 @@ ATOMIC_CRITICAL_REV( cmplx16, div, CPLX1
 /* Performance penalty expected because of SW emulation use                 */
 /* ------------------------------------------------------------------------ */
 
-#define ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                             \
-void __kmpc_atomic_##TYPE_ID##_##OP_ID##_##RTYPE_ID( ident_t *id_ref, int gtid, TYPE * lhs, RTYPE rhs ) \
-{                                                                                                       \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                              \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID "_" #RTYPE_ID ": T#%d\n", gtid ));
+#define ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                \
+  void __kmpc_atomic_##TYPE_ID##_##OP_ID##_##RTYPE_ID(                         \
+      ident_t *id_ref, int gtid, TYPE *lhs, RTYPE rhs) {                       \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100,                                                              \
+             ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID "_" #RTYPE_ID ": T#%d\n",   \
+              gtid));
 
 // -------------------------------------------------------------------------
-#define ATOMIC_CRITICAL_FP(TYPE_ID,TYPE,OP_ID,OP,RTYPE_ID,RTYPE,LCK_ID,GOMP_FLAG)         \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                       \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)  /* send assignment */                              \
-    OP_CRITICAL(OP##=,LCK_ID)  /* send assignment */                                      \
-}
+#define ATOMIC_CRITICAL_FP(TYPE_ID, TYPE, OP_ID, OP, RTYPE_ID, RTYPE, LCK_ID,  \
+                           GOMP_FLAG)                                          \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG) /* send assignment */                    \
+  OP_CRITICAL(OP## =, LCK_ID) /* send assignment */                            \
+  }
 
 // -------------------------------------------------------------------------
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 // -------------------------------------------------------------------------
 // X86 or X86_64: no alignment problems ====================================
-#define ATOMIC_CMPXCHG_MIX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                         \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                       \
-    OP_CMPXCHG(TYPE,BITS,OP)                                                                \
-}
+#define ATOMIC_CMPXCHG_MIX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID, RTYPE,    \
+                           LCK_ID, MASK, GOMP_FLAG)                            \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  OP_CMPXCHG(TYPE, BITS, OP)                                                   \
+  }
 // -------------------------------------------------------------------------
 #else
 // ------------------------------------------------------------------------
 // Code for other architectures that don't handle unaligned accesses.
-#define ATOMIC_CMPXCHG_MIX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                         \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                       \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                                            \
-        OP_CMPXCHG(TYPE,BITS,OP)     /* aligned address */                                  \
-    } else {                                                                                \
-        KMP_CHECK_GTID;                                                                     \
-        OP_CRITICAL(OP##=,LCK_ID)  /* unaligned address - use critical */                   \
-    }                                                                                       \
-}
+#define ATOMIC_CMPXCHG_MIX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID, RTYPE,    \
+                           LCK_ID, MASK, GOMP_FLAG)                            \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    OP_CMPXCHG(TYPE, BITS, OP) /* aligned address */                           \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(OP## =, LCK_ID) /* unaligned address - use critical */         \
+  }                                                                            \
+  }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
 // -------------------------------------------------------------------------
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 // -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_REV_MIX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                         \
-    OP_GOMP_CRITICAL_REV(OP,GOMP_FLAG)                                                       \
-    OP_CMPXCHG_REV(TYPE,BITS,OP)                                                                \
-}
-#define ATOMIC_CRITICAL_REV_FP(TYPE_ID,TYPE,OP_ID,OP,RTYPE_ID,RTYPE,LCK_ID,GOMP_FLAG)           \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                         \
-    OP_GOMP_CRITICAL_REV(OP,GOMP_FLAG)                                        \
-    OP_CRITICAL_REV(OP,LCK_ID)                                                \
-}
+#define ATOMIC_CMPXCHG_REV_MIX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID,       \
+                               RTYPE, LCK_ID, MASK, GOMP_FLAG)                 \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL_REV(OP, GOMP_FLAG)                                          \
+  OP_CMPXCHG_REV(TYPE, BITS, OP)                                               \
+  }
+#define ATOMIC_CRITICAL_REV_FP(TYPE_ID, TYPE, OP_ID, OP, RTYPE_ID, RTYPE,      \
+                               LCK_ID, GOMP_FLAG)                              \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL_REV(OP, GOMP_FLAG)                                          \
+  OP_CRITICAL_REV(OP, LCK_ID)                                                  \
+  }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
 // RHS=float8
-ATOMIC_CMPXCHG_MIX( fixed1, char,       mul,  8, *, float8, kmp_real64, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_mul_float8
-ATOMIC_CMPXCHG_MIX( fixed1, char,       div,  8, /, float8, kmp_real64, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_div_float8
-ATOMIC_CMPXCHG_MIX( fixed2, short,      mul, 16, *, float8, kmp_real64, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_mul_float8
-ATOMIC_CMPXCHG_MIX( fixed2, short,      div, 16, /, float8, kmp_real64, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_div_float8
-ATOMIC_CMPXCHG_MIX( fixed4, kmp_int32,  mul, 32, *, float8, kmp_real64, 4i, 3, 0 )            // __kmpc_atomic_fixed4_mul_float8
-ATOMIC_CMPXCHG_MIX( fixed4, kmp_int32,  div, 32, /, float8, kmp_real64, 4i, 3, 0 )            // __kmpc_atomic_fixed4_div_float8
-ATOMIC_CMPXCHG_MIX( fixed8, kmp_int64,  mul, 64, *, float8, kmp_real64, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_mul_float8
-ATOMIC_CMPXCHG_MIX( fixed8, kmp_int64,  div, 64, /, float8, kmp_real64, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_div_float8
-ATOMIC_CMPXCHG_MIX( float4, kmp_real32, add, 32, +, float8, kmp_real64, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add_float8
-ATOMIC_CMPXCHG_MIX( float4, kmp_real32, sub, 32, -, float8, kmp_real64, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_float8
-ATOMIC_CMPXCHG_MIX( float4, kmp_real32, mul, 32, *, float8, kmp_real64, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_mul_float8
-ATOMIC_CMPXCHG_MIX( float4, kmp_real32, div, 32, /, float8, kmp_real64, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_div_float8
+ATOMIC_CMPXCHG_MIX(fixed1, char, mul, 8, *, float8, kmp_real64, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_mul_float8
+ATOMIC_CMPXCHG_MIX(fixed1, char, div, 8, /, float8, kmp_real64, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_float8
+ATOMIC_CMPXCHG_MIX(fixed2, short, mul, 16, *, float8, kmp_real64, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_mul_float8
+ATOMIC_CMPXCHG_MIX(fixed2, short, div, 16, /, float8, kmp_real64, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_float8
+ATOMIC_CMPXCHG_MIX(fixed4, kmp_int32, mul, 32, *, float8, kmp_real64, 4i, 3,
+                   0) // __kmpc_atomic_fixed4_mul_float8
+ATOMIC_CMPXCHG_MIX(fixed4, kmp_int32, div, 32, /, float8, kmp_real64, 4i, 3,
+                   0) // __kmpc_atomic_fixed4_div_float8
+ATOMIC_CMPXCHG_MIX(fixed8, kmp_int64, mul, 64, *, float8, kmp_real64, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_mul_float8
+ATOMIC_CMPXCHG_MIX(fixed8, kmp_int64, div, 64, /, float8, kmp_real64, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_float8
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, add, 32, +, float8, kmp_real64, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_add_float8
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, sub, 32, -, float8, kmp_real64, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_sub_float8
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, mul, 32, *, float8, kmp_real64, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_mul_float8
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, div, 32, /, float8, kmp_real64, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_div_float8
 
-// RHS=float16 (deprecated, to be removed when we are sure the compiler does not use them)
+// RHS=float16 (deprecated, to be removed when we are sure the compiler does not
+// use them)
 #if KMP_HAVE_QUAD
-ATOMIC_CMPXCHG_MIX( fixed1,  char,       add,  8, +, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_add_fp
-ATOMIC_CMPXCHG_MIX( fixed1u, uchar,      add,  8, +, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_add_fp
-ATOMIC_CMPXCHG_MIX( fixed1,  char,       sub,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed1u, uchar,      sub,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed1,  char,       mul,  8, *, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed1u, uchar,      mul,  8, *, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed1,  char,       div,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_div_fp
-ATOMIC_CMPXCHG_MIX( fixed1u, uchar,      div,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_div_fp
-
-ATOMIC_CMPXCHG_MIX( fixed2,  short,      add, 16, +, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_add_fp
-ATOMIC_CMPXCHG_MIX( fixed2u, ushort,     add, 16, +, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_add_fp
-ATOMIC_CMPXCHG_MIX( fixed2,  short,      sub, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed2u, ushort,     sub, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed2,  short,      mul, 16, *, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed2u, ushort,     mul, 16, *, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed2,  short,      div, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_div_fp
-ATOMIC_CMPXCHG_MIX( fixed2u, ushort,     div, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_div_fp
-
-ATOMIC_CMPXCHG_MIX( fixed4,  kmp_int32,  add, 32, +, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_add_fp
-ATOMIC_CMPXCHG_MIX( fixed4u, kmp_uint32, add, 32, +, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_add_fp
-ATOMIC_CMPXCHG_MIX( fixed4,  kmp_int32,  sub, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed4u, kmp_uint32, sub, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed4,  kmp_int32,  mul, 32, *, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed4u, kmp_uint32, mul, 32, *, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed4,  kmp_int32,  div, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_div_fp
-ATOMIC_CMPXCHG_MIX( fixed4u, kmp_uint32, div, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_div_fp
-
-ATOMIC_CMPXCHG_MIX( fixed8,  kmp_int64,  add, 64, +, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_add_fp
-ATOMIC_CMPXCHG_MIX( fixed8u, kmp_uint64, add, 64, +, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_add_fp
-ATOMIC_CMPXCHG_MIX( fixed8,  kmp_int64,  sub, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed8u, kmp_uint64, sub, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_sub_fp
-ATOMIC_CMPXCHG_MIX( fixed8,  kmp_int64,  mul, 64, *, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed8u, kmp_uint64, mul, 64, *, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_mul_fp
-ATOMIC_CMPXCHG_MIX( fixed8,  kmp_int64,  div, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_div_fp
-ATOMIC_CMPXCHG_MIX( fixed8u, kmp_uint64, div, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_div_fp
-
-ATOMIC_CMPXCHG_MIX( float4,  kmp_real32, add, 32, +, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add_fp
-ATOMIC_CMPXCHG_MIX( float4,  kmp_real32, sub, 32, -, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_fp
-ATOMIC_CMPXCHG_MIX( float4,  kmp_real32, mul, 32, *, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_mul_fp
-ATOMIC_CMPXCHG_MIX( float4,  kmp_real32, div, 32, /, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_div_fp
-
-ATOMIC_CMPXCHG_MIX( float8,  kmp_real64, add, 64, +, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_add_fp
-ATOMIC_CMPXCHG_MIX( float8,  kmp_real64, sub, 64, -, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub_fp
-ATOMIC_CMPXCHG_MIX( float8,  kmp_real64, mul, 64, *, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_mul_fp
-ATOMIC_CMPXCHG_MIX( float8,  kmp_real64, div, 64, /, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_div_fp
-
-ATOMIC_CRITICAL_FP( float10, long double,    add, +, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_add_fp
-ATOMIC_CRITICAL_FP( float10, long double,    sub, -, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_sub_fp
-ATOMIC_CRITICAL_FP( float10, long double,    mul, *, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_mul_fp
-ATOMIC_CRITICAL_FP( float10, long double,    div, /, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_div_fp
+ATOMIC_CMPXCHG_MIX(fixed1, char, add, 8, +, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_add_fp
+ATOMIC_CMPXCHG_MIX(fixed1u, uchar, add, 8, +, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_add_fp
+ATOMIC_CMPXCHG_MIX(fixed1, char, sub, 8, -, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed1u, uchar, sub, 8, -, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed1, char, mul, 8, *, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed1u, uchar, mul, 8, *, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed1, char, div, 8, /, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_fp
+ATOMIC_CMPXCHG_MIX(fixed1u, uchar, div, 8, /, fp, _Quad, 1i, 0,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_fp
+
+ATOMIC_CMPXCHG_MIX(fixed2, short, add, 16, +, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_add_fp
+ATOMIC_CMPXCHG_MIX(fixed2u, ushort, add, 16, +, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_add_fp
+ATOMIC_CMPXCHG_MIX(fixed2, short, sub, 16, -, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed2u, ushort, sub, 16, -, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed2, short, mul, 16, *, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed2u, ushort, mul, 16, *, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed2, short, div, 16, /, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_fp
+ATOMIC_CMPXCHG_MIX(fixed2u, ushort, div, 16, /, fp, _Quad, 2i, 1,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_fp
+
+ATOMIC_CMPXCHG_MIX(fixed4, kmp_int32, add, 32, +, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4_add_fp
+ATOMIC_CMPXCHG_MIX(fixed4u, kmp_uint32, add, 32, +, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4u_add_fp
+ATOMIC_CMPXCHG_MIX(fixed4, kmp_int32, sub, 32, -, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed4u, kmp_uint32, sub, 32, -, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4u_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed4, kmp_int32, mul, 32, *, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed4u, kmp_uint32, mul, 32, *, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4u_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed4, kmp_int32, div, 32, /, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4_div_fp
+ATOMIC_CMPXCHG_MIX(fixed4u, kmp_uint32, div, 32, /, fp, _Quad, 4i, 3,
+                   0) // __kmpc_atomic_fixed4u_div_fp
+
+ATOMIC_CMPXCHG_MIX(fixed8, kmp_int64, add, 64, +, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_add_fp
+ATOMIC_CMPXCHG_MIX(fixed8u, kmp_uint64, add, 64, +, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_add_fp
+ATOMIC_CMPXCHG_MIX(fixed8, kmp_int64, sub, 64, -, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed8u, kmp_uint64, sub, 64, -, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_sub_fp
+ATOMIC_CMPXCHG_MIX(fixed8, kmp_int64, mul, 64, *, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed8u, kmp_uint64, mul, 64, *, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_mul_fp
+ATOMIC_CMPXCHG_MIX(fixed8, kmp_int64, div, 64, /, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_fp
+ATOMIC_CMPXCHG_MIX(fixed8u, kmp_uint64, div, 64, /, fp, _Quad, 8i, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_fp
+
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, add, 32, +, fp, _Quad, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_add_fp
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, sub, 32, -, fp, _Quad, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_sub_fp
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, mul, 32, *, fp, _Quad, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_mul_fp
+ATOMIC_CMPXCHG_MIX(float4, kmp_real32, div, 32, /, fp, _Quad, 4r, 3,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_div_fp
+
+ATOMIC_CMPXCHG_MIX(float8, kmp_real64, add, 64, +, fp, _Quad, 8r, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_add_fp
+ATOMIC_CMPXCHG_MIX(float8, kmp_real64, sub, 64, -, fp, _Quad, 8r, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_sub_fp
+ATOMIC_CMPXCHG_MIX(float8, kmp_real64, mul, 64, *, fp, _Quad, 8r, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_mul_fp
+ATOMIC_CMPXCHG_MIX(float8, kmp_real64, div, 64, /, fp, _Quad, 8r, 7,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_div_fp
+
+ATOMIC_CRITICAL_FP(float10, long double, add, +, fp, _Quad, 10r,
+                   1) // __kmpc_atomic_float10_add_fp
+ATOMIC_CRITICAL_FP(float10, long double, sub, -, fp, _Quad, 10r,
+                   1) // __kmpc_atomic_float10_sub_fp
+ATOMIC_CRITICAL_FP(float10, long double, mul, *, fp, _Quad, 10r,
+                   1) // __kmpc_atomic_float10_mul_fp
+ATOMIC_CRITICAL_FP(float10, long double, div, /, fp, _Quad, 10r,
+                   1) // __kmpc_atomic_float10_div_fp
 
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 // Reverse operations
-ATOMIC_CMPXCHG_REV_MIX( fixed1,  char,       sub_rev,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed1u, uchar,      sub_rev,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed1,  char,       div_rev,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_div_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed1u, uchar,      div_rev,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_div_rev_fp
-
-ATOMIC_CMPXCHG_REV_MIX( fixed2,  short,      sub_rev, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed2u, ushort,     sub_rev, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed2,  short,      div_rev, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_div_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed2u, ushort,     div_rev, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_div_rev_fp
-
-ATOMIC_CMPXCHG_REV_MIX( fixed4,  kmp_int32,  sub_rev, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed4u, kmp_uint32, sub_rev, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed4,  kmp_int32,  div_rev, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_div_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed4u, kmp_uint32, div_rev, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_div_rev_fp
-
-ATOMIC_CMPXCHG_REV_MIX( fixed8,  kmp_int64,  sub_rev, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed8u, kmp_uint64, sub_rev, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed8,  kmp_int64,  div_rev, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_div_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( fixed8u, kmp_uint64, div_rev, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_div_rev_fp
-
-ATOMIC_CMPXCHG_REV_MIX( float4,  kmp_real32, sub_rev, 32, -, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( float4,  kmp_real32, div_rev, 32, /, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_div_rev_fp
-
-ATOMIC_CMPXCHG_REV_MIX( float8,  kmp_real64, sub_rev, 64, -, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub_rev_fp
-ATOMIC_CMPXCHG_REV_MIX( float8,  kmp_real64, div_rev, 64, /, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_div_rev_fp
-
-ATOMIC_CRITICAL_REV_FP( float10, long double,    sub_rev, -, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_sub_rev_fp
-ATOMIC_CRITICAL_REV_FP( float10, long double,    div_rev, /, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_div_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed1, char, sub_rev, 8, -, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed1u, uchar, sub_rev, 8, -, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed1, char, div_rev, 8, /, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed1u, uchar, div_rev, 8, /, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_rev_fp
+
+ATOMIC_CMPXCHG_REV_MIX(fixed2, short, sub_rev, 16, -, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed2u, ushort, sub_rev, 16, -, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed2, short, div_rev, 16, /, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed2u, ushort, div_rev, 16, /, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_rev_fp
+
+ATOMIC_CMPXCHG_REV_MIX(fixed4, kmp_int32, sub_rev, 32, -, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed4u, kmp_uint32, sub_rev, 32, -, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4u_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed4, kmp_int32, div_rev, 32, /, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4_div_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed4u, kmp_uint32, div_rev, 32, /, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4u_div_rev_fp
+
+ATOMIC_CMPXCHG_REV_MIX(fixed8, kmp_int64, sub_rev, 64, -, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed8u, kmp_uint64, sub_rev, 64, -, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed8, kmp_int64, div_rev, 64, /, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(fixed8u, kmp_uint64, div_rev, 64, /, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_rev_fp
+
+ATOMIC_CMPXCHG_REV_MIX(float4, kmp_real32, sub_rev, 32, -, fp, _Quad, 4r, 3,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(float4, kmp_real32, div_rev, 32, /, fp, _Quad, 4r, 3,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_div_rev_fp
+
+ATOMIC_CMPXCHG_REV_MIX(float8, kmp_real64, sub_rev, 64, -, fp, _Quad, 8r, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_sub_rev_fp
+ATOMIC_CMPXCHG_REV_MIX(float8, kmp_real64, div_rev, 64, /, fp, _Quad, 8r, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_div_rev_fp
+
+ATOMIC_CRITICAL_REV_FP(float10, long double, sub_rev, -, fp, _Quad, 10r,
+                       1) // __kmpc_atomic_float10_sub_rev_fp
+ATOMIC_CRITICAL_REV_FP(float10, long double, div_rev, /, fp, _Quad, 10r,
+                       1) // __kmpc_atomic_float10_div_rev_fp
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
 #endif
@@ -1510,57 +1816,63 @@ ATOMIC_CRITICAL_REV_FP( float10, long do
 // X86 or X86_64: no alignment problems ====================================
 #if USE_CMPXCHG_FIX
 // workaround for C78287 (complex(kind=4) data type)
-#define ATOMIC_CMPXCHG_CMPLX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                           \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                         \
-    OP_CMPXCHG_WORKAROUND(TYPE,BITS,OP)                                                       \
-}
+#define ATOMIC_CMPXCHG_CMPLX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID, RTYPE,  \
+                             LCK_ID, MASK, GOMP_FLAG)                          \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  OP_CMPXCHG_WORKAROUND(TYPE, BITS, OP)                                        \
+  }
 // end of the second part of the workaround for C78287
 #else
-#define ATOMIC_CMPXCHG_CMPLX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                           \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                         \
-    OP_CMPXCHG(TYPE,BITS,OP)                                                                  \
-}
+#define ATOMIC_CMPXCHG_CMPLX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID, RTYPE,  \
+                             LCK_ID, MASK, GOMP_FLAG)                          \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  OP_CMPXCHG(TYPE, BITS, OP)                                                   \
+  }
 #endif // USE_CMPXCHG_FIX
 #else
 // ------------------------------------------------------------------------
 // Code for other architectures that don't handle unaligned accesses.
-#define ATOMIC_CMPXCHG_CMPLX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG) \
-ATOMIC_BEGIN_MIX(TYPE_ID,TYPE,OP_ID,RTYPE_ID,RTYPE)                                           \
-    OP_GOMP_CRITICAL(OP##=,GOMP_FLAG)                                                         \
-    if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) {                                              \
-        OP_CMPXCHG(TYPE,BITS,OP)     /* aligned address */                                    \
-    } else {                                                                                  \
-        KMP_CHECK_GTID;                                                                       \
-        OP_CRITICAL(OP##=,LCK_ID)  /* unaligned address - use critical */                     \
-    }                                                                                         \
-}
+#define ATOMIC_CMPXCHG_CMPLX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID, RTYPE,  \
+                             LCK_ID, MASK, GOMP_FLAG)                          \
+  ATOMIC_BEGIN_MIX(TYPE_ID, TYPE, OP_ID, RTYPE_ID, RTYPE)                      \
+  OP_GOMP_CRITICAL(OP## =, GOMP_FLAG)                                          \
+  if (!((kmp_uintptr_t)lhs & 0x##MASK)) {                                      \
+    OP_CMPXCHG(TYPE, BITS, OP) /* aligned address */                           \
+  } else {                                                                     \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL(OP## =, LCK_ID) /* unaligned address - use critical */         \
+  }                                                                            \
+  }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
-ATOMIC_CMPXCHG_CMPLX( cmplx4, kmp_cmplx32, add, 64, +, cmplx8,  kmp_cmplx64,  8c, 7, KMP_ARCH_X86 ) // __kmpc_atomic_cmplx4_add_cmplx8
-ATOMIC_CMPXCHG_CMPLX( cmplx4, kmp_cmplx32, sub, 64, -, cmplx8,  kmp_cmplx64,  8c, 7, KMP_ARCH_X86 ) // __kmpc_atomic_cmplx4_sub_cmplx8
-ATOMIC_CMPXCHG_CMPLX( cmplx4, kmp_cmplx32, mul, 64, *, cmplx8,  kmp_cmplx64,  8c, 7, KMP_ARCH_X86 ) // __kmpc_atomic_cmplx4_mul_cmplx8
-ATOMIC_CMPXCHG_CMPLX( cmplx4, kmp_cmplx32, div, 64, /, cmplx8,  kmp_cmplx64,  8c, 7, KMP_ARCH_X86 ) // __kmpc_atomic_cmplx4_div_cmplx8
+ATOMIC_CMPXCHG_CMPLX(cmplx4, kmp_cmplx32, add, 64, +, cmplx8, kmp_cmplx64, 8c,
+                     7, KMP_ARCH_X86) // __kmpc_atomic_cmplx4_add_cmplx8
+ATOMIC_CMPXCHG_CMPLX(cmplx4, kmp_cmplx32, sub, 64, -, cmplx8, kmp_cmplx64, 8c,
+                     7, KMP_ARCH_X86) // __kmpc_atomic_cmplx4_sub_cmplx8
+ATOMIC_CMPXCHG_CMPLX(cmplx4, kmp_cmplx32, mul, 64, *, cmplx8, kmp_cmplx64, 8c,
+                     7, KMP_ARCH_X86) // __kmpc_atomic_cmplx4_mul_cmplx8
+ATOMIC_CMPXCHG_CMPLX(cmplx4, kmp_cmplx32, div, 64, /, cmplx8, kmp_cmplx64, 8c,
+                     7, KMP_ARCH_X86) // __kmpc_atomic_cmplx4_div_cmplx8
 
 // READ, WRITE, CAPTURE are supported only on IA-32 architecture and Intel(R) 64
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 
-//////////////////////////////////////////////////////////////////////////////////////////////////////
 // ------------------------------------------------------------------------
 // Atomic READ routines
-// ------------------------------------------------------------------------
 
 // ------------------------------------------------------------------------
 // Beginning of a definition (provides name, parameters, gebug trace)
-//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned fixed)
+//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned
+//     fixed)
 //     OP_ID   - operation identifier (add, sub, mul, ...)
 //     TYPE    - operands' type
-#define ATOMIC_BEGIN_READ(TYPE_ID,OP_ID,TYPE, RET_TYPE) \
-RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * loc ) \
-{                                                                                   \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                          \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid ));
+#define ATOMIC_BEGIN_READ(TYPE_ID, OP_ID, TYPE, RET_TYPE)                      \
+  RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID(ident_t *id_ref, int gtid,        \
+                                             TYPE *loc) {                      \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid));
 
 // ------------------------------------------------------------------------
 // Operation on *lhs, rhs using "compare_and_store_ret" routine
@@ -1571,23 +1883,23 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //       *lhs only once (w/o it the compiler reads *lhs twice)
 // TODO: check if it is still necessary
 // Return old value regardless of the result of "compare & swap# operation
-
-#define OP_CMPXCHG_READ(TYPE,BITS,OP)                                     \
-    {                                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-        union f_i_union {                                                 \
-            TYPE f_val;                                                   \
-            kmp_int##BITS i_val;                                          \
-        };                                                                \
-        union f_i_union old_value;                                        \
-        temp_val = *loc;                                                  \
-        old_value.f_val = temp_val;                                       \
-        old_value.i_val = KMP_COMPARE_AND_STORE_RET##BITS( (kmp_int##BITS *) loc, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value.i_val,   \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value.i_val ); \
-        new_value = old_value.f_val;                                      \
-        return new_value;                                                 \
-    }
+#define OP_CMPXCHG_READ(TYPE, BITS, OP)                                        \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    union f_i_union {                                                          \
+      TYPE f_val;                                                              \
+      kmp_int##BITS i_val;                                                     \
+    };                                                                         \
+    union f_i_union old_value;                                                 \
+    temp_val = *loc;                                                           \
+    old_value.f_val = temp_val;                                                \
+    old_value.i_val = KMP_COMPARE_AND_STORE_RET##BITS(                         \
+        (kmp_int##BITS *)loc,                                                  \
+        *VOLATILE_CAST(kmp_int##BITS *) & old_value.i_val,                     \
+        *VOLATILE_CAST(kmp_int##BITS *) & old_value.i_val);                    \
+    new_value = old_value.f_val;                                               \
+    return new_value;                                                          \
+  }
 
 // -------------------------------------------------------------------------
 // Operation on *lhs, rhs bound by critical section
@@ -1595,140 +1907,152 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //     LCK_ID - lock identifier
 // Note: don't check gtid as it should always be valid
 // 1, 2-byte - expect valid parameter, other - check before this macro
-#define OP_CRITICAL_READ(OP,LCK_ID)                                       \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                    \
-                                                                          \
-    new_value = (*loc);                                                   \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );
+#define OP_CRITICAL_READ(OP, LCK_ID)                                           \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  new_value = (*loc);                                                          \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);
 
 // -------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_READ(OP,FLAG)                                    \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_READ( OP, 0 );                                        \
-        return new_value;                                                 \
-    }
-#else
-#define OP_GOMP_CRITICAL_READ(OP,FLAG)
+#define OP_GOMP_CRITICAL_READ(OP, FLAG)                                        \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_READ(OP, 0);                                                   \
+    return new_value;                                                          \
+  }
+#else
+#define OP_GOMP_CRITICAL_READ(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 // -------------------------------------------------------------------------
-#define ATOMIC_FIXED_READ(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)           \
-ATOMIC_BEGIN_READ(TYPE_ID,OP_ID,TYPE,TYPE)                                \
-    TYPE new_value;                                                       \
-    OP_GOMP_CRITICAL_READ(OP##=,GOMP_FLAG)                                \
-    new_value = KMP_TEST_THEN_ADD##BITS( loc, OP 0 );                     \
-    return new_value;                                                     \
-}
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_READ(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)         \
-ATOMIC_BEGIN_READ(TYPE_ID,OP_ID,TYPE,TYPE)                                \
-    TYPE new_value;                                                       \
-    OP_GOMP_CRITICAL_READ(OP##=,GOMP_FLAG)                                \
-    OP_CMPXCHG_READ(TYPE,BITS,OP)                                         \
-}
+#define ATOMIC_FIXED_READ(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)           \
+  ATOMIC_BEGIN_READ(TYPE_ID, OP_ID, TYPE, TYPE)                                \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_READ(OP## =, GOMP_FLAG)                                     \
+  new_value = KMP_TEST_THEN_ADD##BITS(loc, OP 0);                              \
+  return new_value;                                                            \
+  }
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_READ(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)         \
+  ATOMIC_BEGIN_READ(TYPE_ID, OP_ID, TYPE, TYPE)                                \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_READ(OP## =, GOMP_FLAG)                                     \
+  OP_CMPXCHG_READ(TYPE, BITS, OP)                                              \
+  }
 // ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
 //     TYPE_ID, OP_ID, TYPE - detailed above
 //     OP      - operator
 //     LCK_ID  - lock identifier, used to possibly distinguish lock variable
-#define ATOMIC_CRITICAL_READ(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)      \
-ATOMIC_BEGIN_READ(TYPE_ID,OP_ID,TYPE,TYPE)                                \
-    TYPE new_value;                                                       \
-    OP_GOMP_CRITICAL_READ(OP##=,GOMP_FLAG)  /* send assignment */         \
-    OP_CRITICAL_READ(OP,LCK_ID)          /* send assignment */            \
-    return new_value;                                                     \
-}
+#define ATOMIC_CRITICAL_READ(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)      \
+  ATOMIC_BEGIN_READ(TYPE_ID, OP_ID, TYPE, TYPE)                                \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_READ(OP## =, GOMP_FLAG) /* send assignment */               \
+  OP_CRITICAL_READ(OP, LCK_ID) /* send assignment */                           \
+  return new_value;                                                            \
+  }
 
 // ------------------------------------------------------------------------
-// Fix for cmplx4 read (CQ220361) on Windows* OS. Regular routine with return value doesn't work.
+// Fix for cmplx4 read (CQ220361) on Windows* OS. Regular routine with return
+// value doesn't work.
 // Let's return the read value through the additional parameter.
+#if (KMP_OS_WINDOWS)
 
-#if ( KMP_OS_WINDOWS )
-
-#define OP_CRITICAL_READ_WRK(OP,LCK_ID)                                   \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                    \
-                                                                          \
-    (*out) = (*loc);                                                      \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );
+#define OP_CRITICAL_READ_WRK(OP, LCK_ID)                                       \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  (*out) = (*loc);                                                             \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);
 // ------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_READ_WRK(OP,FLAG)                                \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_READ_WRK( OP, 0 );                                    \
-    }
+#define OP_GOMP_CRITICAL_READ_WRK(OP, FLAG)                                    \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_READ_WRK(OP, 0);                                               \
+  }
 #else
-#define OP_GOMP_CRITICAL_READ_WRK(OP,FLAG)
+#define OP_GOMP_CRITICAL_READ_WRK(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 // ------------------------------------------------------------------------
-#define ATOMIC_BEGIN_READ_WRK(TYPE_ID,OP_ID,TYPE) \
-void __kmpc_atomic_##TYPE_ID##_##OP_ID( TYPE * out, ident_t *id_ref, int gtid, TYPE * loc ) \
-{                                                                                   \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                          \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid ));
+#define ATOMIC_BEGIN_READ_WRK(TYPE_ID, OP_ID, TYPE)                            \
+  void __kmpc_atomic_##TYPE_ID##_##OP_ID(TYPE *out, ident_t *id_ref, int gtid, \
+                                         TYPE *loc) {                          \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid));
 
 // ------------------------------------------------------------------------
-#define ATOMIC_CRITICAL_READ_WRK(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)      \
-ATOMIC_BEGIN_READ_WRK(TYPE_ID,OP_ID,TYPE)                                     \
-    OP_GOMP_CRITICAL_READ_WRK(OP##=,GOMP_FLAG)  /* send assignment */         \
-    OP_CRITICAL_READ_WRK(OP,LCK_ID)          /* send assignment */            \
-}
+#define ATOMIC_CRITICAL_READ_WRK(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)  \
+  ATOMIC_BEGIN_READ_WRK(TYPE_ID, OP_ID, TYPE)                                  \
+  OP_GOMP_CRITICAL_READ_WRK(OP## =, GOMP_FLAG) /* send assignment */           \
+  OP_CRITICAL_READ_WRK(OP, LCK_ID) /* send assignment */                       \
+  }
 
 #endif // KMP_OS_WINDOWS
 
 // ------------------------------------------------------------------------
 //                  TYPE_ID,OP_ID, TYPE,      OP, GOMP_FLAG
-ATOMIC_FIXED_READ( fixed4, rd, kmp_int32,  32, +, 0            )      // __kmpc_atomic_fixed4_rd
-ATOMIC_FIXED_READ( fixed8, rd, kmp_int64,  64, +, KMP_ARCH_X86 )      // __kmpc_atomic_fixed8_rd
-ATOMIC_CMPXCHG_READ( float4, rd, kmp_real32, 32, +, KMP_ARCH_X86 )    // __kmpc_atomic_float4_rd
-ATOMIC_CMPXCHG_READ( float8, rd, kmp_real64, 64, +, KMP_ARCH_X86 )    // __kmpc_atomic_float8_rd
+ATOMIC_FIXED_READ(fixed4, rd, kmp_int32, 32, +, 0) // __kmpc_atomic_fixed4_rd
+ATOMIC_FIXED_READ(fixed8, rd, kmp_int64, 64, +,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed8_rd
+ATOMIC_CMPXCHG_READ(float4, rd, kmp_real32, 32, +,
+                    KMP_ARCH_X86) // __kmpc_atomic_float4_rd
+ATOMIC_CMPXCHG_READ(float8, rd, kmp_real64, 64, +,
+                    KMP_ARCH_X86) // __kmpc_atomic_float8_rd
 
 // !!! TODO: Remove lock operations for "char" since it can't be non-atomic
-ATOMIC_CMPXCHG_READ( fixed1,  rd, kmp_int8,    8, +,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_rd
-ATOMIC_CMPXCHG_READ( fixed2,  rd, kmp_int16,  16, +,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_rd
+ATOMIC_CMPXCHG_READ(fixed1, rd, kmp_int8, 8, +,
+                    KMP_ARCH_X86) // __kmpc_atomic_fixed1_rd
+ATOMIC_CMPXCHG_READ(fixed2, rd, kmp_int16, 16, +,
+                    KMP_ARCH_X86) // __kmpc_atomic_fixed2_rd
 
-ATOMIC_CRITICAL_READ( float10, rd, long double, +, 10r,   1 )         // __kmpc_atomic_float10_rd
+ATOMIC_CRITICAL_READ(float10, rd, long double, +, 10r,
+                     1) // __kmpc_atomic_float10_rd
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_READ( float16, rd, QUAD_LEGACY, +, 16r,   1 )         // __kmpc_atomic_float16_rd
+ATOMIC_CRITICAL_READ(float16, rd, QUAD_LEGACY, +, 16r,
+                     1) // __kmpc_atomic_float16_rd
 #endif // KMP_HAVE_QUAD
 
 // Fix for CQ220361 on Windows* OS
-#if ( KMP_OS_WINDOWS )
-    ATOMIC_CRITICAL_READ_WRK( cmplx4,  rd, kmp_cmplx32, +,  8c, 1 )   // __kmpc_atomic_cmplx4_rd
-#else
-    ATOMIC_CRITICAL_READ( cmplx4,  rd, kmp_cmplx32, +,  8c, 1 )       // __kmpc_atomic_cmplx4_rd
-#endif
-ATOMIC_CRITICAL_READ( cmplx8,  rd, kmp_cmplx64, +, 16c, 1 )           // __kmpc_atomic_cmplx8_rd
-ATOMIC_CRITICAL_READ( cmplx10, rd, kmp_cmplx80, +, 20c, 1 )           // __kmpc_atomic_cmplx10_rd
+#if (KMP_OS_WINDOWS)
+ATOMIC_CRITICAL_READ_WRK(cmplx4, rd, kmp_cmplx32, +, 8c,
+                         1) // __kmpc_atomic_cmplx4_rd
+#else
+ATOMIC_CRITICAL_READ(cmplx4, rd, kmp_cmplx32, +, 8c,
+                     1) // __kmpc_atomic_cmplx4_rd
+#endif
+ATOMIC_CRITICAL_READ(cmplx8, rd, kmp_cmplx64, +, 16c,
+                     1) // __kmpc_atomic_cmplx8_rd
+ATOMIC_CRITICAL_READ(cmplx10, rd, kmp_cmplx80, +, 20c,
+                     1) // __kmpc_atomic_cmplx10_rd
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_READ( cmplx16, rd, CPLX128_LEG, +, 32c, 1 )           // __kmpc_atomic_cmplx16_rd
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_READ( float16, a16_rd, Quad_a16_t, +, 16r, 1 )         // __kmpc_atomic_float16_a16_rd
-    ATOMIC_CRITICAL_READ( cmplx16, a16_rd, kmp_cmplx128_a16_t, +, 32c, 1 ) // __kmpc_atomic_cmplx16_a16_rd
+ATOMIC_CRITICAL_READ(cmplx16, rd, CPLX128_LEG, +, 32c,
+                     1) // __kmpc_atomic_cmplx16_rd
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_READ(float16, a16_rd, Quad_a16_t, +, 16r,
+                     1) // __kmpc_atomic_float16_a16_rd
+ATOMIC_CRITICAL_READ(cmplx16, a16_rd, kmp_cmplx128_a16_t, +, 32c,
+                     1) // __kmpc_atomic_cmplx16_a16_rd
 #endif
 #endif
 
-
 // ------------------------------------------------------------------------
 // Atomic WRITE routines
-// ------------------------------------------------------------------------
-
-#define ATOMIC_XCHG_WR(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)              \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(OP,GOMP_FLAG)                                        \
-    KMP_XCHG_FIXED##BITS( lhs, rhs );                                     \
-}
-// ------------------------------------------------------------------------
-#define ATOMIC_XCHG_FLOAT_WR(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)        \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(OP,GOMP_FLAG)                                        \
-    KMP_XCHG_REAL##BITS( lhs, rhs );                                      \
-}
 
+#define ATOMIC_XCHG_WR(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)              \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP, GOMP_FLAG)                                              \
+  KMP_XCHG_FIXED##BITS(lhs, rhs);                                              \
+  }
+// ------------------------------------------------------------------------
+#define ATOMIC_XCHG_FLOAT_WR(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)        \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP, GOMP_FLAG)                                              \
+  KMP_XCHG_REAL##BITS(lhs, rhs);                                               \
+  }
 
 // ------------------------------------------------------------------------
 // Operation on *lhs, rhs using "compare_and_store" routine
@@ -1737,89 +2061,103 @@ ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)
 //     OP      - operator
 // Note: temp_val introduced in order to force the compiler to read
 //       *lhs only once (w/o it the compiler reads *lhs twice)
-#define OP_CMPXCHG_WR(TYPE,BITS,OP)                                       \
-    {                                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-        TYPE old_value, new_value;                                        \
-        temp_val = *lhs;                                                  \
-        old_value = temp_val;                                             \
-        new_value = rhs;                                                  \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,         \
-                      *VOLATILE_CAST(kmp_int##BITS *) &new_value ) )      \
-        {                                                                 \
-            KMP_CPU_PAUSE();                                              \
-                                                                          \
-            temp_val = *lhs;                                              \
-            old_value = temp_val;                                         \
-            new_value = rhs;                                              \
-        }                                                                 \
-    }
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_WR(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)           \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(OP,GOMP_FLAG)                                        \
-    OP_CMPXCHG_WR(TYPE,BITS,OP)                                           \
-}
+#define OP_CMPXCHG_WR(TYPE, BITS, OP)                                          \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    TYPE old_value, new_value;                                                 \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    new_value = rhs;                                                           \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) & old_value,     \
+        *VOLATILE_CAST(kmp_int##BITS *) & new_value)) {                        \
+      KMP_CPU_PAUSE();                                                         \
+                                                                               \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+      new_value = rhs;                                                         \
+    }                                                                          \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_WR(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)           \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP, GOMP_FLAG)                                              \
+  OP_CMPXCHG_WR(TYPE, BITS, OP)                                                \
+  }
 
 // ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
 //     TYPE_ID, OP_ID, TYPE - detailed above
 //     OP      - operator
 //     LCK_ID  - lock identifier, used to possibly distinguish lock variable
-#define ATOMIC_CRITICAL_WR(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)        \
-ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)                                     \
-    OP_GOMP_CRITICAL(OP,GOMP_FLAG)       /* send assignment */            \
-    OP_CRITICAL(OP,LCK_ID)               /* send assignment */            \
-}
-// -------------------------------------------------------------------------
-
-ATOMIC_XCHG_WR( fixed1,  wr, kmp_int8,    8, =,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_wr
-ATOMIC_XCHG_WR( fixed2,  wr, kmp_int16,  16, =,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_wr
-ATOMIC_XCHG_WR( fixed4,  wr, kmp_int32,  32, =,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_wr
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CMPXCHG_WR( fixed8,  wr, kmp_int64,  64, =,  KMP_ARCH_X86 )      // __kmpc_atomic_fixed8_wr
-#else
-    ATOMIC_XCHG_WR( fixed8,  wr, kmp_int64,  64, =,  KMP_ARCH_X86 )         // __kmpc_atomic_fixed8_wr
-#endif
-
-ATOMIC_XCHG_FLOAT_WR( float4, wr, kmp_real32, 32, =, KMP_ARCH_X86 )         // __kmpc_atomic_float4_wr
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CMPXCHG_WR( float8,  wr, kmp_real64,  64, =,  KMP_ARCH_X86 )     // __kmpc_atomic_float8_wr
+#define ATOMIC_CRITICAL_WR(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)        \
+  ATOMIC_BEGIN(TYPE_ID, OP_ID, TYPE, void)                                     \
+  OP_GOMP_CRITICAL(OP, GOMP_FLAG) /* send assignment */                        \
+  OP_CRITICAL(OP, LCK_ID) /* send assignment */                                \
+  }
+// -------------------------------------------------------------------------
+
+ATOMIC_XCHG_WR(fixed1, wr, kmp_int8, 8, =,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed1_wr
+ATOMIC_XCHG_WR(fixed2, wr, kmp_int16, 16, =,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed2_wr
+ATOMIC_XCHG_WR(fixed4, wr, kmp_int32, 32, =,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed4_wr
+#if (KMP_ARCH_X86)
+ATOMIC_CMPXCHG_WR(fixed8, wr, kmp_int64, 64, =,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed8_wr
+#else
+ATOMIC_XCHG_WR(fixed8, wr, kmp_int64, 64, =,
+               KMP_ARCH_X86) // __kmpc_atomic_fixed8_wr
+#endif
+
+ATOMIC_XCHG_FLOAT_WR(float4, wr, kmp_real32, 32, =,
+                     KMP_ARCH_X86) // __kmpc_atomic_float4_wr
+#if (KMP_ARCH_X86)
+ATOMIC_CMPXCHG_WR(float8, wr, kmp_real64, 64, =,
+                  KMP_ARCH_X86) // __kmpc_atomic_float8_wr
 #else
-    ATOMIC_XCHG_FLOAT_WR( float8,  wr, kmp_real64,  64, =,  KMP_ARCH_X86 )  // __kmpc_atomic_float8_wr
+ATOMIC_XCHG_FLOAT_WR(float8, wr, kmp_real64, 64, =,
+                     KMP_ARCH_X86) // __kmpc_atomic_float8_wr
 #endif
 
-ATOMIC_CRITICAL_WR( float10, wr, long double, =, 10r,   1 )         // __kmpc_atomic_float10_wr
+ATOMIC_CRITICAL_WR(float10, wr, long double, =, 10r,
+                   1) // __kmpc_atomic_float10_wr
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_WR( float16, wr, QUAD_LEGACY, =, 16r,   1 )         // __kmpc_atomic_float16_wr
+ATOMIC_CRITICAL_WR(float16, wr, QUAD_LEGACY, =, 16r,
+                   1) // __kmpc_atomic_float16_wr
 #endif
-ATOMIC_CRITICAL_WR( cmplx4,  wr, kmp_cmplx32, =,  8c,   1 )         // __kmpc_atomic_cmplx4_wr
-ATOMIC_CRITICAL_WR( cmplx8,  wr, kmp_cmplx64, =, 16c,   1 )         // __kmpc_atomic_cmplx8_wr
-ATOMIC_CRITICAL_WR( cmplx10, wr, kmp_cmplx80, =, 20c,   1 )         // __kmpc_atomic_cmplx10_wr
+ATOMIC_CRITICAL_WR(cmplx4, wr, kmp_cmplx32, =, 8c, 1) // __kmpc_atomic_cmplx4_wr
+ATOMIC_CRITICAL_WR(cmplx8, wr, kmp_cmplx64, =, 16c,
+                   1) // __kmpc_atomic_cmplx8_wr
+ATOMIC_CRITICAL_WR(cmplx10, wr, kmp_cmplx80, =, 20c,
+                   1) // __kmpc_atomic_cmplx10_wr
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_WR( cmplx16, wr, CPLX128_LEG, =, 32c,   1 )         // __kmpc_atomic_cmplx16_wr
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_WR( float16, a16_wr, Quad_a16_t,         =, 16r, 1 ) // __kmpc_atomic_float16_a16_wr
-    ATOMIC_CRITICAL_WR( cmplx16, a16_wr, kmp_cmplx128_a16_t, =, 32c, 1 ) // __kmpc_atomic_cmplx16_a16_wr
+ATOMIC_CRITICAL_WR(cmplx16, wr, CPLX128_LEG, =, 32c,
+                   1) // __kmpc_atomic_cmplx16_wr
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_WR(float16, a16_wr, Quad_a16_t, =, 16r,
+                   1) // __kmpc_atomic_float16_a16_wr
+ATOMIC_CRITICAL_WR(cmplx16, a16_wr, kmp_cmplx128_a16_t, =, 32c,
+                   1) // __kmpc_atomic_cmplx16_a16_wr
 #endif
 #endif
 
-
 // ------------------------------------------------------------------------
 // Atomic CAPTURE routines
-// ------------------------------------------------------------------------
 
 // Beginning of a definition (provides name, parameters, gebug trace)
-//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned fixed)
+//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned
+//     fixed)
 //     OP_ID   - operation identifier (add, sub, mul, ...)
 //     TYPE    - operands' type
-#define ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,RET_TYPE)                                    \
-RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs, int flag ) \
-{                                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid ));
+#define ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, RET_TYPE)                       \
+  RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID(ident_t *id_ref, int gtid,        \
+                                             TYPE *lhs, TYPE rhs, int flag) {  \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid));
 
 // -------------------------------------------------------------------------
 // Operation on *lhs, rhs bound by critical section
@@ -1827,29 +2165,29 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //     LCK_ID - lock identifier
 // Note: don't check gtid as it should always be valid
 // 1, 2-byte - expect valid parameter, other - check before this macro
-#define OP_CRITICAL_CPT(OP,LCK_ID)                                        \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    if( flag ) {                                                          \
-        (*lhs) OP rhs;                                                    \
-        new_value = (*lhs);                                               \
-    } else {                                                              \
-        new_value = (*lhs);                                               \
-        (*lhs) OP rhs;                                                    \
-    }                                                                     \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-    return new_value;
+#define OP_CRITICAL_CPT(OP, LCK_ID)                                            \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (flag) {                                                                  \
+    (*lhs) OP rhs;                                                             \
+    new_value = (*lhs);                                                        \
+  } else {                                                                     \
+    new_value = (*lhs);                                                        \
+    (*lhs) OP rhs;                                                             \
+  }                                                                            \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return new_value;
 
 // ------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_CPT(OP,FLAG)                                     \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_CPT( OP##=, 0 );                                      \
-    }
+#define OP_GOMP_CRITICAL_CPT(OP, FLAG)                                         \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_CPT(OP## =, 0);                                                \
+  }
 #else
-#define OP_GOMP_CRITICAL_CPT(OP,FLAG)
+#define OP_GOMP_CRITICAL_CPT(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 // ------------------------------------------------------------------------
@@ -1859,60 +2197,67 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_
 //     OP      - operator
 // Note: temp_val introduced in order to force the compiler to read
 //       *lhs only once (w/o it the compiler reads *lhs twice)
-#define OP_CMPXCHG_CPT(TYPE,BITS,OP)                                      \
-    {                                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-        TYPE old_value, new_value;                                        \
-        temp_val = *lhs;                                                  \
-        old_value = temp_val;                                             \
-        new_value = old_value OP rhs;                                     \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,         \
-                      *VOLATILE_CAST(kmp_int##BITS *) &new_value ) )      \
-        {                                                                 \
-            KMP_CPU_PAUSE();                                              \
-                                                                          \
-            temp_val = *lhs;                                              \
-            old_value = temp_val;                                         \
-            new_value = old_value OP rhs;                                 \
-        }                                                                 \
-        if( flag ) {                                                      \
-            return new_value;                                             \
-        } else                                                            \
-            return old_value;                                             \
-    }
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_CPT(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)           \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                  \
-    TYPE new_value;                                                        \
-    OP_GOMP_CRITICAL_CPT(OP,GOMP_FLAG)                                     \
-    OP_CMPXCHG_CPT(TYPE,BITS,OP)                                           \
-}
-
-// -------------------------------------------------------------------------
-#define ATOMIC_FIXED_ADD_CPT(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)         \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                  \
-    TYPE old_value, new_value;                                             \
-    OP_GOMP_CRITICAL_CPT(OP,GOMP_FLAG)                                     \
-    /* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */      \
-    old_value = KMP_TEST_THEN_ADD##BITS( lhs, OP rhs );                    \
-    if( flag ) {                                                           \
-        return old_value OP rhs;                                           \
-    } else                                                                 \
-        return old_value;                                                  \
-}
-// -------------------------------------------------------------------------
-
-ATOMIC_FIXED_ADD_CPT( fixed4, add_cpt, kmp_int32,  32, +, 0            )  // __kmpc_atomic_fixed4_add_cpt
-ATOMIC_FIXED_ADD_CPT( fixed4, sub_cpt, kmp_int32,  32, -, 0            )  // __kmpc_atomic_fixed4_sub_cpt
-ATOMIC_FIXED_ADD_CPT( fixed8, add_cpt, kmp_int64,  64, +, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_add_cpt
-ATOMIC_FIXED_ADD_CPT( fixed8, sub_cpt, kmp_int64,  64, -, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_sub_cpt
-
-ATOMIC_CMPXCHG_CPT( float4, add_cpt, kmp_real32, 32, +, KMP_ARCH_X86 )  // __kmpc_atomic_float4_add_cpt
-ATOMIC_CMPXCHG_CPT( float4, sub_cpt, kmp_real32, 32, -, KMP_ARCH_X86 )  // __kmpc_atomic_float4_sub_cpt
-ATOMIC_CMPXCHG_CPT( float8, add_cpt, kmp_real64, 64, +, KMP_ARCH_X86 )  // __kmpc_atomic_float8_add_cpt
-ATOMIC_CMPXCHG_CPT( float8, sub_cpt, kmp_real64, 64, -, KMP_ARCH_X86 )  // __kmpc_atomic_float8_sub_cpt
+#define OP_CMPXCHG_CPT(TYPE, BITS, OP)                                         \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    TYPE old_value, new_value;                                                 \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    new_value = old_value OP rhs;                                              \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) & old_value,     \
+        *VOLATILE_CAST(kmp_int##BITS *) & new_value)) {                        \
+      KMP_CPU_PAUSE();                                                         \
+                                                                               \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+      new_value = old_value OP rhs;                                            \
+    }                                                                          \
+    if (flag) {                                                                \
+      return new_value;                                                        \
+    } else                                                                     \
+      return old_value;                                                        \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_CPT(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)          \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_CPT(OP, GOMP_FLAG)                                          \
+  OP_CMPXCHG_CPT(TYPE, BITS, OP)                                               \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_FIXED_ADD_CPT(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)        \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE old_value, new_value;                                                   \
+  OP_GOMP_CRITICAL_CPT(OP, GOMP_FLAG)                                          \
+  /* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */            \
+  old_value = KMP_TEST_THEN_ADD##BITS(lhs, OP rhs);                            \
+  if (flag) {                                                                  \
+    return old_value OP rhs;                                                   \
+  } else                                                                       \
+    return old_value;                                                          \
+  }
+// -------------------------------------------------------------------------
+
+ATOMIC_FIXED_ADD_CPT(fixed4, add_cpt, kmp_int32, 32, +,
+                     0) // __kmpc_atomic_fixed4_add_cpt
+ATOMIC_FIXED_ADD_CPT(fixed4, sub_cpt, kmp_int32, 32, -,
+                     0) // __kmpc_atomic_fixed4_sub_cpt
+ATOMIC_FIXED_ADD_CPT(fixed8, add_cpt, kmp_int64, 64, +,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed8_add_cpt
+ATOMIC_FIXED_ADD_CPT(fixed8, sub_cpt, kmp_int64, 64, -,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_cpt
+
+ATOMIC_CMPXCHG_CPT(float4, add_cpt, kmp_real32, 32, +,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_add_cpt
+ATOMIC_CMPXCHG_CPT(float4, sub_cpt, kmp_real32, 32, -,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_sub_cpt
+ATOMIC_CMPXCHG_CPT(float8, add_cpt, kmp_real64, 64, +,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_add_cpt
+ATOMIC_CMPXCHG_CPT(float8, sub_cpt, kmp_real64, 64, -,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_sub_cpt
 
 // ------------------------------------------------------------------------
 // Entries definition for integer operands
@@ -1926,141 +2271,229 @@ ATOMIC_CMPXCHG_CPT( float8, sub_cpt, kmp
 // Routines for ATOMIC integer operands, other operators
 // ------------------------------------------------------------------------
 //              TYPE_ID,OP_ID, TYPE,          OP,  GOMP_FLAG
-ATOMIC_CMPXCHG_CPT( fixed1,  add_cpt, kmp_int8,    8, +,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_add_cpt
-ATOMIC_CMPXCHG_CPT( fixed1, andb_cpt, kmp_int8,    8, &,  0            )  // __kmpc_atomic_fixed1_andb_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  div_cpt, kmp_int8,    8, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed1u, div_cpt, kmp_uint8,   8, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  mul_cpt, kmp_int8,    8, *,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_mul_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  orb_cpt, kmp_int8,    8, |,  0            )  // __kmpc_atomic_fixed1_orb_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  shl_cpt, kmp_int8,    8, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shl_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  shr_cpt, kmp_int8,    8, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed1u, shr_cpt, kmp_uint8,   8, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  sub_cpt, kmp_int8,    8, -,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_sub_cpt
-ATOMIC_CMPXCHG_CPT( fixed1,  xor_cpt, kmp_int8,    8, ^,  0            )  // __kmpc_atomic_fixed1_xor_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  add_cpt, kmp_int16,  16, +,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_add_cpt
-ATOMIC_CMPXCHG_CPT( fixed2, andb_cpt, kmp_int16,  16, &,  0            )  // __kmpc_atomic_fixed2_andb_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  div_cpt, kmp_int16,  16, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed2u, div_cpt, kmp_uint16, 16, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  mul_cpt, kmp_int16,  16, *,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_mul_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  orb_cpt, kmp_int16,  16, |,  0            )  // __kmpc_atomic_fixed2_orb_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  shl_cpt, kmp_int16,  16, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shl_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  shr_cpt, kmp_int16,  16, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed2u, shr_cpt, kmp_uint16, 16, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  sub_cpt, kmp_int16,  16, -,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_sub_cpt
-ATOMIC_CMPXCHG_CPT( fixed2,  xor_cpt, kmp_int16,  16, ^,  0            )  // __kmpc_atomic_fixed2_xor_cpt
-ATOMIC_CMPXCHG_CPT( fixed4, andb_cpt, kmp_int32,  32, &,  0            )  // __kmpc_atomic_fixed4_andb_cpt
-ATOMIC_CMPXCHG_CPT( fixed4,  div_cpt, kmp_int32,  32, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed4u, div_cpt, kmp_uint32, 32, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed4,  mul_cpt, kmp_int32,  32, *,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_mul_cpt
-ATOMIC_CMPXCHG_CPT( fixed4,  orb_cpt, kmp_int32,  32, |,  0            )  // __kmpc_atomic_fixed4_orb_cpt
-ATOMIC_CMPXCHG_CPT( fixed4,  shl_cpt, kmp_int32,  32, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shl_cpt
-ATOMIC_CMPXCHG_CPT( fixed4,  shr_cpt, kmp_int32,  32, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed4u, shr_cpt, kmp_uint32, 32, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed4,  xor_cpt, kmp_int32,  32, ^,  0            )  // __kmpc_atomic_fixed4_xor_cpt
-ATOMIC_CMPXCHG_CPT( fixed8, andb_cpt, kmp_int64,  64, &,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_andb_cpt
-ATOMIC_CMPXCHG_CPT( fixed8,  div_cpt, kmp_int64,  64, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed8u, div_cpt, kmp_uint64, 64, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_div_cpt
-ATOMIC_CMPXCHG_CPT( fixed8,  mul_cpt, kmp_int64,  64, *,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_mul_cpt
-ATOMIC_CMPXCHG_CPT( fixed8,  orb_cpt, kmp_int64,  64, |,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_orb_cpt
-ATOMIC_CMPXCHG_CPT( fixed8,  shl_cpt, kmp_int64,  64, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shl_cpt
-ATOMIC_CMPXCHG_CPT( fixed8,  shr_cpt, kmp_int64,  64, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed8u, shr_cpt, kmp_uint64, 64, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_shr_cpt
-ATOMIC_CMPXCHG_CPT( fixed8,  xor_cpt, kmp_int64,  64, ^,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_xor_cpt
-ATOMIC_CMPXCHG_CPT( float4,  div_cpt, kmp_real32, 32, /,  KMP_ARCH_X86 )  // __kmpc_atomic_float4_div_cpt
-ATOMIC_CMPXCHG_CPT( float4,  mul_cpt, kmp_real32, 32, *,  KMP_ARCH_X86 )  // __kmpc_atomic_float4_mul_cpt
-ATOMIC_CMPXCHG_CPT( float8,  div_cpt, kmp_real64, 64, /,  KMP_ARCH_X86 )  // __kmpc_atomic_float8_div_cpt
-ATOMIC_CMPXCHG_CPT( float8,  mul_cpt, kmp_real64, 64, *,  KMP_ARCH_X86 )  // __kmpc_atomic_float8_mul_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, add_cpt, kmp_int8, 8, +,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_add_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, andb_cpt, kmp_int8, 8, &,
+                   0) // __kmpc_atomic_fixed1_andb_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, div_cpt, kmp_int8, 8, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed1u, div_cpt, kmp_uint8, 8, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, mul_cpt, kmp_int8, 8, *,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_mul_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, orb_cpt, kmp_int8, 8, |,
+                   0) // __kmpc_atomic_fixed1_orb_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, shl_cpt, kmp_int8, 8, <<,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_shl_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, shr_cpt, kmp_int8, 8, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed1u, shr_cpt, kmp_uint8, 8, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1u_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, sub_cpt, kmp_int8, 8, -,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, xor_cpt, kmp_int8, 8, ^,
+                   0) // __kmpc_atomic_fixed1_xor_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, add_cpt, kmp_int16, 16, +,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_add_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, andb_cpt, kmp_int16, 16, &,
+                   0) // __kmpc_atomic_fixed2_andb_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, div_cpt, kmp_int16, 16, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed2u, div_cpt, kmp_uint16, 16, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, mul_cpt, kmp_int16, 16, *,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_mul_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, orb_cpt, kmp_int16, 16, |,
+                   0) // __kmpc_atomic_fixed2_orb_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, shl_cpt, kmp_int16, 16, <<,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_shl_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, shr_cpt, kmp_int16, 16, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed2u, shr_cpt, kmp_uint16, 16, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2u_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, sub_cpt, kmp_int16, 16, -,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, xor_cpt, kmp_int16, 16, ^,
+                   0) // __kmpc_atomic_fixed2_xor_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, andb_cpt, kmp_int32, 32, &,
+                   0) // __kmpc_atomic_fixed4_andb_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, div_cpt, kmp_int32, 32, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed4u, div_cpt, kmp_uint32, 32, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4u_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, mul_cpt, kmp_int32, 32, *,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_mul_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, orb_cpt, kmp_int32, 32, |,
+                   0) // __kmpc_atomic_fixed4_orb_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, shl_cpt, kmp_int32, 32, <<,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_shl_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, shr_cpt, kmp_int32, 32, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed4u, shr_cpt, kmp_uint32, 32, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4u_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, xor_cpt, kmp_int32, 32, ^,
+                   0) // __kmpc_atomic_fixed4_xor_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, andb_cpt, kmp_int64, 64, &,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_andb_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, div_cpt, kmp_int64, 64, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed8u, div_cpt, kmp_uint64, 64, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, mul_cpt, kmp_int64, 64, *,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_mul_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, orb_cpt, kmp_int64, 64, |,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_orb_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, shl_cpt, kmp_int64, 64, <<,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_shl_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, shr_cpt, kmp_int64, 64, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed8u, shr_cpt, kmp_uint64, 64, >>,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8u_shr_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, xor_cpt, kmp_int64, 64, ^,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_xor_cpt
+ATOMIC_CMPXCHG_CPT(float4, div_cpt, kmp_real32, 32, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_div_cpt
+ATOMIC_CMPXCHG_CPT(float4, mul_cpt, kmp_real32, 32, *,
+                   KMP_ARCH_X86) // __kmpc_atomic_float4_mul_cpt
+ATOMIC_CMPXCHG_CPT(float8, div_cpt, kmp_real64, 64, /,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_div_cpt
+ATOMIC_CMPXCHG_CPT(float8, mul_cpt, kmp_real64, 64, *,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_mul_cpt
 //              TYPE_ID,OP_ID, TYPE,          OP,  GOMP_FLAG
 
-//////////////////////////////////
-
 // CAPTURE routines for mixed types RHS=float16
 #if KMP_HAVE_QUAD
 
 // Beginning of a definition (provides name, parameters, gebug trace)
-//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned fixed)
+//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned
+//     fixed)
 //     OP_ID   - operation identifier (add, sub, mul, ...)
 //     TYPE    - operands' type
-#define ATOMIC_BEGIN_CPT_MIX(TYPE_ID,OP_ID,TYPE,RTYPE_ID,RTYPE)       \
-TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID##_##RTYPE_ID( ident_t *id_ref, int gtid, TYPE * lhs, RTYPE rhs, int flag ) \
-{                                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID "_" #RTYPE_ID ": T#%d\n", gtid ));
+#define ATOMIC_BEGIN_CPT_MIX(TYPE_ID, OP_ID, TYPE, RTYPE_ID, RTYPE)            \
+  TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID##_##RTYPE_ID(                         \
+      ident_t *id_ref, int gtid, TYPE *lhs, RTYPE rhs, int flag) {             \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100,                                                              \
+             ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID "_" #RTYPE_ID ": T#%d\n",   \
+              gtid));
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_CPT_MIX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID,       \
+                               RTYPE, LCK_ID, MASK, GOMP_FLAG)                 \
+  ATOMIC_BEGIN_CPT_MIX(TYPE_ID, OP_ID, TYPE, RTYPE_ID, RTYPE)                  \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_CPT(OP, GOMP_FLAG)                                          \
+  OP_CMPXCHG_CPT(TYPE, BITS, OP)                                               \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CRITICAL_CPT_MIX(TYPE_ID, TYPE, OP_ID, OP, RTYPE_ID, RTYPE,     \
+                                LCK_ID, GOMP_FLAG)                             \
+  ATOMIC_BEGIN_CPT_MIX(TYPE_ID, OP_ID, TYPE, RTYPE_ID, RTYPE)                  \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_CPT(OP, GOMP_FLAG) /* send assignment */                    \
+  OP_CRITICAL_CPT(OP## =, LCK_ID) /* send assignment */                        \
+  }
+
+ATOMIC_CMPXCHG_CPT_MIX(fixed1, char, add_cpt, 8, +, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1u, uchar, add_cpt, 8, +, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1, char, sub_cpt, 8, -, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1u, uchar, sub_cpt, 8, -, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1, char, mul_cpt, 8, *, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1u, uchar, mul_cpt, 8, *, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1, char, div_cpt, 8, /, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed1u, uchar, div_cpt, 8, /, fp, _Quad, 1i, 0,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_cpt_fp
+
+ATOMIC_CMPXCHG_CPT_MIX(fixed2, short, add_cpt, 16, +, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2u, ushort, add_cpt, 16, +, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2, short, sub_cpt, 16, -, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2u, ushort, sub_cpt, 16, -, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2, short, mul_cpt, 16, *, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2u, ushort, mul_cpt, 16, *, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2, short, div_cpt, 16, /, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed2u, ushort, div_cpt, 16, /, fp, _Quad, 2i, 1,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_cpt_fp
+
+ATOMIC_CMPXCHG_CPT_MIX(fixed4, kmp_int32, add_cpt, 32, +, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4u, kmp_uint32, add_cpt, 32, +, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4u_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4, kmp_int32, sub_cpt, 32, -, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4u, kmp_uint32, sub_cpt, 32, -, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4u_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4, kmp_int32, mul_cpt, 32, *, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4u, kmp_uint32, mul_cpt, 32, *, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4u_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4, kmp_int32, div_cpt, 32, /, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4_div_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed4u, kmp_uint32, div_cpt, 32, /, fp, _Quad, 4i, 3,
+                       0) // __kmpc_atomic_fixed4u_div_cpt_fp
+
+ATOMIC_CMPXCHG_CPT_MIX(fixed8, kmp_int64, add_cpt, 64, +, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8u, kmp_uint64, add_cpt, 64, +, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8, kmp_int64, sub_cpt, 64, -, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8u, kmp_uint64, sub_cpt, 64, -, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8, kmp_int64, mul_cpt, 64, *, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8u, kmp_uint64, mul_cpt, 64, *, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8, kmp_int64, div_cpt, 64, /, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(fixed8u, kmp_uint64, div_cpt, 64, /, fp, _Quad, 8i, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_cpt_fp
+
+ATOMIC_CMPXCHG_CPT_MIX(float4, kmp_real32, add_cpt, 32, +, fp, _Quad, 4r, 3,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(float4, kmp_real32, sub_cpt, 32, -, fp, _Quad, 4r, 3,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(float4, kmp_real32, mul_cpt, 32, *, fp, _Quad, 4r, 3,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(float4, kmp_real32, div_cpt, 32, /, fp, _Quad, 4r, 3,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_div_cpt_fp
+
+ATOMIC_CMPXCHG_CPT_MIX(float8, kmp_real64, add_cpt, 64, +, fp, _Quad, 8r, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_add_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(float8, kmp_real64, sub_cpt, 64, -, fp, _Quad, 8r, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_sub_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(float8, kmp_real64, mul_cpt, 64, *, fp, _Quad, 8r, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_mul_cpt_fp
+ATOMIC_CMPXCHG_CPT_MIX(float8, kmp_real64, div_cpt, 64, /, fp, _Quad, 8r, 7,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_div_cpt_fp
+
+ATOMIC_CRITICAL_CPT_MIX(float10, long double, add_cpt, +, fp, _Quad, 10r,
+                        1) // __kmpc_atomic_float10_add_cpt_fp
+ATOMIC_CRITICAL_CPT_MIX(float10, long double, sub_cpt, -, fp, _Quad, 10r,
+                        1) // __kmpc_atomic_float10_sub_cpt_fp
+ATOMIC_CRITICAL_CPT_MIX(float10, long double, mul_cpt, *, fp, _Quad, 10r,
+                        1) // __kmpc_atomic_float10_mul_cpt_fp
+ATOMIC_CRITICAL_CPT_MIX(float10, long double, div_cpt, /, fp, _Quad, 10r,
+                        1) // __kmpc_atomic_float10_div_cpt_fp
 
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_CPT_MIX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG)       \
-ATOMIC_BEGIN_CPT_MIX(TYPE_ID,OP_ID,TYPE,RTYPE_ID,RTYPE)                    \
-    TYPE new_value;                                                        \
-    OP_GOMP_CRITICAL_CPT(OP,GOMP_FLAG)                                     \
-    OP_CMPXCHG_CPT(TYPE,BITS,OP)                                           \
-}
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CRITICAL_CPT_MIX(TYPE_ID,TYPE,OP_ID,OP,RTYPE_ID,RTYPE,LCK_ID,GOMP_FLAG)         \
-ATOMIC_BEGIN_CPT_MIX(TYPE_ID,OP_ID,TYPE,RTYPE_ID,RTYPE)                    \
-    TYPE new_value;                                                        \
-    OP_GOMP_CRITICAL_CPT(OP,GOMP_FLAG)  /* send assignment */                              \
-    OP_CRITICAL_CPT(OP##=,LCK_ID)  /* send assignment */                                      \
-}
-
-ATOMIC_CMPXCHG_CPT_MIX( fixed1,  char,       add_cpt,  8, +, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1u, uchar,      add_cpt,  8, +, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1,  char,       sub_cpt,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1u, uchar,      sub_cpt,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1,  char,       mul_cpt,  8, *, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1u, uchar,      mul_cpt,  8, *, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1,  char,       div_cpt,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_div_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed1u, uchar,      div_cpt,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_div_cpt_fp
-
-ATOMIC_CMPXCHG_CPT_MIX( fixed2,  short,      add_cpt, 16, +, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2u, ushort,     add_cpt, 16, +, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2,  short,      sub_cpt, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2u, ushort,     sub_cpt, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2,  short,      mul_cpt, 16, *, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2u, ushort,     mul_cpt, 16, *, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2,  short,      div_cpt, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_div_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed2u, ushort,     div_cpt, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_div_cpt_fp
-
-ATOMIC_CMPXCHG_CPT_MIX( fixed4,  kmp_int32,  add_cpt, 32, +, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4u, kmp_uint32, add_cpt, 32, +, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4,  kmp_int32,  sub_cpt, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4u, kmp_uint32, sub_cpt, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4,  kmp_int32,  mul_cpt, 32, *, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4u, kmp_uint32, mul_cpt, 32, *, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4,  kmp_int32,  div_cpt, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_div_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed4u, kmp_uint32, div_cpt, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_div_cpt_fp
-
-ATOMIC_CMPXCHG_CPT_MIX( fixed8,  kmp_int64,  add_cpt, 64, +, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8u, kmp_uint64, add_cpt, 64, +, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8,  kmp_int64,  sub_cpt, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8u, kmp_uint64, sub_cpt, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8,  kmp_int64,  mul_cpt, 64, *, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8u, kmp_uint64, mul_cpt, 64, *, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8,  kmp_int64,  div_cpt, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_div_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( fixed8u, kmp_uint64, div_cpt, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_div_cpt_fp
-
-ATOMIC_CMPXCHG_CPT_MIX( float4,  kmp_real32, add_cpt, 32, +, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( float4,  kmp_real32, sub_cpt, 32, -, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( float4,  kmp_real32, mul_cpt, 32, *, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( float4,  kmp_real32, div_cpt, 32, /, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_div_cpt_fp
-
-ATOMIC_CMPXCHG_CPT_MIX( float8,  kmp_real64, add_cpt, 64, +, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_add_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( float8,  kmp_real64, sub_cpt, 64, -, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( float8,  kmp_real64, mul_cpt, 64, *, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_mul_cpt_fp
-ATOMIC_CMPXCHG_CPT_MIX( float8,  kmp_real64, div_cpt, 64, /, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_div_cpt_fp
-
-ATOMIC_CRITICAL_CPT_MIX( float10, long double, add_cpt, +, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_add_cpt_fp
-ATOMIC_CRITICAL_CPT_MIX( float10, long double, sub_cpt, -, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_sub_cpt_fp
-ATOMIC_CRITICAL_CPT_MIX( float10, long double, mul_cpt, *, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_mul_cpt_fp
-ATOMIC_CRITICAL_CPT_MIX( float10, long double, div_cpt, /, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_div_cpt_fp
-
-#endif //KMP_HAVE_QUAD
-
-///////////////////////////////////
+#endif // KMP_HAVE_QUAD
 
 // ------------------------------------------------------------------------
 // Routines for C/C++ Reduction operators && and ||
-// ------------------------------------------------------------------------
 
 // -------------------------------------------------------------------------
 // Operation on *lhs, rhs bound by critical section
@@ -2068,285 +2501,347 @@ ATOMIC_CRITICAL_CPT_MIX( float10, long d
 //     LCK_ID - lock identifier
 // Note: don't check gtid as it should always be valid
 // 1, 2-byte - expect valid parameter, other - check before this macro
-#define OP_CRITICAL_L_CPT(OP,LCK_ID)                                      \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                    \
-                                                                          \
-    if( flag ) {                                                          \
-        new_value OP rhs;                                                 \
-    } else                                                                \
-        new_value = (*lhs);                                               \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );
+#define OP_CRITICAL_L_CPT(OP, LCK_ID)                                          \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (flag) {                                                                  \
+    new_value OP rhs;                                                          \
+  } else                                                                       \
+    new_value = (*lhs);                                                        \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);
 
 // ------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_L_CPT(OP,FLAG)                                   \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_L_CPT( OP, 0 );                                       \
-        return new_value;                                                 \
-    }
+#define OP_GOMP_CRITICAL_L_CPT(OP, FLAG)                                       \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_L_CPT(OP, 0);                                                  \
+    return new_value;                                                          \
+  }
 #else
-#define OP_GOMP_CRITICAL_L_CPT(OP,FLAG)
+#define OP_GOMP_CRITICAL_L_CPT(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 // ------------------------------------------------------------------------
 // Need separate macros for &&, || because there is no combined assignment
-#define ATOMIC_CMPX_L_CPT(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)           \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                 \
-    TYPE new_value;                                                       \
-    OP_GOMP_CRITICAL_L_CPT( = *lhs OP, GOMP_FLAG )                        \
-    OP_CMPXCHG_CPT(TYPE,BITS,OP)                                          \
-}
-
-ATOMIC_CMPX_L_CPT( fixed1, andl_cpt, char,       8, &&, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_andl_cpt
-ATOMIC_CMPX_L_CPT( fixed1,  orl_cpt, char,       8, ||, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_orl_cpt
-ATOMIC_CMPX_L_CPT( fixed2, andl_cpt, short,     16, &&, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_andl_cpt
-ATOMIC_CMPX_L_CPT( fixed2,  orl_cpt, short,     16, ||, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_orl_cpt
-ATOMIC_CMPX_L_CPT( fixed4, andl_cpt, kmp_int32, 32, &&, 0 )             // __kmpc_atomic_fixed4_andl_cpt
-ATOMIC_CMPX_L_CPT( fixed4,  orl_cpt, kmp_int32, 32, ||, 0 )             // __kmpc_atomic_fixed4_orl_cpt
-ATOMIC_CMPX_L_CPT( fixed8, andl_cpt, kmp_int64, 64, &&, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_andl_cpt
-ATOMIC_CMPX_L_CPT( fixed8,  orl_cpt, kmp_int64, 64, ||, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_orl_cpt
-
+#define ATOMIC_CMPX_L_CPT(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)           \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_L_CPT(= *lhs OP, GOMP_FLAG)                                 \
+  OP_CMPXCHG_CPT(TYPE, BITS, OP)                                               \
+  }
+
+ATOMIC_CMPX_L_CPT(fixed1, andl_cpt, char, 8, &&,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed1_andl_cpt
+ATOMIC_CMPX_L_CPT(fixed1, orl_cpt, char, 8, ||,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed1_orl_cpt
+ATOMIC_CMPX_L_CPT(fixed2, andl_cpt, short, 16, &&,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed2_andl_cpt
+ATOMIC_CMPX_L_CPT(fixed2, orl_cpt, short, 16, ||,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed2_orl_cpt
+ATOMIC_CMPX_L_CPT(fixed4, andl_cpt, kmp_int32, 32, &&,
+                  0) // __kmpc_atomic_fixed4_andl_cpt
+ATOMIC_CMPX_L_CPT(fixed4, orl_cpt, kmp_int32, 32, ||,
+                  0) // __kmpc_atomic_fixed4_orl_cpt
+ATOMIC_CMPX_L_CPT(fixed8, andl_cpt, kmp_int64, 64, &&,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed8_andl_cpt
+ATOMIC_CMPX_L_CPT(fixed8, orl_cpt, kmp_int64, 64, ||,
+                  KMP_ARCH_X86) // __kmpc_atomic_fixed8_orl_cpt
 
 // -------------------------------------------------------------------------
 // Routines for Fortran operators that matched no one in C:
 // MAX, MIN, .EQV., .NEQV.
 // Operators .AND., .OR. are covered by __kmpc_atomic_*_{andl,orl}_cpt
 // Intrinsics IAND, IOR, IEOR are covered by __kmpc_atomic_*_{andb,orb,xor}_cpt
-// -------------------------------------------------------------------------
 
 // -------------------------------------------------------------------------
 // MIN and MAX need separate macros
 // OP - operator to check if we need any actions?
-#define MIN_MAX_CRITSECT_CPT(OP,LCK_ID)                                    \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                     \
-                                                                           \
-    if ( *lhs OP rhs ) {                 /* still need actions? */         \
-        old_value = *lhs;                                                  \
-        *lhs = rhs;                                                        \
-        if ( flag )                                                        \
-            new_value = rhs;                                               \
-        else                                                               \
-            new_value = old_value;                                         \
-    }                                                                      \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );                     \
-    return new_value;                                                      \
+#define MIN_MAX_CRITSECT_CPT(OP, LCK_ID)                                       \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (*lhs OP rhs) { /* still need actions? */                                 \
+    old_value = *lhs;                                                          \
+    *lhs = rhs;                                                                \
+    if (flag)                                                                  \
+      new_value = rhs;                                                         \
+    else                                                                       \
+      new_value = old_value;                                                   \
+  }                                                                            \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return new_value;
 
 // -------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define GOMP_MIN_MAX_CRITSECT_CPT(OP,FLAG)                                 \
-    if (( FLAG ) && ( __kmp_atomic_mode == 2 )) {                          \
-        KMP_CHECK_GTID;                                                    \
-        MIN_MAX_CRITSECT_CPT( OP, 0 );                                     \
-    }
-#else
-#define GOMP_MIN_MAX_CRITSECT_CPT(OP,FLAG)
+#define GOMP_MIN_MAX_CRITSECT_CPT(OP, FLAG)                                    \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    MIN_MAX_CRITSECT_CPT(OP, 0);                                               \
+  }
+#else
+#define GOMP_MIN_MAX_CRITSECT_CPT(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 // -------------------------------------------------------------------------
-#define MIN_MAX_CMPXCHG_CPT(TYPE,BITS,OP)                                  \
-    {                                                                      \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                 \
-        /*TYPE old_value; */                                               \
-        temp_val = *lhs;                                                   \
-        old_value = temp_val;                                              \
-        while ( old_value OP rhs &&          /* still need actions? */     \
-            ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs,      \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,          \
-                      *VOLATILE_CAST(kmp_int##BITS *) &rhs ) )             \
-        {                                                                  \
-            KMP_CPU_PAUSE();                                               \
-            temp_val = *lhs;                                               \
-            old_value = temp_val;                                          \
-        }                                                                  \
-        if( flag )                                                         \
-            return rhs;                                                    \
-        else                                                               \
-            return old_value;                                              \
-    }
+#define MIN_MAX_CMPXCHG_CPT(TYPE, BITS, OP)                                    \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    /*TYPE old_value; */                                                       \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    while (old_value OP rhs && /* still need actions? */                       \
+           !KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+               (kmp_int##BITS *)lhs,                                           \
+               *VOLATILE_CAST(kmp_int##BITS *) & old_value,                    \
+               *VOLATILE_CAST(kmp_int##BITS *) & rhs)) {                       \
+      KMP_CPU_PAUSE();                                                         \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+    }                                                                          \
+    if (flag)                                                                  \
+      return rhs;                                                              \
+    else                                                                       \
+      return old_value;                                                        \
+  }
 
 // -------------------------------------------------------------------------
 // 1-byte, 2-byte operands - use critical section
-#define MIN_MAX_CRITICAL_CPT(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)       \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                  \
-    TYPE new_value, old_value;                                             \
-    if ( *lhs OP rhs ) {     /* need actions? */                           \
-        GOMP_MIN_MAX_CRITSECT_CPT(OP,GOMP_FLAG)                            \
-        MIN_MAX_CRITSECT_CPT(OP,LCK_ID)                                    \
-    }                                                                      \
-    return *lhs;                                                           \
-}
-
-#define MIN_MAX_COMPXCHG_CPT(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)         \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                  \
-    TYPE new_value, old_value;                                             \
-    if ( *lhs OP rhs ) {                                                   \
-        GOMP_MIN_MAX_CRITSECT_CPT(OP,GOMP_FLAG)                            \
-        MIN_MAX_CMPXCHG_CPT(TYPE,BITS,OP)                                  \
-    }                                                                      \
-    return *lhs;                                                           \
-}
-
-
-MIN_MAX_COMPXCHG_CPT( fixed1,  max_cpt, char,        8, <, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_max_cpt
-MIN_MAX_COMPXCHG_CPT( fixed1,  min_cpt, char,        8, >, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_min_cpt
-MIN_MAX_COMPXCHG_CPT( fixed2,  max_cpt, short,      16, <, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_max_cpt
-MIN_MAX_COMPXCHG_CPT( fixed2,  min_cpt, short,      16, >, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_min_cpt
-MIN_MAX_COMPXCHG_CPT( fixed4,  max_cpt, kmp_int32,  32, <, 0 )            // __kmpc_atomic_fixed4_max_cpt
-MIN_MAX_COMPXCHG_CPT( fixed4,  min_cpt, kmp_int32,  32, >, 0 )            // __kmpc_atomic_fixed4_min_cpt
-MIN_MAX_COMPXCHG_CPT( fixed8,  max_cpt, kmp_int64,  64, <, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_max_cpt
-MIN_MAX_COMPXCHG_CPT( fixed8,  min_cpt, kmp_int64,  64, >, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_min_cpt
-MIN_MAX_COMPXCHG_CPT( float4,  max_cpt, kmp_real32, 32, <, KMP_ARCH_X86 ) // __kmpc_atomic_float4_max_cpt
-MIN_MAX_COMPXCHG_CPT( float4,  min_cpt, kmp_real32, 32, >, KMP_ARCH_X86 ) // __kmpc_atomic_float4_min_cpt
-MIN_MAX_COMPXCHG_CPT( float8,  max_cpt, kmp_real64, 64, <, KMP_ARCH_X86 ) // __kmpc_atomic_float8_max_cpt
-MIN_MAX_COMPXCHG_CPT( float8,  min_cpt, kmp_real64, 64, >, KMP_ARCH_X86 ) // __kmpc_atomic_float8_min_cpt
+#define MIN_MAX_CRITICAL_CPT(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)      \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value, old_value;                                                   \
+  if (*lhs OP rhs) { /* need actions? */                                       \
+    GOMP_MIN_MAX_CRITSECT_CPT(OP, GOMP_FLAG)                                   \
+    MIN_MAX_CRITSECT_CPT(OP, LCK_ID)                                           \
+  }                                                                            \
+  return *lhs;                                                                 \
+  }
+
+#define MIN_MAX_COMPXCHG_CPT(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)        \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value, old_value;                                                   \
+  if (*lhs OP rhs) {                                                           \
+    GOMP_MIN_MAX_CRITSECT_CPT(OP, GOMP_FLAG)                                   \
+    MIN_MAX_CMPXCHG_CPT(TYPE, BITS, OP)                                        \
+  }                                                                            \
+  return *lhs;                                                                 \
+  }
+
+MIN_MAX_COMPXCHG_CPT(fixed1, max_cpt, char, 8, <,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed1_max_cpt
+MIN_MAX_COMPXCHG_CPT(fixed1, min_cpt, char, 8, >,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed1_min_cpt
+MIN_MAX_COMPXCHG_CPT(fixed2, max_cpt, short, 16, <,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed2_max_cpt
+MIN_MAX_COMPXCHG_CPT(fixed2, min_cpt, short, 16, >,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed2_min_cpt
+MIN_MAX_COMPXCHG_CPT(fixed4, max_cpt, kmp_int32, 32, <,
+                     0) // __kmpc_atomic_fixed4_max_cpt
+MIN_MAX_COMPXCHG_CPT(fixed4, min_cpt, kmp_int32, 32, >,
+                     0) // __kmpc_atomic_fixed4_min_cpt
+MIN_MAX_COMPXCHG_CPT(fixed8, max_cpt, kmp_int64, 64, <,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed8_max_cpt
+MIN_MAX_COMPXCHG_CPT(fixed8, min_cpt, kmp_int64, 64, >,
+                     KMP_ARCH_X86) // __kmpc_atomic_fixed8_min_cpt
+MIN_MAX_COMPXCHG_CPT(float4, max_cpt, kmp_real32, 32, <,
+                     KMP_ARCH_X86) // __kmpc_atomic_float4_max_cpt
+MIN_MAX_COMPXCHG_CPT(float4, min_cpt, kmp_real32, 32, >,
+                     KMP_ARCH_X86) // __kmpc_atomic_float4_min_cpt
+MIN_MAX_COMPXCHG_CPT(float8, max_cpt, kmp_real64, 64, <,
+                     KMP_ARCH_X86) // __kmpc_atomic_float8_max_cpt
+MIN_MAX_COMPXCHG_CPT(float8, min_cpt, kmp_real64, 64, >,
+                     KMP_ARCH_X86) // __kmpc_atomic_float8_min_cpt
 #if KMP_HAVE_QUAD
-MIN_MAX_CRITICAL_CPT( float16, max_cpt, QUAD_LEGACY,    <, 16r,   1 )     // __kmpc_atomic_float16_max_cpt
-MIN_MAX_CRITICAL_CPT( float16, min_cpt, QUAD_LEGACY,    >, 16r,   1 )     // __kmpc_atomic_float16_min_cpt
-#if ( KMP_ARCH_X86 )
-    MIN_MAX_CRITICAL_CPT( float16, max_a16_cpt, Quad_a16_t, <, 16r,  1 )  // __kmpc_atomic_float16_max_a16_cpt
-    MIN_MAX_CRITICAL_CPT( float16, min_a16_cpt, Quad_a16_t, >, 16r,  1 )  // __kmpc_atomic_float16_mix_a16_cpt
+MIN_MAX_CRITICAL_CPT(float16, max_cpt, QUAD_LEGACY, <, 16r,
+                     1) // __kmpc_atomic_float16_max_cpt
+MIN_MAX_CRITICAL_CPT(float16, min_cpt, QUAD_LEGACY, >, 16r,
+                     1) // __kmpc_atomic_float16_min_cpt
+#if (KMP_ARCH_X86)
+MIN_MAX_CRITICAL_CPT(float16, max_a16_cpt, Quad_a16_t, <, 16r,
+                     1) // __kmpc_atomic_float16_max_a16_cpt
+MIN_MAX_CRITICAL_CPT(float16, min_a16_cpt, Quad_a16_t, >, 16r,
+                     1) // __kmpc_atomic_float16_mix_a16_cpt
 #endif
 #endif
 
 // ------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_EQV_CPT(OP,FLAG)                                 \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_CPT( OP, 0 );                                         \
-    }
-#else
-#define OP_GOMP_CRITICAL_EQV_CPT(OP,FLAG)
-#endif /* KMP_GOMP_COMPAT */
-// ------------------------------------------------------------------------
-#define ATOMIC_CMPX_EQV_CPT(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)         \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                 \
-    TYPE new_value;                                                       \
-    OP_GOMP_CRITICAL_EQV_CPT(^=~,GOMP_FLAG)  /* send assignment */        \
-    OP_CMPXCHG_CPT(TYPE,BITS,OP)                                          \
-}
+#define OP_GOMP_CRITICAL_EQV_CPT(OP, FLAG)                                     \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_CPT(OP, 0);                                                    \
+  }
+#else
+#define OP_GOMP_CRITICAL_EQV_CPT(OP, FLAG)
+#endif /* KMP_GOMP_COMPAT */
+// ------------------------------------------------------------------------
+#define ATOMIC_CMPX_EQV_CPT(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)         \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_EQV_CPT(^= ~, GOMP_FLAG) /* send assignment */              \
+  OP_CMPXCHG_CPT(TYPE, BITS, OP)                                               \
+  }
 
 // ------------------------------------------------------------------------
 
-ATOMIC_CMPXCHG_CPT(  fixed1, neqv_cpt, kmp_int8,   8,   ^, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_neqv_cpt
-ATOMIC_CMPXCHG_CPT(  fixed2, neqv_cpt, kmp_int16, 16,   ^, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_neqv_cpt
-ATOMIC_CMPXCHG_CPT(  fixed4, neqv_cpt, kmp_int32, 32,   ^, KMP_ARCH_X86 ) // __kmpc_atomic_fixed4_neqv_cpt
-ATOMIC_CMPXCHG_CPT(  fixed8, neqv_cpt, kmp_int64, 64,   ^, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_neqv_cpt
-ATOMIC_CMPX_EQV_CPT( fixed1, eqv_cpt,  kmp_int8,   8,  ^~, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_eqv_cpt
-ATOMIC_CMPX_EQV_CPT( fixed2, eqv_cpt,  kmp_int16, 16,  ^~, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_eqv_cpt
-ATOMIC_CMPX_EQV_CPT( fixed4, eqv_cpt,  kmp_int32, 32,  ^~, KMP_ARCH_X86 ) // __kmpc_atomic_fixed4_eqv_cpt
-ATOMIC_CMPX_EQV_CPT( fixed8, eqv_cpt,  kmp_int64, 64,  ^~, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_eqv_cpt
+ATOMIC_CMPXCHG_CPT(fixed1, neqv_cpt, kmp_int8, 8, ^,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed1_neqv_cpt
+ATOMIC_CMPXCHG_CPT(fixed2, neqv_cpt, kmp_int16, 16, ^,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed2_neqv_cpt
+ATOMIC_CMPXCHG_CPT(fixed4, neqv_cpt, kmp_int32, 32, ^,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed4_neqv_cpt
+ATOMIC_CMPXCHG_CPT(fixed8, neqv_cpt, kmp_int64, 64, ^,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_neqv_cpt
+ATOMIC_CMPX_EQV_CPT(fixed1, eqv_cpt, kmp_int8, 8, ^~,
+                    KMP_ARCH_X86) // __kmpc_atomic_fixed1_eqv_cpt
+ATOMIC_CMPX_EQV_CPT(fixed2, eqv_cpt, kmp_int16, 16, ^~,
+                    KMP_ARCH_X86) // __kmpc_atomic_fixed2_eqv_cpt
+ATOMIC_CMPX_EQV_CPT(fixed4, eqv_cpt, kmp_int32, 32, ^~,
+                    KMP_ARCH_X86) // __kmpc_atomic_fixed4_eqv_cpt
+ATOMIC_CMPX_EQV_CPT(fixed8, eqv_cpt, kmp_int64, 64, ^~,
+                    KMP_ARCH_X86) // __kmpc_atomic_fixed8_eqv_cpt
 
 // ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
 //     TYPE_ID, OP_ID, TYPE - detailed above
 //     OP      - operator
 //     LCK_ID  - lock identifier, used to possibly distinguish lock variable
-#define ATOMIC_CRITICAL_CPT(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG) \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                           \
-    TYPE new_value;                                                 \
-    OP_GOMP_CRITICAL_CPT(OP,GOMP_FLAG)  /* send assignment */       \
-    OP_CRITICAL_CPT(OP##=,LCK_ID)          /* send assignment */    \
-}
+#define ATOMIC_CRITICAL_CPT(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)       \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_CPT(OP, GOMP_FLAG) /* send assignment */                    \
+  OP_CRITICAL_CPT(OP## =, LCK_ID) /* send assignment */                        \
+  }
 
 // ------------------------------------------------------------------------
-
 // Workaround for cmplx4. Regular routines with return value don't work
 // on Win_32e. Let's return captured values through the additional parameter.
-#define OP_CRITICAL_CPT_WRK(OP,LCK_ID)                                    \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    if( flag ) {                                                          \
-        (*lhs) OP rhs;                                                    \
-        (*out) = (*lhs);                                                  \
-    } else {                                                              \
-        (*out) = (*lhs);                                                  \
-        (*lhs) OP rhs;                                                    \
-    }                                                                     \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-    return;
+#define OP_CRITICAL_CPT_WRK(OP, LCK_ID)                                        \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (flag) {                                                                  \
+    (*lhs) OP rhs;                                                             \
+    (*out) = (*lhs);                                                           \
+  } else {                                                                     \
+    (*out) = (*lhs);                                                           \
+    (*lhs) OP rhs;                                                             \
+  }                                                                            \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return;
 // ------------------------------------------------------------------------
 
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_CPT_WRK(OP,FLAG)                                 \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_CPT_WRK( OP##=, 0 );                                  \
-    }
-#else
-#define OP_GOMP_CRITICAL_CPT_WRK(OP,FLAG)
+#define OP_GOMP_CRITICAL_CPT_WRK(OP, FLAG)                                     \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_CPT_WRK(OP## =, 0);                                            \
+  }
+#else
+#define OP_GOMP_CRITICAL_CPT_WRK(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 // ------------------------------------------------------------------------
 
-#define ATOMIC_BEGIN_WRK(TYPE_ID,OP_ID,TYPE)                              \
-void __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs, TYPE * out, int flag ) \
-{                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid ));
-// ------------------------------------------------------------------------
-
-#define ATOMIC_CRITICAL_CPT_WRK(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)   \
-ATOMIC_BEGIN_WRK(TYPE_ID,OP_ID,TYPE)                                      \
-    OP_GOMP_CRITICAL_CPT_WRK(OP,GOMP_FLAG)                                \
-    OP_CRITICAL_CPT_WRK(OP##=,LCK_ID)                                     \
-}
+#define ATOMIC_BEGIN_WRK(TYPE_ID, OP_ID, TYPE)                                 \
+  void __kmpc_atomic_##TYPE_ID##_##OP_ID(ident_t *id_ref, int gtid, TYPE *lhs, \
+                                         TYPE rhs, TYPE *out, int flag) {      \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_" #OP_ID ": T#%d\n", gtid));
+// ------------------------------------------------------------------------
+
+#define ATOMIC_CRITICAL_CPT_WRK(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)   \
+  ATOMIC_BEGIN_WRK(TYPE_ID, OP_ID, TYPE)                                       \
+  OP_GOMP_CRITICAL_CPT_WRK(OP, GOMP_FLAG)                                      \
+  OP_CRITICAL_CPT_WRK(OP## =, LCK_ID)                                          \
+  }
 // The end of workaround for cmplx4
 
 /* ------------------------------------------------------------------------- */
 // routines for long double type
-ATOMIC_CRITICAL_CPT( float10, add_cpt, long double,     +, 10r,   1 )            // __kmpc_atomic_float10_add_cpt
-ATOMIC_CRITICAL_CPT( float10, sub_cpt, long double,     -, 10r,   1 )            // __kmpc_atomic_float10_sub_cpt
-ATOMIC_CRITICAL_CPT( float10, mul_cpt, long double,     *, 10r,   1 )            // __kmpc_atomic_float10_mul_cpt
-ATOMIC_CRITICAL_CPT( float10, div_cpt, long double,     /, 10r,   1 )            // __kmpc_atomic_float10_div_cpt
+ATOMIC_CRITICAL_CPT(float10, add_cpt, long double, +, 10r,
+                    1) // __kmpc_atomic_float10_add_cpt
+ATOMIC_CRITICAL_CPT(float10, sub_cpt, long double, -, 10r,
+                    1) // __kmpc_atomic_float10_sub_cpt
+ATOMIC_CRITICAL_CPT(float10, mul_cpt, long double, *, 10r,
+                    1) // __kmpc_atomic_float10_mul_cpt
+ATOMIC_CRITICAL_CPT(float10, div_cpt, long double, /, 10r,
+                    1) // __kmpc_atomic_float10_div_cpt
 #if KMP_HAVE_QUAD
 // routines for _Quad type
-ATOMIC_CRITICAL_CPT( float16, add_cpt, QUAD_LEGACY,     +, 16r,   1 )            // __kmpc_atomic_float16_add_cpt
-ATOMIC_CRITICAL_CPT( float16, sub_cpt, QUAD_LEGACY,     -, 16r,   1 )            // __kmpc_atomic_float16_sub_cpt
-ATOMIC_CRITICAL_CPT( float16, mul_cpt, QUAD_LEGACY,     *, 16r,   1 )            // __kmpc_atomic_float16_mul_cpt
-ATOMIC_CRITICAL_CPT( float16, div_cpt, QUAD_LEGACY,     /, 16r,   1 )            // __kmpc_atomic_float16_div_cpt
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_CPT( float16, add_a16_cpt, Quad_a16_t, +, 16r,  1 )          // __kmpc_atomic_float16_add_a16_cpt
-    ATOMIC_CRITICAL_CPT( float16, sub_a16_cpt, Quad_a16_t, -, 16r,  1 )          // __kmpc_atomic_float16_sub_a16_cpt
-    ATOMIC_CRITICAL_CPT( float16, mul_a16_cpt, Quad_a16_t, *, 16r,  1 )          // __kmpc_atomic_float16_mul_a16_cpt
-    ATOMIC_CRITICAL_CPT( float16, div_a16_cpt, Quad_a16_t, /, 16r,  1 )          // __kmpc_atomic_float16_div_a16_cpt
+ATOMIC_CRITICAL_CPT(float16, add_cpt, QUAD_LEGACY, +, 16r,
+                    1) // __kmpc_atomic_float16_add_cpt
+ATOMIC_CRITICAL_CPT(float16, sub_cpt, QUAD_LEGACY, -, 16r,
+                    1) // __kmpc_atomic_float16_sub_cpt
+ATOMIC_CRITICAL_CPT(float16, mul_cpt, QUAD_LEGACY, *, 16r,
+                    1) // __kmpc_atomic_float16_mul_cpt
+ATOMIC_CRITICAL_CPT(float16, div_cpt, QUAD_LEGACY, /, 16r,
+                    1) // __kmpc_atomic_float16_div_cpt
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_CPT(float16, add_a16_cpt, Quad_a16_t, +, 16r,
+                    1) // __kmpc_atomic_float16_add_a16_cpt
+ATOMIC_CRITICAL_CPT(float16, sub_a16_cpt, Quad_a16_t, -, 16r,
+                    1) // __kmpc_atomic_float16_sub_a16_cpt
+ATOMIC_CRITICAL_CPT(float16, mul_a16_cpt, Quad_a16_t, *, 16r,
+                    1) // __kmpc_atomic_float16_mul_a16_cpt
+ATOMIC_CRITICAL_CPT(float16, div_a16_cpt, Quad_a16_t, /, 16r,
+                    1) // __kmpc_atomic_float16_div_a16_cpt
 #endif
 #endif
 
 // routines for complex types
 
 // cmplx4 routines to return void
-ATOMIC_CRITICAL_CPT_WRK( cmplx4,  add_cpt, kmp_cmplx32, +, 8c,    1 )            // __kmpc_atomic_cmplx4_add_cpt
-ATOMIC_CRITICAL_CPT_WRK( cmplx4,  sub_cpt, kmp_cmplx32, -, 8c,    1 )            // __kmpc_atomic_cmplx4_sub_cpt
-ATOMIC_CRITICAL_CPT_WRK( cmplx4,  mul_cpt, kmp_cmplx32, *, 8c,    1 )            // __kmpc_atomic_cmplx4_mul_cpt
-ATOMIC_CRITICAL_CPT_WRK( cmplx4,  div_cpt, kmp_cmplx32, /, 8c,    1 )            // __kmpc_atomic_cmplx4_div_cpt
-
-ATOMIC_CRITICAL_CPT( cmplx8,  add_cpt, kmp_cmplx64, +, 16c,   1 )            // __kmpc_atomic_cmplx8_add_cpt
-ATOMIC_CRITICAL_CPT( cmplx8,  sub_cpt, kmp_cmplx64, -, 16c,   1 )            // __kmpc_atomic_cmplx8_sub_cpt
-ATOMIC_CRITICAL_CPT( cmplx8,  mul_cpt, kmp_cmplx64, *, 16c,   1 )            // __kmpc_atomic_cmplx8_mul_cpt
-ATOMIC_CRITICAL_CPT( cmplx8,  div_cpt, kmp_cmplx64, /, 16c,   1 )            // __kmpc_atomic_cmplx8_div_cpt
-ATOMIC_CRITICAL_CPT( cmplx10, add_cpt, kmp_cmplx80, +, 20c,   1 )            // __kmpc_atomic_cmplx10_add_cpt
-ATOMIC_CRITICAL_CPT( cmplx10, sub_cpt, kmp_cmplx80, -, 20c,   1 )            // __kmpc_atomic_cmplx10_sub_cpt
-ATOMIC_CRITICAL_CPT( cmplx10, mul_cpt, kmp_cmplx80, *, 20c,   1 )            // __kmpc_atomic_cmplx10_mul_cpt
-ATOMIC_CRITICAL_CPT( cmplx10, div_cpt, kmp_cmplx80, /, 20c,   1 )            // __kmpc_atomic_cmplx10_div_cpt
+ATOMIC_CRITICAL_CPT_WRK(cmplx4, add_cpt, kmp_cmplx32, +, 8c,
+                        1) // __kmpc_atomic_cmplx4_add_cpt
+ATOMIC_CRITICAL_CPT_WRK(cmplx4, sub_cpt, kmp_cmplx32, -, 8c,
+                        1) // __kmpc_atomic_cmplx4_sub_cpt
+ATOMIC_CRITICAL_CPT_WRK(cmplx4, mul_cpt, kmp_cmplx32, *, 8c,
+                        1) // __kmpc_atomic_cmplx4_mul_cpt
+ATOMIC_CRITICAL_CPT_WRK(cmplx4, div_cpt, kmp_cmplx32, /, 8c,
+                        1) // __kmpc_atomic_cmplx4_div_cpt
+
+ATOMIC_CRITICAL_CPT(cmplx8, add_cpt, kmp_cmplx64, +, 16c,
+                    1) // __kmpc_atomic_cmplx8_add_cpt
+ATOMIC_CRITICAL_CPT(cmplx8, sub_cpt, kmp_cmplx64, -, 16c,
+                    1) // __kmpc_atomic_cmplx8_sub_cpt
+ATOMIC_CRITICAL_CPT(cmplx8, mul_cpt, kmp_cmplx64, *, 16c,
+                    1) // __kmpc_atomic_cmplx8_mul_cpt
+ATOMIC_CRITICAL_CPT(cmplx8, div_cpt, kmp_cmplx64, /, 16c,
+                    1) // __kmpc_atomic_cmplx8_div_cpt
+ATOMIC_CRITICAL_CPT(cmplx10, add_cpt, kmp_cmplx80, +, 20c,
+                    1) // __kmpc_atomic_cmplx10_add_cpt
+ATOMIC_CRITICAL_CPT(cmplx10, sub_cpt, kmp_cmplx80, -, 20c,
+                    1) // __kmpc_atomic_cmplx10_sub_cpt
+ATOMIC_CRITICAL_CPT(cmplx10, mul_cpt, kmp_cmplx80, *, 20c,
+                    1) // __kmpc_atomic_cmplx10_mul_cpt
+ATOMIC_CRITICAL_CPT(cmplx10, div_cpt, kmp_cmplx80, /, 20c,
+                    1) // __kmpc_atomic_cmplx10_div_cpt
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_CPT( cmplx16, add_cpt, CPLX128_LEG, +, 32c,   1 )            // __kmpc_atomic_cmplx16_add_cpt
-ATOMIC_CRITICAL_CPT( cmplx16, sub_cpt, CPLX128_LEG, -, 32c,   1 )            // __kmpc_atomic_cmplx16_sub_cpt
-ATOMIC_CRITICAL_CPT( cmplx16, mul_cpt, CPLX128_LEG, *, 32c,   1 )            // __kmpc_atomic_cmplx16_mul_cpt
-ATOMIC_CRITICAL_CPT( cmplx16, div_cpt, CPLX128_LEG, /, 32c,   1 )            // __kmpc_atomic_cmplx16_div_cpt
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_CPT( cmplx16, add_a16_cpt, kmp_cmplx128_a16_t, +, 32c,   1 )   // __kmpc_atomic_cmplx16_add_a16_cpt
-    ATOMIC_CRITICAL_CPT( cmplx16, sub_a16_cpt, kmp_cmplx128_a16_t, -, 32c,   1 )   // __kmpc_atomic_cmplx16_sub_a16_cpt
-    ATOMIC_CRITICAL_CPT( cmplx16, mul_a16_cpt, kmp_cmplx128_a16_t, *, 32c,   1 )   // __kmpc_atomic_cmplx16_mul_a16_cpt
-    ATOMIC_CRITICAL_CPT( cmplx16, div_a16_cpt, kmp_cmplx128_a16_t, /, 32c,   1 )   // __kmpc_atomic_cmplx16_div_a16_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, add_cpt, CPLX128_LEG, +, 32c,
+                    1) // __kmpc_atomic_cmplx16_add_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, sub_cpt, CPLX128_LEG, -, 32c,
+                    1) // __kmpc_atomic_cmplx16_sub_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, mul_cpt, CPLX128_LEG, *, 32c,
+                    1) // __kmpc_atomic_cmplx16_mul_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, div_cpt, CPLX128_LEG, /, 32c,
+                    1) // __kmpc_atomic_cmplx16_div_cpt
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_CPT(cmplx16, add_a16_cpt, kmp_cmplx128_a16_t, +, 32c,
+                    1) // __kmpc_atomic_cmplx16_add_a16_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, sub_a16_cpt, kmp_cmplx128_a16_t, -, 32c,
+                    1) // __kmpc_atomic_cmplx16_sub_a16_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, mul_a16_cpt, kmp_cmplx128_a16_t, *, 32c,
+                    1) // __kmpc_atomic_cmplx16_mul_a16_cpt
+ATOMIC_CRITICAL_CPT(cmplx16, div_a16_cpt, kmp_cmplx128_a16_t, /, 32c,
+                    1) // __kmpc_atomic_cmplx16_div_a16_cpt
 #endif
 #endif
 
 #if OMP_40_ENABLED
 
-// OpenMP 4.0: v = x = expr binop x; { v = x; x = expr binop x; } { x = expr binop x; v = x; }  for non-commutative operations.
+// OpenMP 4.0: v = x = expr binop x; { v = x; x = expr binop x; } { x = expr
+// binop x; v = x; }  for non-commutative operations.
 // Supported only on IA-32 architecture and Intel(R) 64
 
 // -------------------------------------------------------------------------
@@ -2355,29 +2850,29 @@ ATOMIC_CRITICAL_CPT( cmplx16, div_cpt, C
 //     LCK_ID - lock identifier
 // Note: don't check gtid as it should always be valid
 // 1, 2-byte - expect valid parameter, other - check before this macro
-#define OP_CRITICAL_CPT_REV(OP,LCK_ID)                                    \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    if( flag ) {                                                          \
-        /*temp_val = (*lhs);*/\
-        (*lhs) = (rhs) OP (*lhs);                                         \
-        new_value = (*lhs);                                               \
-    } else {                                                              \
-        new_value = (*lhs);\
-        (*lhs) = (rhs) OP (*lhs);                                         \
-    }                                                                     \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-    return new_value;
+#define OP_CRITICAL_CPT_REV(OP, LCK_ID)                                        \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (flag) {                                                                  \
+    /*temp_val = (*lhs);*/                                                     \
+    (*lhs) = (rhs)OP(*lhs);                                                    \
+    new_value = (*lhs);                                                        \
+  } else {                                                                     \
+    new_value = (*lhs);                                                        \
+    (*lhs) = (rhs)OP(*lhs);                                                    \
+  }                                                                            \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return new_value;
 
 // ------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_CPT_REV(OP,FLAG)                                 \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_CPT_REV( OP, 0 );                                     \
-    }
+#define OP_GOMP_CRITICAL_CPT_REV(OP, FLAG)                                     \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_CPT_REV(OP, 0);                                                \
+  }
 #else
-#define OP_GOMP_CRITICAL_CPT_REV(OP,FLAG)
+#define OP_GOMP_CRITICAL_CPT_REV(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
 // ------------------------------------------------------------------------
@@ -2387,154 +2882,194 @@ ATOMIC_CRITICAL_CPT( cmplx16, div_cpt, C
 //     OP      - operator
 // Note: temp_val introduced in order to force the compiler to read
 //       *lhs only once (w/o it the compiler reads *lhs twice)
-#define OP_CMPXCHG_CPT_REV(TYPE,BITS,OP)                                  \
-    {                                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-        TYPE old_value, new_value;                                        \
-        temp_val = *lhs;                                                  \
-        old_value = temp_val;                                             \
-        new_value = rhs OP old_value;                                     \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,         \
-                      *VOLATILE_CAST(kmp_int##BITS *) &new_value ) )      \
-        {                                                                 \
-            KMP_CPU_PAUSE();                                              \
-                                                                          \
-            temp_val = *lhs;                                              \
-            old_value = temp_val;                                         \
-            new_value = rhs OP old_value;                                 \
-        }                                                                 \
-        if( flag ) {                                                      \
-            return new_value;                                             \
-        } else                                                            \
-            return old_value;                                             \
-    }
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_CPT_REV(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG)       \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                                  \
-    TYPE new_value;                                                        \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-    OP_GOMP_CRITICAL_CPT_REV(OP,GOMP_FLAG)                                 \
-    OP_CMPXCHG_CPT_REV(TYPE,BITS,OP)                                       \
-}
-
-
-ATOMIC_CMPXCHG_CPT_REV( fixed1,  div_cpt_rev, kmp_int8,    8, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed1u, div_cpt_rev, kmp_uint8,   8, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed1,  shl_cpt_rev, kmp_int8,    8, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shl_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed1,  shr_cpt_rev, kmp_int8,    8, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed1u, shr_cpt_rev, kmp_uint8,   8, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1u_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed1,  sub_cpt_rev, kmp_int8,    8, -,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_sub_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed2,  div_cpt_rev, kmp_int16,  16, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed2u, div_cpt_rev, kmp_uint16, 16, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed2,  shl_cpt_rev, kmp_int16,  16, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shl_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed2,  shr_cpt_rev, kmp_int16,  16, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed2u, shr_cpt_rev, kmp_uint16, 16, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2u_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed2,  sub_cpt_rev, kmp_int16,  16, -,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_sub_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed4,  div_cpt_rev, kmp_int32,  32, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed4u, div_cpt_rev, kmp_uint32, 32, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed4,  shl_cpt_rev, kmp_int32,  32, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shl_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed4,  shr_cpt_rev, kmp_int32,  32, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed4u, shr_cpt_rev, kmp_uint32, 32, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4u_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed4,  sub_cpt_rev, kmp_int32,  32, -,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_sub_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed8,  div_cpt_rev, kmp_int64,  64, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed8u, div_cpt_rev, kmp_uint64, 64, /,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed8,  shl_cpt_rev, kmp_int64,  64, <<, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shl_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed8,  shr_cpt_rev, kmp_int64,  64, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed8u, shr_cpt_rev, kmp_uint64, 64, >>, KMP_ARCH_X86 )  // __kmpc_atomic_fixed8u_shr_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( fixed8,  sub_cpt_rev, kmp_int64,  64, -,  KMP_ARCH_X86 )  // __kmpc_atomic_fixed8_sub_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( float4,  div_cpt_rev, kmp_real32, 32, /,  KMP_ARCH_X86 )  // __kmpc_atomic_float4_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( float4,  sub_cpt_rev, kmp_real32, 32, -,  KMP_ARCH_X86 )  // __kmpc_atomic_float4_sub_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( float8,  div_cpt_rev, kmp_real64, 64, /,  KMP_ARCH_X86 )  // __kmpc_atomic_float8_div_cpt_rev
-ATOMIC_CMPXCHG_CPT_REV( float8,  sub_cpt_rev, kmp_real64, 64, -,  KMP_ARCH_X86 )  // __kmpc_atomic_float8_sub_cpt_rev
+#define OP_CMPXCHG_CPT_REV(TYPE, BITS, OP)                                     \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    TYPE old_value, new_value;                                                 \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    new_value = rhs OP old_value;                                              \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) & old_value,     \
+        *VOLATILE_CAST(kmp_int##BITS *) & new_value)) {                        \
+      KMP_CPU_PAUSE();                                                         \
+                                                                               \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+      new_value = rhs OP old_value;                                            \
+    }                                                                          \
+    if (flag) {                                                                \
+      return new_value;                                                        \
+    } else                                                                     \
+      return old_value;                                                        \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_CPT_REV(TYPE_ID, OP_ID, TYPE, BITS, OP, GOMP_FLAG)      \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value;                                                              \
+  TYPE KMP_ATOMIC_VOLATILE temp_val;                                           \
+  OP_GOMP_CRITICAL_CPT_REV(OP, GOMP_FLAG)                                      \
+  OP_CMPXCHG_CPT_REV(TYPE, BITS, OP)                                           \
+  }
+
+ATOMIC_CMPXCHG_CPT_REV(fixed1, div_cpt_rev, kmp_int8, 8, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed1u, div_cpt_rev, kmp_uint8, 8, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed1, shl_cpt_rev, kmp_int8, 8, <<,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_shl_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed1, shr_cpt_rev, kmp_int8, 8, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed1u, shr_cpt_rev, kmp_uint8, 8, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1u_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed1, sub_cpt_rev, kmp_int8, 8, -,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed2, div_cpt_rev, kmp_int16, 16, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed2u, div_cpt_rev, kmp_uint16, 16, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed2, shl_cpt_rev, kmp_int16, 16, <<,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_shl_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed2, shr_cpt_rev, kmp_int16, 16, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed2u, shr_cpt_rev, kmp_uint16, 16, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2u_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed2, sub_cpt_rev, kmp_int16, 16, -,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed4, div_cpt_rev, kmp_int32, 32, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed4_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed4u, div_cpt_rev, kmp_uint32, 32, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed4u_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed4, shl_cpt_rev, kmp_int32, 32, <<,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed4_shl_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed4, shr_cpt_rev, kmp_int32, 32, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed4_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed4u, shr_cpt_rev, kmp_uint32, 32, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed4u_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed4, sub_cpt_rev, kmp_int32, 32, -,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed4_sub_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed8, div_cpt_rev, kmp_int64, 64, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed8u, div_cpt_rev, kmp_uint64, 64, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed8, shl_cpt_rev, kmp_int64, 64, <<,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_shl_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed8, shr_cpt_rev, kmp_int64, 64, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed8u, shr_cpt_rev, kmp_uint64, 64, >>,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8u_shr_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(fixed8, sub_cpt_rev, kmp_int64, 64, -,
+                       KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(float4, div_cpt_rev, kmp_real32, 32, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(float4, sub_cpt_rev, kmp_real32, 32, -,
+                       KMP_ARCH_X86) // __kmpc_atomic_float4_sub_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(float8, div_cpt_rev, kmp_real64, 64, /,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_div_cpt_rev
+ATOMIC_CMPXCHG_CPT_REV(float8, sub_cpt_rev, kmp_real64, 64, -,
+                       KMP_ARCH_X86) // __kmpc_atomic_float8_sub_cpt_rev
 //              TYPE_ID,OP_ID, TYPE,          OP,  GOMP_FLAG
 
-
 // ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
 //     TYPE_ID, OP_ID, TYPE - detailed above
 //     OP      - operator
 //     LCK_ID  - lock identifier, used to possibly distinguish lock variable
-#define ATOMIC_CRITICAL_CPT_REV(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG) \
-ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE)                               \
-    TYPE new_value;                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-    /*printf("__kmp_atomic_mode = %d\n", __kmp_atomic_mode);*/\
-    OP_GOMP_CRITICAL_CPT_REV(OP,GOMP_FLAG)                              \
-    OP_CRITICAL_CPT_REV(OP,LCK_ID)                                      \
-}
-
+#define ATOMIC_CRITICAL_CPT_REV(TYPE_ID, OP_ID, TYPE, OP, LCK_ID, GOMP_FLAG)   \
+  ATOMIC_BEGIN_CPT(TYPE_ID, OP_ID, TYPE, TYPE)                                 \
+  TYPE new_value;                                                              \
+  TYPE KMP_ATOMIC_VOLATILE temp_val;                                           \
+  /*printf("__kmp_atomic_mode = %d\n", __kmp_atomic_mode);*/                   \
+  OP_GOMP_CRITICAL_CPT_REV(OP, GOMP_FLAG)                                      \
+  OP_CRITICAL_CPT_REV(OP, LCK_ID)                                              \
+  }
 
 /* ------------------------------------------------------------------------- */
 // routines for long double type
-ATOMIC_CRITICAL_CPT_REV( float10, sub_cpt_rev, long double,     -, 10r,   1 )            // __kmpc_atomic_float10_sub_cpt_rev
-ATOMIC_CRITICAL_CPT_REV( float10, div_cpt_rev, long double,     /, 10r,   1 )            // __kmpc_atomic_float10_div_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(float10, sub_cpt_rev, long double, -, 10r,
+                        1) // __kmpc_atomic_float10_sub_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(float10, div_cpt_rev, long double, /, 10r,
+                        1) // __kmpc_atomic_float10_div_cpt_rev
 #if KMP_HAVE_QUAD
 // routines for _Quad type
-ATOMIC_CRITICAL_CPT_REV( float16, sub_cpt_rev, QUAD_LEGACY,     -, 16r,   1 )            // __kmpc_atomic_float16_sub_cpt_rev
-ATOMIC_CRITICAL_CPT_REV( float16, div_cpt_rev, QUAD_LEGACY,     /, 16r,   1 )            // __kmpc_atomic_float16_div_cpt_rev
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_CPT_REV( float16, sub_a16_cpt_rev, Quad_a16_t, -, 16r,  1 )          // __kmpc_atomic_float16_sub_a16_cpt_rev
-    ATOMIC_CRITICAL_CPT_REV( float16, div_a16_cpt_rev, Quad_a16_t, /, 16r,  1 )          // __kmpc_atomic_float16_div_a16_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(float16, sub_cpt_rev, QUAD_LEGACY, -, 16r,
+                        1) // __kmpc_atomic_float16_sub_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(float16, div_cpt_rev, QUAD_LEGACY, /, 16r,
+                        1) // __kmpc_atomic_float16_div_cpt_rev
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_CPT_REV(float16, sub_a16_cpt_rev, Quad_a16_t, -, 16r,
+                        1) // __kmpc_atomic_float16_sub_a16_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(float16, div_a16_cpt_rev, Quad_a16_t, /, 16r,
+                        1) // __kmpc_atomic_float16_div_a16_cpt_rev
 #endif
 #endif
 
 // routines for complex types
 
 // ------------------------------------------------------------------------
-
 // Workaround for cmplx4. Regular routines with return value don't work
 // on Win_32e. Let's return captured values through the additional parameter.
-#define OP_CRITICAL_CPT_REV_WRK(OP,LCK_ID)                                \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    if( flag ) {                                                          \
-        (*lhs) = (rhs) OP (*lhs);                                         \
-        (*out) = (*lhs);                                                  \
-    } else {                                                              \
-        (*out) = (*lhs);                                                  \
-        (*lhs) = (rhs) OP (*lhs);                                         \
-    }                                                                     \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-    return;
+#define OP_CRITICAL_CPT_REV_WRK(OP, LCK_ID)                                    \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  if (flag) {                                                                  \
+    (*lhs) = (rhs)OP(*lhs);                                                    \
+    (*out) = (*lhs);                                                           \
+  } else {                                                                     \
+    (*out) = (*lhs);                                                           \
+    (*lhs) = (rhs)OP(*lhs);                                                    \
+  }                                                                            \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return;
 // ------------------------------------------------------------------------
 
 #ifdef KMP_GOMP_COMPAT
-#define OP_GOMP_CRITICAL_CPT_REV_WRK(OP,FLAG)                             \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        OP_CRITICAL_CPT_REV_WRK( OP, 0 );                                 \
-    }
-#else
-#define OP_GOMP_CRITICAL_CPT_REV_WRK(OP,FLAG)
+#define OP_GOMP_CRITICAL_CPT_REV_WRK(OP, FLAG)                                 \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    OP_CRITICAL_CPT_REV_WRK(OP, 0);                                            \
+  }
+#else
+#define OP_GOMP_CRITICAL_CPT_REV_WRK(OP, FLAG)
 #endif /* KMP_GOMP_COMPAT */
 // ------------------------------------------------------------------------
 
-#define ATOMIC_CRITICAL_CPT_REV_WRK(TYPE_ID,OP_ID,TYPE,OP,LCK_ID,GOMP_FLAG)   \
-ATOMIC_BEGIN_WRK(TYPE_ID,OP_ID,TYPE)                                          \
-    OP_GOMP_CRITICAL_CPT_REV_WRK(OP,GOMP_FLAG)                                \
-    OP_CRITICAL_CPT_REV_WRK(OP,LCK_ID)                                        \
-}
+#define ATOMIC_CRITICAL_CPT_REV_WRK(TYPE_ID, OP_ID, TYPE, OP, LCK_ID,          \
+                                    GOMP_FLAG)                                 \
+  ATOMIC_BEGIN_WRK(TYPE_ID, OP_ID, TYPE)                                       \
+  OP_GOMP_CRITICAL_CPT_REV_WRK(OP, GOMP_FLAG)                                  \
+  OP_CRITICAL_CPT_REV_WRK(OP, LCK_ID)                                          \
+  }
 // The end of workaround for cmplx4
 
-
 // !!! TODO: check if we need to return void for cmplx4 routines
 // cmplx4 routines to return void
-ATOMIC_CRITICAL_CPT_REV_WRK( cmplx4,  sub_cpt_rev, kmp_cmplx32, -, 8c,    1 )            // __kmpc_atomic_cmplx4_sub_cpt_rev
-ATOMIC_CRITICAL_CPT_REV_WRK( cmplx4,  div_cpt_rev, kmp_cmplx32, /, 8c,    1 )            // __kmpc_atomic_cmplx4_div_cpt_rev
-
-ATOMIC_CRITICAL_CPT_REV( cmplx8,  sub_cpt_rev, kmp_cmplx64, -, 16c,   1 )            // __kmpc_atomic_cmplx8_sub_cpt_rev
-ATOMIC_CRITICAL_CPT_REV( cmplx8,  div_cpt_rev, kmp_cmplx64, /, 16c,   1 )            // __kmpc_atomic_cmplx8_div_cpt_rev
-ATOMIC_CRITICAL_CPT_REV( cmplx10, sub_cpt_rev, kmp_cmplx80, -, 20c,   1 )            // __kmpc_atomic_cmplx10_sub_cpt_rev
-ATOMIC_CRITICAL_CPT_REV( cmplx10, div_cpt_rev, kmp_cmplx80, /, 20c,   1 )            // __kmpc_atomic_cmplx10_div_cpt_rev
+ATOMIC_CRITICAL_CPT_REV_WRK(cmplx4, sub_cpt_rev, kmp_cmplx32, -, 8c,
+                            1) // __kmpc_atomic_cmplx4_sub_cpt_rev
+ATOMIC_CRITICAL_CPT_REV_WRK(cmplx4, div_cpt_rev, kmp_cmplx32, /, 8c,
+                            1) // __kmpc_atomic_cmplx4_div_cpt_rev
+
+ATOMIC_CRITICAL_CPT_REV(cmplx8, sub_cpt_rev, kmp_cmplx64, -, 16c,
+                        1) // __kmpc_atomic_cmplx8_sub_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(cmplx8, div_cpt_rev, kmp_cmplx64, /, 16c,
+                        1) // __kmpc_atomic_cmplx8_div_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(cmplx10, sub_cpt_rev, kmp_cmplx80, -, 20c,
+                        1) // __kmpc_atomic_cmplx10_sub_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(cmplx10, div_cpt_rev, kmp_cmplx80, /, 20c,
+                        1) // __kmpc_atomic_cmplx10_div_cpt_rev
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_CPT_REV( cmplx16, sub_cpt_rev, CPLX128_LEG, -, 32c,   1 )            // __kmpc_atomic_cmplx16_sub_cpt_rev
-ATOMIC_CRITICAL_CPT_REV( cmplx16, div_cpt_rev, CPLX128_LEG, /, 32c,   1 )            // __kmpc_atomic_cmplx16_div_cpt_rev
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_CPT_REV( cmplx16, sub_a16_cpt_rev, kmp_cmplx128_a16_t, -, 32c,   1 )   // __kmpc_atomic_cmplx16_sub_a16_cpt_rev
-    ATOMIC_CRITICAL_CPT_REV( cmplx16, div_a16_cpt_rev, kmp_cmplx128_a16_t, /, 32c,   1 )   // __kmpc_atomic_cmplx16_div_a16_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(cmplx16, sub_cpt_rev, CPLX128_LEG, -, 32c,
+                        1) // __kmpc_atomic_cmplx16_sub_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(cmplx16, div_cpt_rev, CPLX128_LEG, /, 32c,
+                        1) // __kmpc_atomic_cmplx16_div_cpt_rev
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_CPT_REV(cmplx16, sub_a16_cpt_rev, kmp_cmplx128_a16_t, -, 32c,
+                        1) // __kmpc_atomic_cmplx16_sub_a16_cpt_rev
+ATOMIC_CRITICAL_CPT_REV(cmplx16, div_a16_cpt_rev, kmp_cmplx128_a16_t, /, 32c,
+                        1) // __kmpc_atomic_cmplx16_div_a16_cpt_rev
 #endif
 #endif
 
@@ -2542,577 +3077,556 @@ ATOMIC_CRITICAL_CPT_REV( cmplx16, div_cp
 #if KMP_HAVE_QUAD
 
 // Beginning of a definition (provides name, parameters, gebug trace)
-//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned fixed)
+//     TYPE_ID - operands type and size (fixed*, fixed*u for signed, unsigned
+//     fixed)
 //     OP_ID   - operation identifier (add, sub, mul, ...)
 //     TYPE    - operands' type
 // -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_CPT_REV_MIX(TYPE_ID,TYPE,OP_ID,BITS,OP,RTYPE_ID,RTYPE,LCK_ID,MASK,GOMP_FLAG)       \
-ATOMIC_BEGIN_CPT_MIX(TYPE_ID,OP_ID,TYPE,RTYPE_ID,RTYPE)                    \
-    TYPE new_value;                                                        \
-    OP_GOMP_CRITICAL_CPT_REV(OP,GOMP_FLAG)                                 \
-    OP_CMPXCHG_CPT_REV(TYPE,BITS,OP)                                       \
-}
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CRITICAL_CPT_REV_MIX(TYPE_ID,TYPE,OP_ID,OP,RTYPE_ID,RTYPE,LCK_ID,GOMP_FLAG)         \
-ATOMIC_BEGIN_CPT_MIX(TYPE_ID,OP_ID,TYPE,RTYPE_ID,RTYPE)                    \
-    TYPE new_value;                                                        \
-    OP_GOMP_CRITICAL_CPT_REV(OP,GOMP_FLAG)  /* send assignment */                              \
-    OP_CRITICAL_CPT_REV(OP,LCK_ID)  /* send assignment */                                      \
-}
-
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed1,  char,       sub_cpt_rev,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed1u, uchar,      sub_cpt_rev,  8, -, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed1,  char,       div_cpt_rev,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1_div_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed1u, uchar,      div_cpt_rev,  8, /, fp, _Quad, 1i, 0, KMP_ARCH_X86 ) // __kmpc_atomic_fixed1u_div_cpt_rev_fp
-
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed2,  short,      sub_cpt_rev, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed2u, ushort,     sub_cpt_rev, 16, -, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed2,  short,      div_cpt_rev, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2_div_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed2u, ushort,     div_cpt_rev, 16, /, fp, _Quad, 2i, 1, KMP_ARCH_X86 ) // __kmpc_atomic_fixed2u_div_cpt_rev_fp
-
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed4,  kmp_int32,  sub_cpt_rev, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed4u, kmp_uint32, sub_cpt_rev, 32, -, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed4,  kmp_int32,  div_cpt_rev, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4_div_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed4u, kmp_uint32, div_cpt_rev, 32, /, fp, _Quad, 4i, 3, 0 )            // __kmpc_atomic_fixed4u_div_cpt_rev_fp
-
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed8,  kmp_int64,  sub_cpt_rev, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed8u, kmp_uint64, sub_cpt_rev, 64, -, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed8,  kmp_int64,  div_cpt_rev, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_div_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( fixed8u, kmp_uint64, div_cpt_rev, 64, /, fp, _Quad, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8u_div_cpt_rev_fp
-
-ATOMIC_CMPXCHG_CPT_REV_MIX( float4,  kmp_real32, sub_cpt_rev, 32, -, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( float4,  kmp_real32, div_cpt_rev, 32, /, fp, _Quad, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_div_cpt_rev_fp
-
-ATOMIC_CMPXCHG_CPT_REV_MIX( float8,  kmp_real64, sub_cpt_rev, 64, -, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub_cpt_rev_fp
-ATOMIC_CMPXCHG_CPT_REV_MIX( float8,  kmp_real64, div_cpt_rev, 64, /, fp, _Quad, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_div_cpt_rev_fp
-
-ATOMIC_CRITICAL_CPT_REV_MIX( float10, long double, sub_cpt_rev, -, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_sub_cpt_rev_fp
-ATOMIC_CRITICAL_CPT_REV_MIX( float10, long double, div_cpt_rev, /, fp, _Quad, 10r,   1 )            // __kmpc_atomic_float10_div_cpt_rev_fp
-
-#endif //KMP_HAVE_QUAD
+#define ATOMIC_CMPXCHG_CPT_REV_MIX(TYPE_ID, TYPE, OP_ID, BITS, OP, RTYPE_ID,   \
+                                   RTYPE, LCK_ID, MASK, GOMP_FLAG)             \
+  ATOMIC_BEGIN_CPT_MIX(TYPE_ID, OP_ID, TYPE, RTYPE_ID, RTYPE)                  \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_CPT_REV(OP, GOMP_FLAG)                                      \
+  OP_CMPXCHG_CPT_REV(TYPE, BITS, OP)                                           \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CRITICAL_CPT_REV_MIX(TYPE_ID, TYPE, OP_ID, OP, RTYPE_ID, RTYPE, \
+                                    LCK_ID, GOMP_FLAG)                         \
+  ATOMIC_BEGIN_CPT_MIX(TYPE_ID, OP_ID, TYPE, RTYPE_ID, RTYPE)                  \
+  TYPE new_value;                                                              \
+  OP_GOMP_CRITICAL_CPT_REV(OP, GOMP_FLAG) /* send assignment */                \
+  OP_CRITICAL_CPT_REV(OP, LCK_ID) /* send assignment */                        \
+  }
+
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed1, char, sub_cpt_rev, 8, -, fp, _Quad, 1i, 0,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed1_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed1u, uchar, sub_cpt_rev, 8, -, fp, _Quad, 1i, 0,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed1u_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed1, char, div_cpt_rev, 8, /, fp, _Quad, 1i, 0,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed1_div_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed1u, uchar, div_cpt_rev, 8, /, fp, _Quad, 1i, 0,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed1u_div_cpt_rev_fp
+
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed2, short, sub_cpt_rev, 16, -, fp, _Quad, 2i, 1,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed2_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed2u, ushort, sub_cpt_rev, 16, -, fp, _Quad, 2i,
+                           1,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed2u_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed2, short, div_cpt_rev, 16, /, fp, _Quad, 2i, 1,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed2_div_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed2u, ushort, div_cpt_rev, 16, /, fp, _Quad, 2i,
+                           1,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed2u_div_cpt_rev_fp
+
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed4, kmp_int32, sub_cpt_rev, 32, -, fp, _Quad, 4i,
+                           3, 0) // __kmpc_atomic_fixed4_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed4u, kmp_uint32, sub_cpt_rev, 32, -, fp, _Quad,
+                           4i, 3, 0) // __kmpc_atomic_fixed4u_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed4, kmp_int32, div_cpt_rev, 32, /, fp, _Quad, 4i,
+                           3, 0) // __kmpc_atomic_fixed4_div_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed4u, kmp_uint32, div_cpt_rev, 32, /, fp, _Quad,
+                           4i, 3, 0) // __kmpc_atomic_fixed4u_div_cpt_rev_fp
+
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed8, kmp_int64, sub_cpt_rev, 64, -, fp, _Quad, 8i,
+                           7,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed8_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed8u, kmp_uint64, sub_cpt_rev, 64, -, fp, _Quad,
+                           8i, 7,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed8u_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed8, kmp_int64, div_cpt_rev, 64, /, fp, _Quad, 8i,
+                           7,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed8_div_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(fixed8u, kmp_uint64, div_cpt_rev, 64, /, fp, _Quad,
+                           8i, 7,
+                           KMP_ARCH_X86) // __kmpc_atomic_fixed8u_div_cpt_rev_fp
+
+ATOMIC_CMPXCHG_CPT_REV_MIX(float4, kmp_real32, sub_cpt_rev, 32, -, fp, _Quad,
+                           4r, 3,
+                           KMP_ARCH_X86) // __kmpc_atomic_float4_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(float4, kmp_real32, div_cpt_rev, 32, /, fp, _Quad,
+                           4r, 3,
+                           KMP_ARCH_X86) // __kmpc_atomic_float4_div_cpt_rev_fp
+
+ATOMIC_CMPXCHG_CPT_REV_MIX(float8, kmp_real64, sub_cpt_rev, 64, -, fp, _Quad,
+                           8r, 7,
+                           KMP_ARCH_X86) // __kmpc_atomic_float8_sub_cpt_rev_fp
+ATOMIC_CMPXCHG_CPT_REV_MIX(float8, kmp_real64, div_cpt_rev, 64, /, fp, _Quad,
+                           8r, 7,
+                           KMP_ARCH_X86) // __kmpc_atomic_float8_div_cpt_rev_fp
+
+ATOMIC_CRITICAL_CPT_REV_MIX(float10, long double, sub_cpt_rev, -, fp, _Quad,
+                            10r, 1) // __kmpc_atomic_float10_sub_cpt_rev_fp
+ATOMIC_CRITICAL_CPT_REV_MIX(float10, long double, div_cpt_rev, /, fp, _Quad,
+                            10r, 1) // __kmpc_atomic_float10_div_cpt_rev_fp
 
+#endif // KMP_HAVE_QUAD
 
 //   OpenMP 4.0 Capture-write (swap): {v = x; x = expr;}
 
-#define ATOMIC_BEGIN_SWP(TYPE_ID,TYPE)                                                    \
-TYPE __kmpc_atomic_##TYPE_ID##_swp( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs )     \
-{                                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_swp: T#%d\n", gtid ));
-
-#define CRITICAL_SWP(LCK_ID)                                              \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    old_value = (*lhs);                                                   \
-    (*lhs) = rhs;                                                         \
-                                                                          \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-    return old_value;
+#define ATOMIC_BEGIN_SWP(TYPE_ID, TYPE)                                        \
+  TYPE __kmpc_atomic_##TYPE_ID##_swp(ident_t *id_ref, int gtid, TYPE *lhs,     \
+                                     TYPE rhs) {                               \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_swp: T#%d\n", gtid));
+
+#define CRITICAL_SWP(LCK_ID)                                                   \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  old_value = (*lhs);                                                          \
+  (*lhs) = rhs;                                                                \
+                                                                               \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return old_value;
 
 // ------------------------------------------------------------------------
 #ifdef KMP_GOMP_COMPAT
-#define GOMP_CRITICAL_SWP(FLAG)                                           \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        CRITICAL_SWP( 0 );                                                \
-    }
+#define GOMP_CRITICAL_SWP(FLAG)                                                \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    CRITICAL_SWP(0);                                                           \
+  }
 #else
 #define GOMP_CRITICAL_SWP(FLAG)
 #endif /* KMP_GOMP_COMPAT */
 
+#define ATOMIC_XCHG_SWP(TYPE_ID, TYPE, BITS, GOMP_FLAG)                        \
+  ATOMIC_BEGIN_SWP(TYPE_ID, TYPE)                                              \
+  TYPE old_value;                                                              \
+  GOMP_CRITICAL_SWP(GOMP_FLAG)                                                 \
+  old_value = KMP_XCHG_FIXED##BITS(lhs, rhs);                                  \
+  return old_value;                                                            \
+  }
+// ------------------------------------------------------------------------
+#define ATOMIC_XCHG_FLOAT_SWP(TYPE_ID, TYPE, BITS, GOMP_FLAG)                  \
+  ATOMIC_BEGIN_SWP(TYPE_ID, TYPE)                                              \
+  TYPE old_value;                                                              \
+  GOMP_CRITICAL_SWP(GOMP_FLAG)                                                 \
+  old_value = KMP_XCHG_REAL##BITS(lhs, rhs);                                   \
+  return old_value;                                                            \
+  }
+
+// ------------------------------------------------------------------------
+#define CMPXCHG_SWP(TYPE, BITS)                                                \
+  {                                                                            \
+    TYPE KMP_ATOMIC_VOLATILE temp_val;                                         \
+    TYPE old_value, new_value;                                                 \
+    temp_val = *lhs;                                                           \
+    old_value = temp_val;                                                      \
+    new_value = rhs;                                                           \
+    while (!KMP_COMPARE_AND_STORE_ACQ##BITS(                                   \
+        (kmp_int##BITS *)lhs, *VOLATILE_CAST(kmp_int##BITS *) & old_value,     \
+        *VOLATILE_CAST(kmp_int##BITS *) & new_value)) {                        \
+      KMP_CPU_PAUSE();                                                         \
+                                                                               \
+      temp_val = *lhs;                                                         \
+      old_value = temp_val;                                                    \
+      new_value = rhs;                                                         \
+    }                                                                          \
+    return old_value;                                                          \
+  }
+
+// -------------------------------------------------------------------------
+#define ATOMIC_CMPXCHG_SWP(TYPE_ID, TYPE, BITS, GOMP_FLAG)                     \
+  ATOMIC_BEGIN_SWP(TYPE_ID, TYPE)                                              \
+  TYPE old_value;                                                              \
+  GOMP_CRITICAL_SWP(GOMP_FLAG)                                                 \
+  CMPXCHG_SWP(TYPE, BITS)                                                      \
+  }
+
+ATOMIC_XCHG_SWP(fixed1, kmp_int8, 8, KMP_ARCH_X86) // __kmpc_atomic_fixed1_swp
+ATOMIC_XCHG_SWP(fixed2, kmp_int16, 16, KMP_ARCH_X86) // __kmpc_atomic_fixed2_swp
+ATOMIC_XCHG_SWP(fixed4, kmp_int32, 32, KMP_ARCH_X86) // __kmpc_atomic_fixed4_swp
+
+ATOMIC_XCHG_FLOAT_SWP(float4, kmp_real32, 32,
+                      KMP_ARCH_X86) // __kmpc_atomic_float4_swp
+
+#if (KMP_ARCH_X86)
+ATOMIC_CMPXCHG_SWP(fixed8, kmp_int64, 64,
+                   KMP_ARCH_X86) // __kmpc_atomic_fixed8_swp
+ATOMIC_CMPXCHG_SWP(float8, kmp_real64, 64,
+                   KMP_ARCH_X86) // __kmpc_atomic_float8_swp
+#else
+ATOMIC_XCHG_SWP(fixed8, kmp_int64, 64, KMP_ARCH_X86) // __kmpc_atomic_fixed8_swp
+ATOMIC_XCHG_FLOAT_SWP(float8, kmp_real64, 64,
+                      KMP_ARCH_X86) // __kmpc_atomic_float8_swp
+#endif
+
+// ------------------------------------------------------------------------
+// Routines for Extended types: long double, _Quad, complex flavours (use
+// critical section)
+#define ATOMIC_CRITICAL_SWP(TYPE_ID, TYPE, LCK_ID, GOMP_FLAG)                  \
+  ATOMIC_BEGIN_SWP(TYPE_ID, TYPE)                                              \
+  TYPE old_value;                                                              \
+  GOMP_CRITICAL_SWP(GOMP_FLAG)                                                 \
+  CRITICAL_SWP(LCK_ID)                                                         \
+  }
 
-#define ATOMIC_XCHG_SWP(TYPE_ID,TYPE,BITS,GOMP_FLAG)                      \
-ATOMIC_BEGIN_SWP(TYPE_ID,TYPE)                                            \
-    TYPE old_value;                                                       \
-    GOMP_CRITICAL_SWP(GOMP_FLAG)                                          \
-    old_value = KMP_XCHG_FIXED##BITS( lhs, rhs );                         \
-    return old_value;                                                     \
-}
 // ------------------------------------------------------------------------
-#define ATOMIC_XCHG_FLOAT_SWP(TYPE_ID,TYPE,BITS,GOMP_FLAG)                \
-ATOMIC_BEGIN_SWP(TYPE_ID,TYPE)                                            \
-    TYPE old_value;                                                       \
-    GOMP_CRITICAL_SWP(GOMP_FLAG)                                          \
-    old_value = KMP_XCHG_REAL##BITS( lhs, rhs );                          \
-    return old_value;                                                     \
-}
-
-// ------------------------------------------------------------------------
-#define CMPXCHG_SWP(TYPE,BITS)                                            \
-    {                                                                     \
-        TYPE KMP_ATOMIC_VOLATILE temp_val;                                \
-        TYPE old_value, new_value;                                        \
-        temp_val = *lhs;                                                  \
-        old_value = temp_val;                                             \
-        new_value = rhs;                                                  \
-        while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
-                      *VOLATILE_CAST(kmp_int##BITS *) &old_value,         \
-                      *VOLATILE_CAST(kmp_int##BITS *) &new_value ) )      \
-        {                                                                 \
-            KMP_CPU_PAUSE();                                              \
-                                                                          \
-            temp_val = *lhs;                                              \
-            old_value = temp_val;                                         \
-            new_value = rhs;                                              \
-        }                                                                 \
-        return old_value;                                                 \
-    }
-
-// -------------------------------------------------------------------------
-#define ATOMIC_CMPXCHG_SWP(TYPE_ID,TYPE,BITS,GOMP_FLAG)                   \
-ATOMIC_BEGIN_SWP(TYPE_ID,TYPE)                                            \
-    TYPE old_value;                                                       \
-    GOMP_CRITICAL_SWP(GOMP_FLAG)                                          \
-    CMPXCHG_SWP(TYPE,BITS)                                                \
-}
-
-ATOMIC_XCHG_SWP( fixed1, kmp_int8,    8, KMP_ARCH_X86 )  // __kmpc_atomic_fixed1_swp
-ATOMIC_XCHG_SWP( fixed2, kmp_int16,  16, KMP_ARCH_X86 )  // __kmpc_atomic_fixed2_swp
-ATOMIC_XCHG_SWP( fixed4, kmp_int32,  32, KMP_ARCH_X86 )  // __kmpc_atomic_fixed4_swp
-
-ATOMIC_XCHG_FLOAT_SWP( float4, kmp_real32, 32, KMP_ARCH_X86 )      // __kmpc_atomic_float4_swp
-
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CMPXCHG_SWP( fixed8, kmp_int64, 64, KMP_ARCH_X86 )      // __kmpc_atomic_fixed8_swp
-    ATOMIC_CMPXCHG_SWP( float8, kmp_real64, 64, KMP_ARCH_X86 )     // __kmpc_atomic_float8_swp
-#else
-    ATOMIC_XCHG_SWP(       fixed8, kmp_int64, 64, KMP_ARCH_X86 )   // __kmpc_atomic_fixed8_swp
-    ATOMIC_XCHG_FLOAT_SWP( float8, kmp_real64, 64, KMP_ARCH_X86 )  // __kmpc_atomic_float8_swp
-#endif
-
-// ------------------------------------------------------------------------
-// Routines for Extended types: long double, _Quad, complex flavours (use critical section)
-#define ATOMIC_CRITICAL_SWP(TYPE_ID,TYPE,LCK_ID,GOMP_FLAG)              \
-ATOMIC_BEGIN_SWP(TYPE_ID,TYPE)                                          \
-    TYPE old_value;                                                     \
-    GOMP_CRITICAL_SWP(GOMP_FLAG)                                        \
-    CRITICAL_SWP(LCK_ID)                                                \
-}
-
-// ------------------------------------------------------------------------
-
 // !!! TODO: check if we need to return void for cmplx4 routines
 // Workaround for cmplx4. Regular routines with return value don't work
 // on Win_32e. Let's return captured values through the additional parameter.
 
-#define ATOMIC_BEGIN_SWP_WRK(TYPE_ID,TYPE)                                                \
-void __kmpc_atomic_##TYPE_ID##_swp( ident_t *id_ref, int gtid, TYPE * lhs, TYPE rhs, TYPE * out )     \
-{                                                                                         \
-    KMP_DEBUG_ASSERT( __kmp_init_serial );                                                \
-    KA_TRACE(100,("__kmpc_atomic_" #TYPE_ID "_swp: T#%d\n", gtid ));
-
-
-#define CRITICAL_SWP_WRK(LCK_ID)                                          \
-    __kmp_acquire_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-                                                                          \
-    tmp = (*lhs);                                                         \
-    (*lhs) = (rhs);                                                       \
-    (*out) = tmp;                                                         \
-    __kmp_release_atomic_lock( & ATOMIC_LOCK##LCK_ID, gtid );             \
-    return;
-
+#define ATOMIC_BEGIN_SWP_WRK(TYPE_ID, TYPE)                                    \
+  void __kmpc_atomic_##TYPE_ID##_swp(ident_t *id_ref, int gtid, TYPE *lhs,     \
+                                     TYPE rhs, TYPE *out) {                    \
+    KMP_DEBUG_ASSERT(__kmp_init_serial);                                       \
+    KA_TRACE(100, ("__kmpc_atomic_" #TYPE_ID "_swp: T#%d\n", gtid));
+
+#define CRITICAL_SWP_WRK(LCK_ID)                                               \
+  __kmp_acquire_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+                                                                               \
+  tmp = (*lhs);                                                                \
+  (*lhs) = (rhs);                                                              \
+  (*out) = tmp;                                                                \
+  __kmp_release_atomic_lock(&ATOMIC_LOCK##LCK_ID, gtid);                       \
+  return;
 // ------------------------------------------------------------------------
 
 #ifdef KMP_GOMP_COMPAT
-#define GOMP_CRITICAL_SWP_WRK(FLAG)                                       \
-    if ( (FLAG) && (__kmp_atomic_mode == 2) ) {                           \
-        KMP_CHECK_GTID;                                                   \
-        CRITICAL_SWP_WRK( 0 );                                            \
-    }
+#define GOMP_CRITICAL_SWP_WRK(FLAG)                                            \
+  if ((FLAG) && (__kmp_atomic_mode == 2)) {                                    \
+    KMP_CHECK_GTID;                                                            \
+    CRITICAL_SWP_WRK(0);                                                       \
+  }
 #else
 #define GOMP_CRITICAL_SWP_WRK(FLAG)
 #endif /* KMP_GOMP_COMPAT */
 // ------------------------------------------------------------------------
 
-#define ATOMIC_CRITICAL_SWP_WRK(TYPE_ID, TYPE,LCK_ID,GOMP_FLAG)           \
-ATOMIC_BEGIN_SWP_WRK(TYPE_ID,TYPE)                                        \
-    TYPE tmp;                                                             \
-    GOMP_CRITICAL_SWP_WRK(GOMP_FLAG)                                      \
-    CRITICAL_SWP_WRK(LCK_ID)                                              \
-}
+#define ATOMIC_CRITICAL_SWP_WRK(TYPE_ID, TYPE, LCK_ID, GOMP_FLAG)              \
+  ATOMIC_BEGIN_SWP_WRK(TYPE_ID, TYPE)                                          \
+  TYPE tmp;                                                                    \
+  GOMP_CRITICAL_SWP_WRK(GOMP_FLAG)                                             \
+  CRITICAL_SWP_WRK(LCK_ID)                                                     \
+  }
 // The end of workaround for cmplx4
 
-
-ATOMIC_CRITICAL_SWP( float10, long double, 10r,   1 )              // __kmpc_atomic_float10_swp
+ATOMIC_CRITICAL_SWP(float10, long double, 10r, 1) // __kmpc_atomic_float10_swp
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_SWP( float16, QUAD_LEGACY, 16r,   1 )              // __kmpc_atomic_float16_swp
+ATOMIC_CRITICAL_SWP(float16, QUAD_LEGACY, 16r, 1) // __kmpc_atomic_float16_swp
 #endif
 // cmplx4 routine to return void
-ATOMIC_CRITICAL_SWP_WRK( cmplx4, kmp_cmplx32,  8c,   1 )           // __kmpc_atomic_cmplx4_swp
-
-//ATOMIC_CRITICAL_SWP( cmplx4, kmp_cmplx32,  8c,   1 )           // __kmpc_atomic_cmplx4_swp
+ATOMIC_CRITICAL_SWP_WRK(cmplx4, kmp_cmplx32, 8c, 1) // __kmpc_atomic_cmplx4_swp
 
+// ATOMIC_CRITICAL_SWP( cmplx4, kmp_cmplx32,  8c,   1 )           //
+// __kmpc_atomic_cmplx4_swp
 
-ATOMIC_CRITICAL_SWP( cmplx8,  kmp_cmplx64, 16c,   1 )              // __kmpc_atomic_cmplx8_swp
-ATOMIC_CRITICAL_SWP( cmplx10, kmp_cmplx80, 20c,   1 )              // __kmpc_atomic_cmplx10_swp
+ATOMIC_CRITICAL_SWP(cmplx8, kmp_cmplx64, 16c, 1) // __kmpc_atomic_cmplx8_swp
+ATOMIC_CRITICAL_SWP(cmplx10, kmp_cmplx80, 20c, 1) // __kmpc_atomic_cmplx10_swp
 #if KMP_HAVE_QUAD
-ATOMIC_CRITICAL_SWP( cmplx16, CPLX128_LEG, 32c,   1 )              // __kmpc_atomic_cmplx16_swp
-#if ( KMP_ARCH_X86 )
-    ATOMIC_CRITICAL_SWP( float16_a16, Quad_a16_t,         16r, 1 )  // __kmpc_atomic_float16_a16_swp
-    ATOMIC_CRITICAL_SWP( cmplx16_a16, kmp_cmplx128_a16_t, 32c, 1 )  // __kmpc_atomic_cmplx16_a16_swp
+ATOMIC_CRITICAL_SWP(cmplx16, CPLX128_LEG, 32c, 1) // __kmpc_atomic_cmplx16_swp
+#if (KMP_ARCH_X86)
+ATOMIC_CRITICAL_SWP(float16_a16, Quad_a16_t, 16r,
+                    1) // __kmpc_atomic_float16_a16_swp
+ATOMIC_CRITICAL_SWP(cmplx16_a16, kmp_cmplx128_a16_t, 32c,
+                    1) // __kmpc_atomic_cmplx16_a16_swp
 #endif
 #endif
 
-
 // End of OpenMP 4.0 Capture
 
-#endif //OMP_40_ENABLED
-
-#endif //KMP_ARCH_X86 || KMP_ARCH_X86_64
+#endif // OMP_40_ENABLED
 
+#endif // KMP_ARCH_X86 || KMP_ARCH_X86_64
 
 #undef OP_CRITICAL
 
 /* ------------------------------------------------------------------------ */
 /* Generic atomic routines                                                  */
-/* ------------------------------------------------------------------------ */
 
-void
-__kmpc_atomic_1( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+void __kmpc_atomic_1(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
 
-    if (
+  if (
 #if KMP_ARCH_X86 && defined(KMP_GOMP_COMPAT)
-        FALSE                                   /* must use lock */
+      FALSE /* must use lock */
 #else
-        TRUE
+      TRUE
 #endif
-	)
-    {
-	kmp_int8 old_value, new_value;
-
-	old_value = *(kmp_int8 *) lhs;
-	(*f)( &new_value, &old_value, rhs );
+      ) {
+    kmp_int8 old_value, new_value;
 
-	/* TODO: Should this be acquire or release? */
-	while ( !  KMP_COMPARE_AND_STORE_ACQ8 ( (kmp_int8 *) lhs,
-		    		*(kmp_int8 *) &old_value, *(kmp_int8 *) &new_value ) )
-	{
-	    KMP_CPU_PAUSE();
+    old_value = *(kmp_int8 *)lhs;
+    (*f)(&new_value, &old_value, rhs);
 
-	    old_value = *(kmp_int8 *) lhs;
-	    (*f)( &new_value, &old_value, rhs );
-	}
+    /* TODO: Should this be acquire or release? */
+    while (!KMP_COMPARE_AND_STORE_ACQ8((kmp_int8 *)lhs, *(kmp_int8 *)&old_value,
+                                       *(kmp_int8 *)&new_value)) {
+      KMP_CPU_PAUSE();
 
-	return;
+      old_value = *(kmp_int8 *)lhs;
+      (*f)(&new_value, &old_value, rhs);
     }
-    else {
-        //
-        // All 1-byte data is of integer data type.
-        //
+
+    return;
+  } else {
+// All 1-byte data is of integer data type.
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_acquire_atomic_lock( & __kmp_atomic_lock_1i, gtid );
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock_1i, gtid);
 
-	(*f)( lhs, lhs, rhs );
+    (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_release_atomic_lock( & __kmp_atomic_lock_1i, gtid );
-    }
+      __kmp_release_atomic_lock(&__kmp_atomic_lock_1i, gtid);
+  }
 }
 
-void
-__kmpc_atomic_2( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    if (
+void __kmpc_atomic_2(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *)) {
+  if (
 #if KMP_ARCH_X86 && defined(KMP_GOMP_COMPAT)
-        FALSE                                   /* must use lock */
+      FALSE /* must use lock */
 #elif KMP_ARCH_X86 || KMP_ARCH_X86_64
-	TRUE					/* no alignment problems */
+      TRUE /* no alignment problems */
 #else
-	! ( (kmp_uintptr_t) lhs & 0x1)		/* make sure address is 2-byte aligned */
+      !((kmp_uintptr_t)lhs & 0x1) /* make sure address is 2-byte aligned */
 #endif
-	)
-    {
-	kmp_int16 old_value, new_value;
-
-	old_value = *(kmp_int16 *) lhs;
-	(*f)( &new_value, &old_value, rhs );
+      ) {
+    kmp_int16 old_value, new_value;
 
-	/* TODO: Should this be acquire or release? */
-	while ( !  KMP_COMPARE_AND_STORE_ACQ16 ( (kmp_int16 *) lhs,
-		    		*(kmp_int16 *) &old_value, *(kmp_int16 *) &new_value ) )
-	{
-	    KMP_CPU_PAUSE();
+    old_value = *(kmp_int16 *)lhs;
+    (*f)(&new_value, &old_value, rhs);
 
-	    old_value = *(kmp_int16 *) lhs;
-	    (*f)( &new_value, &old_value, rhs );
-	}
+    /* TODO: Should this be acquire or release? */
+    while (!KMP_COMPARE_AND_STORE_ACQ16(
+        (kmp_int16 *)lhs, *(kmp_int16 *)&old_value, *(kmp_int16 *)&new_value)) {
+      KMP_CPU_PAUSE();
 
-	return;
+      old_value = *(kmp_int16 *)lhs;
+      (*f)(&new_value, &old_value, rhs);
     }
-    else {
-        //
-        // All 2-byte data is of integer data type.
-        //
+
+    return;
+  } else {
+// All 2-byte data is of integer data type.
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_acquire_atomic_lock( & __kmp_atomic_lock_2i, gtid );
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock_2i, gtid);
 
-	(*f)( lhs, lhs, rhs );
+    (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_release_atomic_lock( & __kmp_atomic_lock_2i, gtid );
-    }
+      __kmp_release_atomic_lock(&__kmp_atomic_lock_2i, gtid);
+  }
 }
 
-void
-__kmpc_atomic_4( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
-
-    if (
-        //
-        // FIXME: On IA-32 architecture, gcc uses cmpxchg only for 4-byte ints.
-        // Gomp compatibility is broken if this routine is called for floats.
-        //
+void __kmpc_atomic_4(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+
+  if (
+// FIXME: On IA-32 architecture, gcc uses cmpxchg only for 4-byte ints.
+// Gomp compatibility is broken if this routine is called for floats.
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
-	TRUE					/* no alignment problems */
+      TRUE /* no alignment problems */
 #else
-	! ( (kmp_uintptr_t) lhs & 0x3)		/* make sure address is 4-byte aligned */
+      !((kmp_uintptr_t)lhs & 0x3) /* make sure address is 4-byte aligned */
 #endif
-	)
-    {
-	kmp_int32 old_value, new_value;
-
-	old_value = *(kmp_int32 *) lhs;
-	(*f)( &new_value, &old_value, rhs );
+      ) {
+    kmp_int32 old_value, new_value;
 
-	/* TODO: Should this be acquire or release? */
-	while ( !  KMP_COMPARE_AND_STORE_ACQ32 ( (kmp_int32 *) lhs,
-		    		*(kmp_int32 *) &old_value, *(kmp_int32 *) &new_value ) )
-	{
-	    KMP_CPU_PAUSE();
+    old_value = *(kmp_int32 *)lhs;
+    (*f)(&new_value, &old_value, rhs);
 
-	    old_value = *(kmp_int32 *) lhs;
-	    (*f)( &new_value, &old_value, rhs );
-	}
+    /* TODO: Should this be acquire or release? */
+    while (!KMP_COMPARE_AND_STORE_ACQ32(
+        (kmp_int32 *)lhs, *(kmp_int32 *)&old_value, *(kmp_int32 *)&new_value)) {
+      KMP_CPU_PAUSE();
 
-	return;
+      old_value = *(kmp_int32 *)lhs;
+      (*f)(&new_value, &old_value, rhs);
     }
-    else {
-        //
-        // Use __kmp_atomic_lock_4i for all 4-byte data,
-        // even if it isn't of integer data type.
-        //
+
+    return;
+  } else {
+// Use __kmp_atomic_lock_4i for all 4-byte data,
+// even if it isn't of integer data type.
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_acquire_atomic_lock( & __kmp_atomic_lock_4i, gtid );
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock_4i, gtid);
 
-	(*f)( lhs, lhs, rhs );
+    (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_release_atomic_lock( & __kmp_atomic_lock_4i, gtid );
-    }
+      __kmp_release_atomic_lock(&__kmp_atomic_lock_4i, gtid);
+  }
 }
 
-void
-__kmpc_atomic_8( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
-    if (
+void __kmpc_atomic_8(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+  if (
 
 #if KMP_ARCH_X86 && defined(KMP_GOMP_COMPAT)
-        FALSE                                   /* must use lock */
+      FALSE /* must use lock */
 #elif KMP_ARCH_X86 || KMP_ARCH_X86_64
-	TRUE					/* no alignment problems */
+      TRUE /* no alignment problems */
 #else
-	! ( (kmp_uintptr_t) lhs & 0x7)		/* make sure address is 8-byte aligned */
+      !((kmp_uintptr_t)lhs & 0x7) /* make sure address is 8-byte aligned */
 #endif
-	)
-    {
-	kmp_int64 old_value, new_value;
+      ) {
+    kmp_int64 old_value, new_value;
 
-	old_value = *(kmp_int64 *) lhs;
-	(*f)( &new_value, &old_value, rhs );
-	/* TODO: Should this be acquire or release? */
-	while ( !  KMP_COMPARE_AND_STORE_ACQ64 ( (kmp_int64 *) lhs,
-					       *(kmp_int64 *) &old_value,
-					       *(kmp_int64 *) &new_value ) )
-	{
-	    KMP_CPU_PAUSE();
+    old_value = *(kmp_int64 *)lhs;
+    (*f)(&new_value, &old_value, rhs);
+    /* TODO: Should this be acquire or release? */
+    while (!KMP_COMPARE_AND_STORE_ACQ64(
+        (kmp_int64 *)lhs, *(kmp_int64 *)&old_value, *(kmp_int64 *)&new_value)) {
+      KMP_CPU_PAUSE();
 
-	    old_value = *(kmp_int64 *) lhs;
-	    (*f)( &new_value, &old_value, rhs );
-	}
+      old_value = *(kmp_int64 *)lhs;
+      (*f)(&new_value, &old_value, rhs);
+    }
 
-	return;
-    } else {
-        //
-        // Use __kmp_atomic_lock_8i for all 8-byte data,
-        // even if it isn't of integer data type.
-        //
+    return;
+  } else {
+// Use __kmp_atomic_lock_8i for all 8-byte data,
+// even if it isn't of integer data type.
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_acquire_atomic_lock( & __kmp_atomic_lock_8i, gtid );
+      __kmp_acquire_atomic_lock(&__kmp_atomic_lock_8i, gtid);
 
-	(*f)( lhs, lhs, rhs );
+    (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-        if ( __kmp_atomic_mode == 2 ) {
-            __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-        }
-        else
+    if (__kmp_atomic_mode == 2) {
+      __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+    } else
 #endif /* KMP_GOMP_COMPAT */
-	__kmp_release_atomic_lock( & __kmp_atomic_lock_8i, gtid );
-    }
+      __kmp_release_atomic_lock(&__kmp_atomic_lock_8i, gtid);
+  }
 }
 
-void
-__kmpc_atomic_10( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+void __kmpc_atomic_10(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_acquire_atomic_lock( & __kmp_atomic_lock_10r, gtid );
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock_10r, gtid);
 
-    (*f)( lhs, lhs, rhs );
+  (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_release_atomic_lock( & __kmp_atomic_lock_10r, gtid );
+    __kmp_release_atomic_lock(&__kmp_atomic_lock_10r, gtid);
 }
 
-void
-__kmpc_atomic_16( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+void __kmpc_atomic_16(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_acquire_atomic_lock( & __kmp_atomic_lock_16c, gtid );
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock_16c, gtid);
 
-    (*f)( lhs, lhs, rhs );
+  (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_release_atomic_lock( & __kmp_atomic_lock_16c, gtid );
+    __kmp_release_atomic_lock(&__kmp_atomic_lock_16c, gtid);
 }
 
-void
-__kmpc_atomic_20( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+void __kmpc_atomic_20(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_acquire_atomic_lock( & __kmp_atomic_lock_20c, gtid );
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock_20c, gtid);
 
-    (*f)( lhs, lhs, rhs );
+  (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_release_atomic_lock( & __kmp_atomic_lock_20c, gtid );
+    __kmp_release_atomic_lock(&__kmp_atomic_lock_20c, gtid);
 }
 
-void
-__kmpc_atomic_32( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) )
-{
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+void __kmpc_atomic_32(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *)) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_acquire_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_acquire_atomic_lock( & __kmp_atomic_lock_32c, gtid );
+    __kmp_acquire_atomic_lock(&__kmp_atomic_lock_32c, gtid);
 
-    (*f)( lhs, lhs, rhs );
+  (*f)(lhs, lhs, rhs);
 
 #ifdef KMP_GOMP_COMPAT
-    if ( __kmp_atomic_mode == 2 ) {
-        __kmp_release_atomic_lock( & __kmp_atomic_lock, gtid );
-    }
-    else
+  if (__kmp_atomic_mode == 2) {
+    __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+  } else
 #endif /* KMP_GOMP_COMPAT */
-    __kmp_release_atomic_lock( & __kmp_atomic_lock_32c, gtid );
+    __kmp_release_atomic_lock(&__kmp_atomic_lock_32c, gtid);
 }
 
-// AC: same two routines as GOMP_atomic_start/end, but will be called by our compiler
-//     duplicated in order to not use 3-party names in pure Intel code
+// AC: same two routines as GOMP_atomic_start/end, but will be called by our
+// compiler; duplicated in order to not use 3-party names in pure Intel code
 // TODO: consider adding GTID parameter after consultation with Ernesto/Xinmin.
-void
-__kmpc_atomic_start(void)
-{
-    int gtid = __kmp_entry_gtid();
-    KA_TRACE(20, ("__kmpc_atomic_start: T#%d\n", gtid));
-    __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
+void __kmpc_atomic_start(void) {
+  int gtid = __kmp_entry_gtid();
+  KA_TRACE(20, ("__kmpc_atomic_start: T#%d\n", gtid));
+  __kmp_acquire_atomic_lock(&__kmp_atomic_lock, gtid);
 }
 
-
-void
-__kmpc_atomic_end(void)
-{
-    int gtid = __kmp_get_gtid();
-    KA_TRACE(20, ("__kmpc_atomic_end: T#%d\n", gtid));
-    __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
+void __kmpc_atomic_end(void) {
+  int gtid = __kmp_get_gtid();
+  KA_TRACE(20, ("__kmpc_atomic_end: T#%d\n", gtid));
+  __kmp_release_atomic_lock(&__kmp_atomic_lock, gtid);
 }
 
-/* ------------------------------------------------------------------------ */
-/* ------------------------------------------------------------------------ */
 /*!
 @}
 */

Modified: openmp/trunk/runtime/src/kmp_atomic.h
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_atomic.h?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_atomic.h (original)
+++ openmp/trunk/runtime/src/kmp_atomic.h Fri May 12 13:01:32 2017
@@ -16,8 +16,8 @@
 #ifndef KMP_ATOMIC_H
 #define KMP_ATOMIC_H
 
-#include "kmp_os.h"
 #include "kmp_lock.h"
+#include "kmp_os.h"
 
 #if OMPT_SUPPORT
 #include "ompt-specific.h"
@@ -32,188 +32,181 @@
 //                  to use typedef'ed types on win.
 // Condition for WIN64 was modified in anticipation of 10.1 build compiler.
 
-#if defined( __cplusplus ) && ( KMP_OS_WINDOWS )
-    // create shortcuts for c99 complex types
+#if defined(__cplusplus) && (KMP_OS_WINDOWS)
+// create shortcuts for c99 complex types
+
+// Visual Studio cannot have function parameters that have the
+// align __declspec attribute, so we must remove it. (Compiler Error C2719)
+#if KMP_COMPILER_MSVC
+#undef KMP_DO_ALIGN
+#define KMP_DO_ALIGN(alignment) /* Nothing */
+#endif
+
+#if (_MSC_VER < 1600) && defined(_DEBUG)
+// Workaround for the problem of _DebugHeapTag unresolved external.
+// This problem prevented to use our static debug library for C tests
+// compiled with /MDd option (the library itself built with /MTd),
+#undef _DEBUG
+#define _DEBUG_TEMPORARILY_UNSET_
+#endif
+
+#include <complex>
+
+template <typename type_lhs, typename type_rhs>
+std::complex<type_lhs> __kmp_lhs_div_rhs(const std::complex<type_lhs> &lhs,
+                                         const std::complex<type_rhs> &rhs) {
+  type_lhs a = lhs.real();
+  type_lhs b = lhs.imag();
+  type_rhs c = rhs.real();
+  type_rhs d = rhs.imag();
+  type_rhs den = c * c + d * d;
+  type_rhs r = (a * c + b * d);
+  type_rhs i = (b * c - a * d);
+  std::complex<type_lhs> ret(r / den, i / den);
+  return ret;
+}
+
+// complex8
+struct __kmp_cmplx64_t : std::complex<double> {
+
+  __kmp_cmplx64_t() : std::complex<double>() {}
+
+  __kmp_cmplx64_t(const std::complex<double> &cd) : std::complex<double>(cd) {}
+
+  void operator/=(const __kmp_cmplx64_t &rhs) {
+    std::complex<double> lhs = *this;
+    *this = __kmp_lhs_div_rhs(lhs, rhs);
+  }
+
+  __kmp_cmplx64_t operator/(const __kmp_cmplx64_t &rhs) {
+    std::complex<double> lhs = *this;
+    return __kmp_lhs_div_rhs(lhs, rhs);
+  }
+};
+typedef struct __kmp_cmplx64_t kmp_cmplx64;
+
+// complex4
+struct __kmp_cmplx32_t : std::complex<float> {
+
+  __kmp_cmplx32_t() : std::complex<float>() {}
+
+  __kmp_cmplx32_t(const std::complex<float> &cf) : std::complex<float>(cf) {}
+
+  __kmp_cmplx32_t operator+(const __kmp_cmplx32_t &b) {
+    std::complex<float> lhs = *this;
+    std::complex<float> rhs = b;
+    return (lhs + rhs);
+  }
+  __kmp_cmplx32_t operator-(const __kmp_cmplx32_t &b) {
+    std::complex<float> lhs = *this;
+    std::complex<float> rhs = b;
+    return (lhs - rhs);
+  }
+  __kmp_cmplx32_t operator*(const __kmp_cmplx32_t &b) {
+    std::complex<float> lhs = *this;
+    std::complex<float> rhs = b;
+    return (lhs * rhs);
+  }
+
+  __kmp_cmplx32_t operator+(const kmp_cmplx64 &b) {
+    kmp_cmplx64 t = kmp_cmplx64(*this) + b;
+    std::complex<double> d(t);
+    std::complex<float> f(d);
+    __kmp_cmplx32_t r(f);
+    return r;
+  }
+  __kmp_cmplx32_t operator-(const kmp_cmplx64 &b) {
+    kmp_cmplx64 t = kmp_cmplx64(*this) - b;
+    std::complex<double> d(t);
+    std::complex<float> f(d);
+    __kmp_cmplx32_t r(f);
+    return r;
+  }
+  __kmp_cmplx32_t operator*(const kmp_cmplx64 &b) {
+    kmp_cmplx64 t = kmp_cmplx64(*this) * b;
+    std::complex<double> d(t);
+    std::complex<float> f(d);
+    __kmp_cmplx32_t r(f);
+    return r;
+  }
+
+  void operator/=(const __kmp_cmplx32_t &rhs) {
+    std::complex<float> lhs = *this;
+    *this = __kmp_lhs_div_rhs(lhs, rhs);
+  }
+
+  __kmp_cmplx32_t operator/(const __kmp_cmplx32_t &rhs) {
+    std::complex<float> lhs = *this;
+    return __kmp_lhs_div_rhs(lhs, rhs);
+  }
+
+  void operator/=(const kmp_cmplx64 &rhs) {
+    std::complex<float> lhs = *this;
+    *this = __kmp_lhs_div_rhs(lhs, rhs);
+  }
+
+  __kmp_cmplx32_t operator/(const kmp_cmplx64 &rhs) {
+    std::complex<float> lhs = *this;
+    return __kmp_lhs_div_rhs(lhs, rhs);
+  }
+};
+typedef struct __kmp_cmplx32_t kmp_cmplx32;
+
+// complex10
+struct KMP_DO_ALIGN(16) __kmp_cmplx80_t : std::complex<long double> {
+
+  __kmp_cmplx80_t() : std::complex<long double>() {}
+
+  __kmp_cmplx80_t(const std::complex<long double> &cld)
+      : std::complex<long double>(cld) {}
+
+  void operator/=(const __kmp_cmplx80_t &rhs) {
+    std::complex<long double> lhs = *this;
+    *this = __kmp_lhs_div_rhs(lhs, rhs);
+  }
+
+  __kmp_cmplx80_t operator/(const __kmp_cmplx80_t &rhs) {
+    std::complex<long double> lhs = *this;
+    return __kmp_lhs_div_rhs(lhs, rhs);
+  }
+};
+typedef KMP_DO_ALIGN(16) struct __kmp_cmplx80_t kmp_cmplx80;
+
+// complex16
+#if KMP_HAVE_QUAD
+struct __kmp_cmplx128_t : std::complex<_Quad> {
+
+  __kmp_cmplx128_t() : std::complex<_Quad>() {}
+
+  __kmp_cmplx128_t(const std::complex<_Quad> &cq) : std::complex<_Quad>(cq) {}
+
+  void operator/=(const __kmp_cmplx128_t &rhs) {
+    std::complex<_Quad> lhs = *this;
+    *this = __kmp_lhs_div_rhs(lhs, rhs);
+  }
 
-    // Visual Studio cannot have function parameters that have the
-    // align __declspec attribute, so we must remove it. (Compiler Error C2719)
-    #if KMP_COMPILER_MSVC
-    # undef KMP_DO_ALIGN
-    # define KMP_DO_ALIGN(alignment) /* Nothing */
-    #endif
-
-    #if (_MSC_VER < 1600) && defined(_DEBUG)
-        // Workaround for the problem of _DebugHeapTag unresolved external.
-        // This problem prevented to use our static debug library for C tests
-        // compiled with /MDd option (the library itself built with /MTd),
-        #undef _DEBUG
-        #define _DEBUG_TEMPORARILY_UNSET_
-    #endif
-
-    #include <complex>
-
-    template< typename type_lhs, typename type_rhs >
-    std::complex< type_lhs > __kmp_lhs_div_rhs(
-                const std::complex< type_lhs >& lhs,
-                const std::complex< type_rhs >& rhs ) {
-    type_lhs a = lhs.real();
-    type_lhs b = lhs.imag();
-    type_rhs c = rhs.real();
-    type_rhs d = rhs.imag();
-    type_rhs den = c*c + d*d;
-    type_rhs r = ( a*c + b*d );
-    type_rhs i = ( b*c - a*d );
-    std::complex< type_lhs > ret( r/den, i/den );
-    return ret;
-    }
-
-    // complex8
-    struct __kmp_cmplx64_t : std::complex< double > {
-
-    __kmp_cmplx64_t() : std::complex< double > () {}
-
-    __kmp_cmplx64_t( const std::complex< double >& cd )
-                : std::complex< double > ( cd ) {}
-
-    void operator /= ( const __kmp_cmplx64_t& rhs ) {
-        std::complex< double > lhs = *this;
-        *this = __kmp_lhs_div_rhs( lhs, rhs );
-    }
-
-    __kmp_cmplx64_t operator / ( const __kmp_cmplx64_t& rhs ) {
-        std::complex< double > lhs = *this;
-        return __kmp_lhs_div_rhs( lhs, rhs );
-    }
-
-    };
-    typedef struct __kmp_cmplx64_t kmp_cmplx64;
-
-    // complex4
-    struct __kmp_cmplx32_t : std::complex< float > {
-
-    __kmp_cmplx32_t() : std::complex< float > () {}
-
-    __kmp_cmplx32_t( const std::complex<float>& cf )
-                : std::complex< float > ( cf ) {}
-
-    __kmp_cmplx32_t operator + ( const __kmp_cmplx32_t& b ) {
-        std::complex< float > lhs = *this;
-        std::complex< float > rhs = b;
-        return ( lhs + rhs );
-    }
-    __kmp_cmplx32_t operator - ( const __kmp_cmplx32_t& b ) {
-        std::complex< float > lhs = *this;
-        std::complex< float > rhs = b;
-        return ( lhs - rhs );
-    }
-    __kmp_cmplx32_t operator * ( const __kmp_cmplx32_t& b ) {
-        std::complex< float > lhs = *this;
-        std::complex< float > rhs = b;
-        return ( lhs * rhs );
-    }
-
-    __kmp_cmplx32_t operator + ( const kmp_cmplx64& b ) {
-        kmp_cmplx64 t = kmp_cmplx64( *this ) + b;
-        std::complex< double > d( t );
-        std::complex< float > f( d );
-        __kmp_cmplx32_t r( f );
-        return r;
-    }
-    __kmp_cmplx32_t operator - ( const kmp_cmplx64& b ) {
-        kmp_cmplx64 t = kmp_cmplx64( *this ) - b;
-        std::complex< double > d( t );
-        std::complex< float > f( d );
-        __kmp_cmplx32_t r( f );
-        return r;
-    }
-    __kmp_cmplx32_t operator * ( const kmp_cmplx64& b ) {
-        kmp_cmplx64 t = kmp_cmplx64( *this ) * b;
-        std::complex< double > d( t );
-        std::complex< float > f( d );
-        __kmp_cmplx32_t r( f );
-        return r;
-    }
-
-    void operator /= ( const __kmp_cmplx32_t& rhs ) {
-        std::complex< float > lhs = *this;
-        *this = __kmp_lhs_div_rhs( lhs, rhs );
-    }
-
-    __kmp_cmplx32_t operator / ( const __kmp_cmplx32_t& rhs ) {
-        std::complex< float > lhs = *this;
-        return __kmp_lhs_div_rhs( lhs, rhs );
-    }
-
-    void operator /= ( const kmp_cmplx64& rhs ) {
-        std::complex< float > lhs = *this;
-        *this = __kmp_lhs_div_rhs( lhs, rhs );
-    }
-
-    __kmp_cmplx32_t operator / ( const kmp_cmplx64& rhs ) {
-        std::complex< float > lhs = *this;
-        return __kmp_lhs_div_rhs( lhs, rhs );
-    }
-    };
-    typedef struct __kmp_cmplx32_t kmp_cmplx32;
-
-    // complex10
-    struct KMP_DO_ALIGN( 16 )  __kmp_cmplx80_t : std::complex< long double > {
-
-            __kmp_cmplx80_t() : std::complex< long double > () {}
-
-            __kmp_cmplx80_t( const std::complex< long double >& cld )
-                : std::complex< long double > ( cld ) {}
-
-        void operator /= ( const __kmp_cmplx80_t& rhs ) {
-        std::complex< long double > lhs = *this;
-        *this = __kmp_lhs_div_rhs( lhs, rhs );
-        }
-
-        __kmp_cmplx80_t operator / ( const __kmp_cmplx80_t& rhs ) {
-        std::complex< long double > lhs = *this;
-        return __kmp_lhs_div_rhs( lhs, rhs );
-        }
-
-    };
-    typedef KMP_DO_ALIGN( 16 )  struct __kmp_cmplx80_t kmp_cmplx80;
-
-    // complex16
-    #if KMP_HAVE_QUAD
-    struct __kmp_cmplx128_t : std::complex< _Quad > {
-
-            __kmp_cmplx128_t() : std::complex< _Quad > () {}
-
-            __kmp_cmplx128_t( const std::complex< _Quad >& cq )
-                : std::complex< _Quad > ( cq ) {}
-
-        void operator /= ( const __kmp_cmplx128_t& rhs ) {
-        std::complex< _Quad > lhs = *this;
-        *this = __kmp_lhs_div_rhs( lhs, rhs );
-        }
-
-        __kmp_cmplx128_t operator / ( const __kmp_cmplx128_t& rhs ) {
-        std::complex< _Quad > lhs = *this;
-        return __kmp_lhs_div_rhs( lhs, rhs );
-        }
-
-    };
-    typedef struct __kmp_cmplx128_t kmp_cmplx128;
-    #endif /* KMP_HAVE_QUAD */
-
-    #ifdef _DEBUG_TEMPORARILY_UNSET_
-        #undef _DEBUG_TEMPORARILY_UNSET_
-        // Set it back now
-        #define _DEBUG 1
-    #endif
+  __kmp_cmplx128_t operator/(const __kmp_cmplx128_t &rhs) {
+    std::complex<_Quad> lhs = *this;
+    return __kmp_lhs_div_rhs(lhs, rhs);
+  }
+};
+typedef struct __kmp_cmplx128_t kmp_cmplx128;
+#endif /* KMP_HAVE_QUAD */
+
+#ifdef _DEBUG_TEMPORARILY_UNSET_
+#undef _DEBUG_TEMPORARILY_UNSET_
+// Set it back now
+#define _DEBUG 1
+#endif
 
 #else
-    // create shortcuts for c99 complex types
-    typedef float _Complex       kmp_cmplx32;
-    typedef double _Complex      kmp_cmplx64;
-    typedef long double _Complex kmp_cmplx80;
-    #if KMP_HAVE_QUAD
-    typedef _Quad _Complex       kmp_cmplx128;
-    #endif
+// create shortcuts for c99 complex types
+typedef float _Complex kmp_cmplx32;
+typedef double _Complex kmp_cmplx64;
+typedef long double _Complex kmp_cmplx80;
+#if KMP_HAVE_QUAD
+typedef _Quad _Complex kmp_cmplx128;
+#endif
 #endif
 
 // Compiler 12.0 changed alignment of 16 and 32-byte arguments (like _Quad
@@ -222,377 +215,477 @@
 // introduce the new alignment in 12.0. See CQ88405.
 #if KMP_ARCH_X86 && KMP_HAVE_QUAD
 
-    // 4-byte aligned structures for backward compatibility.
+// 4-byte aligned structures for backward compatibility.
 
-    #pragma pack( push, 4 )
+#pragma pack(push, 4)
 
+struct KMP_DO_ALIGN(4) Quad_a4_t {
+  _Quad q;
 
-    struct KMP_DO_ALIGN( 4 ) Quad_a4_t {
-        _Quad q;
+  Quad_a4_t() : q() {}
+  Quad_a4_t(const _Quad &cq) : q(cq) {}
 
-        Quad_a4_t(  ) : q(  ) {}
-        Quad_a4_t( const _Quad & cq ) : q ( cq ) {}
-
-        Quad_a4_t operator + ( const Quad_a4_t& b ) {
-        _Quad lhs = (*this).q;
-        _Quad rhs = b.q;
-        return (Quad_a4_t)( lhs + rhs );
-    }
-
-    Quad_a4_t operator - ( const Quad_a4_t& b ) {
-        _Quad lhs = (*this).q;
-        _Quad rhs = b.q;
-        return (Quad_a4_t)( lhs - rhs );
-    }
-    Quad_a4_t operator * ( const Quad_a4_t& b ) {
-        _Quad lhs = (*this).q;
-        _Quad rhs = b.q;
-        return (Quad_a4_t)( lhs * rhs );
-    }
-
-    Quad_a4_t operator / ( const Quad_a4_t& b ) {
-        _Quad lhs = (*this).q;
-            _Quad rhs = b.q;
-        return (Quad_a4_t)( lhs / rhs );
-    }
-
-    };
-
-    struct KMP_DO_ALIGN( 4 ) kmp_cmplx128_a4_t {
-        kmp_cmplx128 q;
-
-    kmp_cmplx128_a4_t() : q () {}
-
-    kmp_cmplx128_a4_t( const kmp_cmplx128 & c128 ) : q ( c128 ) {}
-
-        kmp_cmplx128_a4_t operator + ( const kmp_cmplx128_a4_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a4_t)( lhs + rhs );
-    }
-        kmp_cmplx128_a4_t operator - ( const kmp_cmplx128_a4_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a4_t)( lhs - rhs );
-    }
-    kmp_cmplx128_a4_t operator * ( const kmp_cmplx128_a4_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a4_t)( lhs * rhs );
-    }
-
-    kmp_cmplx128_a4_t operator / ( const kmp_cmplx128_a4_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a4_t)( lhs / rhs );
-    }
-
-    };
-
-    #pragma pack( pop )
-
-    // New 16-byte aligned structures for 12.0 compiler.
-    struct KMP_DO_ALIGN( 16 ) Quad_a16_t {
-        _Quad q;
-
-        Quad_a16_t(  ) : q(  ) {}
-        Quad_a16_t( const _Quad & cq ) : q ( cq ) {}
-
-        Quad_a16_t operator + ( const Quad_a16_t& b ) {
-        _Quad lhs = (*this).q;
-        _Quad rhs = b.q;
-        return (Quad_a16_t)( lhs + rhs );
-    }
-
-    Quad_a16_t operator - ( const Quad_a16_t& b ) {
-        _Quad lhs = (*this).q;
-        _Quad rhs = b.q;
-        return (Quad_a16_t)( lhs - rhs );
-    }
-    Quad_a16_t operator * ( const Quad_a16_t& b ) {
-        _Quad lhs = (*this).q;
-        _Quad rhs = b.q;
-        return (Quad_a16_t)( lhs * rhs );
-    }
-
-    Quad_a16_t operator / ( const Quad_a16_t& b ) {
-        _Quad lhs = (*this).q;
-            _Quad rhs = b.q;
-        return (Quad_a16_t)( lhs / rhs );
-    }
-    };
-
-    struct KMP_DO_ALIGN( 16 ) kmp_cmplx128_a16_t {
-        kmp_cmplx128 q;
-
-    kmp_cmplx128_a16_t() : q () {}
-
-    kmp_cmplx128_a16_t( const kmp_cmplx128 & c128 ) : q ( c128 ) {}
-
-       kmp_cmplx128_a16_t operator + ( const kmp_cmplx128_a16_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a16_t)( lhs + rhs );
-    }
-       kmp_cmplx128_a16_t operator - ( const kmp_cmplx128_a16_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a16_t)( lhs - rhs );
-    }
-    kmp_cmplx128_a16_t operator * ( const kmp_cmplx128_a16_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a16_t)( lhs * rhs );
-    }
-
-    kmp_cmplx128_a16_t operator / ( const kmp_cmplx128_a16_t& b ) {
-        kmp_cmplx128 lhs = (*this).q;
-        kmp_cmplx128 rhs = b.q;
-        return (kmp_cmplx128_a16_t)( lhs / rhs );
-    }
-    };
-
-#endif
-
-#if ( KMP_ARCH_X86 )
-    #define QUAD_LEGACY Quad_a4_t
-    #define CPLX128_LEG kmp_cmplx128_a4_t
+  Quad_a4_t operator+(const Quad_a4_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a4_t)(lhs + rhs);
+  }
+
+  Quad_a4_t operator-(const Quad_a4_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a4_t)(lhs - rhs);
+  }
+  Quad_a4_t operator*(const Quad_a4_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a4_t)(lhs * rhs);
+  }
+
+  Quad_a4_t operator/(const Quad_a4_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a4_t)(lhs / rhs);
+  }
+};
+
+struct KMP_DO_ALIGN(4) kmp_cmplx128_a4_t {
+  kmp_cmplx128 q;
+
+  kmp_cmplx128_a4_t() : q() {}
+
+  kmp_cmplx128_a4_t(const kmp_cmplx128 &c128) : q(c128) {}
+
+  kmp_cmplx128_a4_t operator+(const kmp_cmplx128_a4_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a4_t)(lhs + rhs);
+  }
+  kmp_cmplx128_a4_t operator-(const kmp_cmplx128_a4_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a4_t)(lhs - rhs);
+  }
+  kmp_cmplx128_a4_t operator*(const kmp_cmplx128_a4_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a4_t)(lhs * rhs);
+  }
+
+  kmp_cmplx128_a4_t operator/(const kmp_cmplx128_a4_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a4_t)(lhs / rhs);
+  }
+};
+
+#pragma pack(pop)
+
+// New 16-byte aligned structures for 12.0 compiler.
+struct KMP_DO_ALIGN(16) Quad_a16_t {
+  _Quad q;
+
+  Quad_a16_t() : q() {}
+  Quad_a16_t(const _Quad &cq) : q(cq) {}
+
+  Quad_a16_t operator+(const Quad_a16_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a16_t)(lhs + rhs);
+  }
+
+  Quad_a16_t operator-(const Quad_a16_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a16_t)(lhs - rhs);
+  }
+  Quad_a16_t operator*(const Quad_a16_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a16_t)(lhs * rhs);
+  }
+
+  Quad_a16_t operator/(const Quad_a16_t &b) {
+    _Quad lhs = (*this).q;
+    _Quad rhs = b.q;
+    return (Quad_a16_t)(lhs / rhs);
+  }
+};
+
+struct KMP_DO_ALIGN(16) kmp_cmplx128_a16_t {
+  kmp_cmplx128 q;
+
+  kmp_cmplx128_a16_t() : q() {}
+
+  kmp_cmplx128_a16_t(const kmp_cmplx128 &c128) : q(c128) {}
+
+  kmp_cmplx128_a16_t operator+(const kmp_cmplx128_a16_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a16_t)(lhs + rhs);
+  }
+  kmp_cmplx128_a16_t operator-(const kmp_cmplx128_a16_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a16_t)(lhs - rhs);
+  }
+  kmp_cmplx128_a16_t operator*(const kmp_cmplx128_a16_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a16_t)(lhs * rhs);
+  }
+
+  kmp_cmplx128_a16_t operator/(const kmp_cmplx128_a16_t &b) {
+    kmp_cmplx128 lhs = (*this).q;
+    kmp_cmplx128 rhs = b.q;
+    return (kmp_cmplx128_a16_t)(lhs / rhs);
+  }
+};
+
+#endif
+
+#if (KMP_ARCH_X86)
+#define QUAD_LEGACY Quad_a4_t
+#define CPLX128_LEG kmp_cmplx128_a4_t
 #else
-    #define QUAD_LEGACY _Quad
-    #define CPLX128_LEG kmp_cmplx128
+#define QUAD_LEGACY _Quad
+#define CPLX128_LEG kmp_cmplx128
 #endif
 
 #ifdef __cplusplus
-    extern "C" {
+extern "C" {
 #endif
 
 extern int __kmp_atomic_mode;
 
-//
 // Atomic locks can easily become contended, so we use queuing locks for them.
-//
-
 typedef kmp_queuing_lock_t kmp_atomic_lock_t;
 
-static inline void
-__kmp_acquire_atomic_lock( kmp_atomic_lock_t *lck, kmp_int32 gtid )
-{
+static inline void __kmp_acquire_atomic_lock(kmp_atomic_lock_t *lck,
+                                             kmp_int32 gtid) {
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_wait_atomic)) {
-        ompt_callbacks.ompt_callback(ompt_event_wait_atomic)(
-            (ompt_wait_id_t) lck);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_wait_atomic)) {
+    ompt_callbacks.ompt_callback(ompt_event_wait_atomic)((ompt_wait_id_t)lck);
+  }
 #endif
 
-    __kmp_acquire_queuing_lock( lck, gtid );
+  __kmp_acquire_queuing_lock(lck, gtid);
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_acquired_atomic)) {
-        ompt_callbacks.ompt_callback(ompt_event_acquired_atomic)(
-            (ompt_wait_id_t) lck);
-    }
+  if (ompt_enabled &&
+      ompt_callbacks.ompt_callback(ompt_event_acquired_atomic)) {
+    ompt_callbacks.ompt_callback(ompt_event_acquired_atomic)(
+        (ompt_wait_id_t)lck);
+  }
 #endif
 }
 
-static inline int
-__kmp_test_atomic_lock( kmp_atomic_lock_t *lck, kmp_int32 gtid )
-{
-    return __kmp_test_queuing_lock( lck, gtid );
+static inline int __kmp_test_atomic_lock(kmp_atomic_lock_t *lck,
+                                         kmp_int32 gtid) {
+  return __kmp_test_queuing_lock(lck, gtid);
 }
 
-static inline void
-__kmp_release_atomic_lock( kmp_atomic_lock_t *lck, kmp_int32 gtid )
-{
-    __kmp_release_queuing_lock( lck, gtid );
+static inline void __kmp_release_atomic_lock(kmp_atomic_lock_t *lck,
+                                             kmp_int32 gtid) {
+  __kmp_release_queuing_lock(lck, gtid);
 #if OMPT_SUPPORT && OMPT_BLAME
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_release_atomic)) {
-        ompt_callbacks.ompt_callback(ompt_event_release_atomic)(
-            (ompt_wait_id_t) lck);
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_release_atomic)) {
+    ompt_callbacks.ompt_callback(ompt_event_release_atomic)(
+        (ompt_wait_id_t)lck);
   }
 #endif
 }
 
-static inline void
-__kmp_init_atomic_lock( kmp_atomic_lock_t *lck )
-{
-    __kmp_init_queuing_lock( lck );
+static inline void __kmp_init_atomic_lock(kmp_atomic_lock_t *lck) {
+  __kmp_init_queuing_lock(lck);
 }
 
-static inline void
-__kmp_destroy_atomic_lock( kmp_atomic_lock_t *lck )
-{
-    __kmp_destroy_queuing_lock( lck );
+static inline void __kmp_destroy_atomic_lock(kmp_atomic_lock_t *lck) {
+  __kmp_destroy_queuing_lock(lck);
 }
 
 // Global Locks
+extern kmp_atomic_lock_t __kmp_atomic_lock; /* Control access to all user coded
+                                               atomics in Gnu compat mode   */
+extern kmp_atomic_lock_t __kmp_atomic_lock_1i; /* Control access to all user
+                                                  coded atomics for 1-byte fixed
+                                                  data types */
+extern kmp_atomic_lock_t __kmp_atomic_lock_2i; /* Control access to all user
+                                                  coded atomics for 2-byte fixed
+                                                  data types */
+extern kmp_atomic_lock_t __kmp_atomic_lock_4i; /* Control access to all user
+                                                  coded atomics for 4-byte fixed
+                                                  data types */
+extern kmp_atomic_lock_t __kmp_atomic_lock_4r; /* Control access to all user
+                                                  coded atomics for kmp_real32
+                                                  data type    */
+extern kmp_atomic_lock_t __kmp_atomic_lock_8i; /* Control access to all user
+                                                  coded atomics for 8-byte fixed
+                                                  data types */
+extern kmp_atomic_lock_t __kmp_atomic_lock_8r; /* Control access to all user
+                                                  coded atomics for kmp_real64
+                                                  data type    */
+extern kmp_atomic_lock_t
+    __kmp_atomic_lock_8c; /* Control access to all user coded atomics for
+                             complex byte data type  */
+extern kmp_atomic_lock_t
+    __kmp_atomic_lock_10r; /* Control access to all user coded atomics for long
+                              double data type   */
+extern kmp_atomic_lock_t __kmp_atomic_lock_16r; /* Control access to all user
+                                                   coded atomics for _Quad data
+                                                   type         */
+extern kmp_atomic_lock_t __kmp_atomic_lock_16c; /* Control access to all user
+                                                   coded atomics for double
+                                                   complex data type*/
+extern kmp_atomic_lock_t
+    __kmp_atomic_lock_20c; /* Control access to all user coded atomics for long
+                              double complex type*/
+extern kmp_atomic_lock_t __kmp_atomic_lock_32c; /* Control access to all user
+                                                   coded atomics for _Quad
+                                                   complex data type */
 
-extern kmp_atomic_lock_t __kmp_atomic_lock;    /* Control access to all user coded atomics in Gnu compat mode   */
-extern kmp_atomic_lock_t __kmp_atomic_lock_1i;  /* Control access to all user coded atomics for 1-byte fixed data types */
-extern kmp_atomic_lock_t __kmp_atomic_lock_2i;  /* Control access to all user coded atomics for 2-byte fixed data types */
-extern kmp_atomic_lock_t __kmp_atomic_lock_4i;  /* Control access to all user coded atomics for 4-byte fixed data types */
-extern kmp_atomic_lock_t __kmp_atomic_lock_4r;  /* Control access to all user coded atomics for kmp_real32 data type    */
-extern kmp_atomic_lock_t __kmp_atomic_lock_8i;  /* Control access to all user coded atomics for 8-byte fixed data types */
-extern kmp_atomic_lock_t __kmp_atomic_lock_8r;  /* Control access to all user coded atomics for kmp_real64 data type    */
-extern kmp_atomic_lock_t __kmp_atomic_lock_8c;  /* Control access to all user coded atomics for complex byte data type  */
-extern kmp_atomic_lock_t __kmp_atomic_lock_10r; /* Control access to all user coded atomics for long double data type   */
-extern kmp_atomic_lock_t __kmp_atomic_lock_16r; /* Control access to all user coded atomics for _Quad data type         */
-extern kmp_atomic_lock_t __kmp_atomic_lock_16c; /* Control access to all user coded atomics for double complex data type*/
-extern kmp_atomic_lock_t __kmp_atomic_lock_20c; /* Control access to all user coded atomics for long double complex type*/
-extern kmp_atomic_lock_t __kmp_atomic_lock_32c; /* Control access to all user coded atomics for _Quad complex data type */
-
-//
 //  Below routines for atomic UPDATE are listed
-//
 
 // 1-byte
-void __kmpc_atomic_fixed1_add(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_andb( ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_div(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1u_div( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs );
-void __kmpc_atomic_fixed1_mul(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_orb(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_shl(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_shr(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1u_shr( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs );
-void __kmpc_atomic_fixed1_sub(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_xor(  ident_t *id_ref, int gtid, char * lhs, char rhs );
+void __kmpc_atomic_fixed1_add(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_andb(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_div(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1u_div(ident_t *id_ref, int gtid, unsigned char *lhs,
+                               unsigned char rhs);
+void __kmpc_atomic_fixed1_mul(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_orb(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_shl(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_shr(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1u_shr(ident_t *id_ref, int gtid, unsigned char *lhs,
+                               unsigned char rhs);
+void __kmpc_atomic_fixed1_sub(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_xor(ident_t *id_ref, int gtid, char *lhs, char rhs);
 // 2-byte
-void __kmpc_atomic_fixed2_add(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_andb( ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_div(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2u_div( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs );
-void __kmpc_atomic_fixed2_mul(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_orb(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_shl(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_shr(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2u_shr( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs );
-void __kmpc_atomic_fixed2_sub(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_xor(  ident_t *id_ref, int gtid, short * lhs, short rhs );
+void __kmpc_atomic_fixed2_add(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2_andb(ident_t *id_ref, int gtid, short *lhs,
+                               short rhs);
+void __kmpc_atomic_fixed2_div(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2u_div(ident_t *id_ref, int gtid, unsigned short *lhs,
+                               unsigned short rhs);
+void __kmpc_atomic_fixed2_mul(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2_orb(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2_shl(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2_shr(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2u_shr(ident_t *id_ref, int gtid, unsigned short *lhs,
+                               unsigned short rhs);
+void __kmpc_atomic_fixed2_sub(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2_xor(ident_t *id_ref, int gtid, short *lhs, short rhs);
 // 4-byte add / sub fixed
-void __kmpc_atomic_fixed4_add(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_sub(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
+void __kmpc_atomic_fixed4_add(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4_sub(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
 // 4-byte add / sub float
-void __kmpc_atomic_float4_add(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs );
-void __kmpc_atomic_float4_sub(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs );
+void __kmpc_atomic_float4_add(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                              kmp_real32 rhs);
+void __kmpc_atomic_float4_sub(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                              kmp_real32 rhs);
 // 8-byte add / sub fixed
-void __kmpc_atomic_fixed8_add(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_sub(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
+void __kmpc_atomic_fixed8_add(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8_sub(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
 // 8-byte add / sub float
-void __kmpc_atomic_float8_add(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float8_sub(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs );
+void __kmpc_atomic_float8_add(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                              kmp_real64 rhs);
+void __kmpc_atomic_float8_sub(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                              kmp_real64 rhs);
 // 4-byte fixed
-void __kmpc_atomic_fixed4_andb( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_div(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4u_div( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs );
-void __kmpc_atomic_fixed4_mul(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_orb(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_shl(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_shr(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4u_shr( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs );
-void __kmpc_atomic_fixed4_xor(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
+void __kmpc_atomic_fixed4_andb(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                               kmp_int32 rhs);
+void __kmpc_atomic_fixed4_div(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4u_div(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                               kmp_uint32 rhs);
+void __kmpc_atomic_fixed4_mul(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4_orb(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4_shl(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4_shr(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4u_shr(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                               kmp_uint32 rhs);
+void __kmpc_atomic_fixed4_xor(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
 // 8-byte fixed
-void __kmpc_atomic_fixed8_andb( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_div(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8u_div( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs );
-void __kmpc_atomic_fixed8_mul(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_orb(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_shl(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_shr(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8u_shr( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs );
-void __kmpc_atomic_fixed8_xor(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
+void __kmpc_atomic_fixed8_andb(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                               kmp_int64 rhs);
+void __kmpc_atomic_fixed8_div(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8u_div(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                               kmp_uint64 rhs);
+void __kmpc_atomic_fixed8_mul(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8_orb(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8_shl(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8_shr(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8u_shr(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                               kmp_uint64 rhs);
+void __kmpc_atomic_fixed8_xor(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
 // 4-byte float
-void __kmpc_atomic_float4_div(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs );
-void __kmpc_atomic_float4_mul(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs );
+void __kmpc_atomic_float4_div(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                              kmp_real32 rhs);
+void __kmpc_atomic_float4_mul(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                              kmp_real32 rhs);
 // 8-byte float
-void __kmpc_atomic_float8_div(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float8_mul(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs );
+void __kmpc_atomic_float8_div(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                              kmp_real64 rhs);
+void __kmpc_atomic_float8_mul(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                              kmp_real64 rhs);
 // 1-, 2-, 4-, 8-byte logical (&&, ||)
-void __kmpc_atomic_fixed1_andl( ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_orl(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed2_andl( ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_orl(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed4_andl( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_orl(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed8_andl( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_orl(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
+void __kmpc_atomic_fixed1_andl(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_orl(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed2_andl(ident_t *id_ref, int gtid, short *lhs,
+                               short rhs);
+void __kmpc_atomic_fixed2_orl(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed4_andl(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                               kmp_int32 rhs);
+void __kmpc_atomic_fixed4_orl(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed8_andl(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                               kmp_int64 rhs);
+void __kmpc_atomic_fixed8_orl(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
 // MIN / MAX
-void __kmpc_atomic_fixed1_max(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_min(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed2_max(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_min(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed4_max(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_min(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed8_max(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_min(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_float4_max(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs );
-void __kmpc_atomic_float4_min(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs );
-void __kmpc_atomic_float8_max(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float8_min(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs );
+void __kmpc_atomic_fixed1_max(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed1_min(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed2_max(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed2_min(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed4_max(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed4_min(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed8_max(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_fixed8_min(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
+void __kmpc_atomic_float4_max(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                              kmp_real32 rhs);
+void __kmpc_atomic_float4_min(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                              kmp_real32 rhs);
+void __kmpc_atomic_float8_max(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                              kmp_real64 rhs);
+void __kmpc_atomic_float8_min(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                              kmp_real64 rhs);
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_float16_max( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-void __kmpc_atomic_float16_min( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary; IA-32 architecture only
-    void __kmpc_atomic_float16_max_a16( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    void __kmpc_atomic_float16_min_a16( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
+void __kmpc_atomic_float16_max(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                               QUAD_LEGACY rhs);
+void __kmpc_atomic_float16_min(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                               QUAD_LEGACY rhs);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary; IA-32
+// architecture only
+void __kmpc_atomic_float16_max_a16(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                   Quad_a16_t rhs);
+void __kmpc_atomic_float16_min_a16(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                   Quad_a16_t rhs);
 #endif
 #endif
 // .NEQV. (same as xor)
-void __kmpc_atomic_fixed1_neqv( ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed2_neqv( ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed4_neqv( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed8_neqv( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
+void __kmpc_atomic_fixed1_neqv(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed2_neqv(ident_t *id_ref, int gtid, short *lhs,
+                               short rhs);
+void __kmpc_atomic_fixed4_neqv(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                               kmp_int32 rhs);
+void __kmpc_atomic_fixed8_neqv(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                               kmp_int64 rhs);
 // .EQV. (same as ~xor)
-void __kmpc_atomic_fixed1_eqv(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed2_eqv(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed4_eqv(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed8_eqv(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
+void __kmpc_atomic_fixed1_eqv(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed2_eqv(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed4_eqv(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                              kmp_int32 rhs);
+void __kmpc_atomic_fixed8_eqv(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                              kmp_int64 rhs);
 // long double type
-void __kmpc_atomic_float10_add( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
-void __kmpc_atomic_float10_sub( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
-void __kmpc_atomic_float10_mul( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
-void __kmpc_atomic_float10_div( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
+void __kmpc_atomic_float10_add(ident_t *id_ref, int gtid, long double *lhs,
+                               long double rhs);
+void __kmpc_atomic_float10_sub(ident_t *id_ref, int gtid, long double *lhs,
+                               long double rhs);
+void __kmpc_atomic_float10_mul(ident_t *id_ref, int gtid, long double *lhs,
+                               long double rhs);
+void __kmpc_atomic_float10_div(ident_t *id_ref, int gtid, long double *lhs,
+                               long double rhs);
 // _Quad type
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_float16_add( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-void __kmpc_atomic_float16_sub( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-void __kmpc_atomic_float16_mul( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-void __kmpc_atomic_float16_div( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary
-    void __kmpc_atomic_float16_add_a16( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    void __kmpc_atomic_float16_sub_a16( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    void __kmpc_atomic_float16_mul_a16( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    void __kmpc_atomic_float16_div_a16( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
+void __kmpc_atomic_float16_add(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                               QUAD_LEGACY rhs);
+void __kmpc_atomic_float16_sub(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                               QUAD_LEGACY rhs);
+void __kmpc_atomic_float16_mul(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                               QUAD_LEGACY rhs);
+void __kmpc_atomic_float16_div(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                               QUAD_LEGACY rhs);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary
+void __kmpc_atomic_float16_add_a16(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                   Quad_a16_t rhs);
+void __kmpc_atomic_float16_sub_a16(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                   Quad_a16_t rhs);
+void __kmpc_atomic_float16_mul_a16(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                   Quad_a16_t rhs);
+void __kmpc_atomic_float16_div_a16(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                   Quad_a16_t rhs);
 #endif
 #endif
 // routines for complex types
-void __kmpc_atomic_cmplx4_add(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx4_sub(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx4_mul(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx4_div(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx8_add(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx8_sub(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx8_mul(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx8_div(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx10_add( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
-void __kmpc_atomic_cmplx10_sub( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
-void __kmpc_atomic_cmplx10_mul( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
-void __kmpc_atomic_cmplx10_div( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
+void __kmpc_atomic_cmplx4_add(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                              kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx4_sub(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                              kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx4_mul(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                              kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx4_div(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                              kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx8_add(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                              kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx8_sub(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                              kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx8_mul(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                              kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx8_div(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                              kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx10_add(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                               kmp_cmplx80 rhs);
+void __kmpc_atomic_cmplx10_sub(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                               kmp_cmplx80 rhs);
+void __kmpc_atomic_cmplx10_mul(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                               kmp_cmplx80 rhs);
+void __kmpc_atomic_cmplx10_div(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                               kmp_cmplx80 rhs);
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_cmplx16_add( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-void __kmpc_atomic_cmplx16_sub( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-void __kmpc_atomic_cmplx16_mul( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-void __kmpc_atomic_cmplx16_div( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary
-    void __kmpc_atomic_cmplx16_add_a16( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
-    void __kmpc_atomic_cmplx16_sub_a16( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
-    void __kmpc_atomic_cmplx16_mul_a16( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
-    void __kmpc_atomic_cmplx16_div_a16( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
+void __kmpc_atomic_cmplx16_add(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                               CPLX128_LEG rhs);
+void __kmpc_atomic_cmplx16_sub(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                               CPLX128_LEG rhs);
+void __kmpc_atomic_cmplx16_mul(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                               CPLX128_LEG rhs);
+void __kmpc_atomic_cmplx16_div(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                               CPLX128_LEG rhs);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary
+void __kmpc_atomic_cmplx16_add_a16(ident_t *id_ref, int gtid,
+                                   kmp_cmplx128_a16_t *lhs,
+                                   kmp_cmplx128_a16_t rhs);
+void __kmpc_atomic_cmplx16_sub_a16(ident_t *id_ref, int gtid,
+                                   kmp_cmplx128_a16_t *lhs,
+                                   kmp_cmplx128_a16_t rhs);
+void __kmpc_atomic_cmplx16_mul_a16(ident_t *id_ref, int gtid,
+                                   kmp_cmplx128_a16_t *lhs,
+                                   kmp_cmplx128_a16_t rhs);
+void __kmpc_atomic_cmplx16_div_a16(ident_t *id_ref, int gtid,
+                                   kmp_cmplx128_a16_t *lhs,
+                                   kmp_cmplx128_a16_t rhs);
 #endif
 #endif
 
@@ -602,381 +695,710 @@ void __kmpc_atomic_cmplx16_div( ident_t
 // Supported only on IA-32 architecture and Intel(R) 64
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 
-void __kmpc_atomic_fixed1_sub_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_div_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1u_div_rev( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs );
-void __kmpc_atomic_fixed1_shl_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1_shr_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs );
-void __kmpc_atomic_fixed1u_shr_rev( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs );
-void __kmpc_atomic_fixed2_sub_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_div_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2u_div_rev( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs );
-void __kmpc_atomic_fixed2_shl_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2_shr_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs );
-void __kmpc_atomic_fixed2u_shr_rev( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs );
-void __kmpc_atomic_fixed4_sub_rev(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_div_rev(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4u_div_rev( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs );
-void __kmpc_atomic_fixed4_shl_rev(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4_shr_rev(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs );
-void __kmpc_atomic_fixed4u_shr_rev( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs );
-void __kmpc_atomic_fixed8_sub_rev(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_div_rev(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8u_div_rev( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs );
-void __kmpc_atomic_fixed8_shl_rev(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8_shr_rev(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs );
-void __kmpc_atomic_fixed8u_shr_rev( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs );
-void __kmpc_atomic_float4_sub_rev(  ident_t *id_ref, int gtid, float * lhs, float rhs );
-void __kmpc_atomic_float4_div_rev(  ident_t *id_ref, int gtid, float * lhs, float rhs );
-void __kmpc_atomic_float8_sub_rev(  ident_t *id_ref, int gtid, double * lhs, double rhs );
-void __kmpc_atomic_float8_div_rev(  ident_t *id_ref, int gtid, double * lhs, double rhs );
-void __kmpc_atomic_float10_sub_rev( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
-void __kmpc_atomic_float10_div_rev( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
+void __kmpc_atomic_fixed1_sub_rev(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs);
+void __kmpc_atomic_fixed1_div_rev(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs);
+void __kmpc_atomic_fixed1u_div_rev(ident_t *id_ref, int gtid,
+                                   unsigned char *lhs, unsigned char rhs);
+void __kmpc_atomic_fixed1_shl_rev(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs);
+void __kmpc_atomic_fixed1_shr_rev(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs);
+void __kmpc_atomic_fixed1u_shr_rev(ident_t *id_ref, int gtid,
+                                   unsigned char *lhs, unsigned char rhs);
+void __kmpc_atomic_fixed2_sub_rev(ident_t *id_ref, int gtid, short *lhs,
+                                  short rhs);
+void __kmpc_atomic_fixed2_div_rev(ident_t *id_ref, int gtid, short *lhs,
+                                  short rhs);
+void __kmpc_atomic_fixed2u_div_rev(ident_t *id_ref, int gtid,
+                                   unsigned short *lhs, unsigned short rhs);
+void __kmpc_atomic_fixed2_shl_rev(ident_t *id_ref, int gtid, short *lhs,
+                                  short rhs);
+void __kmpc_atomic_fixed2_shr_rev(ident_t *id_ref, int gtid, short *lhs,
+                                  short rhs);
+void __kmpc_atomic_fixed2u_shr_rev(ident_t *id_ref, int gtid,
+                                   unsigned short *lhs, unsigned short rhs);
+void __kmpc_atomic_fixed4_sub_rev(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                  kmp_int32 rhs);
+void __kmpc_atomic_fixed4_div_rev(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                  kmp_int32 rhs);
+void __kmpc_atomic_fixed4u_div_rev(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                                   kmp_uint32 rhs);
+void __kmpc_atomic_fixed4_shl_rev(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                  kmp_int32 rhs);
+void __kmpc_atomic_fixed4_shr_rev(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                  kmp_int32 rhs);
+void __kmpc_atomic_fixed4u_shr_rev(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                                   kmp_uint32 rhs);
+void __kmpc_atomic_fixed8_sub_rev(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                  kmp_int64 rhs);
+void __kmpc_atomic_fixed8_div_rev(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                  kmp_int64 rhs);
+void __kmpc_atomic_fixed8u_div_rev(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                                   kmp_uint64 rhs);
+void __kmpc_atomic_fixed8_shl_rev(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                  kmp_int64 rhs);
+void __kmpc_atomic_fixed8_shr_rev(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                  kmp_int64 rhs);
+void __kmpc_atomic_fixed8u_shr_rev(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                                   kmp_uint64 rhs);
+void __kmpc_atomic_float4_sub_rev(ident_t *id_ref, int gtid, float *lhs,
+                                  float rhs);
+void __kmpc_atomic_float4_div_rev(ident_t *id_ref, int gtid, float *lhs,
+                                  float rhs);
+void __kmpc_atomic_float8_sub_rev(ident_t *id_ref, int gtid, double *lhs,
+                                  double rhs);
+void __kmpc_atomic_float8_div_rev(ident_t *id_ref, int gtid, double *lhs,
+                                  double rhs);
+void __kmpc_atomic_float10_sub_rev(ident_t *id_ref, int gtid, long double *lhs,
+                                   long double rhs);
+void __kmpc_atomic_float10_div_rev(ident_t *id_ref, int gtid, long double *lhs,
+                                   long double rhs);
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_float16_sub_rev( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-void __kmpc_atomic_float16_div_rev( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
-#endif
-void __kmpc_atomic_cmplx4_sub_rev(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx4_div_rev(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx8_sub_rev(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx8_div_rev(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx10_sub_rev( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
-void __kmpc_atomic_cmplx10_div_rev( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
+void __kmpc_atomic_float16_sub_rev(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                                   QUAD_LEGACY rhs);
+void __kmpc_atomic_float16_div_rev(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                                   QUAD_LEGACY rhs);
+#endif
+void __kmpc_atomic_cmplx4_sub_rev(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                                  kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx4_div_rev(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                                  kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx8_sub_rev(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                                  kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx8_div_rev(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                                  kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx10_sub_rev(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                                   kmp_cmplx80 rhs);
+void __kmpc_atomic_cmplx10_div_rev(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                                   kmp_cmplx80 rhs);
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_cmplx16_sub_rev( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-void __kmpc_atomic_cmplx16_div_rev( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary
-    void __kmpc_atomic_float16_sub_a16_rev( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    void __kmpc_atomic_float16_div_a16_rev( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    void __kmpc_atomic_cmplx16_sub_a16_rev( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
-    void __kmpc_atomic_cmplx16_div_a16_rev( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
+void __kmpc_atomic_cmplx16_sub_rev(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                                   CPLX128_LEG rhs);
+void __kmpc_atomic_cmplx16_div_rev(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                                   CPLX128_LEG rhs);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary
+void __kmpc_atomic_float16_sub_a16_rev(ident_t *id_ref, int gtid,
+                                       Quad_a16_t *lhs, Quad_a16_t rhs);
+void __kmpc_atomic_float16_div_a16_rev(ident_t *id_ref, int gtid,
+                                       Quad_a16_t *lhs, Quad_a16_t rhs);
+void __kmpc_atomic_cmplx16_sub_a16_rev(ident_t *id_ref, int gtid,
+                                       kmp_cmplx128_a16_t *lhs,
+                                       kmp_cmplx128_a16_t rhs);
+void __kmpc_atomic_cmplx16_div_a16_rev(ident_t *id_ref, int gtid,
+                                       kmp_cmplx128_a16_t *lhs,
+                                       kmp_cmplx128_a16_t rhs);
 #endif
 #endif // KMP_HAVE_QUAD
 
-#endif //KMP_ARCH_X86 || KMP_ARCH_X86_64
+#endif // KMP_ARCH_X86 || KMP_ARCH_X86_64
 
-#endif //OMP_40_ENABLED
+#endif // OMP_40_ENABLED
 
 // routines for mixed types
 
 // RHS=float8
-void __kmpc_atomic_fixed1_mul_float8( ident_t *id_ref, int gtid, char * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed1_div_float8( ident_t *id_ref, int gtid, char * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed2_mul_float8( ident_t *id_ref, int gtid, short * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed2_div_float8( ident_t *id_ref, int gtid, short * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed4_mul_float8( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed4_div_float8( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed8_mul_float8( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_fixed8_div_float8( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float4_add_float8( ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float4_sub_float8( ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float4_mul_float8( ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real64 rhs );
-void __kmpc_atomic_float4_div_float8( ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real64 rhs );
+void __kmpc_atomic_fixed1_mul_float8(ident_t *id_ref, int gtid, char *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed1_div_float8(ident_t *id_ref, int gtid, char *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed2_mul_float8(ident_t *id_ref, int gtid, short *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed2_div_float8(ident_t *id_ref, int gtid, short *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed4_mul_float8(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed4_div_float8(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed8_mul_float8(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_fixed8_div_float8(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_float4_add_float8(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_float4_sub_float8(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_float4_mul_float8(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                     kmp_real64 rhs);
+void __kmpc_atomic_float4_div_float8(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                     kmp_real64 rhs);
 
-// RHS=float16 (deprecated, to be removed when we are sure the compiler does not use them)
+// RHS=float16 (deprecated, to be removed when we are sure the compiler does not
+// use them)
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_fixed1_add_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1u_add_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1_sub_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1u_sub_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1_mul_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1u_mul_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1_div_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1u_div_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs );
-
-void __kmpc_atomic_fixed2_add_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2u_add_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2_sub_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2u_sub_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2_mul_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2u_mul_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2_div_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2u_div_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs );
-
-void __kmpc_atomic_fixed4_add_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4u_add_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4_sub_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4u_sub_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4_mul_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4u_mul_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4_div_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4u_div_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs );
-
-void __kmpc_atomic_fixed8_add_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8u_add_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8_sub_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8u_sub_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8_mul_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8u_mul_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8_div_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8u_div_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs );
-
-void __kmpc_atomic_float4_add_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs );
-void __kmpc_atomic_float4_sub_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs );
-void __kmpc_atomic_float4_mul_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs );
-void __kmpc_atomic_float4_div_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs );
-
-void __kmpc_atomic_float8_add_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs );
-void __kmpc_atomic_float8_sub_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs );
-void __kmpc_atomic_float8_mul_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs );
-void __kmpc_atomic_float8_div_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs );
-
-void __kmpc_atomic_float10_add_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs );
-void __kmpc_atomic_float10_sub_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs );
-void __kmpc_atomic_float10_mul_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs );
-void __kmpc_atomic_float10_div_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs );
+void __kmpc_atomic_fixed1_add_fp(ident_t *id_ref, int gtid, char *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed1u_add_fp(ident_t *id_ref, int gtid, unsigned char *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed1_sub_fp(ident_t *id_ref, int gtid, char *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed1u_sub_fp(ident_t *id_ref, int gtid, unsigned char *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed1_mul_fp(ident_t *id_ref, int gtid, char *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed1u_mul_fp(ident_t *id_ref, int gtid, unsigned char *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed1_div_fp(ident_t *id_ref, int gtid, char *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed1u_div_fp(ident_t *id_ref, int gtid, unsigned char *lhs,
+                                  _Quad rhs);
+
+void __kmpc_atomic_fixed2_add_fp(ident_t *id_ref, int gtid, short *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed2u_add_fp(ident_t *id_ref, int gtid,
+                                  unsigned short *lhs, _Quad rhs);
+void __kmpc_atomic_fixed2_sub_fp(ident_t *id_ref, int gtid, short *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed2u_sub_fp(ident_t *id_ref, int gtid,
+                                  unsigned short *lhs, _Quad rhs);
+void __kmpc_atomic_fixed2_mul_fp(ident_t *id_ref, int gtid, short *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed2u_mul_fp(ident_t *id_ref, int gtid,
+                                  unsigned short *lhs, _Quad rhs);
+void __kmpc_atomic_fixed2_div_fp(ident_t *id_ref, int gtid, short *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed2u_div_fp(ident_t *id_ref, int gtid,
+                                  unsigned short *lhs, _Quad rhs);
+
+void __kmpc_atomic_fixed4_add_fp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed4u_add_fp(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed4_sub_fp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed4u_sub_fp(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed4_mul_fp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed4u_mul_fp(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed4_div_fp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed4u_div_fp(ident_t *id_ref, int gtid, kmp_uint32 *lhs,
+                                  _Quad rhs);
+
+void __kmpc_atomic_fixed8_add_fp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed8u_add_fp(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed8_sub_fp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed8u_sub_fp(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed8_mul_fp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed8u_mul_fp(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_fixed8_div_fp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_fixed8u_div_fp(ident_t *id_ref, int gtid, kmp_uint64 *lhs,
+                                  _Quad rhs);
+
+void __kmpc_atomic_float4_add_fp(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_float4_sub_fp(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_float4_mul_fp(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_float4_div_fp(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                                 _Quad rhs);
+
+void __kmpc_atomic_float8_add_fp(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_float8_sub_fp(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_float8_mul_fp(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                                 _Quad rhs);
+void __kmpc_atomic_float8_div_fp(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                                 _Quad rhs);
+
+void __kmpc_atomic_float10_add_fp(ident_t *id_ref, int gtid, long double *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_float10_sub_fp(ident_t *id_ref, int gtid, long double *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_float10_mul_fp(ident_t *id_ref, int gtid, long double *lhs,
+                                  _Quad rhs);
+void __kmpc_atomic_float10_div_fp(ident_t *id_ref, int gtid, long double *lhs,
+                                  _Quad rhs);
 
 // Reverse operations
-void __kmpc_atomic_fixed1_sub_rev_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1u_sub_rev_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1_div_rev_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed1u_div_rev_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2_sub_rev_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2u_sub_rev_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2_div_rev_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed2u_div_rev_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4_sub_rev_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4u_sub_rev_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4_div_rev_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed4u_div_rev_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8_sub_rev_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8u_sub_rev_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8_div_rev_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs );
-void __kmpc_atomic_fixed8u_div_rev_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs );
-void __kmpc_atomic_float4_sub_rev_fp(  ident_t *id_ref, int gtid, float * lhs, _Quad rhs );
-void __kmpc_atomic_float4_div_rev_fp(  ident_t *id_ref, int gtid, float * lhs, _Quad rhs );
-void __kmpc_atomic_float8_sub_rev_fp(  ident_t *id_ref, int gtid, double * lhs, _Quad rhs );
-void __kmpc_atomic_float8_div_rev_fp(  ident_t *id_ref, int gtid, double * lhs, _Quad rhs );
-void __kmpc_atomic_float10_sub_rev_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs );
-void __kmpc_atomic_float10_div_rev_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs );
+void __kmpc_atomic_fixed1_sub_rev_fp(ident_t *id_ref, int gtid, char *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed1u_sub_rev_fp(ident_t *id_ref, int gtid,
+                                      unsigned char *lhs, _Quad rhs);
+void __kmpc_atomic_fixed1_div_rev_fp(ident_t *id_ref, int gtid, char *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed1u_div_rev_fp(ident_t *id_ref, int gtid,
+                                      unsigned char *lhs, _Quad rhs);
+void __kmpc_atomic_fixed2_sub_rev_fp(ident_t *id_ref, int gtid, short *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed2u_sub_rev_fp(ident_t *id_ref, int gtid,
+                                      unsigned short *lhs, _Quad rhs);
+void __kmpc_atomic_fixed2_div_rev_fp(ident_t *id_ref, int gtid, short *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed2u_div_rev_fp(ident_t *id_ref, int gtid,
+                                      unsigned short *lhs, _Quad rhs);
+void __kmpc_atomic_fixed4_sub_rev_fp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed4u_sub_rev_fp(ident_t *id_ref, int gtid,
+                                      kmp_uint32 *lhs, _Quad rhs);
+void __kmpc_atomic_fixed4_div_rev_fp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed4u_div_rev_fp(ident_t *id_ref, int gtid,
+                                      kmp_uint32 *lhs, _Quad rhs);
+void __kmpc_atomic_fixed8_sub_rev_fp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed8u_sub_rev_fp(ident_t *id_ref, int gtid,
+                                      kmp_uint64 *lhs, _Quad rhs);
+void __kmpc_atomic_fixed8_div_rev_fp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_fixed8u_div_rev_fp(ident_t *id_ref, int gtid,
+                                      kmp_uint64 *lhs, _Quad rhs);
+void __kmpc_atomic_float4_sub_rev_fp(ident_t *id_ref, int gtid, float *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_float4_div_rev_fp(ident_t *id_ref, int gtid, float *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_float8_sub_rev_fp(ident_t *id_ref, int gtid, double *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_float8_div_rev_fp(ident_t *id_ref, int gtid, double *lhs,
+                                     _Quad rhs);
+void __kmpc_atomic_float10_sub_rev_fp(ident_t *id_ref, int gtid,
+                                      long double *lhs, _Quad rhs);
+void __kmpc_atomic_float10_div_rev_fp(ident_t *id_ref, int gtid,
+                                      long double *lhs, _Quad rhs);
 
 #endif // KMP_HAVE_QUAD
 
 // RHS=cmplx8
-void __kmpc_atomic_cmplx4_add_cmplx8( ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx4_sub_cmplx8( ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx4_mul_cmplx8( ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx4_div_cmplx8( ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx64 rhs );
+void __kmpc_atomic_cmplx4_add_cmplx8(ident_t *id_ref, int gtid,
+                                     kmp_cmplx32 *lhs, kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx4_sub_cmplx8(ident_t *id_ref, int gtid,
+                                     kmp_cmplx32 *lhs, kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx4_mul_cmplx8(ident_t *id_ref, int gtid,
+                                     kmp_cmplx32 *lhs, kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx4_div_cmplx8(ident_t *id_ref, int gtid,
+                                     kmp_cmplx32 *lhs, kmp_cmplx64 rhs);
 
 // generic atomic routines
-void __kmpc_atomic_1(  ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_2(  ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_4(  ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_8(  ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_10( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_16( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_20( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
-void __kmpc_atomic_32( ident_t *id_ref, int gtid, void* lhs, void* rhs, void (*f)( void *, void *, void * ) );
+void __kmpc_atomic_1(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *));
+void __kmpc_atomic_2(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *));
+void __kmpc_atomic_4(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *));
+void __kmpc_atomic_8(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                     void (*f)(void *, void *, void *));
+void __kmpc_atomic_10(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *));
+void __kmpc_atomic_16(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *));
+void __kmpc_atomic_20(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *));
+void __kmpc_atomic_32(ident_t *id_ref, int gtid, void *lhs, void *rhs,
+                      void (*f)(void *, void *, void *));
 
 // READ, WRITE, CAPTURE are supported only on IA-32 architecture and Intel(R) 64
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
 
-//
 //  Below routines for atomic READ are listed
-//
-
-char         __kmpc_atomic_fixed1_rd(  ident_t *id_ref, int gtid, char        * loc );
-short        __kmpc_atomic_fixed2_rd(  ident_t *id_ref, int gtid, short       * loc );
-kmp_int32    __kmpc_atomic_fixed4_rd(  ident_t *id_ref, int gtid, kmp_int32   * loc );
-kmp_int64    __kmpc_atomic_fixed8_rd(  ident_t *id_ref, int gtid, kmp_int64   * loc );
-kmp_real32   __kmpc_atomic_float4_rd(  ident_t *id_ref, int gtid, kmp_real32  * loc );
-kmp_real64   __kmpc_atomic_float8_rd(  ident_t *id_ref, int gtid, kmp_real64  * loc );
-long double  __kmpc_atomic_float10_rd( ident_t *id_ref, int gtid, long double * loc );
+char __kmpc_atomic_fixed1_rd(ident_t *id_ref, int gtid, char *loc);
+short __kmpc_atomic_fixed2_rd(ident_t *id_ref, int gtid, short *loc);
+kmp_int32 __kmpc_atomic_fixed4_rd(ident_t *id_ref, int gtid, kmp_int32 *loc);
+kmp_int64 __kmpc_atomic_fixed8_rd(ident_t *id_ref, int gtid, kmp_int64 *loc);
+kmp_real32 __kmpc_atomic_float4_rd(ident_t *id_ref, int gtid, kmp_real32 *loc);
+kmp_real64 __kmpc_atomic_float8_rd(ident_t *id_ref, int gtid, kmp_real64 *loc);
+long double __kmpc_atomic_float10_rd(ident_t *id_ref, int gtid,
+                                     long double *loc);
 #if KMP_HAVE_QUAD
-QUAD_LEGACY  __kmpc_atomic_float16_rd( ident_t *id_ref, int gtid, QUAD_LEGACY * loc );
+QUAD_LEGACY __kmpc_atomic_float16_rd(ident_t *id_ref, int gtid,
+                                     QUAD_LEGACY *loc);
 #endif
-// Fix for CQ220361: cmplx4 READ will return void on Windows* OS; read value will be
-// returned through an additional parameter
-#if ( KMP_OS_WINDOWS )
-    void  __kmpc_atomic_cmplx4_rd(  kmp_cmplx32 * out, ident_t *id_ref, int gtid, kmp_cmplx32 * loc );
+// Fix for CQ220361: cmplx4 READ will return void on Windows* OS; read value
+// will be returned through an additional parameter
+#if (KMP_OS_WINDOWS)
+void __kmpc_atomic_cmplx4_rd(kmp_cmplx32 *out, ident_t *id_ref, int gtid,
+                             kmp_cmplx32 *loc);
 #else
-    kmp_cmplx32  __kmpc_atomic_cmplx4_rd(  ident_t *id_ref, int gtid, kmp_cmplx32 * loc );
+kmp_cmplx32 __kmpc_atomic_cmplx4_rd(ident_t *id_ref, int gtid,
+                                    kmp_cmplx32 *loc);
 #endif
-kmp_cmplx64  __kmpc_atomic_cmplx8_rd(  ident_t *id_ref, int gtid, kmp_cmplx64 * loc );
-kmp_cmplx80  __kmpc_atomic_cmplx10_rd( ident_t *id_ref, int gtid, kmp_cmplx80 * loc );
+kmp_cmplx64 __kmpc_atomic_cmplx8_rd(ident_t *id_ref, int gtid,
+                                    kmp_cmplx64 *loc);
+kmp_cmplx80 __kmpc_atomic_cmplx10_rd(ident_t *id_ref, int gtid,
+                                     kmp_cmplx80 *loc);
 #if KMP_HAVE_QUAD
-CPLX128_LEG  __kmpc_atomic_cmplx16_rd( ident_t *id_ref, int gtid, CPLX128_LEG * loc );
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary
-    Quad_a16_t         __kmpc_atomic_float16_a16_rd( ident_t * id_ref, int gtid, Quad_a16_t         * loc );
-    kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_a16_rd( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * loc );
+CPLX128_LEG __kmpc_atomic_cmplx16_rd(ident_t *id_ref, int gtid,
+                                     CPLX128_LEG *loc);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary
+Quad_a16_t __kmpc_atomic_float16_a16_rd(ident_t *id_ref, int gtid,
+                                        Quad_a16_t *loc);
+kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_a16_rd(ident_t *id_ref, int gtid,
+                                                kmp_cmplx128_a16_t *loc);
 #endif
 #endif
 
-
-//
 //  Below routines for atomic WRITE are listed
-//
-
-void __kmpc_atomic_fixed1_wr(  ident_t *id_ref, int gtid, char        * lhs, char        rhs );
-void __kmpc_atomic_fixed2_wr(  ident_t *id_ref, int gtid, short       * lhs, short       rhs );
-void __kmpc_atomic_fixed4_wr(  ident_t *id_ref, int gtid, kmp_int32   * lhs, kmp_int32   rhs );
-void __kmpc_atomic_fixed8_wr(  ident_t *id_ref, int gtid, kmp_int64   * lhs, kmp_int64   rhs );
-void __kmpc_atomic_float4_wr(  ident_t *id_ref, int gtid, kmp_real32  * lhs, kmp_real32  rhs );
-void __kmpc_atomic_float8_wr(  ident_t *id_ref, int gtid, kmp_real64  * lhs, kmp_real64  rhs );
-void __kmpc_atomic_float10_wr( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
+void __kmpc_atomic_fixed1_wr(ident_t *id_ref, int gtid, char *lhs, char rhs);
+void __kmpc_atomic_fixed2_wr(ident_t *id_ref, int gtid, short *lhs, short rhs);
+void __kmpc_atomic_fixed4_wr(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                             kmp_int32 rhs);
+void __kmpc_atomic_fixed8_wr(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                             kmp_int64 rhs);
+void __kmpc_atomic_float4_wr(ident_t *id_ref, int gtid, kmp_real32 *lhs,
+                             kmp_real32 rhs);
+void __kmpc_atomic_float8_wr(ident_t *id_ref, int gtid, kmp_real64 *lhs,
+                             kmp_real64 rhs);
+void __kmpc_atomic_float10_wr(ident_t *id_ref, int gtid, long double *lhs,
+                              long double rhs);
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_float16_wr( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
+void __kmpc_atomic_float16_wr(ident_t *id_ref, int gtid, QUAD_LEGACY *lhs,
+                              QUAD_LEGACY rhs);
 #endif
-void __kmpc_atomic_cmplx4_wr(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-void __kmpc_atomic_cmplx8_wr(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-void __kmpc_atomic_cmplx10_wr( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
+void __kmpc_atomic_cmplx4_wr(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                             kmp_cmplx32 rhs);
+void __kmpc_atomic_cmplx8_wr(ident_t *id_ref, int gtid, kmp_cmplx64 *lhs,
+                             kmp_cmplx64 rhs);
+void __kmpc_atomic_cmplx10_wr(ident_t *id_ref, int gtid, kmp_cmplx80 *lhs,
+                              kmp_cmplx80 rhs);
 #if KMP_HAVE_QUAD
-void __kmpc_atomic_cmplx16_wr( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary
-    void __kmpc_atomic_float16_a16_wr( ident_t * id_ref, int gtid, Quad_a16_t         * lhs, Quad_a16_t         rhs );
-    void __kmpc_atomic_cmplx16_a16_wr( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
+void __kmpc_atomic_cmplx16_wr(ident_t *id_ref, int gtid, CPLX128_LEG *lhs,
+                              CPLX128_LEG rhs);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary
+void __kmpc_atomic_float16_a16_wr(ident_t *id_ref, int gtid, Quad_a16_t *lhs,
+                                  Quad_a16_t rhs);
+void __kmpc_atomic_cmplx16_a16_wr(ident_t *id_ref, int gtid,
+                                  kmp_cmplx128_a16_t *lhs,
+                                  kmp_cmplx128_a16_t rhs);
 #endif
 #endif
 
-//
 //  Below routines for atomic CAPTURE are listed
-//
 
 // 1-byte
-char __kmpc_atomic_fixed1_add_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-char __kmpc_atomic_fixed1_andb_cpt( ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-char __kmpc_atomic_fixed1_div_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-unsigned char __kmpc_atomic_fixed1u_div_cpt( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs, int flag);
-char __kmpc_atomic_fixed1_mul_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-char __kmpc_atomic_fixed1_orb_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-char __kmpc_atomic_fixed1_shl_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-char __kmpc_atomic_fixed1_shr_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-unsigned char __kmpc_atomic_fixed1u_shr_cpt( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs, int flag);
-char __kmpc_atomic_fixed1_sub_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
-char __kmpc_atomic_fixed1_xor_cpt(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag);
+char __kmpc_atomic_fixed1_add_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+char __kmpc_atomic_fixed1_andb_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                   char rhs, int flag);
+char __kmpc_atomic_fixed1_div_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_div_cpt(ident_t *id_ref, int gtid,
+                                            unsigned char *lhs,
+                                            unsigned char rhs, int flag);
+char __kmpc_atomic_fixed1_mul_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+char __kmpc_atomic_fixed1_orb_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+char __kmpc_atomic_fixed1_shl_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+char __kmpc_atomic_fixed1_shr_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_shr_cpt(ident_t *id_ref, int gtid,
+                                            unsigned char *lhs,
+                                            unsigned char rhs, int flag);
+char __kmpc_atomic_fixed1_sub_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+char __kmpc_atomic_fixed1_xor_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
 // 2-byte
-short __kmpc_atomic_fixed2_add_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-short __kmpc_atomic_fixed2_andb_cpt( ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-short __kmpc_atomic_fixed2_div_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-unsigned short __kmpc_atomic_fixed2u_div_cpt( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs, int flag);
-short __kmpc_atomic_fixed2_mul_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-short __kmpc_atomic_fixed2_orb_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-short __kmpc_atomic_fixed2_shl_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-short __kmpc_atomic_fixed2_shr_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-unsigned short __kmpc_atomic_fixed2u_shr_cpt( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs, int flag);
-short __kmpc_atomic_fixed2_sub_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
-short __kmpc_atomic_fixed2_xor_cpt(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag);
+short __kmpc_atomic_fixed2_add_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+short __kmpc_atomic_fixed2_andb_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                    short rhs, int flag);
+short __kmpc_atomic_fixed2_div_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_div_cpt(ident_t *id_ref, int gtid,
+                                             unsigned short *lhs,
+                                             unsigned short rhs, int flag);
+short __kmpc_atomic_fixed2_mul_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+short __kmpc_atomic_fixed2_orb_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+short __kmpc_atomic_fixed2_shl_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+short __kmpc_atomic_fixed2_shr_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_shr_cpt(ident_t *id_ref, int gtid,
+                                             unsigned short *lhs,
+                                             unsigned short rhs, int flag);
+short __kmpc_atomic_fixed2_sub_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+short __kmpc_atomic_fixed2_xor_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
 // 4-byte add / sub fixed
-kmp_int32  __kmpc_atomic_fixed4_add_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32 rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_sub_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32 rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_add_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_sub_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
 // 4-byte add / sub float
-kmp_real32 __kmpc_atomic_float4_add_cpt(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs, int flag);
-kmp_real32 __kmpc_atomic_float4_sub_cpt(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs, int flag);
+kmp_real32 __kmpc_atomic_float4_add_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real32 *lhs, kmp_real32 rhs,
+                                        int flag);
+kmp_real32 __kmpc_atomic_float4_sub_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real32 *lhs, kmp_real32 rhs,
+                                        int flag);
 // 8-byte add / sub fixed
-kmp_int64  __kmpc_atomic_fixed8_add_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64 rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_sub_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_add_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_sub_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
 // 8-byte add / sub float
-kmp_real64 __kmpc_atomic_float8_add_cpt(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs, int flag);
-kmp_real64 __kmpc_atomic_float8_sub_cpt(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs, int flag);
+kmp_real64 __kmpc_atomic_float8_add_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real64 *lhs, kmp_real64 rhs,
+                                        int flag);
+kmp_real64 __kmpc_atomic_float8_sub_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real64 *lhs, kmp_real64 rhs,
+                                        int flag);
 // 4-byte fixed
-kmp_int32  __kmpc_atomic_fixed4_andb_cpt( ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_div_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
-kmp_uint32 __kmpc_atomic_fixed4u_div_cpt( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_mul_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_orb_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_shl_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_shr_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
-kmp_uint32 __kmpc_atomic_fixed4u_shr_cpt( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs, int flag);
-kmp_int32  __kmpc_atomic_fixed4_xor_cpt(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_andb_cpt(ident_t *id_ref, int gtid,
+                                        kmp_int32 *lhs, kmp_int32 rhs,
+                                        int flag);
+kmp_int32 __kmpc_atomic_fixed4_div_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_div_cpt(ident_t *id_ref, int gtid,
+                                         kmp_uint32 *lhs, kmp_uint32 rhs,
+                                         int flag);
+kmp_int32 __kmpc_atomic_fixed4_mul_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_orb_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_shl_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_shr_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_shr_cpt(ident_t *id_ref, int gtid,
+                                         kmp_uint32 *lhs, kmp_uint32 rhs,
+                                         int flag);
+kmp_int32 __kmpc_atomic_fixed4_xor_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
 // 8-byte fixed
-kmp_int64  __kmpc_atomic_fixed8_andb_cpt( ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_div_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
-kmp_uint64 __kmpc_atomic_fixed8u_div_cpt( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_mul_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_orb_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_shl_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_shr_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
-kmp_uint64 __kmpc_atomic_fixed8u_shr_cpt( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs, int flag);
-kmp_int64  __kmpc_atomic_fixed8_xor_cpt(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_andb_cpt(ident_t *id_ref, int gtid,
+                                        kmp_int64 *lhs, kmp_int64 rhs,
+                                        int flag);
+kmp_int64 __kmpc_atomic_fixed8_div_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_div_cpt(ident_t *id_ref, int gtid,
+                                         kmp_uint64 *lhs, kmp_uint64 rhs,
+                                         int flag);
+kmp_int64 __kmpc_atomic_fixed8_mul_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_orb_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_shl_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_shr_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_shr_cpt(ident_t *id_ref, int gtid,
+                                         kmp_uint64 *lhs, kmp_uint64 rhs,
+                                         int flag);
+kmp_int64 __kmpc_atomic_fixed8_xor_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
 // 4-byte float
-kmp_real32 __kmpc_atomic_float4_div_cpt(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs, int flag);
-kmp_real32 __kmpc_atomic_float4_mul_cpt(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs, int flag);
+kmp_real32 __kmpc_atomic_float4_div_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real32 *lhs, kmp_real32 rhs,
+                                        int flag);
+kmp_real32 __kmpc_atomic_float4_mul_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real32 *lhs, kmp_real32 rhs,
+                                        int flag);
 // 8-byte float
-kmp_real64 __kmpc_atomic_float8_div_cpt(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs, int flag);
-kmp_real64 __kmpc_atomic_float8_mul_cpt(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs, int flag);
+kmp_real64 __kmpc_atomic_float8_div_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real64 *lhs, kmp_real64 rhs,
+                                        int flag);
+kmp_real64 __kmpc_atomic_float8_mul_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real64 *lhs, kmp_real64 rhs,
+                                        int flag);
 // 1-, 2-, 4-, 8-byte logical (&&, ||)
-char      __kmpc_atomic_fixed1_andl_cpt( ident_t *id_ref, int gtid, char      * lhs, char      rhs, int flag);
-char      __kmpc_atomic_fixed1_orl_cpt(  ident_t *id_ref, int gtid, char      * lhs, char      rhs, int flag);
-short     __kmpc_atomic_fixed2_andl_cpt( ident_t *id_ref, int gtid, short     * lhs, short     rhs, int flag);
-short     __kmpc_atomic_fixed2_orl_cpt(  ident_t *id_ref, int gtid, short     * lhs, short     rhs, int flag);
-kmp_int32 __kmpc_atomic_fixed4_andl_cpt( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs, int flag);
-kmp_int32 __kmpc_atomic_fixed4_orl_cpt(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs, int flag);
-kmp_int64 __kmpc_atomic_fixed8_andl_cpt( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs, int flag);
-kmp_int64 __kmpc_atomic_fixed8_orl_cpt(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs, int flag);
+char __kmpc_atomic_fixed1_andl_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                   char rhs, int flag);
+char __kmpc_atomic_fixed1_orl_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+short __kmpc_atomic_fixed2_andl_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                    short rhs, int flag);
+short __kmpc_atomic_fixed2_orl_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_andl_cpt(ident_t *id_ref, int gtid,
+                                        kmp_int32 *lhs, kmp_int32 rhs,
+                                        int flag);
+kmp_int32 __kmpc_atomic_fixed4_orl_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_andl_cpt(ident_t *id_ref, int gtid,
+                                        kmp_int64 *lhs, kmp_int64 rhs,
+                                        int flag);
+kmp_int64 __kmpc_atomic_fixed8_orl_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
 // MIN / MAX
-char        __kmpc_atomic_fixed1_max_cpt(  ident_t *id_ref, int gtid, char      * lhs, char      rhs, int flag);
-char        __kmpc_atomic_fixed1_min_cpt(  ident_t *id_ref, int gtid, char      * lhs, char      rhs, int flag);
-short       __kmpc_atomic_fixed2_max_cpt(  ident_t *id_ref, int gtid, short     * lhs, short     rhs, int flag);
-short       __kmpc_atomic_fixed2_min_cpt(  ident_t *id_ref, int gtid, short     * lhs, short     rhs, int flag);
-kmp_int32   __kmpc_atomic_fixed4_max_cpt(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs, int flag);
-kmp_int32   __kmpc_atomic_fixed4_min_cpt(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs, int flag);
-kmp_int64   __kmpc_atomic_fixed8_max_cpt(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs, int flag);
-kmp_int64   __kmpc_atomic_fixed8_min_cpt(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs, int flag);
-kmp_real32  __kmpc_atomic_float4_max_cpt(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs, int flag);
-kmp_real32  __kmpc_atomic_float4_min_cpt(  ident_t *id_ref, int gtid, kmp_real32 * lhs, kmp_real32 rhs, int flag);
-kmp_real64  __kmpc_atomic_float8_max_cpt(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs, int flag);
-kmp_real64  __kmpc_atomic_float8_min_cpt(  ident_t *id_ref, int gtid, kmp_real64 * lhs, kmp_real64 rhs, int flag);
+char __kmpc_atomic_fixed1_max_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+char __kmpc_atomic_fixed1_min_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+short __kmpc_atomic_fixed2_max_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+short __kmpc_atomic_fixed2_min_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_max_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_min_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_max_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_min_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
+kmp_real32 __kmpc_atomic_float4_max_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real32 *lhs, kmp_real32 rhs,
+                                        int flag);
+kmp_real32 __kmpc_atomic_float4_min_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real32 *lhs, kmp_real32 rhs,
+                                        int flag);
+kmp_real64 __kmpc_atomic_float8_max_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real64 *lhs, kmp_real64 rhs,
+                                        int flag);
+kmp_real64 __kmpc_atomic_float8_min_cpt(ident_t *id_ref, int gtid,
+                                        kmp_real64 *lhs, kmp_real64 rhs,
+                                        int flag);
 #if KMP_HAVE_QUAD
-QUAD_LEGACY __kmpc_atomic_float16_max_cpt( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag);
-QUAD_LEGACY __kmpc_atomic_float16_min_cpt( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag);
+QUAD_LEGACY __kmpc_atomic_float16_max_cpt(ident_t *id_ref, int gtid,
+                                          QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                          int flag);
+QUAD_LEGACY __kmpc_atomic_float16_min_cpt(ident_t *id_ref, int gtid,
+                                          QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                          int flag);
 #endif
 // .NEQV. (same as xor)
-char      __kmpc_atomic_fixed1_neqv_cpt( ident_t *id_ref, int gtid, char      * lhs, char      rhs, int flag);
-short     __kmpc_atomic_fixed2_neqv_cpt( ident_t *id_ref, int gtid, short     * lhs, short     rhs, int flag);
-kmp_int32 __kmpc_atomic_fixed4_neqv_cpt( ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs, int flag);
-kmp_int64 __kmpc_atomic_fixed8_neqv_cpt( ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs, int flag);
+char __kmpc_atomic_fixed1_neqv_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                   char rhs, int flag);
+short __kmpc_atomic_fixed2_neqv_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                    short rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_neqv_cpt(ident_t *id_ref, int gtid,
+                                        kmp_int32 *lhs, kmp_int32 rhs,
+                                        int flag);
+kmp_int64 __kmpc_atomic_fixed8_neqv_cpt(ident_t *id_ref, int gtid,
+                                        kmp_int64 *lhs, kmp_int64 rhs,
+                                        int flag);
 // .EQV. (same as ~xor)
-char      __kmpc_atomic_fixed1_eqv_cpt(  ident_t *id_ref, int gtid, char      * lhs, char      rhs, int flag);
-short     __kmpc_atomic_fixed2_eqv_cpt(  ident_t *id_ref, int gtid, short     * lhs, short     rhs, int flag);
-kmp_int32 __kmpc_atomic_fixed4_eqv_cpt(  ident_t *id_ref, int gtid, kmp_int32 * lhs, kmp_int32 rhs, int flag);
-kmp_int64 __kmpc_atomic_fixed8_eqv_cpt(  ident_t *id_ref, int gtid, kmp_int64 * lhs, kmp_int64 rhs, int flag);
+char __kmpc_atomic_fixed1_eqv_cpt(ident_t *id_ref, int gtid, char *lhs,
+                                  char rhs, int flag);
+short __kmpc_atomic_fixed2_eqv_cpt(ident_t *id_ref, int gtid, short *lhs,
+                                   short rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_eqv_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int32 *lhs, kmp_int32 rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_eqv_cpt(ident_t *id_ref, int gtid,
+                                       kmp_int64 *lhs, kmp_int64 rhs, int flag);
 // long double type
-long double __kmpc_atomic_float10_add_cpt( ident_t *id_ref, int gtid, long double * lhs, long double rhs, int flag);
-long double __kmpc_atomic_float10_sub_cpt( ident_t *id_ref, int gtid, long double * lhs, long double rhs, int flag);
-long double __kmpc_atomic_float10_mul_cpt( ident_t *id_ref, int gtid, long double * lhs, long double rhs, int flag);
-long double __kmpc_atomic_float10_div_cpt( ident_t *id_ref, int gtid, long double * lhs, long double rhs, int flag);
+long double __kmpc_atomic_float10_add_cpt(ident_t *id_ref, int gtid,
+                                          long double *lhs, long double rhs,
+                                          int flag);
+long double __kmpc_atomic_float10_sub_cpt(ident_t *id_ref, int gtid,
+                                          long double *lhs, long double rhs,
+                                          int flag);
+long double __kmpc_atomic_float10_mul_cpt(ident_t *id_ref, int gtid,
+                                          long double *lhs, long double rhs,
+                                          int flag);
+long double __kmpc_atomic_float10_div_cpt(ident_t *id_ref, int gtid,
+                                          long double *lhs, long double rhs,
+                                          int flag);
 #if KMP_HAVE_QUAD
 // _Quad type
-QUAD_LEGACY __kmpc_atomic_float16_add_cpt( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag);
-QUAD_LEGACY __kmpc_atomic_float16_sub_cpt( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag);
-QUAD_LEGACY __kmpc_atomic_float16_mul_cpt( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag);
-QUAD_LEGACY __kmpc_atomic_float16_div_cpt( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag);
+QUAD_LEGACY __kmpc_atomic_float16_add_cpt(ident_t *id_ref, int gtid,
+                                          QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                          int flag);
+QUAD_LEGACY __kmpc_atomic_float16_sub_cpt(ident_t *id_ref, int gtid,
+                                          QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                          int flag);
+QUAD_LEGACY __kmpc_atomic_float16_mul_cpt(ident_t *id_ref, int gtid,
+                                          QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                          int flag);
+QUAD_LEGACY __kmpc_atomic_float16_div_cpt(ident_t *id_ref, int gtid,
+                                          QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                          int flag);
 #endif
 // routines for complex types
-// Workaround for cmplx4 routines - return void; captured value is returned via the argument
-void __kmpc_atomic_cmplx4_add_cpt(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag);
-void __kmpc_atomic_cmplx4_sub_cpt(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag);
-void __kmpc_atomic_cmplx4_mul_cpt(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag);
-void __kmpc_atomic_cmplx4_div_cpt(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag);
-
-kmp_cmplx64 __kmpc_atomic_cmplx8_add_cpt(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs, int flag);
-kmp_cmplx64 __kmpc_atomic_cmplx8_sub_cpt(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs, int flag);
-kmp_cmplx64 __kmpc_atomic_cmplx8_mul_cpt(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs, int flag);
-kmp_cmplx64 __kmpc_atomic_cmplx8_div_cpt(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs, int flag);
-kmp_cmplx80 __kmpc_atomic_cmplx10_add_cpt( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs, int flag);
-kmp_cmplx80 __kmpc_atomic_cmplx10_sub_cpt( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs, int flag);
-kmp_cmplx80 __kmpc_atomic_cmplx10_mul_cpt( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs, int flag);
-kmp_cmplx80 __kmpc_atomic_cmplx10_div_cpt( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs, int flag);
+// Workaround for cmplx4 routines - return void; captured value is returned via
+// the argument
+void __kmpc_atomic_cmplx4_add_cpt(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                                  kmp_cmplx32 rhs, kmp_cmplx32 *out, int flag);
+void __kmpc_atomic_cmplx4_sub_cpt(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                                  kmp_cmplx32 rhs, kmp_cmplx32 *out, int flag);
+void __kmpc_atomic_cmplx4_mul_cpt(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                                  kmp_cmplx32 rhs, kmp_cmplx32 *out, int flag);
+void __kmpc_atomic_cmplx4_div_cpt(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                                  kmp_cmplx32 rhs, kmp_cmplx32 *out, int flag);
+
+kmp_cmplx64 __kmpc_atomic_cmplx8_add_cpt(ident_t *id_ref, int gtid,
+                                         kmp_cmplx64 *lhs, kmp_cmplx64 rhs,
+                                         int flag);
+kmp_cmplx64 __kmpc_atomic_cmplx8_sub_cpt(ident_t *id_ref, int gtid,
+                                         kmp_cmplx64 *lhs, kmp_cmplx64 rhs,
+                                         int flag);
+kmp_cmplx64 __kmpc_atomic_cmplx8_mul_cpt(ident_t *id_ref, int gtid,
+                                         kmp_cmplx64 *lhs, kmp_cmplx64 rhs,
+                                         int flag);
+kmp_cmplx64 __kmpc_atomic_cmplx8_div_cpt(ident_t *id_ref, int gtid,
+                                         kmp_cmplx64 *lhs, kmp_cmplx64 rhs,
+                                         int flag);
+kmp_cmplx80 __kmpc_atomic_cmplx10_add_cpt(ident_t *id_ref, int gtid,
+                                          kmp_cmplx80 *lhs, kmp_cmplx80 rhs,
+                                          int flag);
+kmp_cmplx80 __kmpc_atomic_cmplx10_sub_cpt(ident_t *id_ref, int gtid,
+                                          kmp_cmplx80 *lhs, kmp_cmplx80 rhs,
+                                          int flag);
+kmp_cmplx80 __kmpc_atomic_cmplx10_mul_cpt(ident_t *id_ref, int gtid,
+                                          kmp_cmplx80 *lhs, kmp_cmplx80 rhs,
+                                          int flag);
+kmp_cmplx80 __kmpc_atomic_cmplx10_div_cpt(ident_t *id_ref, int gtid,
+                                          kmp_cmplx80 *lhs, kmp_cmplx80 rhs,
+                                          int flag);
 #if KMP_HAVE_QUAD
-CPLX128_LEG __kmpc_atomic_cmplx16_add_cpt( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs, int flag);
-CPLX128_LEG __kmpc_atomic_cmplx16_sub_cpt( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs, int flag);
-CPLX128_LEG __kmpc_atomic_cmplx16_mul_cpt( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs, int flag);
-CPLX128_LEG __kmpc_atomic_cmplx16_div_cpt( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs, int flag);
-#if ( KMP_ARCH_X86 )
-    // Routines with 16-byte arguments aligned to 16-byte boundary
-    Quad_a16_t __kmpc_atomic_float16_add_a16_cpt( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag);
-    Quad_a16_t __kmpc_atomic_float16_sub_a16_cpt( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag);
-    Quad_a16_t __kmpc_atomic_float16_mul_a16_cpt( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag);
-    Quad_a16_t __kmpc_atomic_float16_div_a16_cpt( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag);
-    Quad_a16_t __kmpc_atomic_float16_max_a16_cpt( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag);
-    Quad_a16_t __kmpc_atomic_float16_min_a16_cpt( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag);
-    kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_add_a16_cpt( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs, int flag);
-    kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_sub_a16_cpt( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs, int flag);
-    kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_mul_a16_cpt( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs, int flag);
-    kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_div_a16_cpt( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs, int flag);
+CPLX128_LEG __kmpc_atomic_cmplx16_add_cpt(ident_t *id_ref, int gtid,
+                                          CPLX128_LEG *lhs, CPLX128_LEG rhs,
+                                          int flag);
+CPLX128_LEG __kmpc_atomic_cmplx16_sub_cpt(ident_t *id_ref, int gtid,
+                                          CPLX128_LEG *lhs, CPLX128_LEG rhs,
+                                          int flag);
+CPLX128_LEG __kmpc_atomic_cmplx16_mul_cpt(ident_t *id_ref, int gtid,
+                                          CPLX128_LEG *lhs, CPLX128_LEG rhs,
+                                          int flag);
+CPLX128_LEG __kmpc_atomic_cmplx16_div_cpt(ident_t *id_ref, int gtid,
+                                          CPLX128_LEG *lhs, CPLX128_LEG rhs,
+                                          int flag);
+#if (KMP_ARCH_X86)
+// Routines with 16-byte arguments aligned to 16-byte boundary
+Quad_a16_t __kmpc_atomic_float16_add_a16_cpt(ident_t *id_ref, int gtid,
+                                             Quad_a16_t *lhs, Quad_a16_t rhs,
+                                             int flag);
+Quad_a16_t __kmpc_atomic_float16_sub_a16_cpt(ident_t *id_ref, int gtid,
+                                             Quad_a16_t *lhs, Quad_a16_t rhs,
+                                             int flag);
+Quad_a16_t __kmpc_atomic_float16_mul_a16_cpt(ident_t *id_ref, int gtid,
+                                             Quad_a16_t *lhs, Quad_a16_t rhs,
+                                             int flag);
+Quad_a16_t __kmpc_atomic_float16_div_a16_cpt(ident_t *id_ref, int gtid,
+                                             Quad_a16_t *lhs, Quad_a16_t rhs,
+                                             int flag);
+Quad_a16_t __kmpc_atomic_float16_max_a16_cpt(ident_t *id_ref, int gtid,
+                                             Quad_a16_t *lhs, Quad_a16_t rhs,
+                                             int flag);
+Quad_a16_t __kmpc_atomic_float16_min_a16_cpt(ident_t *id_ref, int gtid,
+                                             Quad_a16_t *lhs, Quad_a16_t rhs,
+                                             int flag);
+kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_add_a16_cpt(ident_t *id_ref, int gtid,
+                                                     kmp_cmplx128_a16_t *lhs,
+                                                     kmp_cmplx128_a16_t rhs,
+                                                     int flag);
+kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_sub_a16_cpt(ident_t *id_ref, int gtid,
+                                                     kmp_cmplx128_a16_t *lhs,
+                                                     kmp_cmplx128_a16_t rhs,
+                                                     int flag);
+kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_mul_a16_cpt(ident_t *id_ref, int gtid,
+                                                     kmp_cmplx128_a16_t *lhs,
+                                                     kmp_cmplx128_a16_t rhs,
+                                                     int flag);
+kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_div_a16_cpt(ident_t *id_ref, int gtid,
+                                                     kmp_cmplx128_a16_t *lhs,
+                                                     kmp_cmplx128_a16_t rhs,
+                                                     int flag);
 #endif
 #endif
 
@@ -985,175 +1407,369 @@ void __kmpc_atomic_end(void);
 
 #if OMP_40_ENABLED
 
-// OpenMP 4.0: v = x = expr binop x; { v = x; x = expr binop x; } { x = expr binop x; v = x; }  for non-commutative operations.
+// OpenMP 4.0: v = x = expr binop x; { v = x; x = expr binop x; } { x = expr
+// binop x; v = x; }  for non-commutative operations.
 
-char	       	__kmpc_atomic_fixed1_sub_cpt_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag );
-char		__kmpc_atomic_fixed1_div_cpt_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag );
-unsigned char 	__kmpc_atomic_fixed1u_div_cpt_rev( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs, int flag );
-char 		__kmpc_atomic_fixed1_shl_cpt_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs , int flag);
-char		__kmpc_atomic_fixed1_shr_cpt_rev(  ident_t *id_ref, int gtid, char * lhs, char rhs, int flag );
-unsigned char 	__kmpc_atomic_fixed1u_shr_cpt_rev( ident_t *id_ref, int gtid, unsigned char * lhs, unsigned char rhs, int flag );
-short 		__kmpc_atomic_fixed2_sub_cpt_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag );
-short 		__kmpc_atomic_fixed2_div_cpt_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag );
-unsigned short 	__kmpc_atomic_fixed2u_div_cpt_rev( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs, int flag );
-short 		__kmpc_atomic_fixed2_shl_cpt_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag );
-short 		__kmpc_atomic_fixed2_shr_cpt_rev(  ident_t *id_ref, int gtid, short * lhs, short rhs, int flag );
-unsigned short 	__kmpc_atomic_fixed2u_shr_cpt_rev( ident_t *id_ref, int gtid, unsigned short * lhs, unsigned short rhs, int flag );
-kmp_int32 	__kmpc_atomic_fixed4_sub_cpt_rev(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag );
-kmp_int32 	__kmpc_atomic_fixed4_div_cpt_rev(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag );
-kmp_uint32 	__kmpc_atomic_fixed4u_div_cpt_rev( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs, int flag );
-kmp_int32 	__kmpc_atomic_fixed4_shl_cpt_rev(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag );
-kmp_int32 	__kmpc_atomic_fixed4_shr_cpt_rev(  ident_t *id_ref, int gtid, kmp_int32  * lhs, kmp_int32  rhs, int flag );
-kmp_uint32 	__kmpc_atomic_fixed4u_shr_cpt_rev( ident_t *id_ref, int gtid, kmp_uint32 * lhs, kmp_uint32 rhs, int flag );
-kmp_int64 	__kmpc_atomic_fixed8_sub_cpt_rev(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag );
-kmp_int64 	__kmpc_atomic_fixed8_div_cpt_rev(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag );
-kmp_uint64      __kmpc_atomic_fixed8u_div_cpt_rev( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs, int flag );
-kmp_int64 	__kmpc_atomic_fixed8_shl_cpt_rev(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag );
-kmp_int64 	__kmpc_atomic_fixed8_shr_cpt_rev(  ident_t *id_ref, int gtid, kmp_int64  * lhs, kmp_int64  rhs, int flag );
-kmp_uint64      __kmpc_atomic_fixed8u_shr_cpt_rev( ident_t *id_ref, int gtid, kmp_uint64 * lhs, kmp_uint64 rhs, int flag );
-float 		__kmpc_atomic_float4_sub_cpt_rev(  ident_t *id_ref, int gtid, float * lhs, float rhs, int flag );
-float 		__kmpc_atomic_float4_div_cpt_rev(  ident_t *id_ref, int gtid, float * lhs, float rhs, int flag );
-double 		__kmpc_atomic_float8_sub_cpt_rev(  ident_t *id_ref, int gtid, double * lhs, double rhs, int flag );
-double 		__kmpc_atomic_float8_div_cpt_rev(  ident_t *id_ref, int gtid, double * lhs, double rhs, int flag );
-long double 	__kmpc_atomic_float10_sub_cpt_rev( ident_t *id_ref, int gtid, long double * lhs, long double rhs, int flag );
-long double 	__kmpc_atomic_float10_div_cpt_rev( ident_t *id_ref, int gtid, long double * lhs, long double rhs, int flag );
+char __kmpc_atomic_fixed1_sub_cpt_rev(ident_t *id_ref, int gtid, char *lhs,
+                                      char rhs, int flag);
+char __kmpc_atomic_fixed1_div_cpt_rev(ident_t *id_ref, int gtid, char *lhs,
+                                      char rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_div_cpt_rev(ident_t *id_ref, int gtid,
+                                                unsigned char *lhs,
+                                                unsigned char rhs, int flag);
+char __kmpc_atomic_fixed1_shl_cpt_rev(ident_t *id_ref, int gtid, char *lhs,
+                                      char rhs, int flag);
+char __kmpc_atomic_fixed1_shr_cpt_rev(ident_t *id_ref, int gtid, char *lhs,
+                                      char rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_shr_cpt_rev(ident_t *id_ref, int gtid,
+                                                unsigned char *lhs,
+                                                unsigned char rhs, int flag);
+short __kmpc_atomic_fixed2_sub_cpt_rev(ident_t *id_ref, int gtid, short *lhs,
+                                       short rhs, int flag);
+short __kmpc_atomic_fixed2_div_cpt_rev(ident_t *id_ref, int gtid, short *lhs,
+                                       short rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_div_cpt_rev(ident_t *id_ref, int gtid,
+                                                 unsigned short *lhs,
+                                                 unsigned short rhs, int flag);
+short __kmpc_atomic_fixed2_shl_cpt_rev(ident_t *id_ref, int gtid, short *lhs,
+                                       short rhs, int flag);
+short __kmpc_atomic_fixed2_shr_cpt_rev(ident_t *id_ref, int gtid, short *lhs,
+                                       short rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_shr_cpt_rev(ident_t *id_ref, int gtid,
+                                                 unsigned short *lhs,
+                                                 unsigned short rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int32 *lhs, kmp_int32 rhs,
+                                           int flag);
+kmp_int32 __kmpc_atomic_fixed4_div_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int32 *lhs, kmp_int32 rhs,
+                                           int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_div_cpt_rev(ident_t *id_ref, int gtid,
+                                             kmp_uint32 *lhs, kmp_uint32 rhs,
+                                             int flag);
+kmp_int32 __kmpc_atomic_fixed4_shl_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int32 *lhs, kmp_int32 rhs,
+                                           int flag);
+kmp_int32 __kmpc_atomic_fixed4_shr_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int32 *lhs, kmp_int32 rhs,
+                                           int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_shr_cpt_rev(ident_t *id_ref, int gtid,
+                                             kmp_uint32 *lhs, kmp_uint32 rhs,
+                                             int flag);
+kmp_int64 __kmpc_atomic_fixed8_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int64 *lhs, kmp_int64 rhs,
+                                           int flag);
+kmp_int64 __kmpc_atomic_fixed8_div_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int64 *lhs, kmp_int64 rhs,
+                                           int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_div_cpt_rev(ident_t *id_ref, int gtid,
+                                             kmp_uint64 *lhs, kmp_uint64 rhs,
+                                             int flag);
+kmp_int64 __kmpc_atomic_fixed8_shl_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int64 *lhs, kmp_int64 rhs,
+                                           int flag);
+kmp_int64 __kmpc_atomic_fixed8_shr_cpt_rev(ident_t *id_ref, int gtid,
+                                           kmp_int64 *lhs, kmp_int64 rhs,
+                                           int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_shr_cpt_rev(ident_t *id_ref, int gtid,
+                                             kmp_uint64 *lhs, kmp_uint64 rhs,
+                                             int flag);
+float __kmpc_atomic_float4_sub_cpt_rev(ident_t *id_ref, int gtid, float *lhs,
+                                       float rhs, int flag);
+float __kmpc_atomic_float4_div_cpt_rev(ident_t *id_ref, int gtid, float *lhs,
+                                       float rhs, int flag);
+double __kmpc_atomic_float8_sub_cpt_rev(ident_t *id_ref, int gtid, double *lhs,
+                                        double rhs, int flag);
+double __kmpc_atomic_float8_div_cpt_rev(ident_t *id_ref, int gtid, double *lhs,
+                                        double rhs, int flag);
+long double __kmpc_atomic_float10_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                              long double *lhs, long double rhs,
+                                              int flag);
+long double __kmpc_atomic_float10_div_cpt_rev(ident_t *id_ref, int gtid,
+                                              long double *lhs, long double rhs,
+                                              int flag);
 #if KMP_HAVE_QUAD
-QUAD_LEGACY	__kmpc_atomic_float16_sub_cpt_rev( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag );
-QUAD_LEGACY	__kmpc_atomic_float16_div_cpt_rev( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs, int flag );
-#endif
-// Workaround for cmplx4 routines - return void; captured value is returned via the argument
-void     	__kmpc_atomic_cmplx4_sub_cpt_rev(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag );
-void 	        __kmpc_atomic_cmplx4_div_cpt_rev(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out, int flag );
-kmp_cmplx64 	__kmpc_atomic_cmplx8_sub_cpt_rev(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs, int flag );
-kmp_cmplx64 	__kmpc_atomic_cmplx8_div_cpt_rev(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs, int flag );
-kmp_cmplx80 	__kmpc_atomic_cmplx10_sub_cpt_rev( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs, int flag );
-kmp_cmplx80 	__kmpc_atomic_cmplx10_div_cpt_rev( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs, int flag );
+QUAD_LEGACY __kmpc_atomic_float16_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                              QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                              int flag);
+QUAD_LEGACY __kmpc_atomic_float16_div_cpt_rev(ident_t *id_ref, int gtid,
+                                              QUAD_LEGACY *lhs, QUAD_LEGACY rhs,
+                                              int flag);
+#endif
+// Workaround for cmplx4 routines - return void; captured value is returned via
+// the argument
+void __kmpc_atomic_cmplx4_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                      kmp_cmplx32 *lhs, kmp_cmplx32 rhs,
+                                      kmp_cmplx32 *out, int flag);
+void __kmpc_atomic_cmplx4_div_cpt_rev(ident_t *id_ref, int gtid,
+                                      kmp_cmplx32 *lhs, kmp_cmplx32 rhs,
+                                      kmp_cmplx32 *out, int flag);
+kmp_cmplx64 __kmpc_atomic_cmplx8_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                             kmp_cmplx64 *lhs, kmp_cmplx64 rhs,
+                                             int flag);
+kmp_cmplx64 __kmpc_atomic_cmplx8_div_cpt_rev(ident_t *id_ref, int gtid,
+                                             kmp_cmplx64 *lhs, kmp_cmplx64 rhs,
+                                             int flag);
+kmp_cmplx80 __kmpc_atomic_cmplx10_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                              kmp_cmplx80 *lhs, kmp_cmplx80 rhs,
+                                              int flag);
+kmp_cmplx80 __kmpc_atomic_cmplx10_div_cpt_rev(ident_t *id_ref, int gtid,
+                                              kmp_cmplx80 *lhs, kmp_cmplx80 rhs,
+                                              int flag);
 #if KMP_HAVE_QUAD
-CPLX128_LEG  	__kmpc_atomic_cmplx16_sub_cpt_rev( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs, int flag );
-CPLX128_LEG  	__kmpc_atomic_cmplx16_div_cpt_rev( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs, int flag );
-#if ( KMP_ARCH_X86 )
-    Quad_a16_t 		__kmpc_atomic_float16_sub_a16_cpt_rev( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag );
-    Quad_a16_t		__kmpc_atomic_float16_div_a16_cpt_rev( ident_t * id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs, int flag );
-    kmp_cmplx128_a16_t 	__kmpc_atomic_cmplx16_sub_a16_cpt_rev( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs, int flag );
-    kmp_cmplx128_a16_t 	__kmpc_atomic_cmplx16_div_a16_cpt_rev( ident_t * id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs, int flag );
+CPLX128_LEG __kmpc_atomic_cmplx16_sub_cpt_rev(ident_t *id_ref, int gtid,
+                                              CPLX128_LEG *lhs, CPLX128_LEG rhs,
+                                              int flag);
+CPLX128_LEG __kmpc_atomic_cmplx16_div_cpt_rev(ident_t *id_ref, int gtid,
+                                              CPLX128_LEG *lhs, CPLX128_LEG rhs,
+                                              int flag);
+#if (KMP_ARCH_X86)
+Quad_a16_t __kmpc_atomic_float16_sub_a16_cpt_rev(ident_t *id_ref, int gtid,
+                                                 Quad_a16_t *lhs,
+                                                 Quad_a16_t rhs, int flag);
+Quad_a16_t __kmpc_atomic_float16_div_a16_cpt_rev(ident_t *id_ref, int gtid,
+                                                 Quad_a16_t *lhs,
+                                                 Quad_a16_t rhs, int flag);
+kmp_cmplx128_a16_t
+__kmpc_atomic_cmplx16_sub_a16_cpt_rev(ident_t *id_ref, int gtid,
+                                      kmp_cmplx128_a16_t *lhs,
+                                      kmp_cmplx128_a16_t rhs, int flag);
+kmp_cmplx128_a16_t
+__kmpc_atomic_cmplx16_div_a16_cpt_rev(ident_t *id_ref, int gtid,
+                                      kmp_cmplx128_a16_t *lhs,
+                                      kmp_cmplx128_a16_t rhs, int flag);
 #endif
 #endif
 
 //   OpenMP 4.0 Capture-write (swap): {v = x; x = expr;}
-char 		__kmpc_atomic_fixed1_swp(  ident_t *id_ref, int gtid, char        * lhs, char        rhs );
-short           __kmpc_atomic_fixed2_swp(  ident_t *id_ref, int gtid, short       * lhs, short       rhs );
-kmp_int32       __kmpc_atomic_fixed4_swp(  ident_t *id_ref, int gtid, kmp_int32   * lhs, kmp_int32   rhs );
-kmp_int64 	__kmpc_atomic_fixed8_swp(  ident_t *id_ref, int gtid, kmp_int64   * lhs, kmp_int64   rhs );
-float 		__kmpc_atomic_float4_swp(  ident_t *id_ref, int gtid, float       * lhs, float  rhs );
-double		__kmpc_atomic_float8_swp(  ident_t *id_ref, int gtid, double      * lhs, double  rhs );
-long double	__kmpc_atomic_float10_swp( ident_t *id_ref, int gtid, long double * lhs, long double rhs );
+char __kmpc_atomic_fixed1_swp(ident_t *id_ref, int gtid, char *lhs, char rhs);
+short __kmpc_atomic_fixed2_swp(ident_t *id_ref, int gtid, short *lhs,
+                               short rhs);
+kmp_int32 __kmpc_atomic_fixed4_swp(ident_t *id_ref, int gtid, kmp_int32 *lhs,
+                                   kmp_int32 rhs);
+kmp_int64 __kmpc_atomic_fixed8_swp(ident_t *id_ref, int gtid, kmp_int64 *lhs,
+                                   kmp_int64 rhs);
+float __kmpc_atomic_float4_swp(ident_t *id_ref, int gtid, float *lhs,
+                               float rhs);
+double __kmpc_atomic_float8_swp(ident_t *id_ref, int gtid, double *lhs,
+                                double rhs);
+long double __kmpc_atomic_float10_swp(ident_t *id_ref, int gtid,
+                                      long double *lhs, long double rhs);
 #if KMP_HAVE_QUAD
-QUAD_LEGACY    	__kmpc_atomic_float16_swp( ident_t *id_ref, int gtid, QUAD_LEGACY * lhs, QUAD_LEGACY rhs );
+QUAD_LEGACY __kmpc_atomic_float16_swp(ident_t *id_ref, int gtid,
+                                      QUAD_LEGACY *lhs, QUAD_LEGACY rhs);
 #endif
 // !!! TODO: check if we need a workaround here
-void        	__kmpc_atomic_cmplx4_swp(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs, kmp_cmplx32 * out );
-//kmp_cmplx32   	__kmpc_atomic_cmplx4_swp(  ident_t *id_ref, int gtid, kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
-
-kmp_cmplx64 	__kmpc_atomic_cmplx8_swp(  ident_t *id_ref, int gtid, kmp_cmplx64 * lhs, kmp_cmplx64 rhs );
-kmp_cmplx80	__kmpc_atomic_cmplx10_swp( ident_t *id_ref, int gtid, kmp_cmplx80 * lhs, kmp_cmplx80 rhs );
+void __kmpc_atomic_cmplx4_swp(ident_t *id_ref, int gtid, kmp_cmplx32 *lhs,
+                              kmp_cmplx32 rhs, kmp_cmplx32 *out);
+// kmp_cmplx32   	__kmpc_atomic_cmplx4_swp(  ident_t *id_ref, int gtid,
+// kmp_cmplx32 * lhs, kmp_cmplx32 rhs );
+
+kmp_cmplx64 __kmpc_atomic_cmplx8_swp(ident_t *id_ref, int gtid,
+                                     kmp_cmplx64 *lhs, kmp_cmplx64 rhs);
+kmp_cmplx80 __kmpc_atomic_cmplx10_swp(ident_t *id_ref, int gtid,
+                                      kmp_cmplx80 *lhs, kmp_cmplx80 rhs);
 #if KMP_HAVE_QUAD
-CPLX128_LEG 	__kmpc_atomic_cmplx16_swp( ident_t *id_ref, int gtid, CPLX128_LEG * lhs, CPLX128_LEG rhs );
-#if ( KMP_ARCH_X86 )
-    Quad_a16_t		__kmpc_atomic_float16_a16_swp( ident_t *id_ref, int gtid, Quad_a16_t * lhs, Quad_a16_t rhs );
-    kmp_cmplx128_a16_t  __kmpc_atomic_cmplx16_a16_swp( ident_t *id_ref, int gtid, kmp_cmplx128_a16_t * lhs, kmp_cmplx128_a16_t rhs );
+CPLX128_LEG __kmpc_atomic_cmplx16_swp(ident_t *id_ref, int gtid,
+                                      CPLX128_LEG *lhs, CPLX128_LEG rhs);
+#if (KMP_ARCH_X86)
+Quad_a16_t __kmpc_atomic_float16_a16_swp(ident_t *id_ref, int gtid,
+                                         Quad_a16_t *lhs, Quad_a16_t rhs);
+kmp_cmplx128_a16_t __kmpc_atomic_cmplx16_a16_swp(ident_t *id_ref, int gtid,
+                                                 kmp_cmplx128_a16_t *lhs,
+                                                 kmp_cmplx128_a16_t rhs);
 #endif
 #endif
 
 // Capture routines for mixed types (RHS=float16)
 #if KMP_HAVE_QUAD
 
-char __kmpc_atomic_fixed1_add_cpt_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs, int flag );
-char __kmpc_atomic_fixed1_sub_cpt_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs, int flag );
-char __kmpc_atomic_fixed1_mul_cpt_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs, int flag );
-char __kmpc_atomic_fixed1_div_cpt_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs, int flag );
-unsigned char  __kmpc_atomic_fixed1u_add_cpt_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs, int flag );
-unsigned char __kmpc_atomic_fixed1u_sub_cpt_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs, int flag );
-unsigned char __kmpc_atomic_fixed1u_mul_cpt_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs, int flag );
-unsigned char __kmpc_atomic_fixed1u_div_cpt_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs, int flag );
-
-short __kmpc_atomic_fixed2_add_cpt_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs, int flag );
-short __kmpc_atomic_fixed2_sub_cpt_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs, int flag );
-short __kmpc_atomic_fixed2_mul_cpt_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs, int flag );
-short __kmpc_atomic_fixed2_div_cpt_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs, int flag );
-unsigned short __kmpc_atomic_fixed2u_add_cpt_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs, int flag );
-unsigned short __kmpc_atomic_fixed2u_sub_cpt_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs, int flag );
-unsigned short __kmpc_atomic_fixed2u_mul_cpt_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs, int flag );
-unsigned short __kmpc_atomic_fixed2u_div_cpt_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs, int flag );
-
-kmp_int32 __kmpc_atomic_fixed4_add_cpt_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs, int flag );
-kmp_int32 __kmpc_atomic_fixed4_sub_cpt_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs, int flag );
-kmp_int32 __kmpc_atomic_fixed4_mul_cpt_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs, int flag );
-kmp_int32 __kmpc_atomic_fixed4_div_cpt_fp(  ident_t *id_ref, int gtid, kmp_int32 * lhs, _Quad rhs, int flag );
-kmp_uint32 __kmpc_atomic_fixed4u_add_cpt_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs, int flag );
-kmp_uint32 __kmpc_atomic_fixed4u_sub_cpt_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs, int flag );
-kmp_uint32 __kmpc_atomic_fixed4u_mul_cpt_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs, int flag );
-kmp_uint32 __kmpc_atomic_fixed4u_div_cpt_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs, int flag );
-
-kmp_int64 __kmpc_atomic_fixed8_add_cpt_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs, int flag );
-kmp_int64 __kmpc_atomic_fixed8_sub_cpt_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs, int flag );
-kmp_int64 __kmpc_atomic_fixed8_mul_cpt_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs, int flag );
-kmp_int64 __kmpc_atomic_fixed8_div_cpt_fp(  ident_t *id_ref, int gtid, kmp_int64 * lhs, _Quad rhs, int flag );
-kmp_uint64 __kmpc_atomic_fixed8u_add_cpt_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs, int flag );
-kmp_uint64 __kmpc_atomic_fixed8u_sub_cpt_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs, int flag );
-kmp_uint64 __kmpc_atomic_fixed8u_mul_cpt_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs, int flag );
-kmp_uint64 __kmpc_atomic_fixed8u_div_cpt_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs, int flag );
-
-float __kmpc_atomic_float4_add_cpt_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs, int flag );
-float __kmpc_atomic_float4_sub_cpt_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs, int flag );
-float __kmpc_atomic_float4_mul_cpt_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs, int flag );
-float __kmpc_atomic_float4_div_cpt_fp(  ident_t *id_ref, int gtid, kmp_real32 * lhs, _Quad rhs, int flag );
-
-double __kmpc_atomic_float8_add_cpt_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs, int flag );
-double __kmpc_atomic_float8_sub_cpt_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs, int flag );
-double __kmpc_atomic_float8_mul_cpt_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs, int flag );
-double __kmpc_atomic_float8_div_cpt_fp(  ident_t *id_ref, int gtid, kmp_real64 * lhs, _Quad rhs, int flag );
-
-long double __kmpc_atomic_float10_add_cpt_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs, int flag );
-long double __kmpc_atomic_float10_sub_cpt_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs, int flag );
-long double __kmpc_atomic_float10_mul_cpt_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs, int flag );
-long double __kmpc_atomic_float10_div_cpt_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs, int flag );
-
-char            __kmpc_atomic_fixed1_sub_cpt_rev_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs, int flag );
-unsigned char   __kmpc_atomic_fixed1u_sub_cpt_rev_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs, int flag );
-char            __kmpc_atomic_fixed1_div_cpt_rev_fp(  ident_t *id_ref, int gtid, char * lhs, _Quad rhs, int flag );
-unsigned char   __kmpc_atomic_fixed1u_div_cpt_rev_fp( ident_t *id_ref, int gtid, unsigned char * lhs, _Quad rhs, int flag );
-short           __kmpc_atomic_fixed2_sub_cpt_rev_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs, int flag );
-unsigned short  __kmpc_atomic_fixed2u_sub_cpt_rev_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs, int flag );
-short           __kmpc_atomic_fixed2_div_cpt_rev_fp(  ident_t *id_ref, int gtid, short * lhs, _Quad rhs, int flag );
-unsigned short  __kmpc_atomic_fixed2u_div_cpt_rev_fp( ident_t *id_ref, int gtid, unsigned short * lhs, _Quad rhs, int flag );
-kmp_int32       __kmpc_atomic_fixed4_sub_cpt_rev_fp(  ident_t *id_ref, int gtid, kmp_int32  * lhs, _Quad  rhs, int flag );
-kmp_uint32      __kmpc_atomic_fixed4u_sub_cpt_rev_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs, int flag );
-kmp_int32       __kmpc_atomic_fixed4_div_cpt_rev_fp(  ident_t *id_ref, int gtid, kmp_int32  * lhs, _Quad  rhs, int flag );
-kmp_uint32      __kmpc_atomic_fixed4u_div_cpt_rev_fp( ident_t *id_ref, int gtid, kmp_uint32 * lhs, _Quad rhs, int flag );
-kmp_int64       __kmpc_atomic_fixed8_sub_cpt_rev_fp(  ident_t *id_ref, int gtid, kmp_int64  * lhs, _Quad  rhs, int flag );
-kmp_uint64      __kmpc_atomic_fixed8u_sub_cpt_rev_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs, int flag );
-kmp_int64       __kmpc_atomic_fixed8_div_cpt_rev_fp(  ident_t *id_ref, int gtid, kmp_int64  * lhs, _Quad  rhs, int flag );
-kmp_uint64      __kmpc_atomic_fixed8u_div_cpt_rev_fp( ident_t *id_ref, int gtid, kmp_uint64 * lhs, _Quad rhs, int flag );
-float           __kmpc_atomic_float4_sub_cpt_rev_fp(  ident_t *id_ref, int gtid, float * lhs, _Quad rhs, int flag );
-float           __kmpc_atomic_float4_div_cpt_rev_fp(  ident_t *id_ref, int gtid, float * lhs, _Quad rhs, int flag );
-double          __kmpc_atomic_float8_sub_cpt_rev_fp(  ident_t *id_ref, int gtid, double * lhs, _Quad rhs, int flag );
-double          __kmpc_atomic_float8_div_cpt_rev_fp(  ident_t *id_ref, int gtid, double * lhs, _Quad rhs, int flag );
-long double     __kmpc_atomic_float10_sub_cpt_rev_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs, int flag );
-long double     __kmpc_atomic_float10_div_cpt_rev_fp( ident_t *id_ref, int gtid, long double * lhs, _Quad rhs, int flag );
+char __kmpc_atomic_fixed1_add_cpt_fp(ident_t *id_ref, int gtid, char *lhs,
+                                     _Quad rhs, int flag);
+char __kmpc_atomic_fixed1_sub_cpt_fp(ident_t *id_ref, int gtid, char *lhs,
+                                     _Quad rhs, int flag);
+char __kmpc_atomic_fixed1_mul_cpt_fp(ident_t *id_ref, int gtid, char *lhs,
+                                     _Quad rhs, int flag);
+char __kmpc_atomic_fixed1_div_cpt_fp(ident_t *id_ref, int gtid, char *lhs,
+                                     _Quad rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_add_cpt_fp(ident_t *id_ref, int gtid,
+                                               unsigned char *lhs, _Quad rhs,
+                                               int flag);
+unsigned char __kmpc_atomic_fixed1u_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                               unsigned char *lhs, _Quad rhs,
+                                               int flag);
+unsigned char __kmpc_atomic_fixed1u_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                               unsigned char *lhs, _Quad rhs,
+                                               int flag);
+unsigned char __kmpc_atomic_fixed1u_div_cpt_fp(ident_t *id_ref, int gtid,
+                                               unsigned char *lhs, _Quad rhs,
+                                               int flag);
+
+short __kmpc_atomic_fixed2_add_cpt_fp(ident_t *id_ref, int gtid, short *lhs,
+                                      _Quad rhs, int flag);
+short __kmpc_atomic_fixed2_sub_cpt_fp(ident_t *id_ref, int gtid, short *lhs,
+                                      _Quad rhs, int flag);
+short __kmpc_atomic_fixed2_mul_cpt_fp(ident_t *id_ref, int gtid, short *lhs,
+                                      _Quad rhs, int flag);
+short __kmpc_atomic_fixed2_div_cpt_fp(ident_t *id_ref, int gtid, short *lhs,
+                                      _Quad rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_add_cpt_fp(ident_t *id_ref, int gtid,
+                                                unsigned short *lhs, _Quad rhs,
+                                                int flag);
+unsigned short __kmpc_atomic_fixed2u_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                                unsigned short *lhs, _Quad rhs,
+                                                int flag);
+unsigned short __kmpc_atomic_fixed2u_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                                unsigned short *lhs, _Quad rhs,
+                                                int flag);
+unsigned short __kmpc_atomic_fixed2u_div_cpt_fp(ident_t *id_ref, int gtid,
+                                                unsigned short *lhs, _Quad rhs,
+                                                int flag);
+
+kmp_int32 __kmpc_atomic_fixed4_add_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int32 *lhs, _Quad rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int32 *lhs, _Quad rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int32 *lhs, _Quad rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_div_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int32 *lhs, _Quad rhs, int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_add_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint32 *lhs, _Quad rhs,
+                                            int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint32 *lhs, _Quad rhs,
+                                            int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint32 *lhs, _Quad rhs,
+                                            int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_div_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint32 *lhs, _Quad rhs,
+                                            int flag);
+
+kmp_int64 __kmpc_atomic_fixed8_add_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int64 *lhs, _Quad rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int64 *lhs, _Quad rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int64 *lhs, _Quad rhs, int flag);
+kmp_int64 __kmpc_atomic_fixed8_div_cpt_fp(ident_t *id_ref, int gtid,
+                                          kmp_int64 *lhs, _Quad rhs, int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_add_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint64 *lhs, _Quad rhs,
+                                            int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint64 *lhs, _Quad rhs,
+                                            int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint64 *lhs, _Quad rhs,
+                                            int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_div_cpt_fp(ident_t *id_ref, int gtid,
+                                            kmp_uint64 *lhs, _Quad rhs,
+                                            int flag);
+
+float __kmpc_atomic_float4_add_cpt_fp(ident_t *id_ref, int gtid,
+                                      kmp_real32 *lhs, _Quad rhs, int flag);
+float __kmpc_atomic_float4_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                      kmp_real32 *lhs, _Quad rhs, int flag);
+float __kmpc_atomic_float4_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                      kmp_real32 *lhs, _Quad rhs, int flag);
+float __kmpc_atomic_float4_div_cpt_fp(ident_t *id_ref, int gtid,
+                                      kmp_real32 *lhs, _Quad rhs, int flag);
+
+double __kmpc_atomic_float8_add_cpt_fp(ident_t *id_ref, int gtid,
+                                       kmp_real64 *lhs, _Quad rhs, int flag);
+double __kmpc_atomic_float8_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                       kmp_real64 *lhs, _Quad rhs, int flag);
+double __kmpc_atomic_float8_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                       kmp_real64 *lhs, _Quad rhs, int flag);
+double __kmpc_atomic_float8_div_cpt_fp(ident_t *id_ref, int gtid,
+                                       kmp_real64 *lhs, _Quad rhs, int flag);
+
+long double __kmpc_atomic_float10_add_cpt_fp(ident_t *id_ref, int gtid,
+                                             long double *lhs, _Quad rhs,
+                                             int flag);
+long double __kmpc_atomic_float10_sub_cpt_fp(ident_t *id_ref, int gtid,
+                                             long double *lhs, _Quad rhs,
+                                             int flag);
+long double __kmpc_atomic_float10_mul_cpt_fp(ident_t *id_ref, int gtid,
+                                             long double *lhs, _Quad rhs,
+                                             int flag);
+long double __kmpc_atomic_float10_div_cpt_fp(ident_t *id_ref, int gtid,
+                                             long double *lhs, _Quad rhs,
+                                             int flag);
+
+char __kmpc_atomic_fixed1_sub_cpt_rev_fp(ident_t *id_ref, int gtid, char *lhs,
+                                         _Quad rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                   unsigned char *lhs,
+                                                   _Quad rhs, int flag);
+char __kmpc_atomic_fixed1_div_cpt_rev_fp(ident_t *id_ref, int gtid, char *lhs,
+                                         _Quad rhs, int flag);
+unsigned char __kmpc_atomic_fixed1u_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                   unsigned char *lhs,
+                                                   _Quad rhs, int flag);
+short __kmpc_atomic_fixed2_sub_cpt_rev_fp(ident_t *id_ref, int gtid, short *lhs,
+                                          _Quad rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                    unsigned short *lhs,
+                                                    _Quad rhs, int flag);
+short __kmpc_atomic_fixed2_div_cpt_rev_fp(ident_t *id_ref, int gtid, short *lhs,
+                                          _Quad rhs, int flag);
+unsigned short __kmpc_atomic_fixed2u_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                    unsigned short *lhs,
+                                                    _Quad rhs, int flag);
+kmp_int32 __kmpc_atomic_fixed4_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                              kmp_int32 *lhs, _Quad rhs,
+                                              int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                kmp_uint32 *lhs, _Quad rhs,
+                                                int flag);
+kmp_int32 __kmpc_atomic_fixed4_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                              kmp_int32 *lhs, _Quad rhs,
+                                              int flag);
+kmp_uint32 __kmpc_atomic_fixed4u_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                kmp_uint32 *lhs, _Quad rhs,
+                                                int flag);
+kmp_int64 __kmpc_atomic_fixed8_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                              kmp_int64 *lhs, _Quad rhs,
+                                              int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                kmp_uint64 *lhs, _Quad rhs,
+                                                int flag);
+kmp_int64 __kmpc_atomic_fixed8_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                              kmp_int64 *lhs, _Quad rhs,
+                                              int flag);
+kmp_uint64 __kmpc_atomic_fixed8u_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                kmp_uint64 *lhs, _Quad rhs,
+                                                int flag);
+float __kmpc_atomic_float4_sub_cpt_rev_fp(ident_t *id_ref, int gtid, float *lhs,
+                                          _Quad rhs, int flag);
+float __kmpc_atomic_float4_div_cpt_rev_fp(ident_t *id_ref, int gtid, float *lhs,
+                                          _Quad rhs, int flag);
+double __kmpc_atomic_float8_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                           double *lhs, _Quad rhs, int flag);
+double __kmpc_atomic_float8_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                           double *lhs, _Quad rhs, int flag);
+long double __kmpc_atomic_float10_sub_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                 long double *lhs, _Quad rhs,
+                                                 int flag);
+long double __kmpc_atomic_float10_div_cpt_rev_fp(ident_t *id_ref, int gtid,
+                                                 long double *lhs, _Quad rhs,
+                                                 int flag);
 
 #endif // KMP_HAVE_QUAD
 
 // End of OpenMP 4.0 capture
 
-#endif //OMP_40_ENABLED
+#endif // OMP_40_ENABLED
 
-#endif //KMP_ARCH_X86 || KMP_ARCH_X86_64
+#endif // KMP_ARCH_X86 || KMP_ARCH_X86_64
 
 /* ------------------------------------------------------------------------ */
-/* ------------------------------------------------------------------------ */
 
 #ifdef __cplusplus
-    } // extern "C"
+} // extern "C"
 #endif
 
 #endif /* KMP_ATOMIC_H */

Modified: openmp/trunk/runtime/src/kmp_barrier.cpp
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_barrier.cpp?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_barrier.cpp (original)
+++ openmp/trunk/runtime/src/kmp_barrier.cpp Fri May 12 13:01:32 2017
@@ -15,9 +15,9 @@
 
 #include "kmp.h"
 #include "kmp_wait_release.h"
-#include "kmp_stats.h"
 #include "kmp_itt.h"
 #include "kmp_os.h"
+#include "kmp_stats.h"
 
 
 #if KMP_MIC
@@ -29,15 +29,15 @@
 
 #if KMP_MIC && USE_NGO_STORES
 // ICV copying
-#define ngo_load(src)            __m512d Vt = _mm512_load_pd((void *)(src))
+#define ngo_load(src) __m512d Vt = _mm512_load_pd((void *)(src))
 #define ngo_store_icvs(dst, src) _mm512_storenrngo_pd((void *)(dst), Vt)
-#define ngo_store_go(dst, src)   _mm512_storenrngo_pd((void *)(dst), Vt)
-#define ngo_sync()               __asm__ volatile ("lock; addl $0,0(%%rsp)" ::: "memory")
+#define ngo_store_go(dst, src) _mm512_storenrngo_pd((void *)(dst), Vt)
+#define ngo_sync() __asm__ volatile("lock; addl $0,0(%%rsp)" ::: "memory")
 #else
-#define ngo_load(src)            ((void)0)
+#define ngo_load(src) ((void)0)
 #define ngo_store_icvs(dst, src) copy_icvs((dst), (src))
-#define ngo_store_go(dst, src)   KMP_MEMCPY((dst), (src), CACHE_LINE)
-#define ngo_sync()               ((void)0)
+#define ngo_store_go(dst, src) KMP_MEMCPY((dst), (src), CACHE_LINE)
+#define ngo_sync() ((void)0)
 #endif /* KMP_MIC && USE_NGO_STORES */
 
 void __kmp_print_structure(void); // Forward declaration
@@ -45,1785 +45,1966 @@ void __kmp_print_structure(void); // For
 // ---------------------------- Barrier Algorithms ----------------------------
 
 // Linear Barrier
-static void
-__kmp_linear_barrier_gather(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                            void (*reduce)(void *, void *)
-                            USE_ITT_BUILD_ARG(void * itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_linear_gather);
-    register kmp_team_t *team = this_thr->th.th_team;
-    register kmp_bstate_t *thr_bar = & this_thr->th.th_bar[bt].bb;
-    register kmp_info_t **other_threads = team->t.t_threads;
-
-    KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
-    KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
+static void __kmp_linear_barrier_gather(
+    enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
+    void (*reduce)(void *, void *) USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_linear_gather);
+  register kmp_team_t *team = this_thr->th.th_team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_info_t **other_threads = team->t.t_threads;
+
+  KA_TRACE(
+      20,
+      ("__kmp_linear_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
+       gtid, team->t.t_id, tid, bt));
+  KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
 
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-    // Barrier imbalance - save arrive time to the thread
-    if(__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
-        this_thr->th.th_bar_arrive_time = this_thr->th.th_bar_min_time = __itt_get_timestamp();
-    }
-#endif
-    // We now perform a linear reduction to signal that all of the threads have arrived.
-    if (!KMP_MASTER_TID(tid)) {
-        KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d)"
-                      "arrived(%p): %llu => %llu\n", gtid, team->t.t_id, tid,
-                      __kmp_gtid_from_tid(0, team), team->t.t_id, 0, &thr_bar->b_arrived,
-                      thr_bar->b_arrived, thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
-        // Mark arrival to master thread
-        /* After performing this write, a worker thread may not assume that the team is valid
-           any more - it could be deallocated by the master thread at any time. */
-        ANNOTATE_BARRIER_BEGIN(this_thr);
-        kmp_flag_64 flag(&thr_bar->b_arrived, other_threads[0]);
-        flag.release();
-    } else {
-        register kmp_balign_team_t *team_bar = &team->t.t_bar[bt];
-        register int nproc = this_thr->th.th_team_nproc;
-        register int i;
-        // Don't have to worry about sleep bit here or atomic since team setting
-        register kmp_uint64 new_state = team_bar->b_arrived + KMP_BARRIER_STATE_BUMP;
+  // Barrier imbalance - save arrive time to the thread
+  if (__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
+    this_thr->th.th_bar_arrive_time = this_thr->th.th_bar_min_time =
+        __itt_get_timestamp();
+  }
+#endif
+  // We now perform a linear reduction to signal that all of the threads have
+  // arrived.
+  if (!KMP_MASTER_TID(tid)) {
+    KA_TRACE(20,
+             ("__kmp_linear_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d)"
+              "arrived(%p): %llu => %llu\n",
+              gtid, team->t.t_id, tid, __kmp_gtid_from_tid(0, team),
+              team->t.t_id, 0, &thr_bar->b_arrived, thr_bar->b_arrived,
+              thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
+    // Mark arrival to master thread
+    /* After performing this write, a worker thread may not assume that the team
+       is valid any more - it could be deallocated by the master thread at any
+       time. */
+    ANNOTATE_BARRIER_BEGIN(this_thr);
+    kmp_flag_64 flag(&thr_bar->b_arrived, other_threads[0]);
+    flag.release();
+  } else {
+    register kmp_balign_team_t *team_bar = &team->t.t_bar[bt];
+    register int nproc = this_thr->th.th_team_nproc;
+    register int i;
+    // Don't have to worry about sleep bit here or atomic since team setting
+    register kmp_uint64 new_state =
+        team_bar->b_arrived + KMP_BARRIER_STATE_BUMP;
 
-        // Collect all the worker team member threads.
-        for (i=1; i<nproc; ++i) {
+    // Collect all the worker team member threads.
+    for (i = 1; i < nproc; ++i) {
 #if KMP_CACHE_MANAGE
-            // Prefetch next thread's arrived count
-            if (i+1 < nproc)
-                KMP_CACHE_PREFETCH(&other_threads[i+1]->th.th_bar[bt].bb.b_arrived);
+      // Prefetch next thread's arrived count
+      if (i + 1 < nproc)
+        KMP_CACHE_PREFETCH(&other_threads[i + 1]->th.th_bar[bt].bb.b_arrived);
 #endif /* KMP_CACHE_MANAGE */
-            KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%d) "
-                          "arrived(%p) == %llu\n", gtid, team->t.t_id, tid,
-                            __kmp_gtid_from_tid(i, team), team->t.t_id, i,
-                            &other_threads[i]->th.th_bar[bt].bb.b_arrived, new_state));
-
-            // Wait for worker thread to arrive
-            kmp_flag_64 flag(&other_threads[i]->th.th_bar[bt].bb.b_arrived, new_state);
-            flag.wait(this_thr, FALSE
-                      USE_ITT_BUILD_ARG(itt_sync_obj) );
-            ANNOTATE_BARRIER_END(other_threads[i]);
+      KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%d) "
+                    "arrived(%p) == %llu\n",
+                    gtid, team->t.t_id, tid, __kmp_gtid_from_tid(i, team),
+                    team->t.t_id, i,
+                    &other_threads[i]->th.th_bar[bt].bb.b_arrived, new_state));
+
+      // Wait for worker thread to arrive
+      kmp_flag_64 flag(&other_threads[i]->th.th_bar[bt].bb.b_arrived,
+                       new_state);
+      flag.wait(this_thr, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+      ANNOTATE_BARRIER_END(other_threads[i]);
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-            // Barrier imbalance - write min of the thread time and the other thread time to the thread.
-            if (__kmp_forkjoin_frames_mode == 2) {
-                this_thr->th.th_bar_min_time = KMP_MIN(this_thr->th.th_bar_min_time,
-                                                          other_threads[i]->th.th_bar_min_time);
-            }
+      // Barrier imbalance - write min of the thread time and the other thread
+      // time to the thread.
+      if (__kmp_forkjoin_frames_mode == 2) {
+        this_thr->th.th_bar_min_time = KMP_MIN(
+            this_thr->th.th_bar_min_time, other_threads[i]->th.th_bar_min_time);
+      }
 #endif
-            if (reduce) {
-                KA_TRACE(100, ("__kmp_linear_barrier_gather: T#%d(%d:%d) += T#%d(%d:%d)\n", gtid,
-                               team->t.t_id, tid, __kmp_gtid_from_tid(i, team), team->t.t_id, i));
-                ANNOTATE_REDUCE_AFTER(reduce);
-                (*reduce)(this_thr->th.th_local.reduce_data,
-                          other_threads[i]->th.th_local.reduce_data);
-                ANNOTATE_REDUCE_BEFORE(reduce);
-                ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
-            }
-        }
-        // Don't have to worry about sleep bit here or atomic since team setting
-        team_bar->b_arrived = new_state;
-        KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) set team %d arrived(%p) = %llu\n",
-                      gtid, team->t.t_id, tid, team->t.t_id, &team_bar->b_arrived, new_state));
+      if (reduce) {
+        KA_TRACE(100,
+                 ("__kmp_linear_barrier_gather: T#%d(%d:%d) += T#%d(%d:%d)\n",
+                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(i, team),
+                  team->t.t_id, i));
+        ANNOTATE_REDUCE_AFTER(reduce);
+        (*reduce)(this_thr->th.th_local.reduce_data,
+                  other_threads[i]->th.th_local.reduce_data);
+        ANNOTATE_REDUCE_BEFORE(reduce);
+        ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
+      }
     }
-    KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+    // Don't have to worry about sleep bit here or atomic since team setting
+    team_bar->b_arrived = new_state;
+    KA_TRACE(20, ("__kmp_linear_barrier_gather: T#%d(%d:%d) set team %d "
+                  "arrived(%p) = %llu\n",
+                  gtid, team->t.t_id, tid, team->t.t_id, &team_bar->b_arrived,
+                  new_state));
+  }
+  KA_TRACE(
+      20,
+      ("__kmp_linear_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
+       gtid, team->t.t_id, tid, bt));
 }
 
-static void
-__kmp_linear_barrier_release(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                             int propagate_icvs
-                             USE_ITT_BUILD_ARG(void *itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_linear_release);
-    register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
-    register kmp_team_t *team;
+static void __kmp_linear_barrier_release(
+    enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
+    int propagate_icvs USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_linear_release);
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_team_t *team;
 
-    if (KMP_MASTER_TID(tid)) {
-        register unsigned int i;
-        register kmp_uint32 nproc = this_thr->th.th_team_nproc;
-        register kmp_info_t **other_threads;
-
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        other_threads = team->t.t_threads;
+  if (KMP_MASTER_TID(tid)) {
+    register unsigned int i;
+    register kmp_uint32 nproc = this_thr->th.th_team_nproc;
+    register kmp_info_t **other_threads;
 
-        KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d(%d:%d) master enter for barrier type %d\n",
-                      gtid, team->t.t_id, tid, bt));
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    other_threads = team->t.t_threads;
+
+    KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d(%d:%d) master enter for "
+                  "barrier type %d\n",
+                  gtid, team->t.t_id, tid, bt));
 
-        if (nproc > 1) {
+    if (nproc > 1) {
 #if KMP_BARRIER_ICV_PUSH
-            {
-                KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(USER_icv_copy);
-                if (propagate_icvs) {
-                    ngo_load(&team->t.t_implicit_task_taskdata[0].td_icvs);
-                    for (i=1; i<nproc; ++i) {
-                        __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[i], team, i, FALSE);
-                        ngo_store_icvs(&team->t.t_implicit_task_taskdata[i].td_icvs,
-                                       &team->t.t_implicit_task_taskdata[0].td_icvs);
-                    }
-                    ngo_sync();
-                }
-            }
+      {
+        KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(USER_icv_copy);
+        if (propagate_icvs) {
+          ngo_load(&team->t.t_implicit_task_taskdata[0].td_icvs);
+          for (i = 1; i < nproc; ++i) {
+            __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[i],
+                                     team, i, FALSE);
+            ngo_store_icvs(&team->t.t_implicit_task_taskdata[i].td_icvs,
+                           &team->t.t_implicit_task_taskdata[0].td_icvs);
+          }
+          ngo_sync();
+        }
+      }
 #endif // KMP_BARRIER_ICV_PUSH
 
-            // Now, release all of the worker threads
-            for (i=1; i<nproc; ++i) {
+      // Now, release all of the worker threads
+      for (i = 1; i < nproc; ++i) {
 #if KMP_CACHE_MANAGE
-                // Prefetch next thread's go flag
-                if (i+1 < nproc)
-                    KMP_CACHE_PREFETCH(&other_threads[i+1]->th.th_bar[bt].bb.b_go);
+        // Prefetch next thread's go flag
+        if (i + 1 < nproc)
+          KMP_CACHE_PREFETCH(&other_threads[i + 1]->th.th_bar[bt].bb.b_go);
 #endif /* KMP_CACHE_MANAGE */
-                KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%d) "
-                              "go(%p): %u => %u\n", gtid, team->t.t_id, tid,
-                              other_threads[i]->th.th_info.ds.ds_gtid, team->t.t_id, i,
-                              &other_threads[i]->th.th_bar[bt].bb.b_go,
-                              other_threads[i]->th.th_bar[bt].bb.b_go,
-                              other_threads[i]->th.th_bar[bt].bb.b_go + KMP_BARRIER_STATE_BUMP));
-                ANNOTATE_BARRIER_BEGIN(other_threads[i]);
-                kmp_flag_64 flag(&other_threads[i]->th.th_bar[bt].bb.b_go, other_threads[i]);
-                flag.release();
-            }
-        }
-    } else { // Wait for the MASTER thread to release us
-        KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d wait go(%p) == %u\n",
-                      gtid, &thr_bar->b_go, KMP_BARRIER_STATE_BUMP));
-        kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
-        flag.wait(this_thr, TRUE
-                  USE_ITT_BUILD_ARG(itt_sync_obj) );
-        ANNOTATE_BARRIER_END(this_thr);
+        KA_TRACE(
+            20,
+            ("__kmp_linear_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%d) "
+             "go(%p): %u => %u\n",
+             gtid, team->t.t_id, tid, other_threads[i]->th.th_info.ds.ds_gtid,
+             team->t.t_id, i, &other_threads[i]->th.th_bar[bt].bb.b_go,
+             other_threads[i]->th.th_bar[bt].bb.b_go,
+             other_threads[i]->th.th_bar[bt].bb.b_go + KMP_BARRIER_STATE_BUMP));
+        ANNOTATE_BARRIER_BEGIN(other_threads[i]);
+        kmp_flag_64 flag(&other_threads[i]->th.th_bar[bt].bb.b_go,
+                         other_threads[i]);
+        flag.release();
+      }
+    }
+  } else { // Wait for the MASTER thread to release us
+    KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d wait go(%p) == %u\n",
+                  gtid, &thr_bar->b_go, KMP_BARRIER_STATE_BUMP));
+    kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
+    flag.wait(this_thr, TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+    ANNOTATE_BARRIER_END(this_thr);
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-        if ((__itt_sync_create_ptr && itt_sync_obj == NULL) || KMP_ITT_DEBUG) {
-            // In a fork barrier; cannot get the object reliably (or ITTNOTIFY is disabled)
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 0, -1);
-            // Cancel wait on previous parallel region...
-            __kmp_itt_task_starting(itt_sync_obj);
-
-            if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
-                return;
-
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
-            if (itt_sync_obj != NULL)
-                // Call prepare as early as possible for "new" barrier
-                __kmp_itt_task_finished(itt_sync_obj);
-        } else
+    if ((__itt_sync_create_ptr && itt_sync_obj == NULL) || KMP_ITT_DEBUG) {
+      // In a fork barrier; cannot get the object reliably (or ITTNOTIFY is
+      // disabled)
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 0, -1);
+      // Cancel wait on previous parallel region...
+      __kmp_itt_task_starting(itt_sync_obj);
+
+      if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
+        return;
+
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
+      if (itt_sync_obj != NULL)
+        // Call prepare as early as possible for "new" barrier
+        __kmp_itt_task_finished(itt_sync_obj);
+    } else
 #endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
         // Early exit for reaping threads releasing forkjoin barrier
-        if ( bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done) )
-            return;
-        // The worker thread may now assume that the team is valid.
+        if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
+      return;
+// The worker thread may now assume that the team is valid.
 #ifdef KMP_DEBUG
-        tid = __kmp_tid_from_gtid(gtid);
-        team = __kmp_threads[gtid]->th.th_team;
+    tid = __kmp_tid_from_gtid(gtid);
+    team = __kmp_threads[gtid]->th.th_team;
 #endif
-        KMP_DEBUG_ASSERT(team != NULL);
-        TCW_4(thr_bar->b_go, KMP_INIT_BARRIER_STATE);
-        KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
-                      gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
-        KMP_MB();  // Flush all pending memory write invalidates.
-    }
-    KA_TRACE(20, ("__kmp_linear_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+    KMP_DEBUG_ASSERT(team != NULL);
+    TCW_4(thr_bar->b_go, KMP_INIT_BARRIER_STATE);
+    KA_TRACE(20,
+             ("__kmp_linear_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
+              gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
+    KMP_MB(); // Flush all pending memory write invalidates.
+  }
+  KA_TRACE(
+      20,
+      ("__kmp_linear_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
+       gtid, team->t.t_id, tid, bt));
 }
 
 // Tree barrier
 static void
-__kmp_tree_barrier_gather(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                          void (*reduce)(void *, void *)
-                          USE_ITT_BUILD_ARG(void *itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_tree_gather);
-    register kmp_team_t *team = this_thr->th.th_team;
-    register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
-    register kmp_info_t **other_threads = team->t.t_threads;
-    register kmp_uint32 nproc = this_thr->th.th_team_nproc;
-    register kmp_uint32 branch_bits = __kmp_barrier_gather_branch_bits[bt];
-    register kmp_uint32 branch_factor = 1 << branch_bits;
-    register kmp_uint32 child;
-    register kmp_uint32 child_tid;
-    register kmp_uint64 new_state;
-
-    KA_TRACE(20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
-    KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
+__kmp_tree_barrier_gather(enum barrier_type bt, kmp_info_t *this_thr, int gtid,
+                          int tid, void (*reduce)(void *, void *)
+                                       USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_tree_gather);
+  register kmp_team_t *team = this_thr->th.th_team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_info_t **other_threads = team->t.t_threads;
+  register kmp_uint32 nproc = this_thr->th.th_team_nproc;
+  register kmp_uint32 branch_bits = __kmp_barrier_gather_branch_bits[bt];
+  register kmp_uint32 branch_factor = 1 << branch_bits;
+  register kmp_uint32 child;
+  register kmp_uint32 child_tid;
+  register kmp_uint64 new_state;
+
+  KA_TRACE(
+      20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
+           gtid, team->t.t_id, tid, bt));
+  KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
 
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-    // Barrier imbalance - save arrive time to the thread
-    if(__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
-        this_thr->th.th_bar_arrive_time = this_thr->th.th_bar_min_time = __itt_get_timestamp();
-    }
-#endif
-    // Perform tree gather to wait until all threads have arrived; reduce any required data as we go
-    child_tid = (tid << branch_bits) + 1;
-    if (child_tid < nproc) {
-        // Parent threads wait for all their children to arrive
-        new_state = team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
-        child = 1;
-        do {
-            register kmp_info_t *child_thr = other_threads[child_tid];
-            register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+  // Barrier imbalance - save arrive time to the thread
+  if (__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
+    this_thr->th.th_bar_arrive_time = this_thr->th.th_bar_min_time =
+        __itt_get_timestamp();
+  }
+#endif
+  // Perform tree gather to wait until all threads have arrived; reduce any
+  // required data as we go
+  child_tid = (tid << branch_bits) + 1;
+  if (child_tid < nproc) {
+    // Parent threads wait for all their children to arrive
+    new_state = team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
+    child = 1;
+    do {
+      register kmp_info_t *child_thr = other_threads[child_tid];
+      register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
 #if KMP_CACHE_MANAGE
-            // Prefetch next thread's arrived count
-            if (child+1 <= branch_factor && child_tid+1 < nproc)
-                KMP_CACHE_PREFETCH(&other_threads[child_tid+1]->th.th_bar[bt].bb.b_arrived);
+      // Prefetch next thread's arrived count
+      if (child + 1 <= branch_factor && child_tid + 1 < nproc)
+        KMP_CACHE_PREFETCH(
+            &other_threads[child_tid + 1]->th.th_bar[bt].bb.b_arrived);
 #endif /* KMP_CACHE_MANAGE */
-            KA_TRACE(20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%u) "
-                          "arrived(%p) == %llu\n", gtid, team->t.t_id, tid,
-                            __kmp_gtid_from_tid(child_tid, team), team->t.t_id, child_tid,
-                            &child_bar->b_arrived, new_state));
-            // Wait for child to arrive
-            kmp_flag_64 flag(&child_bar->b_arrived, new_state);
-            flag.wait(this_thr, FALSE
-                      USE_ITT_BUILD_ARG(itt_sync_obj) );
-            ANNOTATE_BARRIER_END(child_thr);
+      KA_TRACE(20,
+               ("__kmp_tree_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%u) "
+                "arrived(%p) == %llu\n",
+                gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+                team->t.t_id, child_tid, &child_bar->b_arrived, new_state));
+      // Wait for child to arrive
+      kmp_flag_64 flag(&child_bar->b_arrived, new_state);
+      flag.wait(this_thr, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+      ANNOTATE_BARRIER_END(child_thr);
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-            // Barrier imbalance - write min of the thread time and a child time to the thread.
-            if (__kmp_forkjoin_frames_mode == 2) {
-                this_thr->th.th_bar_min_time = KMP_MIN(this_thr->th.th_bar_min_time,
-                                                          child_thr->th.th_bar_min_time);
-            }
+      // Barrier imbalance - write min of the thread time and a child time to
+      // the thread.
+      if (__kmp_forkjoin_frames_mode == 2) {
+        this_thr->th.th_bar_min_time = KMP_MIN(this_thr->th.th_bar_min_time,
+                                               child_thr->th.th_bar_min_time);
+      }
 #endif
-            if (reduce) {
-                KA_TRACE(100, ("__kmp_tree_barrier_gather: T#%d(%d:%d) += T#%d(%d:%u)\n",
-                               gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                               team->t.t_id, child_tid));
-                ANNOTATE_REDUCE_AFTER(reduce);
-                (*reduce)(this_thr->th.th_local.reduce_data, child_thr->th.th_local.reduce_data);
-                ANNOTATE_REDUCE_BEFORE(reduce);
-                ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
-            }
-            child++;
-            child_tid++;
-        }
-        while (child <= branch_factor && child_tid < nproc);
-    }
-
-    if (!KMP_MASTER_TID(tid)) { // Worker threads
-        register kmp_int32 parent_tid = (tid - 1) >> branch_bits;
-
-        KA_TRACE(20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d) "
-                      "arrived(%p): %llu => %llu\n", gtid, team->t.t_id, tid,
-                      __kmp_gtid_from_tid(parent_tid, team), team->t.t_id, parent_tid,
-                      &thr_bar->b_arrived, thr_bar->b_arrived,
-                      thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
-
-        // Mark arrival to parent thread
-        /* After performing this write, a worker thread may not assume that the team is valid
-           any more - it could be deallocated by the master thread at any time.  */
-        ANNOTATE_BARRIER_BEGIN(this_thr);
-        kmp_flag_64 flag(&thr_bar->b_arrived, other_threads[parent_tid]);
-        flag.release();
-    } else {
-        // Need to update the team arrived pointer if we are the master thread
-        if (nproc > 1) // New value was already computed above
-            team->t.t_bar[bt].b_arrived = new_state;
-        else
-            team->t.t_bar[bt].b_arrived += KMP_BARRIER_STATE_BUMP;
-        KA_TRACE(20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) set team %d arrived(%p) = %llu\n",
-                      gtid, team->t.t_id, tid, team->t.t_id,
-                      &team->t.t_bar[bt].b_arrived, team->t.t_bar[bt].b_arrived));
-    }
-    KA_TRACE(20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+      if (reduce) {
+        KA_TRACE(100,
+                 ("__kmp_tree_barrier_gather: T#%d(%d:%d) += T#%d(%d:%u)\n",
+                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+                  team->t.t_id, child_tid));
+        ANNOTATE_REDUCE_AFTER(reduce);
+        (*reduce)(this_thr->th.th_local.reduce_data,
+                  child_thr->th.th_local.reduce_data);
+        ANNOTATE_REDUCE_BEFORE(reduce);
+        ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
+      }
+      child++;
+      child_tid++;
+    } while (child <= branch_factor && child_tid < nproc);
+  }
+
+  if (!KMP_MASTER_TID(tid)) { // Worker threads
+    register kmp_int32 parent_tid = (tid - 1) >> branch_bits;
+
+    KA_TRACE(20,
+             ("__kmp_tree_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d) "
+              "arrived(%p): %llu => %llu\n",
+              gtid, team->t.t_id, tid, __kmp_gtid_from_tid(parent_tid, team),
+              team->t.t_id, parent_tid, &thr_bar->b_arrived, thr_bar->b_arrived,
+              thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
+
+    // Mark arrival to parent thread
+    /* After performing this write, a worker thread may not assume that the team
+       is valid any more - it could be deallocated by the master thread at any
+       time.  */
+    ANNOTATE_BARRIER_BEGIN(this_thr);
+    kmp_flag_64 flag(&thr_bar->b_arrived, other_threads[parent_tid]);
+    flag.release();
+  } else {
+    // Need to update the team arrived pointer if we are the master thread
+    if (nproc > 1) // New value was already computed above
+      team->t.t_bar[bt].b_arrived = new_state;
+    else
+      team->t.t_bar[bt].b_arrived += KMP_BARRIER_STATE_BUMP;
+    KA_TRACE(20, ("__kmp_tree_barrier_gather: T#%d(%d:%d) set team %d "
+                  "arrived(%p) = %llu\n",
+                  gtid, team->t.t_id, tid, team->t.t_id,
+                  &team->t.t_bar[bt].b_arrived, team->t.t_bar[bt].b_arrived));
+  }
+  KA_TRACE(20,
+           ("__kmp_tree_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
+            gtid, team->t.t_id, tid, bt));
 }
 
-static void
-__kmp_tree_barrier_release(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                           int propagate_icvs
-                           USE_ITT_BUILD_ARG(void *itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_tree_release);
-    register kmp_team_t *team;
-    register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
-    register kmp_uint32 nproc;
-    register kmp_uint32 branch_bits = __kmp_barrier_release_branch_bits[bt];
-    register kmp_uint32 branch_factor = 1 << branch_bits;
-    register kmp_uint32 child;
-    register kmp_uint32 child_tid;
-
-    // Perform a tree release for all of the threads that have been gathered
-    if (!KMP_MASTER_TID(tid)) { // Handle fork barrier workers who aren't part of a team yet
-        KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d wait go(%p) == %u\n",
-                      gtid, &thr_bar->b_go, KMP_BARRIER_STATE_BUMP));
-        // Wait for parent thread to release us
-        kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
-        flag.wait(this_thr, TRUE
-                  USE_ITT_BUILD_ARG(itt_sync_obj) );
-        ANNOTATE_BARRIER_END(this_thr);
+static void __kmp_tree_barrier_release(
+    enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
+    int propagate_icvs USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_tree_release);
+  register kmp_team_t *team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_uint32 nproc;
+  register kmp_uint32 branch_bits = __kmp_barrier_release_branch_bits[bt];
+  register kmp_uint32 branch_factor = 1 << branch_bits;
+  register kmp_uint32 child;
+  register kmp_uint32 child_tid;
+
+  // Perform a tree release for all of the threads that have been gathered
+  if (!KMP_MASTER_TID(
+          tid)) { // Handle fork barrier workers who aren't part of a team yet
+    KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d wait go(%p) == %u\n", gtid,
+                  &thr_bar->b_go, KMP_BARRIER_STATE_BUMP));
+    // Wait for parent thread to release us
+    kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
+    flag.wait(this_thr, TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+    ANNOTATE_BARRIER_END(this_thr);
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-        if ((__itt_sync_create_ptr && itt_sync_obj == NULL) || KMP_ITT_DEBUG) {
-            // In fork barrier where we could not get the object reliably (or ITTNOTIFY is disabled)
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 0, -1);
-            // Cancel wait on previous parallel region...
-            __kmp_itt_task_starting(itt_sync_obj);
-
-            if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
-                return;
-
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
-            if (itt_sync_obj != NULL)
-                // Call prepare as early as possible for "new" barrier
-                __kmp_itt_task_finished(itt_sync_obj);
-        } else
+    if ((__itt_sync_create_ptr && itt_sync_obj == NULL) || KMP_ITT_DEBUG) {
+      // In fork barrier where we could not get the object reliably (or
+      // ITTNOTIFY is disabled)
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 0, -1);
+      // Cancel wait on previous parallel region...
+      __kmp_itt_task_starting(itt_sync_obj);
+
+      if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
+        return;
+
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
+      if (itt_sync_obj != NULL)
+        // Call prepare as early as possible for "new" barrier
+        __kmp_itt_task_finished(itt_sync_obj);
+    } else
 #endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
         // Early exit for reaping threads releasing forkjoin barrier
         if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
-            return;
+      return;
 
-        // The worker thread may now assume that the team is valid.
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        tid = __kmp_tid_from_gtid(gtid);
-
-        TCW_4(thr_bar->b_go, KMP_INIT_BARRIER_STATE);
-        KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
-                      gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
-        KMP_MB();  // Flush all pending memory write invalidates.
-    } else {
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d(%d:%d) master enter for barrier type %d\n",
-                      gtid, team->t.t_id, tid, bt));
-    }
-    nproc = this_thr->th.th_team_nproc;
-    child_tid = (tid << branch_bits) + 1;
-
-    if (child_tid < nproc) {
-        register kmp_info_t **other_threads = team->t.t_threads;
-        child = 1;
-        // Parent threads release all their children
-        do {
-            register kmp_info_t *child_thr = other_threads[child_tid];
-            register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+    // The worker thread may now assume that the team is valid.
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    tid = __kmp_tid_from_gtid(gtid);
+
+    TCW_4(thr_bar->b_go, KMP_INIT_BARRIER_STATE);
+    KA_TRACE(20,
+             ("__kmp_tree_barrier_release: T#%d(%d:%d) set go(%p) = %u\n", gtid,
+              team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
+    KMP_MB(); // Flush all pending memory write invalidates.
+  } else {
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d(%d:%d) master enter for "
+                  "barrier type %d\n",
+                  gtid, team->t.t_id, tid, bt));
+  }
+  nproc = this_thr->th.th_team_nproc;
+  child_tid = (tid << branch_bits) + 1;
+
+  if (child_tid < nproc) {
+    register kmp_info_t **other_threads = team->t.t_threads;
+    child = 1;
+    // Parent threads release all their children
+    do {
+      register kmp_info_t *child_thr = other_threads[child_tid];
+      register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
 #if KMP_CACHE_MANAGE
-            // Prefetch next thread's go count
-            if (child+1 <= branch_factor && child_tid+1 < nproc)
-                KMP_CACHE_PREFETCH(&other_threads[child_tid+1]->th.th_bar[bt].bb.b_go);
+      // Prefetch next thread's go count
+      if (child + 1 <= branch_factor && child_tid + 1 < nproc)
+        KMP_CACHE_PREFETCH(
+            &other_threads[child_tid + 1]->th.th_bar[bt].bb.b_go);
 #endif /* KMP_CACHE_MANAGE */
 
 #if KMP_BARRIER_ICV_PUSH
-            {
-                KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(USER_icv_copy);
-                if (propagate_icvs) {
-                    __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[child_tid],
-                                             team, child_tid, FALSE);
-                    copy_icvs(&team->t.t_implicit_task_taskdata[child_tid].td_icvs,
-                              &team->t.t_implicit_task_taskdata[0].td_icvs);
-                }
-            }
-#endif // KMP_BARRIER_ICV_PUSH
-            KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%u)"
-                          "go(%p): %u => %u\n", gtid, team->t.t_id, tid,
-                          __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
-                          child_tid, &child_bar->b_go, child_bar->b_go,
-                          child_bar->b_go + KMP_BARRIER_STATE_BUMP));
-            // Release child from barrier
-            ANNOTATE_BARRIER_BEGIN(child_thr);
-            kmp_flag_64 flag(&child_bar->b_go, child_thr);
-            flag.release();
-            child++;
-            child_tid++;
+      {
+        KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(USER_icv_copy);
+        if (propagate_icvs) {
+          __kmp_init_implicit_task(team->t.t_ident,
+                                   team->t.t_threads[child_tid], team,
+                                   child_tid, FALSE);
+          copy_icvs(&team->t.t_implicit_task_taskdata[child_tid].td_icvs,
+                    &team->t.t_implicit_task_taskdata[0].td_icvs);
         }
-        while (child <= branch_factor && child_tid < nproc);
-    }
-    KA_TRACE(20, ("__kmp_tree_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+      }
+#endif // KMP_BARRIER_ICV_PUSH
+      KA_TRACE(20,
+               ("__kmp_tree_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%u)"
+                "go(%p): %u => %u\n",
+                gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+                team->t.t_id, child_tid, &child_bar->b_go, child_bar->b_go,
+                child_bar->b_go + KMP_BARRIER_STATE_BUMP));
+      // Release child from barrier
+      ANNOTATE_BARRIER_BEGIN(child_thr);
+      kmp_flag_64 flag(&child_bar->b_go, child_thr);
+      flag.release();
+      child++;
+      child_tid++;
+    } while (child <= branch_factor && child_tid < nproc);
+  }
+  KA_TRACE(
+      20, ("__kmp_tree_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
+           gtid, team->t.t_id, tid, bt));
 }
 
-
 // Hyper Barrier
 static void
-__kmp_hyper_barrier_gather(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                           void (*reduce)(void *, void *)
-                           USE_ITT_BUILD_ARG(void *itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hyper_gather);
-    register kmp_team_t *team = this_thr->th.th_team;
-    register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
-    register kmp_info_t **other_threads = team->t.t_threads;
-    register kmp_uint64 new_state = KMP_BARRIER_UNUSED_STATE;
-    register kmp_uint32 num_threads = this_thr->th.th_team_nproc;
-    register kmp_uint32 branch_bits = __kmp_barrier_gather_branch_bits[bt];
-    register kmp_uint32 branch_factor = 1 << branch_bits;
-    register kmp_uint32 offset;
-    register kmp_uint32 level;
-
-    KA_TRACE(20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
-
-    KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
+__kmp_hyper_barrier_gather(enum barrier_type bt, kmp_info_t *this_thr, int gtid,
+                           int tid, void (*reduce)(void *, void *)
+                                        USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hyper_gather);
+  register kmp_team_t *team = this_thr->th.th_team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_info_t **other_threads = team->t.t_threads;
+  register kmp_uint64 new_state = KMP_BARRIER_UNUSED_STATE;
+  register kmp_uint32 num_threads = this_thr->th.th_team_nproc;
+  register kmp_uint32 branch_bits = __kmp_barrier_gather_branch_bits[bt];
+  register kmp_uint32 branch_factor = 1 << branch_bits;
+  register kmp_uint32 offset;
+  register kmp_uint32 level;
+
+  KA_TRACE(
+      20,
+      ("__kmp_hyper_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
+       gtid, team->t.t_id, tid, bt));
+  KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
 
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-    // Barrier imbalance - save arrive time to the thread
-    if(__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
-        this_thr->th.th_bar_arrive_time = this_thr->th.th_bar_min_time = __itt_get_timestamp();
-    }
-#endif
-    /* Perform a hypercube-embedded tree gather to wait until all of the threads have
-       arrived, and reduce any required data as we go.  */
-    kmp_flag_64 p_flag(&thr_bar->b_arrived);
-    for (level=0, offset=1; offset<num_threads; level+=branch_bits, offset<<=branch_bits)
-    {
-        register kmp_uint32 child;
-        register kmp_uint32 child_tid;
-
-        if (((tid >> level) & (branch_factor - 1)) != 0) {
-            register kmp_int32 parent_tid = tid & ~((1 << (level + branch_bits)) -1);
+  // Barrier imbalance - save arrive time to the thread
+  if (__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
+    this_thr->th.th_bar_arrive_time = this_thr->th.th_bar_min_time =
+        __itt_get_timestamp();
+  }
+#endif
+  /* Perform a hypercube-embedded tree gather to wait until all of the threads
+     have arrived, and reduce any required data as we go.  */
+  kmp_flag_64 p_flag(&thr_bar->b_arrived);
+  for (level = 0, offset = 1; offset < num_threads;
+       level += branch_bits, offset <<= branch_bits) {
+    register kmp_uint32 child;
+    register kmp_uint32 child_tid;
 
-            KA_TRACE(20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d) "
-                          "arrived(%p): %llu => %llu\n", gtid, team->t.t_id, tid,
-                          __kmp_gtid_from_tid(parent_tid, team), team->t.t_id, parent_tid,
-                          &thr_bar->b_arrived, thr_bar->b_arrived,
-                          thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
-            // Mark arrival to parent thread
-            /* After performing this write (in the last iteration of the enclosing for loop),
-               a worker thread may not assume that the team is valid any more - it could be
-               deallocated by the master thread at any time.  */
-            ANNOTATE_BARRIER_BEGIN(this_thr);
-            p_flag.set_waiter(other_threads[parent_tid]);
-            p_flag.release();
-            break;
-        }
+    if (((tid >> level) & (branch_factor - 1)) != 0) {
+      register kmp_int32 parent_tid = tid & ~((1 << (level + branch_bits)) - 1);
 
-        // Parent threads wait for children to arrive
-        if (new_state == KMP_BARRIER_UNUSED_STATE)
-            new_state = team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
-        for (child=1, child_tid=tid+(1 << level); child<branch_factor && child_tid<num_threads;
-             child++, child_tid+=(1 << level))
-        {
-            register kmp_info_t *child_thr = other_threads[child_tid];
-            register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+      KA_TRACE(20,
+               ("__kmp_hyper_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d) "
+                "arrived(%p): %llu => %llu\n",
+                gtid, team->t.t_id, tid, __kmp_gtid_from_tid(parent_tid, team),
+                team->t.t_id, parent_tid, &thr_bar->b_arrived,
+                thr_bar->b_arrived,
+                thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
+      // Mark arrival to parent thread
+      /* After performing this write (in the last iteration of the enclosing for
+         loop), a worker thread may not assume that the team is valid any more
+         - it could be deallocated by the master thread at any time.  */
+      ANNOTATE_BARRIER_BEGIN(this_thr);
+      p_flag.set_waiter(other_threads[parent_tid]);
+      p_flag.release();
+      break;
+    }
+
+    // Parent threads wait for children to arrive
+    if (new_state == KMP_BARRIER_UNUSED_STATE)
+      new_state = team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
+    for (child = 1, child_tid = tid + (1 << level);
+         child < branch_factor && child_tid < num_threads;
+         child++, child_tid += (1 << level)) {
+      register kmp_info_t *child_thr = other_threads[child_tid];
+      register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
 #if KMP_CACHE_MANAGE
-            register kmp_uint32 next_child_tid = child_tid + (1 << level);
-            // Prefetch next thread's arrived count
-            if (child+1 < branch_factor && next_child_tid < num_threads)
-                KMP_CACHE_PREFETCH(&other_threads[next_child_tid]->th.th_bar[bt].bb.b_arrived);
+      register kmp_uint32 next_child_tid = child_tid + (1 << level);
+      // Prefetch next thread's arrived count
+      if (child + 1 < branch_factor && next_child_tid < num_threads)
+        KMP_CACHE_PREFETCH(
+            &other_threads[next_child_tid]->th.th_bar[bt].bb.b_arrived);
 #endif /* KMP_CACHE_MANAGE */
-            KA_TRACE(20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%u) "
-                          "arrived(%p) == %llu\n", gtid, team->t.t_id, tid,
-                          __kmp_gtid_from_tid(child_tid, team), team->t.t_id, child_tid,
-                          &child_bar->b_arrived, new_state));
-            // Wait for child to arrive
-            kmp_flag_64 c_flag(&child_bar->b_arrived, new_state);
-            c_flag.wait(this_thr, FALSE
-                        USE_ITT_BUILD_ARG(itt_sync_obj) );
-            ANNOTATE_BARRIER_END(child_thr);
+      KA_TRACE(20,
+               ("__kmp_hyper_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%u) "
+                "arrived(%p) == %llu\n",
+                gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+                team->t.t_id, child_tid, &child_bar->b_arrived, new_state));
+      // Wait for child to arrive
+      kmp_flag_64 c_flag(&child_bar->b_arrived, new_state);
+      c_flag.wait(this_thr, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+      ANNOTATE_BARRIER_END(child_thr);
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-            // Barrier imbalance - write min of the thread time and a child time to the thread.
-            if (__kmp_forkjoin_frames_mode == 2) {
-                this_thr->th.th_bar_min_time = KMP_MIN(this_thr->th.th_bar_min_time,
-                                                          child_thr->th.th_bar_min_time);
-            }
+      // Barrier imbalance - write min of the thread time and a child time to
+      // the thread.
+      if (__kmp_forkjoin_frames_mode == 2) {
+        this_thr->th.th_bar_min_time = KMP_MIN(this_thr->th.th_bar_min_time,
+                                               child_thr->th.th_bar_min_time);
+      }
 #endif
-            if (reduce) {
-                KA_TRACE(100, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) += T#%d(%d:%u)\n",
-                               gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                               team->t.t_id, child_tid));
-                ANNOTATE_REDUCE_AFTER(reduce);
-                (*reduce)(this_thr->th.th_local.reduce_data, child_thr->th.th_local.reduce_data);
-                ANNOTATE_REDUCE_BEFORE(reduce);
-                ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
-            }
-        }
+      if (reduce) {
+        KA_TRACE(100,
+                 ("__kmp_hyper_barrier_gather: T#%d(%d:%d) += T#%d(%d:%u)\n",
+                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+                  team->t.t_id, child_tid));
+        ANNOTATE_REDUCE_AFTER(reduce);
+        (*reduce)(this_thr->th.th_local.reduce_data,
+                  child_thr->th.th_local.reduce_data);
+        ANNOTATE_REDUCE_BEFORE(reduce);
+        ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
+      }
     }
+  }
 
-    if (KMP_MASTER_TID(tid)) {
-        // Need to update the team arrived pointer if we are the master thread
-        if (new_state == KMP_BARRIER_UNUSED_STATE)
-            team->t.t_bar[bt].b_arrived += KMP_BARRIER_STATE_BUMP;
-        else
-            team->t.t_bar[bt].b_arrived = new_state;
-        KA_TRACE(20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) set team %d arrived(%p) = %llu\n",
-                      gtid, team->t.t_id, tid, team->t.t_id,
-                      &team->t.t_bar[bt].b_arrived, team->t.t_bar[bt].b_arrived));
-    }
-    KA_TRACE(20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+  if (KMP_MASTER_TID(tid)) {
+    // Need to update the team arrived pointer if we are the master thread
+    if (new_state == KMP_BARRIER_UNUSED_STATE)
+      team->t.t_bar[bt].b_arrived += KMP_BARRIER_STATE_BUMP;
+    else
+      team->t.t_bar[bt].b_arrived = new_state;
+    KA_TRACE(20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) set team %d "
+                  "arrived(%p) = %llu\n",
+                  gtid, team->t.t_id, tid, team->t.t_id,
+                  &team->t.t_bar[bt].b_arrived, team->t.t_bar[bt].b_arrived));
+  }
+  KA_TRACE(
+      20, ("__kmp_hyper_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
+           gtid, team->t.t_id, tid, bt));
 }
 
 // The reverse versions seem to beat the forward versions overall
 #define KMP_REVERSE_HYPER_BAR
-static void
-__kmp_hyper_barrier_release(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                            int propagate_icvs
-                            USE_ITT_BUILD_ARG(void *itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hyper_release);
-    register kmp_team_t    *team;
-    register kmp_bstate_t  *thr_bar       = & this_thr -> th.th_bar[ bt ].bb;
-    register kmp_info_t   **other_threads;
-    register kmp_uint32     num_threads;
-    register kmp_uint32     branch_bits   = __kmp_barrier_release_branch_bits[ bt ];
-    register kmp_uint32     branch_factor = 1 << branch_bits;
-    register kmp_uint32     child;
-    register kmp_uint32     child_tid;
-    register kmp_uint32     offset;
-    register kmp_uint32     level;
-
-    /* Perform a hypercube-embedded tree release for all of the threads that have been gathered.
-       If KMP_REVERSE_HYPER_BAR is defined (default) the threads are released in the reverse
-       order of the corresponding gather, otherwise threads are released in the same order. */
-    if (KMP_MASTER_TID(tid)) { // master
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d(%d:%d) master enter for barrier type %d\n",
-                      gtid, team->t.t_id, tid, bt));
+static void __kmp_hyper_barrier_release(
+    enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
+    int propagate_icvs USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hyper_release);
+  register kmp_team_t *team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_info_t **other_threads;
+  register kmp_uint32 num_threads;
+  register kmp_uint32 branch_bits = __kmp_barrier_release_branch_bits[bt];
+  register kmp_uint32 branch_factor = 1 << branch_bits;
+  register kmp_uint32 child;
+  register kmp_uint32 child_tid;
+  register kmp_uint32 offset;
+  register kmp_uint32 level;
+
+  /* Perform a hypercube-embedded tree release for all of the threads that have
+     been gathered. If KMP_REVERSE_HYPER_BAR is defined (default) the threads
+     are released in the reverse order of the corresponding gather, otherwise
+     threads are released in the same order. */
+  if (KMP_MASTER_TID(tid)) { // master
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d(%d:%d) master enter for "
+                  "barrier type %d\n",
+                  gtid, team->t.t_id, tid, bt));
 #if KMP_BARRIER_ICV_PUSH
-        if (propagate_icvs) { // master already has ICVs in final destination; copy
-            copy_icvs(&thr_bar->th_fixed_icvs, &team->t.t_implicit_task_taskdata[tid].td_icvs);
-        }
-#endif
+    if (propagate_icvs) { // master already has ICVs in final destination; copy
+      copy_icvs(&thr_bar->th_fixed_icvs,
+                &team->t.t_implicit_task_taskdata[tid].td_icvs);
     }
-    else  { // Handle fork barrier workers who aren't part of a team yet
-        KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d wait go(%p) == %u\n",
-                      gtid, &thr_bar->b_go, KMP_BARRIER_STATE_BUMP));
-        // Wait for parent thread to release us
-        kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
-        flag.wait(this_thr, TRUE
-                  USE_ITT_BUILD_ARG(itt_sync_obj) );
-        ANNOTATE_BARRIER_END(this_thr);
+#endif
+  } else { // Handle fork barrier workers who aren't part of a team yet
+    KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d wait go(%p) == %u\n", gtid,
+                  &thr_bar->b_go, KMP_BARRIER_STATE_BUMP));
+    // Wait for parent thread to release us
+    kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
+    flag.wait(this_thr, TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+    ANNOTATE_BARRIER_END(this_thr);
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-        if ((__itt_sync_create_ptr && itt_sync_obj == NULL) || KMP_ITT_DEBUG) {
-            // In fork barrier where we could not get the object reliably
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 0, -1);
-            // Cancel wait on previous parallel region...
-            __kmp_itt_task_starting(itt_sync_obj);
-
-            if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
-                return;
-
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
-            if (itt_sync_obj != NULL)
-                // Call prepare as early as possible for "new" barrier
-                __kmp_itt_task_finished(itt_sync_obj);
-        } else
+    if ((__itt_sync_create_ptr && itt_sync_obj == NULL) || KMP_ITT_DEBUG) {
+      // In fork barrier where we could not get the object reliably
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 0, -1);
+      // Cancel wait on previous parallel region...
+      __kmp_itt_task_starting(itt_sync_obj);
+
+      if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
+        return;
+
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
+      if (itt_sync_obj != NULL)
+        // Call prepare as early as possible for "new" barrier
+        __kmp_itt_task_finished(itt_sync_obj);
+    } else
 #endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
         // Early exit for reaping threads releasing forkjoin barrier
         if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
-            return;
+      return;
 
-        // The worker thread may now assume that the team is valid.
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        tid = __kmp_tid_from_gtid(gtid);
-
-        TCW_4(thr_bar->b_go, KMP_INIT_BARRIER_STATE);
-        KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
-                      gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
-        KMP_MB();  // Flush all pending memory write invalidates.
-    }
-    num_threads = this_thr->th.th_team_nproc;
-    other_threads = team->t.t_threads;
+    // The worker thread may now assume that the team is valid.
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    tid = __kmp_tid_from_gtid(gtid);
+
+    TCW_4(thr_bar->b_go, KMP_INIT_BARRIER_STATE);
+    KA_TRACE(20,
+             ("__kmp_hyper_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
+              gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
+    KMP_MB(); // Flush all pending memory write invalidates.
+  }
+  num_threads = this_thr->th.th_team_nproc;
+  other_threads = team->t.t_threads;
 
 #ifdef KMP_REVERSE_HYPER_BAR
-    // Count up to correct level for parent
-    for (level=0, offset=1; offset<num_threads && (((tid>>level) & (branch_factor-1)) == 0);
-         level+=branch_bits, offset<<=branch_bits);
-
-    // Now go down from there
-    for (level-=branch_bits, offset>>=branch_bits; offset != 0;
-         level-=branch_bits, offset>>=branch_bits)
+  // Count up to correct level for parent
+  for (level = 0, offset = 1;
+       offset < num_threads && (((tid >> level) & (branch_factor - 1)) == 0);
+       level += branch_bits, offset <<= branch_bits)
+    ;
+
+  // Now go down from there
+  for (level -= branch_bits, offset >>= branch_bits; offset != 0;
+       level -= branch_bits, offset >>= branch_bits)
 #else
-    // Go down the tree, level by level
-    for (level=0, offset=1; offset<num_threads; level+=branch_bits, offset<<=branch_bits)
+  // Go down the tree, level by level
+  for (level = 0, offset = 1; offset < num_threads;
+       level += branch_bits, offset <<= branch_bits)
 #endif // KMP_REVERSE_HYPER_BAR
-    {
+  {
 #ifdef KMP_REVERSE_HYPER_BAR
-        /* Now go in reverse order through the children, highest to lowest.
-           Initial setting of child is conservative here. */
-        child = num_threads >> ((level==0)?level:level-1);
-        for (child=(child<branch_factor-1) ? child : branch_factor-1, child_tid=tid+(child<<level);
-             child>=1; child--, child_tid-=(1<<level))
+    /* Now go in reverse order through the children, highest to lowest.
+       Initial setting of child is conservative here. */
+    child = num_threads >> ((level == 0) ? level : level - 1);
+    for (child = (child < branch_factor - 1) ? child : branch_factor - 1,
+        child_tid = tid + (child << level);
+         child >= 1; child--, child_tid -= (1 << level))
 #else
-        if (((tid >> level) & (branch_factor - 1)) != 0)
-            // No need to go lower than this, since this is the level parent would be notified
-            break;
-        // Iterate through children on this level of the tree
-        for (child=1, child_tid=tid+(1<<level); child<branch_factor && child_tid<num_threads;
-             child++, child_tid+=(1<<level))
+    if (((tid >> level) & (branch_factor - 1)) != 0)
+      // No need to go lower than this, since this is the level parent would be
+      // notified
+      break;
+    // Iterate through children on this level of the tree
+    for (child = 1, child_tid = tid + (1 << level);
+         child < branch_factor && child_tid < num_threads;
+         child++, child_tid += (1 << level))
 #endif // KMP_REVERSE_HYPER_BAR
-        {
-            if (child_tid >= num_threads) continue;  // Child doesn't exist so keep going
-            else {
-                register kmp_info_t *child_thr = other_threads[child_tid];
-                register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+    {
+      if (child_tid >= num_threads)
+        continue; // Child doesn't exist so keep going
+      else {
+        register kmp_info_t *child_thr = other_threads[child_tid];
+        register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
 #if KMP_CACHE_MANAGE
-                register kmp_uint32 next_child_tid = child_tid - (1 << level);
-                // Prefetch next thread's go count
-# ifdef KMP_REVERSE_HYPER_BAR
-                if (child-1 >= 1 && next_child_tid < num_threads)
-# else
-                if (child+1 < branch_factor && next_child_tid < num_threads)
-# endif // KMP_REVERSE_HYPER_BAR
-                    KMP_CACHE_PREFETCH(&other_threads[next_child_tid]->th.th_bar[bt].bb.b_go);
+        register kmp_uint32 next_child_tid = child_tid - (1 << level);
+// Prefetch next thread's go count
+#ifdef KMP_REVERSE_HYPER_BAR
+        if (child - 1 >= 1 && next_child_tid < num_threads)
+#else
+        if (child + 1 < branch_factor && next_child_tid < num_threads)
+#endif // KMP_REVERSE_HYPER_BAR
+          KMP_CACHE_PREFETCH(
+              &other_threads[next_child_tid]->th.th_bar[bt].bb.b_go);
 #endif /* KMP_CACHE_MANAGE */
 
 #if KMP_BARRIER_ICV_PUSH
-                if (propagate_icvs) // push my fixed ICVs to my child
-                    copy_icvs(&child_bar->th_fixed_icvs, &thr_bar->th_fixed_icvs);
+        if (propagate_icvs) // push my fixed ICVs to my child
+          copy_icvs(&child_bar->th_fixed_icvs, &thr_bar->th_fixed_icvs);
 #endif // KMP_BARRIER_ICV_PUSH
 
-                KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%u)"
-                              "go(%p): %u => %u\n", gtid, team->t.t_id, tid,
-                              __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
-                              child_tid, &child_bar->b_go, child_bar->b_go,
-                              child_bar->b_go + KMP_BARRIER_STATE_BUMP));
-                // Release child from barrier
-                ANNOTATE_BARRIER_BEGIN(child_thr);
-                kmp_flag_64 flag(&child_bar->b_go, child_thr);
-                flag.release();
-            }
-        }
+        KA_TRACE(
+            20,
+            ("__kmp_hyper_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%u)"
+             "go(%p): %u => %u\n",
+             gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+             team->t.t_id, child_tid, &child_bar->b_go, child_bar->b_go,
+             child_bar->b_go + KMP_BARRIER_STATE_BUMP));
+        // Release child from barrier
+        ANNOTATE_BARRIER_BEGIN(child_thr);
+        kmp_flag_64 flag(&child_bar->b_go, child_thr);
+        flag.release();
+      }
     }
+  }
 #if KMP_BARRIER_ICV_PUSH
-    if (propagate_icvs && !KMP_MASTER_TID(tid)) { // copy ICVs locally to final dest
-        __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[tid], team, tid, FALSE);
-        copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs, &thr_bar->th_fixed_icvs);
-    }
-#endif
-    KA_TRACE(20, ("__kmp_hyper_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+  if (propagate_icvs &&
+      !KMP_MASTER_TID(tid)) { // copy ICVs locally to final dest
+    __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[tid], team, tid,
+                             FALSE);
+    copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
+              &thr_bar->th_fixed_icvs);
+  }
+#endif
+  KA_TRACE(
+      20,
+      ("__kmp_hyper_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
+       gtid, team->t.t_id, tid, bt));
 }
 
 // Hierarchical Barrier
 
 // Initialize thread barrier data
-/* Initializes/re-initializes the hierarchical barrier data stored on a thread.  Performs the
-   minimum amount of initialization required based on how the team has changed.  Returns true if
-   leaf children will require both on-core and traditional wake-up mechanisms.  For example, if the
-   team size increases, threads already in the team will respond to on-core wakeup on their parent
-   thread, but threads newly added to the team will only be listening on the their local b_go. */
-static bool
-__kmp_init_hierarchical_barrier_thread(enum barrier_type bt, kmp_bstate_t *thr_bar, kmp_uint32 nproc,
-                                       int gtid, int tid, kmp_team_t *team)
-{
-    // Checks to determine if (re-)initialization is needed
-    bool uninitialized = thr_bar->team == NULL;
-    bool team_changed = team != thr_bar->team;
-    bool team_sz_changed = nproc != thr_bar->nproc;
-    bool tid_changed = tid != thr_bar->old_tid;
-    bool retval = false;
-
-    if (uninitialized || team_sz_changed) {
-        __kmp_get_hierarchy(nproc, thr_bar);
-    }
-
-    if (uninitialized || team_sz_changed || tid_changed) {
-        thr_bar->my_level = thr_bar->depth-1; // default for master
-        thr_bar->parent_tid = -1; // default for master
-        if (!KMP_MASTER_TID(tid)) { // if not master, find parent thread in hierarchy
-            kmp_uint32 d=0;
-            while (d<thr_bar->depth) { // find parent based on level of thread in hierarchy, and note level
-                kmp_uint32 rem;
-                if (d == thr_bar->depth-2) { // reached level right below the master
-                    thr_bar->parent_tid = 0;
-                    thr_bar->my_level = d;
-                    break;
-                }
-                else if ((rem = tid%thr_bar->skip_per_level[d+1]) != 0) { // TODO: can we make this op faster?
-                    // thread is not a subtree root at next level, so this is max
-                    thr_bar->parent_tid = tid - rem;
-                    thr_bar->my_level = d;
-                    break;
-                }
-                ++d;
-            }
+/* Initializes/re-initializes the hierarchical barrier data stored on a thread.
+   Performs the minimum amount of initialization required based on how the team
+   has changed. Returns true if leaf children will require both on-core and
+   traditional wake-up mechanisms. For example, if the team size increases,
+   threads already in the team will respond to on-core wakeup on their parent
+   thread, but threads newly added to the team will only be listening on the
+   their local b_go. */
+static bool __kmp_init_hierarchical_barrier_thread(enum barrier_type bt,
+                                                   kmp_bstate_t *thr_bar,
+                                                   kmp_uint32 nproc, int gtid,
+                                                   int tid, kmp_team_t *team) {
+  // Checks to determine if (re-)initialization is needed
+  bool uninitialized = thr_bar->team == NULL;
+  bool team_changed = team != thr_bar->team;
+  bool team_sz_changed = nproc != thr_bar->nproc;
+  bool tid_changed = tid != thr_bar->old_tid;
+  bool retval = false;
+
+  if (uninitialized || team_sz_changed) {
+    __kmp_get_hierarchy(nproc, thr_bar);
+  }
+
+  if (uninitialized || team_sz_changed || tid_changed) {
+    thr_bar->my_level = thr_bar->depth - 1; // default for master
+    thr_bar->parent_tid = -1; // default for master
+    if (!KMP_MASTER_TID(
+            tid)) { // if not master, find parent thread in hierarchy
+      kmp_uint32 d = 0;
+      while (d < thr_bar->depth) { // find parent based on level of thread in
+        // hierarchy, and note level
+        kmp_uint32 rem;
+        if (d == thr_bar->depth - 2) { // reached level right below the master
+          thr_bar->parent_tid = 0;
+          thr_bar->my_level = d;
+          break;
+        } else if ((rem = tid % thr_bar->skip_per_level[d + 1]) !=
+                   0) { // TODO: can we make this op faster?
+          // thread is not a subtree root at next level, so this is max
+          thr_bar->parent_tid = tid - rem;
+          thr_bar->my_level = d;
+          break;
         }
-        thr_bar->offset = 7-(tid-thr_bar->parent_tid-1);
-        thr_bar->old_tid = tid;
-        thr_bar->wait_flag = KMP_BARRIER_NOT_WAITING;
-        thr_bar->team = team;
-        thr_bar->parent_bar = &team->t.t_threads[thr_bar->parent_tid]->th.th_bar[bt].bb;
-    }
-    if (uninitialized || team_changed || tid_changed) {
-        thr_bar->team = team;
-        thr_bar->parent_bar = &team->t.t_threads[thr_bar->parent_tid]->th.th_bar[bt].bb;
-        retval = true;
-    }
-    if (uninitialized || team_sz_changed || tid_changed) {
-        thr_bar->nproc = nproc;
-        thr_bar->leaf_kids = thr_bar->base_leaf_kids;
-        if (thr_bar->my_level == 0) thr_bar->leaf_kids=0;
-        if (thr_bar->leaf_kids && (kmp_uint32)tid+thr_bar->leaf_kids+1 > nproc)
-            thr_bar->leaf_kids = nproc - tid - 1;
-        thr_bar->leaf_state = 0;
-        for (int i=0; i<thr_bar->leaf_kids; ++i) ((char *)&(thr_bar->leaf_state))[7-i] = 1;
+        ++d;
+      }
     }
-    return retval;
+    thr_bar->offset = 7 - (tid - thr_bar->parent_tid - 1);
+    thr_bar->old_tid = tid;
+    thr_bar->wait_flag = KMP_BARRIER_NOT_WAITING;
+    thr_bar->team = team;
+    thr_bar->parent_bar =
+        &team->t.t_threads[thr_bar->parent_tid]->th.th_bar[bt].bb;
+  }
+  if (uninitialized || team_changed || tid_changed) {
+    thr_bar->team = team;
+    thr_bar->parent_bar =
+        &team->t.t_threads[thr_bar->parent_tid]->th.th_bar[bt].bb;
+    retval = true;
+  }
+  if (uninitialized || team_sz_changed || tid_changed) {
+    thr_bar->nproc = nproc;
+    thr_bar->leaf_kids = thr_bar->base_leaf_kids;
+    if (thr_bar->my_level == 0)
+      thr_bar->leaf_kids = 0;
+    if (thr_bar->leaf_kids && (kmp_uint32)tid + thr_bar->leaf_kids + 1 > nproc)
+      thr_bar->leaf_kids = nproc - tid - 1;
+    thr_bar->leaf_state = 0;
+    for (int i = 0; i < thr_bar->leaf_kids; ++i)
+      ((char *)&(thr_bar->leaf_state))[7 - i] = 1;
+  }
+  return retval;
 }
 
-static void
-__kmp_hierarchical_barrier_gather(enum barrier_type bt, kmp_info_t *this_thr,
-                                  int gtid, int tid, void (*reduce) (void *, void *)
-                                  USE_ITT_BUILD_ARG(void * itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hier_gather);
-    register kmp_team_t *team = this_thr->th.th_team;
-    register kmp_bstate_t *thr_bar = & this_thr->th.th_bar[bt].bb;
-    register kmp_uint32 nproc = this_thr->th.th_team_nproc;
-    register kmp_info_t **other_threads = team->t.t_threads;
-    register kmp_uint64 new_state;
+static void __kmp_hierarchical_barrier_gather(
+    enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
+    void (*reduce)(void *, void *) USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hier_gather);
+  register kmp_team_t *team = this_thr->th.th_team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_uint32 nproc = this_thr->th.th_team_nproc;
+  register kmp_info_t **other_threads = team->t.t_threads;
+  register kmp_uint64 new_state;
 
-    int level = team->t.t_level;
+  int level = team->t.t_level;
 #if OMP_40_ENABLED
-    if (other_threads[0]->th.th_teams_microtask)    // are we inside the teams construct?
-        if (this_thr->th.th_teams_size.nteams > 1)
-            ++level; // level was not increased in teams construct for team_of_masters
-#endif
-    if (level == 1) thr_bar->use_oncore_barrier = 1;
-    else thr_bar->use_oncore_barrier = 0; // Do not use oncore barrier when nested
-
-    KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) enter for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
-    KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
+  if (other_threads[0]
+          ->th.th_teams_microtask) // are we inside the teams construct?
+    if (this_thr->th.th_teams_size.nteams > 1)
+      ++level; // level was not increased in teams construct for team_of_masters
+#endif
+  if (level == 1)
+    thr_bar->use_oncore_barrier = 1;
+  else
+    thr_bar->use_oncore_barrier = 0; // Do not use oncore barrier when nested
+
+  KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) enter for "
+                "barrier type %d\n",
+                gtid, team->t.t_id, tid, bt));
+  KMP_DEBUG_ASSERT(this_thr == other_threads[this_thr->th.th_info.ds.ds_tid]);
 
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-    // Barrier imbalance - save arrive time to the thread
-    if(__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
-        this_thr->th.th_bar_arrive_time = __itt_get_timestamp();
-    }
-#endif
-
-    (void)__kmp_init_hierarchical_barrier_thread(bt, thr_bar, nproc, gtid, tid, team);
-
-    if (thr_bar->my_level) { // not a leaf (my_level==0 means leaf)
-        register kmp_int32 child_tid;
-        new_state = (kmp_uint64)team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
-        if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME && thr_bar->use_oncore_barrier) {
-            if (thr_bar->leaf_kids) { // First, wait for leaf children to check-in on my b_arrived flag
-                kmp_uint64 leaf_state = KMP_MASTER_TID(tid) ? thr_bar->b_arrived | thr_bar->leaf_state : team->t.t_bar[bt].b_arrived | thr_bar->leaf_state;
-                KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) waiting for leaf kids\n",
-                              gtid, team->t.t_id, tid));
-                kmp_flag_64 flag(&thr_bar->b_arrived, leaf_state);
-                flag.wait(this_thr, FALSE
-                          USE_ITT_BUILD_ARG(itt_sync_obj) );
-                if (reduce) {
-                    ANNOTATE_REDUCE_AFTER(reduce);
-                    for (child_tid=tid+1; child_tid<=tid+thr_bar->leaf_kids; ++child_tid) {
-                        KA_TRACE(100, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) += T#%d(%d:%d)\n",
-                                       gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                       team->t.t_id, child_tid));
-                        ANNOTATE_BARRIER_END(other_threads[child_tid]);
-                        (*reduce)(this_thr->th.th_local.reduce_data, other_threads[child_tid]->th.th_local.reduce_data);
-                    }
-                    ANNOTATE_REDUCE_BEFORE(reduce);
-                    ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
-                }
-                (void) KMP_TEST_THEN_AND64((volatile kmp_int64 *)&thr_bar->b_arrived, ~(thr_bar->leaf_state)); // clear leaf_state bits
-            }
-            // Next, wait for higher level children on each child's b_arrived flag
-            for (kmp_uint32 d=1; d<thr_bar->my_level; ++d) { // gather lowest level threads first, but skip 0
-                kmp_uint32 last = tid+thr_bar->skip_per_level[d+1], skip = thr_bar->skip_per_level[d];
-                if (last > nproc) last = nproc;
-                for (child_tid=tid+skip; child_tid<(int)last; child_tid+=skip) {
-                    register kmp_info_t *child_thr = other_threads[child_tid];
-                    register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
-                    KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%d) "
-                                  "arrived(%p) == %llu\n",
-                                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                  team->t.t_id, child_tid, &child_bar->b_arrived, new_state));
-                    kmp_flag_64 flag(&child_bar->b_arrived, new_state);
-                    flag.wait(this_thr, FALSE
-                              USE_ITT_BUILD_ARG(itt_sync_obj) );
-                    ANNOTATE_BARRIER_END(child_thr);
-                    if (reduce) {
-                        KA_TRACE(100, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) += T#%d(%d:%d)\n",
-                                       gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                       team->t.t_id, child_tid));
-                        ANNOTATE_REDUCE_AFTER(reduce);
-                        (*reduce)(this_thr->th.th_local.reduce_data, child_thr->th.th_local.reduce_data);
-                        ANNOTATE_REDUCE_BEFORE(reduce);
-                        ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
-                    }
-                }
-            }
+  // Barrier imbalance - save arrive time to the thread
+  if (__kmp_forkjoin_frames_mode == 3 || __kmp_forkjoin_frames_mode == 2) {
+    this_thr->th.th_bar_arrive_time = __itt_get_timestamp();
+  }
+#endif
+
+  (void)__kmp_init_hierarchical_barrier_thread(bt, thr_bar, nproc, gtid, tid,
+                                               team);
+
+  if (thr_bar->my_level) { // not a leaf (my_level==0 means leaf)
+    register kmp_int32 child_tid;
+    new_state =
+        (kmp_uint64)team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
+    if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME &&
+        thr_bar->use_oncore_barrier) {
+      if (thr_bar->leaf_kids) { // First, wait for leaf children to check-in on
+        // my b_arrived flag
+        kmp_uint64 leaf_state =
+            KMP_MASTER_TID(tid)
+                ? thr_bar->b_arrived | thr_bar->leaf_state
+                : team->t.t_bar[bt].b_arrived | thr_bar->leaf_state;
+        KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) waiting "
+                      "for leaf kids\n",
+                      gtid, team->t.t_id, tid));
+        kmp_flag_64 flag(&thr_bar->b_arrived, leaf_state);
+        flag.wait(this_thr, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+        if (reduce) {
+          ANNOTATE_REDUCE_AFTER(reduce);
+          for (child_tid = tid + 1; child_tid <= tid + thr_bar->leaf_kids;
+               ++child_tid) {
+            KA_TRACE(100, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) += "
+                           "T#%d(%d:%d)\n",
+                           gtid, team->t.t_id, tid,
+                           __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                           child_tid));
+            ANNOTATE_BARRIER_END(other_threads[child_tid]);
+            (*reduce)(this_thr->th.th_local.reduce_data,
+                      other_threads[child_tid]->th.th_local.reduce_data);
+          }
+          ANNOTATE_REDUCE_BEFORE(reduce);
+          ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
+        }
+        (void)KMP_TEST_THEN_AND64(
+            (volatile kmp_int64 *)&thr_bar->b_arrived,
+            ~(thr_bar->leaf_state)); // clear leaf_state bits
+      }
+      // Next, wait for higher level children on each child's b_arrived flag
+      for (kmp_uint32 d = 1; d < thr_bar->my_level;
+           ++d) { // gather lowest level threads first, but skip 0
+        kmp_uint32 last = tid + thr_bar->skip_per_level[d + 1],
+                   skip = thr_bar->skip_per_level[d];
+        if (last > nproc)
+          last = nproc;
+        for (child_tid = tid + skip; child_tid < (int)last; child_tid += skip) {
+          register kmp_info_t *child_thr = other_threads[child_tid];
+          register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+          KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) wait "
+                        "T#%d(%d:%d) "
+                        "arrived(%p) == %llu\n",
+                        gtid, team->t.t_id, tid,
+                        __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                        child_tid, &child_bar->b_arrived, new_state));
+          kmp_flag_64 flag(&child_bar->b_arrived, new_state);
+          flag.wait(this_thr, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+          ANNOTATE_BARRIER_END(child_thr);
+          if (reduce) {
+            KA_TRACE(100, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) += "
+                           "T#%d(%d:%d)\n",
+                           gtid, team->t.t_id, tid,
+                           __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                           child_tid));
+            ANNOTATE_REDUCE_AFTER(reduce);
+            (*reduce)(this_thr->th.th_local.reduce_data,
+                      child_thr->th.th_local.reduce_data);
+            ANNOTATE_REDUCE_BEFORE(reduce);
+            ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
+          }
         }
-        else { // Blocktime is not infinite
-            for (kmp_uint32 d=0; d<thr_bar->my_level; ++d) { // Gather lowest level threads first
-                kmp_uint32 last = tid+thr_bar->skip_per_level[d+1], skip = thr_bar->skip_per_level[d];
-                if (last > nproc) last = nproc;
-                for (child_tid=tid+skip; child_tid<(int)last; child_tid+=skip) {
-                    register kmp_info_t *child_thr = other_threads[child_tid];
-                    register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
-                    KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) wait T#%d(%d:%d) "
-                                  "arrived(%p) == %llu\n",
-                                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                  team->t.t_id, child_tid, &child_bar->b_arrived, new_state));
-                    kmp_flag_64 flag(&child_bar->b_arrived, new_state);
-                    flag.wait(this_thr, FALSE
-                              USE_ITT_BUILD_ARG(itt_sync_obj) );
-                    ANNOTATE_BARRIER_END(child_thr);
-                    if (reduce) {
-                        KA_TRACE(100, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) += T#%d(%d:%d)\n",
-                                       gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                       team->t.t_id, child_tid));
-                        ANNOTATE_REDUCE_AFTER(reduce);
-                        (*reduce)(this_thr->th.th_local.reduce_data, child_thr->th.th_local.reduce_data);
-                        ANNOTATE_REDUCE_BEFORE(reduce);
-                        ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
-                    }
-                }
-            }
+      }
+    } else { // Blocktime is not infinite
+      for (kmp_uint32 d = 0; d < thr_bar->my_level;
+           ++d) { // Gather lowest level threads first
+        kmp_uint32 last = tid + thr_bar->skip_per_level[d + 1],
+                   skip = thr_bar->skip_per_level[d];
+        if (last > nproc)
+          last = nproc;
+        for (child_tid = tid + skip; child_tid < (int)last; child_tid += skip) {
+          register kmp_info_t *child_thr = other_threads[child_tid];
+          register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+          KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) wait "
+                        "T#%d(%d:%d) "
+                        "arrived(%p) == %llu\n",
+                        gtid, team->t.t_id, tid,
+                        __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                        child_tid, &child_bar->b_arrived, new_state));
+          kmp_flag_64 flag(&child_bar->b_arrived, new_state);
+          flag.wait(this_thr, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+          ANNOTATE_BARRIER_END(child_thr);
+          if (reduce) {
+            KA_TRACE(100, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) += "
+                           "T#%d(%d:%d)\n",
+                           gtid, team->t.t_id, tid,
+                           __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                           child_tid));
+            ANNOTATE_REDUCE_AFTER(reduce);
+            (*reduce)(this_thr->th.th_local.reduce_data,
+                      child_thr->th.th_local.reduce_data);
+            ANNOTATE_REDUCE_BEFORE(reduce);
+            ANNOTATE_REDUCE_BEFORE(&team->t.t_bar);
+          }
         }
+      }
     }
-    // All subordinates are gathered; now release parent if not master thread
+  }
+  // All subordinates are gathered; now release parent if not master thread
 
-    if (!KMP_MASTER_TID(tid)) { // worker threads release parent in hierarchy
-        KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d) "
-                      "arrived(%p): %llu => %llu\n", gtid, team->t.t_id, tid,
-                      __kmp_gtid_from_tid(thr_bar->parent_tid, team), team->t.t_id, thr_bar->parent_tid,
-                      &thr_bar->b_arrived, thr_bar->b_arrived, thr_bar->b_arrived+KMP_BARRIER_STATE_BUMP));
-        /* Mark arrival to parent: After performing this write, a worker thread may not assume that
-           the team is valid any more - it could be deallocated by the master thread at any time. */
-        if (thr_bar->my_level || __kmp_dflt_blocktime != KMP_MAX_BLOCKTIME
-            || !thr_bar->use_oncore_barrier) { // Parent is waiting on my b_arrived flag; release it
-            ANNOTATE_BARRIER_BEGIN(this_thr);
-            kmp_flag_64 flag(&thr_bar->b_arrived, other_threads[thr_bar->parent_tid]);
-            flag.release();
-        }
-        else { // Leaf does special release on the "offset" bits of parent's b_arrived flag
-            thr_bar->b_arrived = team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
-            kmp_flag_oncore flag(&thr_bar->parent_bar->b_arrived, thr_bar->offset);
-            flag.set_waiter(other_threads[thr_bar->parent_tid]);
-            flag.release();
-        }
-    } else { // Master thread needs to update the team's b_arrived value
-        team->t.t_bar[bt].b_arrived = new_state;
-        KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) set team %d arrived(%p) = %llu\n",
-                      gtid, team->t.t_id, tid, team->t.t_id, &team->t.t_bar[bt].b_arrived, team->t.t_bar[bt].b_arrived));
-    }
-    // Is the team access below unsafe or just technically invalid?
-    KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+  if (!KMP_MASTER_TID(tid)) { // worker threads release parent in hierarchy
+    KA_TRACE(
+        20,
+        ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) releasing T#%d(%d:%d) "
+         "arrived(%p): %llu => %llu\n",
+         gtid, team->t.t_id, tid,
+         __kmp_gtid_from_tid(thr_bar->parent_tid, team), team->t.t_id,
+         thr_bar->parent_tid, &thr_bar->b_arrived, thr_bar->b_arrived,
+         thr_bar->b_arrived + KMP_BARRIER_STATE_BUMP));
+    /* Mark arrival to parent: After performing this write, a worker thread may
+       not assume that the team is valid any more - it could be deallocated by
+       the master thread at any time. */
+    if (thr_bar->my_level || __kmp_dflt_blocktime != KMP_MAX_BLOCKTIME ||
+        !thr_bar->use_oncore_barrier) { // Parent is waiting on my b_arrived
+      // flag; release it
+      ANNOTATE_BARRIER_BEGIN(this_thr);
+      kmp_flag_64 flag(&thr_bar->b_arrived, other_threads[thr_bar->parent_tid]);
+      flag.release();
+    } else { // Leaf does special release on the "offset" bits of parent's
+      // b_arrived flag
+      thr_bar->b_arrived = team->t.t_bar[bt].b_arrived + KMP_BARRIER_STATE_BUMP;
+      kmp_flag_oncore flag(&thr_bar->parent_bar->b_arrived, thr_bar->offset);
+      flag.set_waiter(other_threads[thr_bar->parent_tid]);
+      flag.release();
+    }
+  } else { // Master thread needs to update the team's b_arrived value
+    team->t.t_bar[bt].b_arrived = new_state;
+    KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) set team %d "
+                  "arrived(%p) = %llu\n",
+                  gtid, team->t.t_id, tid, team->t.t_id,
+                  &team->t.t_bar[bt].b_arrived, team->t.t_bar[bt].b_arrived));
+  }
+  // Is the team access below unsafe or just technically invalid?
+  KA_TRACE(20, ("__kmp_hierarchical_barrier_gather: T#%d(%d:%d) exit for "
+                "barrier type %d\n",
+                gtid, team->t.t_id, tid, bt));
 }
 
-static void
-__kmp_hierarchical_barrier_release(enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
-                                   int propagate_icvs
-                                   USE_ITT_BUILD_ARG(void * itt_sync_obj) )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hier_release);
-    register kmp_team_t *team;
-    register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
-    register kmp_uint32 nproc;
-    bool team_change = false; // indicates on-core barrier shouldn't be used
+static void __kmp_hierarchical_barrier_release(
+    enum barrier_type bt, kmp_info_t *this_thr, int gtid, int tid,
+    int propagate_icvs USE_ITT_BUILD_ARG(void *itt_sync_obj)) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_hier_release);
+  register kmp_team_t *team;
+  register kmp_bstate_t *thr_bar = &this_thr->th.th_bar[bt].bb;
+  register kmp_uint32 nproc;
+  bool team_change = false; // indicates on-core barrier shouldn't be used
 
-    if (KMP_MASTER_TID(tid)) {
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) master entered barrier type %d\n",
-                      gtid, team->t.t_id, tid, bt));
-    }
-    else { // Worker threads
-        // Wait for parent thread to release me
-        if (!thr_bar->use_oncore_barrier || __kmp_dflt_blocktime != KMP_MAX_BLOCKTIME
-            || thr_bar->my_level != 0 || thr_bar->team == NULL) {
-            // Use traditional method of waiting on my own b_go flag
-            thr_bar->wait_flag = KMP_BARRIER_OWN_FLAG;
-            kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
-            flag.wait(this_thr, TRUE
-                      USE_ITT_BUILD_ARG(itt_sync_obj) );
-            ANNOTATE_BARRIER_END(this_thr);
-            TCW_8(thr_bar->b_go, KMP_INIT_BARRIER_STATE); // Reset my b_go flag for next time
-        }
-        else { // Thread barrier data is initialized, this is a leaf, blocktime is infinite, not nested
-            // Wait on my "offset" bits on parent's b_go flag
-            thr_bar->wait_flag = KMP_BARRIER_PARENT_FLAG;
-            kmp_flag_oncore flag(&thr_bar->parent_bar->b_go, KMP_BARRIER_STATE_BUMP, thr_bar->offset,
-                                 bt, this_thr
-                                 USE_ITT_BUILD_ARG(itt_sync_obj) );
-            flag.wait(this_thr, TRUE);
-            if (thr_bar->wait_flag == KMP_BARRIER_SWITCHING) { // Thread was switched to own b_go
-                TCW_8(thr_bar->b_go, KMP_INIT_BARRIER_STATE); // Reset my b_go flag for next time
-            }
-            else { // Reset my bits on parent's b_go flag
-                ((char*)&(thr_bar->parent_bar->b_go))[thr_bar->offset] = 0;
-            }
-        }
-        thr_bar->wait_flag = KMP_BARRIER_NOT_WAITING;
-        // Early exit for reaping threads releasing forkjoin barrier
-        if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
-            return;
-        // The worker thread may now assume that the team is valid.
-        team = __kmp_threads[gtid]->th.th_team;
-        KMP_DEBUG_ASSERT(team != NULL);
-        tid = __kmp_tid_from_gtid(gtid);
-
-        KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
-                      gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
-        KMP_MB();  // Flush all pending memory write invalidates.
+  if (KMP_MASTER_TID(tid)) {
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) master "
+                  "entered barrier type %d\n",
+                  gtid, team->t.t_id, tid, bt));
+  } else { // Worker threads
+    // Wait for parent thread to release me
+    if (!thr_bar->use_oncore_barrier ||
+        __kmp_dflt_blocktime != KMP_MAX_BLOCKTIME || thr_bar->my_level != 0 ||
+        thr_bar->team == NULL) {
+      // Use traditional method of waiting on my own b_go flag
+      thr_bar->wait_flag = KMP_BARRIER_OWN_FLAG;
+      kmp_flag_64 flag(&thr_bar->b_go, KMP_BARRIER_STATE_BUMP);
+      flag.wait(this_thr, TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+      ANNOTATE_BARRIER_END(this_thr);
+      TCW_8(thr_bar->b_go,
+            KMP_INIT_BARRIER_STATE); // Reset my b_go flag for next time
+    } else { // Thread barrier data is initialized, this is a leaf, blocktime is
+      // infinite, not nested
+      // Wait on my "offset" bits on parent's b_go flag
+      thr_bar->wait_flag = KMP_BARRIER_PARENT_FLAG;
+      kmp_flag_oncore flag(&thr_bar->parent_bar->b_go, KMP_BARRIER_STATE_BUMP,
+                           thr_bar->offset, bt,
+                           this_thr USE_ITT_BUILD_ARG(itt_sync_obj));
+      flag.wait(this_thr, TRUE);
+      if (thr_bar->wait_flag ==
+          KMP_BARRIER_SWITCHING) { // Thread was switched to own b_go
+        TCW_8(thr_bar->b_go,
+              KMP_INIT_BARRIER_STATE); // Reset my b_go flag for next time
+      } else { // Reset my bits on parent's b_go flag
+        ((char *)&(thr_bar->parent_bar->b_go))[thr_bar->offset] = 0;
+      }
     }
+    thr_bar->wait_flag = KMP_BARRIER_NOT_WAITING;
+    // Early exit for reaping threads releasing forkjoin barrier
+    if (bt == bs_forkjoin_barrier && TCR_4(__kmp_global.g.g_done))
+      return;
+    // The worker thread may now assume that the team is valid.
+    team = __kmp_threads[gtid]->th.th_team;
+    KMP_DEBUG_ASSERT(team != NULL);
+    tid = __kmp_tid_from_gtid(gtid);
 
-    nproc = this_thr->th.th_team_nproc;
-    int level = team->t.t_level;
+    KA_TRACE(
+        20,
+        ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) set go(%p) = %u\n",
+         gtid, team->t.t_id, tid, &thr_bar->b_go, KMP_INIT_BARRIER_STATE));
+    KMP_MB(); // Flush all pending memory write invalidates.
+  }
+
+  nproc = this_thr->th.th_team_nproc;
+  int level = team->t.t_level;
 #if OMP_40_ENABLED
-    if (team->t.t_threads[0]->th.th_teams_microtask ) {    // are we inside the teams construct?
-        if (team->t.t_pkfn != (microtask_t)__kmp_teams_master && this_thr->th.th_teams_level == level)
-            ++level; // level was not increased in teams construct for team_of_workers
-        if( this_thr->th.th_teams_size.nteams > 1 )
-            ++level; // level was not increased in teams construct for team_of_masters
-    }
-#endif
-    if (level == 1) thr_bar->use_oncore_barrier = 1;
-    else thr_bar->use_oncore_barrier = 0; // Do not use oncore barrier when nested
-
-    // If the team size has increased, we still communicate with old leaves via oncore barrier.
-    unsigned short int old_leaf_kids = thr_bar->leaf_kids;
-    kmp_uint64 old_leaf_state = thr_bar->leaf_state;
-    team_change = __kmp_init_hierarchical_barrier_thread(bt, thr_bar, nproc, gtid, tid, team);
-    // But if the entire team changes, we won't use oncore barrier at all
-    if (team_change) old_leaf_kids = 0;
+  if (team->t.t_threads[0]
+          ->th.th_teams_microtask) { // are we inside the teams construct?
+    if (team->t.t_pkfn != (microtask_t)__kmp_teams_master &&
+        this_thr->th.th_teams_level == level)
+      ++level; // level was not increased in teams construct for team_of_workers
+    if (this_thr->th.th_teams_size.nteams > 1)
+      ++level; // level was not increased in teams construct for team_of_masters
+  }
+#endif
+  if (level == 1)
+    thr_bar->use_oncore_barrier = 1;
+  else
+    thr_bar->use_oncore_barrier = 0; // Do not use oncore barrier when nested
+
+  // If the team size has increased, we still communicate with old leaves via
+  // oncore barrier.
+  unsigned short int old_leaf_kids = thr_bar->leaf_kids;
+  kmp_uint64 old_leaf_state = thr_bar->leaf_state;
+  team_change = __kmp_init_hierarchical_barrier_thread(bt, thr_bar, nproc, gtid,
+                                                       tid, team);
+  // But if the entire team changes, we won't use oncore barrier at all
+  if (team_change)
+    old_leaf_kids = 0;
 
 #if KMP_BARRIER_ICV_PUSH
-    if (propagate_icvs) {
-        __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[tid], team, tid, FALSE);
-        if (KMP_MASTER_TID(tid)) { // master already has copy in final destination; copy
-            copy_icvs(&thr_bar->th_fixed_icvs, &team->t.t_implicit_task_taskdata[tid].td_icvs);
-        }
-        else if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME && thr_bar->use_oncore_barrier) { // optimization for inf blocktime
-            if (!thr_bar->my_level) // I'm a leaf in the hierarchy (my_level==0)
-                // leaves (on-core children) pull parent's fixed ICVs directly to local ICV store
-                copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
-                          &thr_bar->parent_bar->th_fixed_icvs);
-            // non-leaves will get ICVs piggybacked with b_go via NGO store
-        }
-        else { // blocktime is not infinite; pull ICVs from parent's fixed ICVs
-            if (thr_bar->my_level) // not a leaf; copy ICVs to my fixed ICVs child can access
-                copy_icvs(&thr_bar->th_fixed_icvs, &thr_bar->parent_bar->th_fixed_icvs);
-            else // leaves copy parent's fixed ICVs directly to local ICV store
-                copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
-                          &thr_bar->parent_bar->th_fixed_icvs);
-        }
+  if (propagate_icvs) {
+    __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[tid], team, tid,
+                             FALSE);
+    if (KMP_MASTER_TID(
+            tid)) { // master already has copy in final destination; copy
+      copy_icvs(&thr_bar->th_fixed_icvs,
+                &team->t.t_implicit_task_taskdata[tid].td_icvs);
+    } else if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME &&
+               thr_bar->use_oncore_barrier) { // optimization for inf blocktime
+      if (!thr_bar->my_level) // I'm a leaf in the hierarchy (my_level==0)
+        // leaves (on-core children) pull parent's fixed ICVs directly to local
+        // ICV store
+        copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
+                  &thr_bar->parent_bar->th_fixed_icvs);
+      // non-leaves will get ICVs piggybacked with b_go via NGO store
+    } else { // blocktime is not infinite; pull ICVs from parent's fixed ICVs
+      if (thr_bar->my_level) // not a leaf; copy ICVs to my fixed ICVs child can
+        // access
+        copy_icvs(&thr_bar->th_fixed_icvs, &thr_bar->parent_bar->th_fixed_icvs);
+      else // leaves copy parent's fixed ICVs directly to local ICV store
+        copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
+                  &thr_bar->parent_bar->th_fixed_icvs);
     }
+  }
 #endif // KMP_BARRIER_ICV_PUSH
 
-    // Now, release my children
-    if (thr_bar->my_level) { // not a leaf
-        register kmp_int32 child_tid;
-        kmp_uint32 last;
-        if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME && thr_bar->use_oncore_barrier) {
-            if (KMP_MASTER_TID(tid)) { // do a flat release
-                // Set local b_go to bump children via NGO store of the cache line containing IVCs and b_go.
-                thr_bar->b_go = KMP_BARRIER_STATE_BUMP;
-                // Use ngo stores if available; b_go piggybacks in the last 8 bytes of the cache line
-                ngo_load(&thr_bar->th_fixed_icvs);
-                // This loops over all the threads skipping only the leaf nodes in the hierarchy
-                for (child_tid=thr_bar->skip_per_level[1]; child_tid<(int)nproc; child_tid+=thr_bar->skip_per_level[1]) {
-                    register kmp_bstate_t *child_bar = &team->t.t_threads[child_tid]->th.th_bar[bt].bb;
-                    KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%d)"
-                                  " go(%p): %u => %u\n",
-                                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                  team->t.t_id, child_tid, &child_bar->b_go, child_bar->b_go,
-                                  child_bar->b_go + KMP_BARRIER_STATE_BUMP));
-                    // Use ngo store (if available) to both store ICVs and release child via child's b_go
-                    ngo_store_go(&child_bar->th_fixed_icvs, &thr_bar->th_fixed_icvs);
-                }
-                ngo_sync();
-            }
-            TCW_8(thr_bar->b_go, KMP_INIT_BARRIER_STATE); // Reset my b_go flag for next time
-            // Now, release leaf children
-            if (thr_bar->leaf_kids) { // if there are any
-                // We test team_change on the off-chance that the level 1 team changed.
-                if (team_change || old_leaf_kids < thr_bar->leaf_kids) { // some old leaf_kids, some new
-                    if (old_leaf_kids) { // release old leaf kids
-                        thr_bar->b_go |= old_leaf_state;
-                    }
-                    // Release new leaf kids
-                    last = tid+thr_bar->skip_per_level[1];
-                    if (last > nproc) last = nproc;
-                    for (child_tid=tid+1+old_leaf_kids; child_tid<(int)last; ++child_tid) { // skip_per_level[0]=1
-                        register kmp_info_t   *child_thr = team->t.t_threads[child_tid];
-                        register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
-                        KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) releasing"
-                                      " T#%d(%d:%d) go(%p): %u => %u\n",
-                                      gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                      team->t.t_id, child_tid, &child_bar->b_go, child_bar->b_go,
-                                      child_bar->b_go + KMP_BARRIER_STATE_BUMP));
-                        // Release child using child's b_go flag
-                        ANNOTATE_BARRIER_BEGIN(child_thr);
-                        kmp_flag_64 flag(&child_bar->b_go, child_thr);
-                        flag.release();
-                    }
-                }
-                else { // Release all children at once with leaf_state bits on my own b_go flag
-                    thr_bar->b_go |= thr_bar->leaf_state;
-                }
-            }
+  // Now, release my children
+  if (thr_bar->my_level) { // not a leaf
+    register kmp_int32 child_tid;
+    kmp_uint32 last;
+    if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME &&
+        thr_bar->use_oncore_barrier) {
+      if (KMP_MASTER_TID(tid)) { // do a flat release
+        // Set local b_go to bump children via NGO store of the cache line
+        // containing IVCs and b_go.
+        thr_bar->b_go = KMP_BARRIER_STATE_BUMP;
+        // Use ngo stores if available; b_go piggybacks in the last 8 bytes of
+        // the cache line
+        ngo_load(&thr_bar->th_fixed_icvs);
+        // This loops over all the threads skipping only the leaf nodes in the
+        // hierarchy
+        for (child_tid = thr_bar->skip_per_level[1]; child_tid < (int)nproc;
+             child_tid += thr_bar->skip_per_level[1]) {
+          register kmp_bstate_t *child_bar =
+              &team->t.t_threads[child_tid]->th.th_bar[bt].bb;
+          KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) "
+                        "releasing T#%d(%d:%d)"
+                        " go(%p): %u => %u\n",
+                        gtid, team->t.t_id, tid,
+                        __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                        child_tid, &child_bar->b_go, child_bar->b_go,
+                        child_bar->b_go + KMP_BARRIER_STATE_BUMP));
+          // Use ngo store (if available) to both store ICVs and release child
+          // via child's b_go
+          ngo_store_go(&child_bar->th_fixed_icvs, &thr_bar->th_fixed_icvs);
         }
-        else { // Blocktime is not infinite; do a simple hierarchical release
-            for (int d=thr_bar->my_level-1; d>=0; --d) { // Release highest level threads first
-                last = tid+thr_bar->skip_per_level[d+1];
-                kmp_uint32 skip = thr_bar->skip_per_level[d];
-                if (last > nproc) last = nproc;
-                for (child_tid=tid+skip; child_tid<(int)last; child_tid+=skip) {
-                    register kmp_info_t   *child_thr = team->t.t_threads[child_tid];
-                    register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
-                    KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) releasing T#%d(%d:%d)"
-                                  " go(%p): %u => %u\n",
-                                  gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
-                                  team->t.t_id, child_tid, &child_bar->b_go, child_bar->b_go,
-                                  child_bar->b_go + KMP_BARRIER_STATE_BUMP));
-                    // Release child using child's b_go flag
-                    ANNOTATE_BARRIER_BEGIN(child_thr);
-                    kmp_flag_64 flag(&child_bar->b_go, child_thr);
-                    flag.release();
-                }
-            }
+        ngo_sync();
+      }
+      TCW_8(thr_bar->b_go,
+            KMP_INIT_BARRIER_STATE); // Reset my b_go flag for next time
+      // Now, release leaf children
+      if (thr_bar->leaf_kids) { // if there are any
+        // We test team_change on the off-chance that the level 1 team changed.
+        if (team_change ||
+            old_leaf_kids < thr_bar->leaf_kids) { // some old, some new
+          if (old_leaf_kids) { // release old leaf kids
+            thr_bar->b_go |= old_leaf_state;
+          }
+          // Release new leaf kids
+          last = tid + thr_bar->skip_per_level[1];
+          if (last > nproc)
+            last = nproc;
+          for (child_tid = tid + 1 + old_leaf_kids; child_tid < (int)last;
+               ++child_tid) { // skip_per_level[0]=1
+            register kmp_info_t *child_thr = team->t.t_threads[child_tid];
+            register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+            KA_TRACE(
+                20,
+                ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) releasing"
+                 " T#%d(%d:%d) go(%p): %u => %u\n",
+                 gtid, team->t.t_id, tid, __kmp_gtid_from_tid(child_tid, team),
+                 team->t.t_id, child_tid, &child_bar->b_go, child_bar->b_go,
+                 child_bar->b_go + KMP_BARRIER_STATE_BUMP));
+            // Release child using child's b_go flag
+            ANNOTATE_BARRIER_BEGIN(child_thr);
+            kmp_flag_64 flag(&child_bar->b_go, child_thr);
+            flag.release();
+          }
+        } else { // Release all children at once with leaf_state bits on my own
+          // b_go flag
+          thr_bar->b_go |= thr_bar->leaf_state;
         }
+      }
+    } else { // Blocktime is not infinite; do a simple hierarchical release
+      for (int d = thr_bar->my_level - 1; d >= 0;
+           --d) { // Release highest level threads first
+        last = tid + thr_bar->skip_per_level[d + 1];
+        kmp_uint32 skip = thr_bar->skip_per_level[d];
+        if (last > nproc)
+          last = nproc;
+        for (child_tid = tid + skip; child_tid < (int)last; child_tid += skip) {
+          register kmp_info_t *child_thr = team->t.t_threads[child_tid];
+          register kmp_bstate_t *child_bar = &child_thr->th.th_bar[bt].bb;
+          KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) "
+                        "releasing T#%d(%d:%d) go(%p): %u => %u\n",
+                        gtid, team->t.t_id, tid,
+                        __kmp_gtid_from_tid(child_tid, team), team->t.t_id,
+                        child_tid, &child_bar->b_go, child_bar->b_go,
+                        child_bar->b_go + KMP_BARRIER_STATE_BUMP));
+          // Release child using child's b_go flag
+          ANNOTATE_BARRIER_BEGIN(child_thr);
+          kmp_flag_64 flag(&child_bar->b_go, child_thr);
+          flag.release();
+        }
+      }
+    }
 #if KMP_BARRIER_ICV_PUSH
-        if (propagate_icvs && !KMP_MASTER_TID(tid)) // non-leaves copy ICVs from fixed ICVs to local dest
-            copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs, &thr_bar->th_fixed_icvs);
+    if (propagate_icvs && !KMP_MASTER_TID(tid))
+      // non-leaves copy ICVs from fixed ICVs to local dest
+      copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
+                &thr_bar->th_fixed_icvs);
 #endif // KMP_BARRIER_ICV_PUSH
-    }
-    KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) exit for barrier type %d\n",
-                  gtid, team->t.t_id, tid, bt));
+  }
+  KA_TRACE(20, ("__kmp_hierarchical_barrier_release: T#%d(%d:%d) exit for "
+                "barrier type %d\n",
+                gtid, team->t.t_id, tid, bt));
 }
 
-// ---------------------------- End of Barrier Algorithms ----------------------------
+
+// End of Barrier Algorithms
 
 // Internal function to do a barrier.
 /* If is_split is true, do a split barrier, otherwise, do a plain barrier
-   If reduce is non-NULL, do a split reduction barrier, otherwise, do a split barrier
+   If reduce is non-NULL, do a split reduction barrier, otherwise, do a split
+   barrier
    Returns 0 if master thread, 1 if worker thread.  */
-int
-__kmp_barrier(enum barrier_type bt, int gtid, int is_split, size_t reduce_size,
-              void *reduce_data, void (*reduce)(void *, void *))
-{
-    KMP_TIME_PARTITIONED_BLOCK(OMP_plain_barrier);
-    KMP_SET_THREAD_STATE_BLOCK(PLAIN_BARRIER);
-    register int tid = __kmp_tid_from_gtid(gtid);
-    register kmp_info_t *this_thr = __kmp_threads[gtid];
-    register kmp_team_t *team = this_thr->th.th_team;
-    register int status = 0;
-    ident_t *loc = __kmp_threads[gtid]->th.th_ident;
+int __kmp_barrier(enum barrier_type bt, int gtid, int is_split,
+                  size_t reduce_size, void *reduce_data,
+                  void (*reduce)(void *, void *)) {
+  KMP_TIME_PARTITIONED_BLOCK(OMP_plain_barrier);
+  KMP_SET_THREAD_STATE_BLOCK(PLAIN_BARRIER);
+  register int tid = __kmp_tid_from_gtid(gtid);
+  register kmp_info_t *this_thr = __kmp_threads[gtid];
+  register kmp_team_t *team = this_thr->th.th_team;
+  register int status = 0;
+  ident_t *loc = __kmp_threads[gtid]->th.th_ident;
 #if OMPT_SUPPORT
-    ompt_task_id_t my_task_id;
-    ompt_parallel_id_t my_parallel_id;
+  ompt_task_id_t my_task_id;
+  ompt_parallel_id_t my_parallel_id;
 #endif
 
-    KA_TRACE(15, ("__kmp_barrier: T#%d(%d:%d) has arrived\n",
-                  gtid, __kmp_team_from_gtid(gtid)->t.t_id, __kmp_tid_from_gtid(gtid)));
+  KA_TRACE(15, ("__kmp_barrier: T#%d(%d:%d) has arrived\n", gtid,
+                __kmp_team_from_gtid(gtid)->t.t_id, __kmp_tid_from_gtid(gtid)));
 
-    ANNOTATE_BARRIER_BEGIN(&team->t.t_bar);
+  ANNOTATE_BARRIER_BEGIN(&team->t.t_bar);
 #if OMPT_SUPPORT
-    if (ompt_enabled) {
+  if (ompt_enabled) {
 #if OMPT_BLAME
-        my_task_id = team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id;
-        my_parallel_id = team->t.ompt_team_info.parallel_id;
+    my_task_id = team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id;
+    my_parallel_id = team->t.ompt_team_info.parallel_id;
 
 #if OMPT_TRACE
-        if (this_thr->th.ompt_thread_info.state == ompt_state_wait_single) {
-            if (ompt_callbacks.ompt_callback(ompt_event_single_others_end)) {
-                ompt_callbacks.ompt_callback(ompt_event_single_others_end)(
-                    my_parallel_id, my_task_id);
-            }
-        }
-#endif
-        if (ompt_callbacks.ompt_callback(ompt_event_barrier_begin)) {
-            ompt_callbacks.ompt_callback(ompt_event_barrier_begin)(
-                my_parallel_id, my_task_id);
-        }
+    if (this_thr->th.ompt_thread_info.state == ompt_state_wait_single) {
+      if (ompt_callbacks.ompt_callback(ompt_event_single_others_end)) {
+        ompt_callbacks.ompt_callback(ompt_event_single_others_end)(
+            my_parallel_id, my_task_id);
+      }
+    }
 #endif
-        // It is OK to report the barrier state after the barrier begin callback.
-        // According to the OMPT specification, a compliant implementation may
-        // even delay reporting this state until the barrier begins to wait.
-        this_thr->th.ompt_thread_info.state = ompt_state_wait_barrier;
+    if (ompt_callbacks.ompt_callback(ompt_event_barrier_begin)) {
+      ompt_callbacks.ompt_callback(ompt_event_barrier_begin)(my_parallel_id,
+                                                             my_task_id);
     }
 #endif
+    // It is OK to report the barrier state after the barrier begin callback.
+    // According to the OMPT specification, a compliant implementation may
+    // even delay reporting this state until the barrier begins to wait.
+    this_thr->th.ompt_thread_info.state = ompt_state_wait_barrier;
+  }
+#endif
 
-    if (! team->t.t_serialized) {
+  if (!team->t.t_serialized) {
 #if USE_ITT_BUILD
-        // This value will be used in itt notify events below.
-        void *itt_sync_obj = NULL;
-# if USE_ITT_NOTIFY
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bt, 1);
-# endif
+    // This value will be used in itt notify events below.
+    void *itt_sync_obj = NULL;
+#if USE_ITT_NOTIFY
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bt, 1);
+#endif
 #endif /* USE_ITT_BUILD */
-        if (__kmp_tasking_mode == tskm_extra_barrier) {
-            __kmp_tasking_barrier(team, this_thr, gtid);
-            KA_TRACE(15, ("__kmp_barrier: T#%d(%d:%d) past tasking barrier\n",
-                          gtid, __kmp_team_from_gtid(gtid)->t.t_id, __kmp_tid_from_gtid(gtid)));
-        }
+    if (__kmp_tasking_mode == tskm_extra_barrier) {
+      __kmp_tasking_barrier(team, this_thr, gtid);
+      KA_TRACE(15,
+               ("__kmp_barrier: T#%d(%d:%d) past tasking barrier\n", gtid,
+                __kmp_team_from_gtid(gtid)->t.t_id, __kmp_tid_from_gtid(gtid)));
+    }
 
-        /* Copy the blocktime info to the thread, where __kmp_wait_template() can access it when
-           the team struct is not guaranteed to exist. */
-        // See note about the corresponding code in __kmp_join_barrier() being performance-critical.
-        if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
+    /* Copy the blocktime info to the thread, where __kmp_wait_template() can
+       access it when the team struct is not guaranteed to exist. */
+    // See note about the corresponding code in __kmp_join_barrier() being
+    // performance-critical.
+    if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
 #if KMP_USE_MONITOR
-            this_thr->th.th_team_bt_intervals = team->t.t_implicit_task_taskdata[tid].td_icvs.bt_intervals;
-            this_thr->th.th_team_bt_set = team->t.t_implicit_task_taskdata[tid].td_icvs.bt_set;
+      this_thr->th.th_team_bt_intervals =
+          team->t.t_implicit_task_taskdata[tid].td_icvs.bt_intervals;
+      this_thr->th.th_team_bt_set =
+          team->t.t_implicit_task_taskdata[tid].td_icvs.bt_set;
 #else
-            this_thr->th.th_team_bt_intervals = KMP_BLOCKTIME_INTERVAL();
+      this_thr->th.th_team_bt_intervals = KMP_BLOCKTIME_INTERVAL();
 #endif
-        }
+    }
 
 #if USE_ITT_BUILD
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-            __kmp_itt_barrier_starting(gtid, itt_sync_obj);
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+      __kmp_itt_barrier_starting(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
 #if USE_DEBUGGER
-        // Let the debugger know: the thread arrived to the barrier and waiting.
-        if (KMP_MASTER_TID(tid)) { // Master counter is stored in team structure.
-            team->t.t_bar[bt].b_master_arrived += 1;
-        } else {
-            this_thr->th.th_bar[bt].bb.b_worker_arrived += 1;
-        } // if
+    // Let the debugger know: the thread arrived to the barrier and waiting.
+    if (KMP_MASTER_TID(tid)) { // Master counter is stored in team structure.
+      team->t.t_bar[bt].b_master_arrived += 1;
+    } else {
+      this_thr->th.th_bar[bt].bb.b_worker_arrived += 1;
+    } // if
 #endif /* USE_DEBUGGER */
-        if (reduce != NULL) {
-            //KMP_DEBUG_ASSERT( is_split == TRUE );  // #C69956
-            this_thr->th.th_local.reduce_data = reduce_data;
-        }
+    if (reduce != NULL) {
+      // KMP_DEBUG_ASSERT( is_split == TRUE );  // #C69956
+      this_thr->th.th_local.reduce_data = reduce_data;
+    }
 
-        if (KMP_MASTER_TID(tid) && __kmp_tasking_mode != tskm_immediate_exec)
-            __kmp_task_team_setup(this_thr, team, 0); // use 0 to only setup the current team if nthreads > 1
+    if (KMP_MASTER_TID(tid) && __kmp_tasking_mode != tskm_immediate_exec)
+      __kmp_task_team_setup(
+          this_thr, team,
+          0); // use 0 to only setup the current team if nthreads > 1
 
-        switch (__kmp_barrier_gather_pattern[bt]) {
-        case bp_hyper_bar: {
-            KMP_ASSERT(__kmp_barrier_gather_branch_bits[bt]); // don't set branch bits to 0; use linear
-            __kmp_hyper_barrier_gather(bt, this_thr, gtid, tid, reduce
-                                       USE_ITT_BUILD_ARG(itt_sync_obj) );
-            break;
-        }
-        case bp_hierarchical_bar: {
-            __kmp_hierarchical_barrier_gather(bt, this_thr, gtid, tid, reduce
-                                              USE_ITT_BUILD_ARG(itt_sync_obj));
-            break;
-        }
-        case bp_tree_bar: {
-            KMP_ASSERT(__kmp_barrier_gather_branch_bits[bt]); // don't set branch bits to 0; use linear
-            __kmp_tree_barrier_gather(bt, this_thr, gtid, tid, reduce
-                                      USE_ITT_BUILD_ARG(itt_sync_obj) );
-            break;
-        }
-        default: {
-            __kmp_linear_barrier_gather(bt, this_thr, gtid, tid, reduce
-                                        USE_ITT_BUILD_ARG(itt_sync_obj) );
-        }
-        }
+    switch (__kmp_barrier_gather_pattern[bt]) {
+    case bp_hyper_bar: {
+      KMP_ASSERT(__kmp_barrier_gather_branch_bits[bt]); // don't set branch bits
+      // to 0; use linear
+      __kmp_hyper_barrier_gather(bt, this_thr, gtid, tid,
+                                 reduce USE_ITT_BUILD_ARG(itt_sync_obj));
+      break;
+    }
+    case bp_hierarchical_bar: {
+      __kmp_hierarchical_barrier_gather(bt, this_thr, gtid, tid,
+                                        reduce USE_ITT_BUILD_ARG(itt_sync_obj));
+      break;
+    }
+    case bp_tree_bar: {
+      KMP_ASSERT(__kmp_barrier_gather_branch_bits[bt]); // don't set branch bits
+      // to 0; use linear
+      __kmp_tree_barrier_gather(bt, this_thr, gtid, tid,
+                                reduce USE_ITT_BUILD_ARG(itt_sync_obj));
+      break;
+    }
+    default: {
+      __kmp_linear_barrier_gather(bt, this_thr, gtid, tid,
+                                  reduce USE_ITT_BUILD_ARG(itt_sync_obj));
+    }
+    }
 
-        KMP_MB();
+    KMP_MB();
 
-        if (KMP_MASTER_TID(tid)) {
-            status = 0;
-            if (__kmp_tasking_mode != tskm_immediate_exec) {
-                __kmp_task_team_wait(this_thr, team
-                                     USE_ITT_BUILD_ARG(itt_sync_obj) );
-            }
+    if (KMP_MASTER_TID(tid)) {
+      status = 0;
+      if (__kmp_tasking_mode != tskm_immediate_exec) {
+          __kmp_task_team_wait(this_thr, team USE_ITT_BUILD_ARG(itt_sync_obj));
+      }
 #if USE_DEBUGGER
-            // Let the debugger know: All threads are arrived and starting leaving the barrier.
-            team->t.t_bar[bt].b_team_arrived += 1;
+      // Let the debugger know: All threads are arrived and starting leaving the
+      // barrier.
+      team->t.t_bar[bt].b_team_arrived += 1;
 #endif
 
 #if OMP_40_ENABLED
       // Reset cancellation flag for worksharing constructs
-      if(team->t.t_cancel_request == cancel_loop ||
-         team->t.t_cancel_request == cancel_sections ) {
+      if (team->t.t_cancel_request == cancel_loop ||
+          team->t.t_cancel_request == cancel_sections) {
         team->t.t_cancel_request = cancel_noreq;
       }
 #endif
 #if USE_ITT_BUILD
-            /* TODO: In case of split reduction barrier, master thread may send acquired event early,
-               before the final summation into the shared variable is done (final summation can be a
-               long operation for array reductions).  */
-            if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-                __kmp_itt_barrier_middle(gtid, itt_sync_obj);
+      /* TODO: In case of split reduction barrier, master thread may send
+         acquired event early, before the final summation into the shared
+         variable is done (final summation can be a long operation for array
+         reductions).  */
+      if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+        __kmp_itt_barrier_middle(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-            // Barrier - report frame end (only if active_level == 1)
-            if ((__itt_frame_submit_v3_ptr || KMP_ITT_DEBUG) && __kmp_forkjoin_frames_mode &&
+      // Barrier - report frame end (only if active_level == 1)
+      if ((__itt_frame_submit_v3_ptr || KMP_ITT_DEBUG) &&
+          __kmp_forkjoin_frames_mode &&
 #if OMP_40_ENABLED
-                this_thr->th.th_teams_microtask == NULL &&
+          this_thr->th.th_teams_microtask == NULL &&
 #endif
-                team->t.t_active_level == 1)
-            {
-                kmp_uint64 cur_time = __itt_get_timestamp();
-                kmp_info_t **other_threads = team->t.t_threads;
-                int nproc = this_thr->th.th_team_nproc;
-                int i;
-                switch(__kmp_forkjoin_frames_mode) {
-                case 1:
-                    __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0, loc, nproc);
-                    this_thr->th.th_frame_time = cur_time;
-                    break;
-                case 2: // AC 2015-01-19: currently does not work for hierarchical (to be fixed)
-                    __kmp_itt_frame_submit(gtid, this_thr->th.th_bar_min_time, cur_time, 1, loc, nproc);
-                    break;
-                case 3:
-                    if( __itt_metadata_add_ptr ) {
-                        // Initialize with master's wait time
-                        kmp_uint64 delta = cur_time - this_thr->th.th_bar_arrive_time;
-                        // Set arrive time to zero to be able to check it in __kmp_invoke_task(); the same is done inside the loop below
-                        this_thr->th.th_bar_arrive_time = 0;
-                        for (i=1; i<nproc; ++i) {
-                            delta += ( cur_time - other_threads[i]->th.th_bar_arrive_time );
-                            other_threads[i]->th.th_bar_arrive_time = 0;
-                        }
-                        __kmp_itt_metadata_imbalance(gtid, this_thr->th.th_frame_time, cur_time, delta, (kmp_uint64)( reduce != NULL));
-                    }
-                    __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0, loc, nproc);
-                    this_thr->th.th_frame_time = cur_time;
-                    break;
-                }
-            }
+          team->t.t_active_level == 1) {
+        kmp_uint64 cur_time = __itt_get_timestamp();
+        kmp_info_t **other_threads = team->t.t_threads;
+        int nproc = this_thr->th.th_team_nproc;
+        int i;
+        switch (__kmp_forkjoin_frames_mode) {
+        case 1:
+          __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0,
+                                 loc, nproc);
+          this_thr->th.th_frame_time = cur_time;
+          break;
+        case 2: // AC 2015-01-19: currently does not work for hierarchical (to
+          // be fixed)
+          __kmp_itt_frame_submit(gtid, this_thr->th.th_bar_min_time, cur_time,
+                                 1, loc, nproc);
+          break;
+        case 3:
+          if (__itt_metadata_add_ptr) {
+            // Initialize with master's wait time
+            kmp_uint64 delta = cur_time - this_thr->th.th_bar_arrive_time;
+            // Set arrive time to zero to be able to check it in
+            // __kmp_invoke_task(); the same is done inside the loop below
+            this_thr->th.th_bar_arrive_time = 0;
+            for (i = 1; i < nproc; ++i) {
+              delta += (cur_time - other_threads[i]->th.th_bar_arrive_time);
+              other_threads[i]->th.th_bar_arrive_time = 0;
+            }
+            __kmp_itt_metadata_imbalance(gtid, this_thr->th.th_frame_time,
+                                         cur_time, delta,
+                                         (kmp_uint64)(reduce != NULL));
+          }
+          __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0,
+                                 loc, nproc);
+          this_thr->th.th_frame_time = cur_time;
+          break;
+        }
+      }
 #endif /* USE_ITT_BUILD */
-        } else {
-            status = 1;
+    } else {
+      status = 1;
 #if USE_ITT_BUILD
-            if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-                __kmp_itt_barrier_middle(gtid, itt_sync_obj);
+      if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+        __kmp_itt_barrier_middle(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
-        }
-        if (status == 1 || ! is_split) {
-            switch (__kmp_barrier_release_pattern[bt]) {
-            case bp_hyper_bar: {
-                KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
-                __kmp_hyper_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                            USE_ITT_BUILD_ARG(itt_sync_obj) );
-                break;
-            }
-            case bp_hierarchical_bar: {
-                __kmp_hierarchical_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                                   USE_ITT_BUILD_ARG(itt_sync_obj) );
-                break;
-            }
-            case bp_tree_bar: {
-                KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
-                __kmp_tree_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                           USE_ITT_BUILD_ARG(itt_sync_obj) );
-                break;
-            }
-            default: {
-                __kmp_linear_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                             USE_ITT_BUILD_ARG(itt_sync_obj) );
-            }
-            }
-            if (__kmp_tasking_mode != tskm_immediate_exec) {
-                __kmp_task_team_sync(this_thr, team);
-            }
-        }
+    }
+    if (status == 1 || !is_split) {
+      switch (__kmp_barrier_release_pattern[bt]) {
+      case bp_hyper_bar: {
+        KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
+        __kmp_hyper_barrier_release(bt, this_thr, gtid, tid,
+                                    FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+        break;
+      }
+      case bp_hierarchical_bar: {
+        __kmp_hierarchical_barrier_release(
+            bt, this_thr, gtid, tid, FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+        break;
+      }
+      case bp_tree_bar: {
+        KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
+        __kmp_tree_barrier_release(bt, this_thr, gtid, tid,
+                                   FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+        break;
+      }
+      default: {
+        __kmp_linear_barrier_release(bt, this_thr, gtid, tid,
+                                     FALSE USE_ITT_BUILD_ARG(itt_sync_obj));
+      }
+      }
+      if (__kmp_tasking_mode != tskm_immediate_exec) {
+        __kmp_task_team_sync(this_thr, team);
+      }
+    }
 
 #if USE_ITT_BUILD
-        /* GEH: TODO: Move this under if-condition above and also include in
-           __kmp_end_split_barrier(). This will more accurately represent the actual release time
-           of the threads for split barriers.  */
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-            __kmp_itt_barrier_finished(gtid, itt_sync_obj);
+    /* GEH: TODO: Move this under if-condition above and also include in
+       __kmp_end_split_barrier(). This will more accurately represent the actual
+       release time of the threads for split barriers.  */
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+      __kmp_itt_barrier_finished(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
-    } else { // Team is serialized.
-        status = 0;
-        if (__kmp_tasking_mode != tskm_immediate_exec) {
+  } else { // Team is serialized.
+    status = 0;
+    if (__kmp_tasking_mode != tskm_immediate_exec) {
 #if OMP_45_ENABLED
-            if ( this_thr->th.th_task_team != NULL ) {
-                void *itt_sync_obj = NULL;
+      if (this_thr->th.th_task_team != NULL) {
+        void *itt_sync_obj = NULL;
 #if USE_ITT_NOTIFY
-                if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
-                    itt_sync_obj = __kmp_itt_barrier_object(gtid, bt, 1);
-                    __kmp_itt_barrier_starting(gtid, itt_sync_obj);
-                }
+        if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
+          itt_sync_obj = __kmp_itt_barrier_object(gtid, bt, 1);
+          __kmp_itt_barrier_starting(gtid, itt_sync_obj);
+        }
 #endif
 
-                KMP_DEBUG_ASSERT(this_thr->th.th_task_team->tt.tt_found_proxy_tasks == TRUE);
-                __kmp_task_team_wait(this_thr, team
-                                               USE_ITT_BUILD_ARG(itt_sync_obj));
-                __kmp_task_team_setup(this_thr, team, 0);
+        KMP_DEBUG_ASSERT(this_thr->th.th_task_team->tt.tt_found_proxy_tasks ==
+                         TRUE);
+        __kmp_task_team_wait(this_thr, team USE_ITT_BUILD_ARG(itt_sync_obj));
+        __kmp_task_team_setup(this_thr, team, 0);
 
 #if USE_ITT_BUILD
-                if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-                    __kmp_itt_barrier_finished(gtid, itt_sync_obj);
+        if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+          __kmp_itt_barrier_finished(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
-            }
+      }
 #else
-            // The task team should be NULL for serialized code (tasks will be executed immediately)
-            KMP_DEBUG_ASSERT(team->t.t_task_team[this_thr->th.th_task_state] == NULL);
-            KMP_DEBUG_ASSERT(this_thr->th.th_task_team == NULL);
+      // The task team should be NULL for serialized code (tasks will be
+      // executed immediately)
+      KMP_DEBUG_ASSERT(team->t.t_task_team[this_thr->th.th_task_state] == NULL);
+      KMP_DEBUG_ASSERT(this_thr->th.th_task_team == NULL);
 #endif
-        }
     }
-    KA_TRACE(15, ("__kmp_barrier: T#%d(%d:%d) is leaving with return value %d\n",
-                  gtid, __kmp_team_from_gtid(gtid)->t.t_id, __kmp_tid_from_gtid(gtid), status));
+  }
+  KA_TRACE(15, ("__kmp_barrier: T#%d(%d:%d) is leaving with return value %d\n",
+                gtid, __kmp_team_from_gtid(gtid)->t.t_id,
+                __kmp_tid_from_gtid(gtid), status));
 
 #if OMPT_SUPPORT
-    if (ompt_enabled) {
+  if (ompt_enabled) {
 #if OMPT_BLAME
-        if (ompt_callbacks.ompt_callback(ompt_event_barrier_end)) {
-            ompt_callbacks.ompt_callback(ompt_event_barrier_end)(
-                my_parallel_id, my_task_id);
-        }
-#endif
-        this_thr->th.ompt_thread_info.state = ompt_state_work_parallel;
+    if (ompt_callbacks.ompt_callback(ompt_event_barrier_end)) {
+      ompt_callbacks.ompt_callback(ompt_event_barrier_end)(my_parallel_id,
+                                                           my_task_id);
     }
 #endif
-    ANNOTATE_BARRIER_END(&team->t.t_bar);
+    this_thr->th.ompt_thread_info.state = ompt_state_work_parallel;
+  }
+#endif
+  ANNOTATE_BARRIER_END(&team->t.t_bar);
 
-    return status;
+  return status;
 }
 
-
-void
-__kmp_end_split_barrier(enum barrier_type bt, int gtid)
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_end_split_barrier);
-    KMP_SET_THREAD_STATE_BLOCK(PLAIN_BARRIER);
-    int tid = __kmp_tid_from_gtid(gtid);
-    kmp_info_t *this_thr = __kmp_threads[gtid];
-    kmp_team_t *team = this_thr->th.th_team;
-
-    ANNOTATE_BARRIER_BEGIN(&team->t.t_bar);
-    if (!team->t.t_serialized) {
-        if (KMP_MASTER_GTID(gtid)) {
-            switch (__kmp_barrier_release_pattern[bt]) {
-            case bp_hyper_bar: {
-                KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
-                __kmp_hyper_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                            USE_ITT_BUILD_ARG(NULL) );
-                break;
-            }
-            case bp_hierarchical_bar: {
-                __kmp_hierarchical_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                                   USE_ITT_BUILD_ARG(NULL));
-                break;
-            }
-            case bp_tree_bar: {
-                KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
-                __kmp_tree_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                           USE_ITT_BUILD_ARG(NULL) );
-                break;
-            }
-            default: {
-                __kmp_linear_barrier_release(bt, this_thr, gtid, tid, FALSE
-                                             USE_ITT_BUILD_ARG(NULL) );
-            }
-            }
-            if (__kmp_tasking_mode != tskm_immediate_exec) {
-                __kmp_task_team_sync(this_thr, team);
-            } // if
-        }
+void __kmp_end_split_barrier(enum barrier_type bt, int gtid) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_end_split_barrier);
+  KMP_SET_THREAD_STATE_BLOCK(PLAIN_BARRIER);
+  int tid = __kmp_tid_from_gtid(gtid);
+  kmp_info_t *this_thr = __kmp_threads[gtid];
+  kmp_team_t *team = this_thr->th.th_team;
+
+  ANNOTATE_BARRIER_BEGIN(&team->t.t_bar);
+  if (!team->t.t_serialized) {
+    if (KMP_MASTER_GTID(gtid)) {
+      switch (__kmp_barrier_release_pattern[bt]) {
+      case bp_hyper_bar: {
+        KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
+        __kmp_hyper_barrier_release(bt, this_thr, gtid, tid,
+                                    FALSE USE_ITT_BUILD_ARG(NULL));
+        break;
+      }
+      case bp_hierarchical_bar: {
+        __kmp_hierarchical_barrier_release(bt, this_thr, gtid, tid,
+                                           FALSE USE_ITT_BUILD_ARG(NULL));
+        break;
+      }
+      case bp_tree_bar: {
+        KMP_ASSERT(__kmp_barrier_release_branch_bits[bt]);
+        __kmp_tree_barrier_release(bt, this_thr, gtid, tid,
+                                   FALSE USE_ITT_BUILD_ARG(NULL));
+        break;
+      }
+      default: {
+        __kmp_linear_barrier_release(bt, this_thr, gtid, tid,
+                                     FALSE USE_ITT_BUILD_ARG(NULL));
+      }
+      }
+      if (__kmp_tasking_mode != tskm_immediate_exec) {
+        __kmp_task_team_sync(this_thr, team);
+      } // if
     }
-    ANNOTATE_BARRIER_END(&team->t.t_bar);
+  }
+  ANNOTATE_BARRIER_END(&team->t.t_bar);
 }
 
-
-void
-__kmp_join_barrier(int gtid)
-{
-    KMP_TIME_PARTITIONED_BLOCK(OMP_join_barrier);
-    KMP_SET_THREAD_STATE_BLOCK(FORK_JOIN_BARRIER);
-    register kmp_info_t *this_thr = __kmp_threads[gtid];
-    register kmp_team_t *team;
-    register kmp_uint nproc;
-    kmp_info_t *master_thread;
-    int tid;
+void __kmp_join_barrier(int gtid) {
+  KMP_TIME_PARTITIONED_BLOCK(OMP_join_barrier);
+  KMP_SET_THREAD_STATE_BLOCK(FORK_JOIN_BARRIER);
+  register kmp_info_t *this_thr = __kmp_threads[gtid];
+  register kmp_team_t *team;
+  register kmp_uint nproc;
+  kmp_info_t *master_thread;
+  int tid;
 #ifdef KMP_DEBUG
-    int team_id;
+  int team_id;
 #endif /* KMP_DEBUG */
 #if USE_ITT_BUILD
-    void *itt_sync_obj = NULL;
-# if USE_ITT_NOTIFY
-    if (__itt_sync_create_ptr || KMP_ITT_DEBUG) // Don't call routine without need
-        // Get object created at fork_barrier
-        itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
-# endif
+  void *itt_sync_obj = NULL;
+#if USE_ITT_NOTIFY
+  if (__itt_sync_create_ptr || KMP_ITT_DEBUG) // Don't call routine without need
+    // Get object created at fork_barrier
+    itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
+#endif
 #endif /* USE_ITT_BUILD */
-    KMP_MB();
+  KMP_MB();
 
-    // Get current info
-    team = this_thr->th.th_team;
-    nproc = this_thr->th.th_team_nproc;
-    KMP_DEBUG_ASSERT((int)nproc == team->t.t_nproc);
-    tid = __kmp_tid_from_gtid(gtid);
+  // Get current info
+  team = this_thr->th.th_team;
+  nproc = this_thr->th.th_team_nproc;
+  KMP_DEBUG_ASSERT((int)nproc == team->t.t_nproc);
+  tid = __kmp_tid_from_gtid(gtid);
 #ifdef KMP_DEBUG
-    team_id = team->t.t_id;
+  team_id = team->t.t_id;
 #endif /* KMP_DEBUG */
-    master_thread = this_thr->th.th_team_master;
+  master_thread = this_thr->th.th_team_master;
 #ifdef KMP_DEBUG
-    if (master_thread != team->t.t_threads[0]) {
-        __kmp_print_structure();
-    }
+  if (master_thread != team->t.t_threads[0]) {
+    __kmp_print_structure();
+  }
 #endif /* KMP_DEBUG */
-    KMP_DEBUG_ASSERT(master_thread == team->t.t_threads[0]);
-    KMP_MB();
+  KMP_DEBUG_ASSERT(master_thread == team->t.t_threads[0]);
+  KMP_MB();
 
-    // Verify state
-    KMP_DEBUG_ASSERT(__kmp_threads && __kmp_threads[gtid]);
-    KMP_DEBUG_ASSERT(TCR_PTR(this_thr->th.th_team));
-    KMP_DEBUG_ASSERT(TCR_PTR(this_thr->th.th_root));
-    KMP_DEBUG_ASSERT(this_thr == team->t.t_threads[tid]);
-    KA_TRACE(10, ("__kmp_join_barrier: T#%d(%d:%d) arrived at join barrier\n", gtid, team_id, tid));
+  // Verify state
+  KMP_DEBUG_ASSERT(__kmp_threads && __kmp_threads[gtid]);
+  KMP_DEBUG_ASSERT(TCR_PTR(this_thr->th.th_team));
+  KMP_DEBUG_ASSERT(TCR_PTR(this_thr->th.th_root));
+  KMP_DEBUG_ASSERT(this_thr == team->t.t_threads[tid]);
+  KA_TRACE(10, ("__kmp_join_barrier: T#%d(%d:%d) arrived at join barrier\n",
+                gtid, team_id, tid));
 
-    ANNOTATE_BARRIER_BEGIN(&team->t.t_bar);
+  ANNOTATE_BARRIER_BEGIN(&team->t.t_bar);
 #if OMPT_SUPPORT
 #if OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_barrier_begin)) {
-        ompt_callbacks.ompt_callback(ompt_event_barrier_begin)(
-            team->t.ompt_team_info.parallel_id,
-            team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
-    }
-#endif
-    this_thr->th.ompt_thread_info.state = ompt_state_wait_barrier;
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_barrier_begin)) {
+    ompt_callbacks.ompt_callback(ompt_event_barrier_begin)(
+        team->t.ompt_team_info.parallel_id,
+        team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
+  }
+#endif
+  this_thr->th.ompt_thread_info.state = ompt_state_wait_barrier;
 #endif
 
-    if (__kmp_tasking_mode == tskm_extra_barrier) {
-        __kmp_tasking_barrier(team, this_thr, gtid);
-        KA_TRACE(10, ("__kmp_join_barrier: T#%d(%d:%d) past taking barrier\n", gtid, team_id, tid));
-    }
-# ifdef KMP_DEBUG
-    if (__kmp_tasking_mode != tskm_immediate_exec) {
-        KA_TRACE(20, ( "__kmp_join_barrier: T#%d, old team = %d, old task_team = %p, th_task_team = %p\n",
-                       __kmp_gtid_from_thread(this_thr), team_id, team->t.t_task_team[this_thr->th.th_task_state],
-                       this_thr->th.th_task_team));
-        KMP_DEBUG_ASSERT(this_thr->th.th_task_team == team->t.t_task_team[this_thr->th.th_task_state]);
-    }
-# endif /* KMP_DEBUG */
-
-    /* Copy the blocktime info to the thread, where __kmp_wait_template() can access it when the
-       team struct is not guaranteed to exist. Doing these loads causes a cache miss slows
-       down EPCC parallel by 2x. As a workaround, we do not perform the copy if blocktime=infinite,
-       since the values are not used by __kmp_wait_template() in that case. */
-    if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
+  if (__kmp_tasking_mode == tskm_extra_barrier) {
+    __kmp_tasking_barrier(team, this_thr, gtid);
+    KA_TRACE(10, ("__kmp_join_barrier: T#%d(%d:%d) past taking barrier\n", gtid,
+                  team_id, tid));
+  }
+#ifdef KMP_DEBUG
+  if (__kmp_tasking_mode != tskm_immediate_exec) {
+    KA_TRACE(20, ("__kmp_join_barrier: T#%d, old team = %d, old task_team = "
+                  "%p, th_task_team = %p\n",
+                  __kmp_gtid_from_thread(this_thr), team_id,
+                  team->t.t_task_team[this_thr->th.th_task_state],
+                  this_thr->th.th_task_team));
+    KMP_DEBUG_ASSERT(this_thr->th.th_task_team ==
+                     team->t.t_task_team[this_thr->th.th_task_state]);
+  }
+#endif /* KMP_DEBUG */
+
+  /* Copy the blocktime info to the thread, where __kmp_wait_template() can
+     access it when the team struct is not guaranteed to exist. Doing these
+     loads causes a cache miss slows down EPCC parallel by 2x. As a workaround,
+     we do not perform the copy if blocktime=infinite, since the values are not
+     used by __kmp_wait_template() in that case. */
+  if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
 #if KMP_USE_MONITOR
-        this_thr->th.th_team_bt_intervals = team->t.t_implicit_task_taskdata[tid].td_icvs.bt_intervals;
-        this_thr->th.th_team_bt_set = team->t.t_implicit_task_taskdata[tid].td_icvs.bt_set;
+    this_thr->th.th_team_bt_intervals =
+        team->t.t_implicit_task_taskdata[tid].td_icvs.bt_intervals;
+    this_thr->th.th_team_bt_set =
+        team->t.t_implicit_task_taskdata[tid].td_icvs.bt_set;
 #else
-        this_thr->th.th_team_bt_intervals = KMP_BLOCKTIME_INTERVAL();
+    this_thr->th.th_team_bt_intervals = KMP_BLOCKTIME_INTERVAL();
 #endif
-    }
+  }
 
 #if USE_ITT_BUILD
-    if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-        __kmp_itt_barrier_starting(gtid, itt_sync_obj);
+  if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+    __kmp_itt_barrier_starting(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
 
-    switch (__kmp_barrier_gather_pattern[bs_forkjoin_barrier]) {
-    case bp_hyper_bar: {
-        KMP_ASSERT(__kmp_barrier_gather_branch_bits[bs_forkjoin_barrier]);
-        __kmp_hyper_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid, NULL
-                                   USE_ITT_BUILD_ARG(itt_sync_obj) );
-        break;
-    }
-    case bp_hierarchical_bar: {
-        __kmp_hierarchical_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid, NULL
-                                          USE_ITT_BUILD_ARG(itt_sync_obj) );
-        break;
-    }
-    case bp_tree_bar: {
-        KMP_ASSERT(__kmp_barrier_gather_branch_bits[bs_forkjoin_barrier]);
-        __kmp_tree_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid, NULL
-                                  USE_ITT_BUILD_ARG(itt_sync_obj) );
-        break;
-    }
-    default: {
-        __kmp_linear_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid, NULL
-                                    USE_ITT_BUILD_ARG(itt_sync_obj) );
-    }
+  switch (__kmp_barrier_gather_pattern[bs_forkjoin_barrier]) {
+  case bp_hyper_bar: {
+    KMP_ASSERT(__kmp_barrier_gather_branch_bits[bs_forkjoin_barrier]);
+    __kmp_hyper_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid,
+                               NULL USE_ITT_BUILD_ARG(itt_sync_obj));
+    break;
+  }
+  case bp_hierarchical_bar: {
+    __kmp_hierarchical_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid,
+                                      NULL USE_ITT_BUILD_ARG(itt_sync_obj));
+    break;
+  }
+  case bp_tree_bar: {
+    KMP_ASSERT(__kmp_barrier_gather_branch_bits[bs_forkjoin_barrier]);
+    __kmp_tree_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid,
+                              NULL USE_ITT_BUILD_ARG(itt_sync_obj));
+    break;
+  }
+  default: {
+    __kmp_linear_barrier_gather(bs_forkjoin_barrier, this_thr, gtid, tid,
+                                NULL USE_ITT_BUILD_ARG(itt_sync_obj));
+  }
+  }
+
+  /* From this point on, the team data structure may be deallocated at any time
+     by the master thread - it is unsafe to reference it in any of the worker
+     threads. Any per-team data items that need to be referenced before the
+     end of the barrier should be moved to the kmp_task_team_t structs.  */
+  if (KMP_MASTER_TID(tid)) {
+    if (__kmp_tasking_mode != tskm_immediate_exec) {
+      __kmp_task_team_wait(this_thr, team USE_ITT_BUILD_ARG(itt_sync_obj));
     }
-
-    /* From this point on, the team data structure may be deallocated at any time by the
-       master thread - it is unsafe to reference it in any of the worker threads. Any per-team
-       data items that need to be referenced before the end of the barrier should be moved to
-       the kmp_task_team_t structs.  */
-    if (KMP_MASTER_TID(tid)) {
-        if (__kmp_tasking_mode != tskm_immediate_exec) {
-            __kmp_task_team_wait(this_thr, team
-                                 USE_ITT_BUILD_ARG(itt_sync_obj) );
-        }
 #if KMP_STATS_ENABLED
-        // Have master thread flag the workers to indicate they are now waiting for
-        // next parallel region, Also wake them up so they switch their timers to idle.
-        for (int i=0; i<team->t.t_nproc; ++i) {
-            kmp_info_t* team_thread = team->t.t_threads[i];
-            if (team_thread == this_thr)
-                continue;
-            team_thread->th.th_stats->setIdleFlag();
-            if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME && team_thread->th.th_sleep_loc != NULL)
-                __kmp_null_resume_wrapper(__kmp_gtid_from_thread(team_thread), team_thread->th.th_sleep_loc);
-        }
+    // Have master thread flag the workers to indicate they are now waiting for
+    // next parallel region, Also wake them up so they switch their timers to
+    // idle.
+    for (int i = 0; i < team->t.t_nproc; ++i) {
+      kmp_info_t *team_thread = team->t.t_threads[i];
+      if (team_thread == this_thr)
+        continue;
+      team_thread->th.th_stats->setIdleFlag();
+      if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME &&
+          team_thread->th.th_sleep_loc != NULL)
+        __kmp_null_resume_wrapper(__kmp_gtid_from_thread(team_thread),
+                                  team_thread->th.th_sleep_loc);
+    }
 #endif
 #if USE_ITT_BUILD
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-            __kmp_itt_barrier_middle(gtid, itt_sync_obj);
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+      __kmp_itt_barrier_middle(gtid, itt_sync_obj);
 #endif /* USE_ITT_BUILD */
 
-# if USE_ITT_BUILD && USE_ITT_NOTIFY
-        // Join barrier - report frame end
-        if ((__itt_frame_submit_v3_ptr || KMP_ITT_DEBUG) && __kmp_forkjoin_frames_mode &&
+#if USE_ITT_BUILD && USE_ITT_NOTIFY
+    // Join barrier - report frame end
+    if ((__itt_frame_submit_v3_ptr || KMP_ITT_DEBUG) &&
+        __kmp_forkjoin_frames_mode &&
 #if OMP_40_ENABLED
-            this_thr->th.th_teams_microtask == NULL &&
+        this_thr->th.th_teams_microtask == NULL &&
 #endif
-            team->t.t_active_level == 1)
-        {
-            kmp_uint64 cur_time = __itt_get_timestamp();
-            ident_t * loc = team->t.t_ident;
-            kmp_info_t **other_threads = team->t.t_threads;
-            int nproc = this_thr->th.th_team_nproc;
-            int i;
-            switch(__kmp_forkjoin_frames_mode) {
-            case 1:
-                __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0, loc, nproc);
-                break;
-            case 2:
-                __kmp_itt_frame_submit(gtid, this_thr->th.th_bar_min_time, cur_time, 1, loc, nproc);
-                break;
-            case 3:
-                if( __itt_metadata_add_ptr ) {
-                    // Initialize with master's wait time
-                    kmp_uint64 delta = cur_time - this_thr->th.th_bar_arrive_time;
-                    // Set arrive time to zero to be able to check it in __kmp_invoke_task(); the same is done inside the loop below
-                    this_thr->th.th_bar_arrive_time = 0;
-                    for (i=1; i<nproc; ++i) {
-                        delta += ( cur_time - other_threads[i]->th.th_bar_arrive_time );
-                        other_threads[i]->th.th_bar_arrive_time = 0;
-                    }
-                    __kmp_itt_metadata_imbalance(gtid, this_thr->th.th_frame_time, cur_time, delta, 0);
-                }
-                __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0, loc, nproc);
-                this_thr->th.th_frame_time = cur_time;
-                break;
-            }
-        }
-# endif /* USE_ITT_BUILD */
+        team->t.t_active_level == 1) {
+      kmp_uint64 cur_time = __itt_get_timestamp();
+      ident_t *loc = team->t.t_ident;
+      kmp_info_t **other_threads = team->t.t_threads;
+      int nproc = this_thr->th.th_team_nproc;
+      int i;
+      switch (__kmp_forkjoin_frames_mode) {
+      case 1:
+        __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0,
+                               loc, nproc);
+        break;
+      case 2:
+        __kmp_itt_frame_submit(gtid, this_thr->th.th_bar_min_time, cur_time, 1,
+                               loc, nproc);
+        break;
+      case 3:
+        if (__itt_metadata_add_ptr) {
+          // Initialize with master's wait time
+          kmp_uint64 delta = cur_time - this_thr->th.th_bar_arrive_time;
+          // Set arrive time to zero to be able to check it in
+          // __kmp_invoke_task(); the same is done inside the loop below
+          this_thr->th.th_bar_arrive_time = 0;
+          for (i = 1; i < nproc; ++i) {
+            delta += (cur_time - other_threads[i]->th.th_bar_arrive_time);
+            other_threads[i]->th.th_bar_arrive_time = 0;
+          }
+          __kmp_itt_metadata_imbalance(gtid, this_thr->th.th_frame_time,
+                                       cur_time, delta, 0);
+        }
+        __kmp_itt_frame_submit(gtid, this_thr->th.th_frame_time, cur_time, 0,
+                               loc, nproc);
+        this_thr->th.th_frame_time = cur_time;
+        break;
+      }
     }
+#endif /* USE_ITT_BUILD */
+  }
 #if USE_ITT_BUILD
-    else {
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
-            __kmp_itt_barrier_middle(gtid, itt_sync_obj);
-    }
+  else {
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG)
+      __kmp_itt_barrier_middle(gtid, itt_sync_obj);
+  }
 #endif /* USE_ITT_BUILD */
 
 #if KMP_DEBUG
-    if (KMP_MASTER_TID(tid)) {
-        KA_TRACE(15, ("__kmp_join_barrier: T#%d(%d:%d) says all %d team threads arrived\n",
-                      gtid, team_id, tid, nproc));
-    }
+  if (KMP_MASTER_TID(tid)) {
+    KA_TRACE(
+        15,
+        ("__kmp_join_barrier: T#%d(%d:%d) says all %d team threads arrived\n",
+         gtid, team_id, tid, nproc));
+  }
 #endif /* KMP_DEBUG */
 
-    // TODO now, mark worker threads as done so they may be disbanded
-    KMP_MB(); // Flush all pending memory write invalidates.
-    KA_TRACE(10, ("__kmp_join_barrier: T#%d(%d:%d) leaving\n", gtid, team_id, tid));
+  // TODO now, mark worker threads as done so they may be disbanded
+  KMP_MB(); // Flush all pending memory write invalidates.
+  KA_TRACE(10,
+           ("__kmp_join_barrier: T#%d(%d:%d) leaving\n", gtid, team_id, tid));
 
 #if OMPT_SUPPORT
-    if (ompt_enabled) {
+  if (ompt_enabled) {
 #if OMPT_BLAME
-        if (ompt_callbacks.ompt_callback(ompt_event_barrier_end)) {
-            ompt_callbacks.ompt_callback(ompt_event_barrier_end)(
-                team->t.ompt_team_info.parallel_id,
-                team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
-        }
+    if (ompt_callbacks.ompt_callback(ompt_event_barrier_end)) {
+      ompt_callbacks.ompt_callback(ompt_event_barrier_end)(
+          team->t.ompt_team_info.parallel_id,
+          team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
+    }
 #endif
 
-        // return to default state
-        this_thr->th.ompt_thread_info.state = ompt_state_overhead;
-    }
+    // return to default state
+    this_thr->th.ompt_thread_info.state = ompt_state_overhead;
+  }
 #endif
-    ANNOTATE_BARRIER_END(&team->t.t_bar);
+  ANNOTATE_BARRIER_END(&team->t.t_bar);
 }
 
-
-// TODO release worker threads' fork barriers as we are ready instead of all at once
-void
-__kmp_fork_barrier(int gtid, int tid)
-{
-    KMP_TIME_PARTITIONED_BLOCK(OMP_fork_barrier);
-    KMP_SET_THREAD_STATE_BLOCK(FORK_JOIN_BARRIER);
-    kmp_info_t *this_thr = __kmp_threads[gtid];
-    kmp_team_t *team = (tid == 0) ? this_thr->th.th_team : NULL;
+// TODO release worker threads' fork barriers as we are ready instead of all at
+// once
+void __kmp_fork_barrier(int gtid, int tid) {
+  KMP_TIME_PARTITIONED_BLOCK(OMP_fork_barrier);
+  KMP_SET_THREAD_STATE_BLOCK(FORK_JOIN_BARRIER);
+  kmp_info_t *this_thr = __kmp_threads[gtid];
+  kmp_team_t *team = (tid == 0) ? this_thr->th.th_team : NULL;
 #if USE_ITT_BUILD
-    void * itt_sync_obj = NULL;
+  void *itt_sync_obj = NULL;
 #endif /* USE_ITT_BUILD */
-    if (team)
-      ANNOTATE_BARRIER_END(&team->t.t_bar);
+  if (team)
+    ANNOTATE_BARRIER_END(&team->t.t_bar);
 
-    KA_TRACE(10, ("__kmp_fork_barrier: T#%d(%d:%d) has arrived\n",
-                  gtid, (team != NULL) ? team->t.t_id : -1, tid));
+  KA_TRACE(10, ("__kmp_fork_barrier: T#%d(%d:%d) has arrived\n", gtid,
+                (team != NULL) ? team->t.t_id : -1, tid));
 
-    // th_team pointer only valid for master thread here
-    if (KMP_MASTER_TID(tid)) {
+  // th_team pointer only valid for master thread here
+  if (KMP_MASTER_TID(tid)) {
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
-            // Create itt barrier object
-            itt_sync_obj  = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 1);
-            __kmp_itt_barrier_middle(gtid, itt_sync_obj);  // Call acquired/releasing
-        }
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
+      // Create itt barrier object
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier, 1);
+      __kmp_itt_barrier_middle(gtid, itt_sync_obj); // Call acquired/releasing
+    }
 #endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
 
 #ifdef KMP_DEBUG
-        register kmp_info_t **other_threads = team->t.t_threads;
-        register int i;
+    register kmp_info_t **other_threads = team->t.t_threads;
+    register int i;
 
-        // Verify state
-        KMP_MB();
+    // Verify state
+    KMP_MB();
 
-        for(i=1; i<team->t.t_nproc; ++i) {
-            KA_TRACE(500, ("__kmp_fork_barrier: T#%d(%d:0) checking T#%d(%d:%d) fork go == %u.\n",
-                           gtid, team->t.t_id, other_threads[i]->th.th_info.ds.ds_gtid,
-                           team->t.t_id, other_threads[i]->th.th_info.ds.ds_tid,
-                           other_threads[i]->th.th_bar[bs_forkjoin_barrier].bb.b_go));
-            KMP_DEBUG_ASSERT((TCR_4(other_threads[i]->th.th_bar[bs_forkjoin_barrier].bb.b_go)
-                              & ~(KMP_BARRIER_SLEEP_STATE))
-                             == KMP_INIT_BARRIER_STATE);
-            KMP_DEBUG_ASSERT(other_threads[i]->th.th_team == team);
-        }
+    for (i = 1; i < team->t.t_nproc; ++i) {
+      KA_TRACE(500,
+               ("__kmp_fork_barrier: T#%d(%d:0) checking T#%d(%d:%d) fork go "
+                "== %u.\n",
+                gtid, team->t.t_id, other_threads[i]->th.th_info.ds.ds_gtid,
+                team->t.t_id, other_threads[i]->th.th_info.ds.ds_tid,
+                other_threads[i]->th.th_bar[bs_forkjoin_barrier].bb.b_go));
+      KMP_DEBUG_ASSERT(
+          (TCR_4(other_threads[i]->th.th_bar[bs_forkjoin_barrier].bb.b_go) &
+           ~(KMP_BARRIER_SLEEP_STATE)) == KMP_INIT_BARRIER_STATE);
+      KMP_DEBUG_ASSERT(other_threads[i]->th.th_team == team);
+    }
 #endif
 
-        if (__kmp_tasking_mode != tskm_immediate_exec) {
-            __kmp_task_team_setup(this_thr, team, 0);  // 0 indicates setup current task team if nthreads > 1
-        }
+    if (__kmp_tasking_mode != tskm_immediate_exec) {
+      __kmp_task_team_setup(
+          this_thr, team,
+          0); // 0 indicates setup current task team if nthreads > 1
+    }
 
-        /* The master thread may have changed its blocktime between the join barrier and the
-           fork barrier. Copy the blocktime info to the thread, where __kmp_wait_template() can
-           access it when the team struct is not guaranteed to exist. */
-        // See note about the corresponding code in __kmp_join_barrier() being performance-critical
-        if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
+    /* The master thread may have changed its blocktime between the join barrier
+       and the fork barrier. Copy the blocktime info to the thread, where
+       __kmp_wait_template() can access it when the team struct is not
+       guaranteed to exist. */
+    // See note about the corresponding code in __kmp_join_barrier() being
+    // performance-critical
+    if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
 #if KMP_USE_MONITOR
-            this_thr->th.th_team_bt_intervals = team->t.t_implicit_task_taskdata[tid].td_icvs.bt_intervals;
-            this_thr->th.th_team_bt_set = team->t.t_implicit_task_taskdata[tid].td_icvs.bt_set;
+      this_thr->th.th_team_bt_intervals =
+          team->t.t_implicit_task_taskdata[tid].td_icvs.bt_intervals;
+      this_thr->th.th_team_bt_set =
+          team->t.t_implicit_task_taskdata[tid].td_icvs.bt_set;
 #else
-            this_thr->th.th_team_bt_intervals = KMP_BLOCKTIME_INTERVAL();
+      this_thr->th.th_team_bt_intervals = KMP_BLOCKTIME_INTERVAL();
 #endif
-        }
-    } // master
-
-    switch (__kmp_barrier_release_pattern[bs_forkjoin_barrier]) {
-    case bp_hyper_bar: {
-        KMP_ASSERT(__kmp_barrier_release_branch_bits[bs_forkjoin_barrier]);
-        __kmp_hyper_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid, TRUE
-                                    USE_ITT_BUILD_ARG(itt_sync_obj) );
-        break;
-    }
-    case bp_hierarchical_bar: {
-        __kmp_hierarchical_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid, TRUE
-                                           USE_ITT_BUILD_ARG(itt_sync_obj) );
-        break;
-    }
-    case bp_tree_bar: {
-        KMP_ASSERT(__kmp_barrier_release_branch_bits[bs_forkjoin_barrier]);
-        __kmp_tree_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid, TRUE
-                                   USE_ITT_BUILD_ARG(itt_sync_obj) );
-        break;
-    }
-    default: {
-        __kmp_linear_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid, TRUE
-                                     USE_ITT_BUILD_ARG(itt_sync_obj) );
-    }
     }
+  } // master
 
-    // Early exit for reaping threads releasing forkjoin barrier
-    if (TCR_4(__kmp_global.g.g_done)) {
-        this_thr->th.th_task_team = NULL;
+  switch (__kmp_barrier_release_pattern[bs_forkjoin_barrier]) {
+  case bp_hyper_bar: {
+    KMP_ASSERT(__kmp_barrier_release_branch_bits[bs_forkjoin_barrier]);
+    __kmp_hyper_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid,
+                                TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+    break;
+  }
+  case bp_hierarchical_bar: {
+    __kmp_hierarchical_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid,
+                                       TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+    break;
+  }
+  case bp_tree_bar: {
+    KMP_ASSERT(__kmp_barrier_release_branch_bits[bs_forkjoin_barrier]);
+    __kmp_tree_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid,
+                               TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+    break;
+  }
+  default: {
+    __kmp_linear_barrier_release(bs_forkjoin_barrier, this_thr, gtid, tid,
+                                 TRUE USE_ITT_BUILD_ARG(itt_sync_obj));
+  }
+  }
+
+  // Early exit for reaping threads releasing forkjoin barrier
+  if (TCR_4(__kmp_global.g.g_done)) {
+    this_thr->th.th_task_team = NULL;
 
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-        if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
-            if (!KMP_MASTER_TID(tid)) {
-                itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
-                if (itt_sync_obj)
-                    __kmp_itt_barrier_finished(gtid, itt_sync_obj);
-            }
-        }
-#endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
-        KA_TRACE(10, ("__kmp_fork_barrier: T#%d is leaving early\n", gtid));
-        return;
+    if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
+      if (!KMP_MASTER_TID(tid)) {
+        itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
+        if (itt_sync_obj)
+          __kmp_itt_barrier_finished(gtid, itt_sync_obj);
+      }
     }
-
-    /* We can now assume that a valid team structure has been allocated by the master and
-       propagated to all worker threads. The current thread, however, may not be part of the
-       team, so we can't blindly assume that the team pointer is non-null.  */
-    team = (kmp_team_t *)TCR_PTR(this_thr->th.th_team);
-    KMP_DEBUG_ASSERT(team != NULL);
-    tid = __kmp_tid_from_gtid(gtid);
-
+#endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
+    KA_TRACE(10, ("__kmp_fork_barrier: T#%d is leaving early\n", gtid));
+    return;
+  }
+
+  /* We can now assume that a valid team structure has been allocated by the
+     master and propagated to all worker threads. The current thread, however,
+     may not be part of the team, so we can't blindly assume that the team
+     pointer is non-null.  */
+  team = (kmp_team_t *)TCR_PTR(this_thr->th.th_team);
+  KMP_DEBUG_ASSERT(team != NULL);
+  tid = __kmp_tid_from_gtid(gtid);
 
 #if KMP_BARRIER_ICV_PULL
-    /* Master thread's copy of the ICVs was set up on the implicit taskdata in
-       __kmp_reinitialize_team. __kmp_fork_call() assumes the master thread's implicit task has
-       this data before this function is called. We cannot modify __kmp_fork_call() to look at
-       the fixed ICVs in the master's thread struct, because it is not always the case that the
-       threads arrays have been allocated when __kmp_fork_call() is executed. */
-    {
-        KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(USER_icv_copy);
-        if (!KMP_MASTER_TID(tid)) {  // master thread already has ICVs
-            // Copy the initial ICVs from the master's thread struct to the implicit task for this tid.
-            KA_TRACE(10, ("__kmp_fork_barrier: T#%d(%d) is PULLing ICVs\n", gtid, tid));
-            __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[tid], team, tid, FALSE);
-            copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
-                      &team->t.t_threads[0]->th.th_bar[bs_forkjoin_barrier].bb.th_fixed_icvs);
-        }
+  /* Master thread's copy of the ICVs was set up on the implicit taskdata in
+     __kmp_reinitialize_team. __kmp_fork_call() assumes the master thread's
+     implicit task has this data before this function is called. We cannot
+     modify __kmp_fork_call() to look at the fixed ICVs in the master's thread
+     struct, because it is not always the case that the threads arrays have
+     been allocated when __kmp_fork_call() is executed. */
+  {
+    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(USER_icv_copy);
+    if (!KMP_MASTER_TID(tid)) { // master thread already has ICVs
+      // Copy the initial ICVs from the master's thread struct to the implicit
+      // task for this tid.
+      KA_TRACE(10,
+               ("__kmp_fork_barrier: T#%d(%d) is PULLing ICVs\n", gtid, tid));
+      __kmp_init_implicit_task(team->t.t_ident, team->t.t_threads[tid], team,
+                               tid, FALSE);
+      copy_icvs(&team->t.t_implicit_task_taskdata[tid].td_icvs,
+                &team->t.t_threads[0]
+                     ->th.th_bar[bs_forkjoin_barrier]
+                     .bb.th_fixed_icvs);
     }
+  }
 #endif // KMP_BARRIER_ICV_PULL
 
-    if (__kmp_tasking_mode != tskm_immediate_exec) {
-        __kmp_task_team_sync(this_thr, team);
-    }
+  if (__kmp_tasking_mode != tskm_immediate_exec) {
+    __kmp_task_team_sync(this_thr, team);
+  }
 
 #if OMP_40_ENABLED && KMP_AFFINITY_SUPPORTED
-    kmp_proc_bind_t proc_bind = team->t.t_proc_bind;
-    if (proc_bind == proc_bind_intel) {
+  kmp_proc_bind_t proc_bind = team->t.t_proc_bind;
+  if (proc_bind == proc_bind_intel) {
 #endif
 #if KMP_AFFINITY_SUPPORTED
-        // Call dynamic affinity settings
-        if(__kmp_affinity_type == affinity_balanced && team->t.t_size_changed) {
-            __kmp_balanced_affinity(tid, team->t.t_nproc);
-        }
+    // Call dynamic affinity settings
+    if (__kmp_affinity_type == affinity_balanced && team->t.t_size_changed) {
+      __kmp_balanced_affinity(tid, team->t.t_nproc);
+    }
 #endif // KMP_AFFINITY_SUPPORTED
 #if OMP_40_ENABLED && KMP_AFFINITY_SUPPORTED
+  } else if (proc_bind != proc_bind_false) {
+    if (this_thr->th.th_new_place == this_thr->th.th_current_place) {
+      KA_TRACE(100, ("__kmp_fork_barrier: T#%d already in correct place %d\n",
+                     __kmp_gtid_from_thread(this_thr),
+                     this_thr->th.th_current_place));
+    } else {
+      __kmp_affinity_set_place(gtid);
     }
-    else if (proc_bind != proc_bind_false) {
-        if (this_thr->th.th_new_place == this_thr->th.th_current_place) {
-            KA_TRACE(100, ("__kmp_fork_barrier: T#%d already in correct place %d\n",
-                           __kmp_gtid_from_thread(this_thr), this_thr->th.th_current_place));
-        }
-        else {
-            __kmp_affinity_set_place(gtid);
-        }
-    }
+  }
 #endif
 
 #if USE_ITT_BUILD && USE_ITT_NOTIFY
-    if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
-        if (!KMP_MASTER_TID(tid)) {
-            // Get correct barrier object
-            itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
-            __kmp_itt_barrier_finished(gtid, itt_sync_obj);  // Workers call acquired
-        } // (prepare called inside barrier_release)
-    }
+  if (__itt_sync_create_ptr || KMP_ITT_DEBUG) {
+    if (!KMP_MASTER_TID(tid)) {
+      // Get correct barrier object
+      itt_sync_obj = __kmp_itt_barrier_object(gtid, bs_forkjoin_barrier);
+      __kmp_itt_barrier_finished(gtid, itt_sync_obj); // Workers call acquired
+    } // (prepare called inside barrier_release)
+  }
 #endif /* USE_ITT_BUILD && USE_ITT_NOTIFY */
-    ANNOTATE_BARRIER_END(&team->t.t_bar);
-    KA_TRACE(10, ("__kmp_fork_barrier: T#%d(%d:%d) is leaving\n", gtid, team->t.t_id, tid));
+  ANNOTATE_BARRIER_END(&team->t.t_bar);
+  KA_TRACE(10, ("__kmp_fork_barrier: T#%d(%d:%d) is leaving\n", gtid,
+                team->t.t_id, tid));
 }
 
-
-void
-__kmp_setup_icv_copy(kmp_team_t *team, int new_nproc, kmp_internal_control_t *new_icvs, ident_t *loc )
-{
-    KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_setup_icv_copy);
-
-    KMP_DEBUG_ASSERT(team && new_nproc && new_icvs);
-    KMP_DEBUG_ASSERT((!TCR_4(__kmp_init_parallel)) || new_icvs->nproc);
-
-    /* Master thread's copy of the ICVs was set up on the implicit taskdata in
-       __kmp_reinitialize_team. __kmp_fork_call() assumes the master thread's implicit task has
-       this data before this function is called. */
+void __kmp_setup_icv_copy(kmp_team_t *team, int new_nproc,
+                          kmp_internal_control_t *new_icvs, ident_t *loc) {
+  KMP_TIME_DEVELOPER_PARTITIONED_BLOCK(KMP_setup_icv_copy);
+
+  KMP_DEBUG_ASSERT(team && new_nproc && new_icvs);
+  KMP_DEBUG_ASSERT((!TCR_4(__kmp_init_parallel)) || new_icvs->nproc);
+
+/* Master thread's copy of the ICVs was set up on the implicit taskdata in
+   __kmp_reinitialize_team. __kmp_fork_call() assumes the master thread's
+   implicit task has this data before this function is called. */
 #if KMP_BARRIER_ICV_PULL
-    /* Copy ICVs to master's thread structure into th_fixed_icvs (which remains untouched), where
-       all of the worker threads can access them and make their own copies after the barrier. */
-    KMP_DEBUG_ASSERT(team->t.t_threads[0]);  // The threads arrays should be allocated at this point
-    copy_icvs(&team->t.t_threads[0]->th.th_bar[bs_forkjoin_barrier].bb.th_fixed_icvs, new_icvs);
-    KF_TRACE(10, ("__kmp_setup_icv_copy: PULL: T#%d this_thread=%p team=%p\n",
-                  0, team->t.t_threads[0], team));
+  /* Copy ICVs to master's thread structure into th_fixed_icvs (which remains
+     untouched), where all of the worker threads can access them and make their
+     own copies after the barrier. */
+  KMP_DEBUG_ASSERT(team->t.t_threads[0]); // The threads arrays should be
+  // allocated at this point
+  copy_icvs(
+      &team->t.t_threads[0]->th.th_bar[bs_forkjoin_barrier].bb.th_fixed_icvs,
+      new_icvs);
+  KF_TRACE(10, ("__kmp_setup_icv_copy: PULL: T#%d this_thread=%p team=%p\n", 0,
+                team->t.t_threads[0], team));
 #elif KMP_BARRIER_ICV_PUSH
-    // The ICVs will be propagated in the fork barrier, so nothing needs to be done here.
-    KF_TRACE(10, ("__kmp_setup_icv_copy: PUSH: T#%d this_thread=%p team=%p\n",
-                  0, team->t.t_threads[0], team));
+  // The ICVs will be propagated in the fork barrier, so nothing needs to be
+  // done here.
+  KF_TRACE(10, ("__kmp_setup_icv_copy: PUSH: T#%d this_thread=%p team=%p\n", 0,
+                team->t.t_threads[0], team));
 #else
-    // Copy the ICVs to each of the non-master threads.  This takes O(nthreads) time.
-    ngo_load(new_icvs);
-    KMP_DEBUG_ASSERT(team->t.t_threads[0]);  // The threads arrays should be allocated at this point
-    for (int f=1; f<new_nproc; ++f) { // Skip the master thread
-        // TODO: GEH - pass in better source location info since usually NULL here
-        KF_TRACE(10, ("__kmp_setup_icv_copy: LINEAR: T#%d this_thread=%p team=%p\n",
-                      f, team->t.t_threads[f], team));
-        __kmp_init_implicit_task(loc, team->t.t_threads[f], team, f, FALSE);
-        ngo_store_icvs(&team->t.t_implicit_task_taskdata[f].td_icvs, new_icvs);
-        KF_TRACE(10, ("__kmp_setup_icv_copy: LINEAR: T#%d this_thread=%p team=%p\n",
-                      f, team->t.t_threads[f], team));
-    }
-    ngo_sync();
+  // Copy the ICVs to each of the non-master threads.  This takes O(nthreads)
+  // time.
+  ngo_load(new_icvs);
+  KMP_DEBUG_ASSERT(team->t.t_threads[0]); // The threads arrays should be
+  // allocated at this point
+  for (int f = 1; f < new_nproc; ++f) { // Skip the master thread
+    // TODO: GEH - pass in better source location info since usually NULL here
+    KF_TRACE(10, ("__kmp_setup_icv_copy: LINEAR: T#%d this_thread=%p team=%p\n",
+                  f, team->t.t_threads[f], team));
+    __kmp_init_implicit_task(loc, team->t.t_threads[f], team, f, FALSE);
+    ngo_store_icvs(&team->t.t_implicit_task_taskdata[f].td_icvs, new_icvs);
+    KF_TRACE(10, ("__kmp_setup_icv_copy: LINEAR: T#%d this_thread=%p team=%p\n",
+                  f, team->t.t_threads[f], team));
+  }
+  ngo_sync();
 #endif // KMP_BARRIER_ICV_PULL
 }

Modified: openmp/trunk/runtime/src/kmp_cancel.cpp
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_cancel.cpp?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_cancel.cpp (original)
+++ openmp/trunk/runtime/src/kmp_cancel.cpp Fri May 12 13:01:32 2017
@@ -22,76 +22,80 @@
 @param gtid Global thread ID of encountering thread
 @param cncl_kind Cancellation kind (parallel, for, sections, taskgroup)
 
- at return returns true if the cancellation request has been activated and the execution thread
-needs to proceed to the end of the canceled region.
+ at return returns true if the cancellation request has been activated and the
+execution thread needs to proceed to the end of the canceled region.
 
 Request cancellation of the binding OpenMP region.
 */
-kmp_int32 __kmpc_cancel(ident_t* loc_ref, kmp_int32 gtid, kmp_int32 cncl_kind) {
-    kmp_info_t *this_thr = __kmp_threads [ gtid ];
+kmp_int32 __kmpc_cancel(ident_t *loc_ref, kmp_int32 gtid, kmp_int32 cncl_kind) {
+  kmp_info_t *this_thr = __kmp_threads[gtid];
 
-    KC_TRACE( 10, ("__kmpc_cancel: T#%d request %d OMP_CANCELLATION=%d\n", gtid, cncl_kind, __kmp_omp_cancellation) );
+  KC_TRACE(10, ("__kmpc_cancel: T#%d request %d OMP_CANCELLATION=%d\n", gtid,
+                cncl_kind, __kmp_omp_cancellation));
 
-    KMP_DEBUG_ASSERT(cncl_kind != cancel_noreq);
-    KMP_DEBUG_ASSERT(cncl_kind == cancel_parallel || cncl_kind == cancel_loop ||
-                     cncl_kind == cancel_sections || cncl_kind == cancel_taskgroup);
-    KMP_DEBUG_ASSERT(__kmp_get_gtid() == gtid);
-
-    if (__kmp_omp_cancellation) {
-        switch (cncl_kind) {
-        case cancel_parallel:
-        case cancel_loop:
-        case cancel_sections:
-            // cancellation requests for parallel and worksharing constructs
-            // are handled through the team structure
-            {
-                kmp_team_t *this_team = this_thr->th.th_team;
-                KMP_DEBUG_ASSERT(this_team);
-                kmp_int32 old = KMP_COMPARE_AND_STORE_RET32(&(this_team->t.t_cancel_request), cancel_noreq, cncl_kind);
-                if (old == cancel_noreq || old == cncl_kind) {
-                    //printf("__kmpc_cancel: this_team->t.t_cancel_request=%d @ %p\n",
-                    //       this_team->t.t_cancel_request, &(this_team->t.t_cancel_request));
-                    // we do not have a cancellation request in this team or we do have one
-                    // that matches the current request -> cancel
-                    return 1 /* true */;
-                }
-                break;
-            }
-        case cancel_taskgroup:
-            // cancellation requests for a task group
-            // are handled through the taskgroup structure
-            {
-                kmp_taskdata_t*  task;
-                kmp_taskgroup_t* taskgroup;
-
-                task = this_thr->th.th_current_task;
-                KMP_DEBUG_ASSERT( task );
-
-                taskgroup = task->td_taskgroup;
-                if (taskgroup) {
-                    kmp_int32 old = KMP_COMPARE_AND_STORE_RET32(&(taskgroup->cancel_request), cancel_noreq, cncl_kind);
-                    if (old == cancel_noreq || old == cncl_kind) {
-                        // we do not have a cancellation request in this taskgroup or we do have one
-                        // that matches the current request -> cancel
-                        return 1 /* true */;
-                    }
-                }
-                else {
-                    // TODO: what needs to happen here?
-                    // the specification disallows cancellation w/o taskgroups
-                    // so we might do anything here, let's abort for now
-                    KMP_ASSERT( 0 /* false */);
-                }
-            }
-            break;
-        default:
-            KMP_ASSERT (0 /* false */);
+  KMP_DEBUG_ASSERT(cncl_kind != cancel_noreq);
+  KMP_DEBUG_ASSERT(cncl_kind == cancel_parallel || cncl_kind == cancel_loop ||
+                   cncl_kind == cancel_sections ||
+                   cncl_kind == cancel_taskgroup);
+  KMP_DEBUG_ASSERT(__kmp_get_gtid() == gtid);
+
+  if (__kmp_omp_cancellation) {
+    switch (cncl_kind) {
+    case cancel_parallel:
+    case cancel_loop:
+    case cancel_sections:
+      // cancellation requests for parallel and worksharing constructs
+      // are handled through the team structure
+      {
+        kmp_team_t *this_team = this_thr->th.th_team;
+        KMP_DEBUG_ASSERT(this_team);
+        kmp_int32 old = KMP_COMPARE_AND_STORE_RET32(
+            &(this_team->t.t_cancel_request), cancel_noreq, cncl_kind);
+        if (old == cancel_noreq || old == cncl_kind) {
+          // printf("__kmpc_cancel: this_team->t.t_cancel_request=%d @ %p\n",
+          //       this_team->t.t_cancel_request,
+          //       &(this_team->t.t_cancel_request));
+          // we do not have a cancellation request in this team or we do have
+          // one that matches the current request -> cancel
+          return 1 /* true */;
         }
+        break;
+      }
+    case cancel_taskgroup:
+      // cancellation requests for a task group
+      // are handled through the taskgroup structure
+      {
+        kmp_taskdata_t *task;
+        kmp_taskgroup_t *taskgroup;
+
+        task = this_thr->th.th_current_task;
+        KMP_DEBUG_ASSERT(task);
+
+        taskgroup = task->td_taskgroup;
+        if (taskgroup) {
+          kmp_int32 old = KMP_COMPARE_AND_STORE_RET32(
+              &(taskgroup->cancel_request), cancel_noreq, cncl_kind);
+          if (old == cancel_noreq || old == cncl_kind) {
+            // we do not have a cancellation request in this taskgroup or we do
+            // have one that matches the current request -> cancel
+            return 1 /* true */;
+          }
+        } else {
+          // TODO: what needs to happen here?
+          // the specification disallows cancellation w/o taskgroups
+          // so we might do anything here, let's abort for now
+          KMP_ASSERT(0 /* false */);
+        }
+      }
+      break;
+    default:
+      KMP_ASSERT(0 /* false */);
     }
+  }
 
-    // ICV OMP_CANCELLATION=false, so we ignored this cancel request
-    KMP_DEBUG_ASSERT(!__kmp_omp_cancellation);
-    return 0 /* false */;
+  // ICV OMP_CANCELLATION=false, so we ignored this cancel request
+  KMP_DEBUG_ASSERT(!__kmp_omp_cancellation);
+  return 0 /* false */;
 }
 
 /*!
@@ -100,77 +104,77 @@ kmp_int32 __kmpc_cancel(ident_t* loc_ref
 @param gtid Global thread ID of encountering thread
 @param cncl_kind Cancellation kind (parallel, for, sections, taskgroup)
 
- at return returns true if a matching cancellation request has been flagged in the RTL and the
-encountering thread has to cancel..
+ at return returns true if a matching cancellation request has been flagged in the
+RTL and the encountering thread has to cancel..
 
 Cancellation point for the encountering thread.
 */
-kmp_int32 __kmpc_cancellationpoint(ident_t* loc_ref, kmp_int32 gtid, kmp_int32 cncl_kind) {
-    kmp_info_t *this_thr = __kmp_threads [ gtid ];
-
-    KC_TRACE( 10, ("__kmpc_cancellationpoint: T#%d request %d OMP_CANCELLATION=%d\n", gtid, cncl_kind, __kmp_omp_cancellation) );
-
-    KMP_DEBUG_ASSERT(cncl_kind != cancel_noreq);
-    KMP_DEBUG_ASSERT(cncl_kind == cancel_parallel || cncl_kind == cancel_loop ||
-                     cncl_kind == cancel_sections || cncl_kind == cancel_taskgroup);
-    KMP_DEBUG_ASSERT(__kmp_get_gtid() == gtid);
-
-    if (__kmp_omp_cancellation) {
-        switch (cncl_kind) {
-        case cancel_parallel:
-        case cancel_loop:
-        case cancel_sections:
-            // cancellation requests for parallel and worksharing constructs
-            // are handled through the team structure
-            {
-                kmp_team_t *this_team = this_thr->th.th_team;
-                KMP_DEBUG_ASSERT(this_team);
-                if (this_team->t.t_cancel_request) {
-                    if (cncl_kind == this_team->t.t_cancel_request) {
-                        // the request in the team structure matches the type of
-                        // cancellation point so we can cancel
-                        return 1 /* true */;
-                    }
-                    KMP_ASSERT( 0 /* false */);
-                }
-                else {
-                    // we do not have a cancellation request pending, so we just
-                    // ignore this cancellation point
-                    return 0;
-                }
-                break;
-            }
-        case cancel_taskgroup:
-            // cancellation requests for a task group
-            // are handled through the taskgroup structure
-            {
-                kmp_taskdata_t*  task;
-                kmp_taskgroup_t* taskgroup;
-
-                task = this_thr->th.th_current_task;
-                KMP_DEBUG_ASSERT( task );
-
-                taskgroup = task->td_taskgroup;
-                if (taskgroup) {
-                    // return the current status of cancellation for the
-                    // taskgroup
-                    return !!taskgroup->cancel_request;
-                }
-                else {
-                    // if a cancellation point is encountered by a task
-                    // that does not belong to a taskgroup, it is OK
-                    // to ignore it
-                    return 0 /* false */;
-                }
-            }
-        default:
-            KMP_ASSERT (0 /* false */);
+kmp_int32 __kmpc_cancellationpoint(ident_t *loc_ref, kmp_int32 gtid,
+                                   kmp_int32 cncl_kind) {
+  kmp_info_t *this_thr = __kmp_threads[gtid];
+
+  KC_TRACE(10,
+           ("__kmpc_cancellationpoint: T#%d request %d OMP_CANCELLATION=%d\n",
+            gtid, cncl_kind, __kmp_omp_cancellation));
+
+  KMP_DEBUG_ASSERT(cncl_kind != cancel_noreq);
+  KMP_DEBUG_ASSERT(cncl_kind == cancel_parallel || cncl_kind == cancel_loop ||
+                   cncl_kind == cancel_sections ||
+                   cncl_kind == cancel_taskgroup);
+  KMP_DEBUG_ASSERT(__kmp_get_gtid() == gtid);
+
+  if (__kmp_omp_cancellation) {
+    switch (cncl_kind) {
+    case cancel_parallel:
+    case cancel_loop:
+    case cancel_sections:
+      // cancellation requests for parallel and worksharing constructs
+      // are handled through the team structure
+      {
+        kmp_team_t *this_team = this_thr->th.th_team;
+        KMP_DEBUG_ASSERT(this_team);
+        if (this_team->t.t_cancel_request) {
+          if (cncl_kind == this_team->t.t_cancel_request) {
+            // the request in the team structure matches the type of
+            // cancellation point so we can cancel
+            return 1 /* true */;
+          }
+          KMP_ASSERT(0 /* false */);
+        } else {
+          // we do not have a cancellation request pending, so we just
+          // ignore this cancellation point
+          return 0;
+        }
+        break;
+      }
+    case cancel_taskgroup:
+      // cancellation requests for a task group
+      // are handled through the taskgroup structure
+      {
+        kmp_taskdata_t *task;
+        kmp_taskgroup_t *taskgroup;
+
+        task = this_thr->th.th_current_task;
+        KMP_DEBUG_ASSERT(task);
+
+        taskgroup = task->td_taskgroup;
+        if (taskgroup) {
+          // return the current status of cancellation for the taskgroup
+          return !!taskgroup->cancel_request;
+        } else {
+          // if a cancellation point is encountered by a task that does not
+          // belong to a taskgroup, it is OK to ignore it
+          return 0 /* false */;
         }
+      }
+    default:
+      KMP_ASSERT(0 /* false */);
     }
+  }
 
-    // ICV OMP_CANCELLATION=false, so we ignore the cancellation point
-    KMP_DEBUG_ASSERT(!__kmp_omp_cancellation);
-    return 0 /* false */;
+  // ICV OMP_CANCELLATION=false, so we ignore the cancellation point
+  KMP_DEBUG_ASSERT(!__kmp_omp_cancellation);
+  return 0 /* false */;
 }
 
 /*!
@@ -178,63 +182,61 @@ kmp_int32 __kmpc_cancellationpoint(ident
 @param loc_ref location of the original task directive
 @param gtid Global thread ID of encountering thread
 
- at return returns true if a matching cancellation request has been flagged in the RTL and the
-encountering thread has to cancel..
+ at return returns true if a matching cancellation request has been flagged in the
+RTL and the encountering thread has to cancel..
 
 Barrier with cancellation point to send threads from the barrier to the
 end of the parallel region.  Needs a special code pattern as documented
 in the design document for the cancellation feature.
 */
-kmp_int32
-__kmpc_cancel_barrier(ident_t *loc, kmp_int32 gtid) {
-    int ret = 0 /* false */;
-    kmp_info_t *this_thr = __kmp_threads [ gtid ];
-    kmp_team_t *this_team = this_thr->th.th_team;
-
-    KMP_DEBUG_ASSERT(__kmp_get_gtid() == gtid);
-
-    // call into the standard barrier
-    __kmpc_barrier(loc, gtid);
-
-    // if cancellation is active, check cancellation flag
-    if (__kmp_omp_cancellation) {
-        // depending on which construct to cancel, check the flag and
-        // reset the flag
-        switch (this_team->t.t_cancel_request) {
-        case cancel_parallel:
-            ret = 1;
-            // ensure that threads have checked the flag, when
-            // leaving the above barrier
-            __kmpc_barrier(loc, gtid);
-            this_team->t.t_cancel_request = cancel_noreq;
-            // the next barrier is the fork/join barrier, which
-            // synchronizes the threads leaving here
-            break;
-        case cancel_loop:
-        case cancel_sections:
-            ret = 1;
-            // ensure that threads have checked the flag, when
-            // leaving the above barrier
-            __kmpc_barrier(loc, gtid);
-            this_team->t.t_cancel_request = cancel_noreq;
-            // synchronize the threads again to make sure we
-            // do not have any run-away threads that cause a race
-            // on the cancellation flag
-            __kmpc_barrier(loc, gtid);
-            break;
-        case cancel_taskgroup:
-            // this case should not occur
-            KMP_ASSERT (0 /* false */ );
-            break;
-        case cancel_noreq:
-            // do nothing
-            break;
-        default:
-            KMP_ASSERT ( 0 /* false */);
-        }
+kmp_int32 __kmpc_cancel_barrier(ident_t *loc, kmp_int32 gtid) {
+  int ret = 0 /* false */;
+  kmp_info_t *this_thr = __kmp_threads[gtid];
+  kmp_team_t *this_team = this_thr->th.th_team;
+
+  KMP_DEBUG_ASSERT(__kmp_get_gtid() == gtid);
+
+  // call into the standard barrier
+  __kmpc_barrier(loc, gtid);
+
+  // if cancellation is active, check cancellation flag
+  if (__kmp_omp_cancellation) {
+    // depending on which construct to cancel, check the flag and
+    // reset the flag
+    switch (this_team->t.t_cancel_request) {
+    case cancel_parallel:
+      ret = 1;
+      // ensure that threads have checked the flag, when
+      // leaving the above barrier
+      __kmpc_barrier(loc, gtid);
+      this_team->t.t_cancel_request = cancel_noreq;
+      // the next barrier is the fork/join barrier, which
+      // synchronizes the threads leaving here
+      break;
+    case cancel_loop:
+    case cancel_sections:
+      ret = 1;
+      // ensure that threads have checked the flag, when
+      // leaving the above barrier
+      __kmpc_barrier(loc, gtid);
+      this_team->t.t_cancel_request = cancel_noreq;
+      // synchronize the threads again to make sure we do not have any run-away
+      // threads that cause a race on the cancellation flag
+      __kmpc_barrier(loc, gtid);
+      break;
+    case cancel_taskgroup:
+      // this case should not occur
+      KMP_ASSERT(0 /* false */);
+      break;
+    case cancel_noreq:
+      // do nothing
+      break;
+    default:
+      KMP_ASSERT(0 /* false */);
     }
+  }
 
-    return ret;
+  return ret;
 }
 
 /*!
@@ -242,8 +244,8 @@ __kmpc_cancel_barrier(ident_t *loc, kmp_
 @param loc_ref location of the original task directive
 @param gtid Global thread ID of encountering thread
 
- at return returns true if a matching cancellation request has been flagged in the RTL and the
-encountering thread has to cancel..
+ at return returns true if a matching cancellation request has been flagged in the
+RTL and the encountering thread has to cancel..
 
 Query function to query the current status of cancellation requests.
 Can be used to implement the following pattern:
@@ -254,29 +256,27 @@ if (kmp_get_cancellation_status(kmp_canc
 }
 */
 int __kmp_get_cancellation_status(int cancel_kind) {
-    if (__kmp_omp_cancellation) {
-        kmp_info_t *this_thr = __kmp_entry_thread();
+  if (__kmp_omp_cancellation) {
+    kmp_info_t *this_thr = __kmp_entry_thread();
 
-        switch (cancel_kind) {
-        case cancel_parallel:
-        case cancel_loop:
-        case cancel_sections:
-            {
-                kmp_team_t *this_team = this_thr->th.th_team;
-                return this_team->t.t_cancel_request == cancel_kind;
-            }
-        case cancel_taskgroup:
-            {
-                kmp_taskdata_t*  task;
-                kmp_taskgroup_t* taskgroup;
-                task = this_thr->th.th_current_task;
-                taskgroup = task->td_taskgroup;
-                return taskgroup && taskgroup->cancel_request;
-            }
-        }
+    switch (cancel_kind) {
+    case cancel_parallel:
+    case cancel_loop:
+    case cancel_sections: {
+      kmp_team_t *this_team = this_thr->th.th_team;
+      return this_team->t.t_cancel_request == cancel_kind;
+    }
+    case cancel_taskgroup: {
+      kmp_taskdata_t *task;
+      kmp_taskgroup_t *taskgroup;
+      task = this_thr->th.th_current_task;
+      taskgroup = task->td_taskgroup;
+      return taskgroup && taskgroup->cancel_request;
+    }
     }
+  }
 
-    return 0 /* false */;
+  return 0 /* false */;
 }
 
 #endif

Modified: openmp/trunk/runtime/src/kmp_csupport.cpp
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_csupport.cpp?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_csupport.cpp (original)
+++ openmp/trunk/runtime/src/kmp_csupport.cpp Fri May 12 13:01:32 2017
@@ -13,12 +13,12 @@
 //===----------------------------------------------------------------------===//
 
 
-#include "omp.h"        /* extern "C" declarations of user-visible routines */
+#include "omp.h" /* extern "C" declarations of user-visible routines */
 #include "kmp.h"
+#include "kmp_error.h"
 #include "kmp_i18n.h"
 #include "kmp_itt.h"
 #include "kmp_lock.h"
-#include "kmp_error.h"
 #include "kmp_stats.h"
 
 #if OMPT_SUPPORT
@@ -28,11 +28,8 @@
 
 #define MAX_MESSAGE 512
 
-/* ------------------------------------------------------------------------ */
-/* ------------------------------------------------------------------------ */
-
-/*  flags will be used in future, e.g., to implement */
-/*  openmp_strict library restrictions               */
+// flags will be used in future, e.g. to implement openmp_strict library
+// restrictions
 
 /*!
  * @ingroup STARTUP_SHUTDOWN
@@ -41,44 +38,41 @@
  *
  * Initialize the runtime library. This call is optional; if it is not made then
  * it will be implicitly called by attempts to use other library functions.
- *
  */
-void
-__kmpc_begin(ident_t *loc, kmp_int32 flags)
-{
-    // By default __kmpc_begin() is no-op.
-    char *env;
-    if ((env = getenv( "KMP_INITIAL_THREAD_BIND" )) != NULL &&
-        __kmp_str_match_true( env )) {
-        __kmp_middle_initialize();
-        KC_TRACE(10, ("__kmpc_begin: middle initialization called\n" ));
-    } else if (__kmp_ignore_mppbeg() == FALSE) {
-        // By default __kmp_ignore_mppbeg() returns TRUE.
-        __kmp_internal_begin();
-        KC_TRACE( 10, ("__kmpc_begin: called\n" ) );
-    }
+void __kmpc_begin(ident_t *loc, kmp_int32 flags) {
+  // By default __kmpc_begin() is no-op.
+  char *env;
+  if ((env = getenv("KMP_INITIAL_THREAD_BIND")) != NULL &&
+      __kmp_str_match_true(env)) {
+    __kmp_middle_initialize();
+    KC_TRACE(10, ("__kmpc_begin: middle initialization called\n"));
+  } else if (__kmp_ignore_mppbeg() == FALSE) {
+    // By default __kmp_ignore_mppbeg() returns TRUE.
+    __kmp_internal_begin();
+    KC_TRACE(10, ("__kmpc_begin: called\n"));
+  }
 }
 
 /*!
  * @ingroup STARTUP_SHUTDOWN
  * @param loc source location information
  *
- * Shutdown the runtime library. This is also optional, and even if called will not
- * do anything unless the `KMP_IGNORE_MPPEND` environment variable is set to zero.
-  */
-void
-__kmpc_end(ident_t *loc)
-{
-    // By default, __kmp_ignore_mppend() returns TRUE which makes __kmpc_end() call no-op.
-    // However, this can be overridden with KMP_IGNORE_MPPEND environment variable.
-    // If KMP_IGNORE_MPPEND is 0, __kmp_ignore_mppend() returns FALSE and __kmpc_end()
-    // will unregister this root (it can cause library shut down).
-    if (__kmp_ignore_mppend() == FALSE) {
-        KC_TRACE( 10, ("__kmpc_end: called\n" ) );
-        KA_TRACE( 30, ("__kmpc_end\n" ));
+ * Shutdown the runtime library. This is also optional, and even if called will
+ * not do anything unless the `KMP_IGNORE_MPPEND` environment variable is set to
+ * zero.
+ */
+void __kmpc_end(ident_t *loc) {
+  // By default, __kmp_ignore_mppend() returns TRUE which makes __kmpc_end()
+  // call no-op. However, this can be overridden with KMP_IGNORE_MPPEND
+  // environment variable. If KMP_IGNORE_MPPEND is 0, __kmp_ignore_mppend()
+  // returns FALSE and __kmpc_end() will unregister this root (it can cause
+  // library shut down).
+  if (__kmp_ignore_mppend() == FALSE) {
+    KC_TRACE(10, ("__kmpc_end: called\n"));
+    KA_TRACE(30, ("__kmpc_end\n"));
 
-        __kmp_internal_end_thread( -1 );
-    }
+    __kmp_internal_end_thread(-1);
+  }
 }
 
 /*!
@@ -89,8 +83,8 @@ __kmpc_end(ident_t *loc)
 This function can be called in any context.
 
 If the runtime has ony been entered at the outermost level from a
-single (necessarily non-OpenMP<sup>*</sup>) thread, then the thread number is that
-which would be returned by omp_get_thread_num() in the outermost
+single (necessarily non-OpenMP<sup>*</sup>) thread, then the thread number is
+that which would be returned by omp_get_thread_num() in the outermost
 active parallel construct. (Or zero if there is no active parallel
 construct, since the master thread is necessarily thread zero).
 
@@ -98,16 +92,13 @@ If multiple non-OpenMP threads all enter
 will be a unique thread identifier among all the threads created by
 the OpenMP runtime (but the value cannote be defined in terms of
 OpenMP thread ids returned by omp_get_thread_num()).
-
 */
-kmp_int32
-__kmpc_global_thread_num(ident_t *loc)
-{
-    kmp_int32 gtid = __kmp_entry_gtid();
+kmp_int32 __kmpc_global_thread_num(ident_t *loc) {
+  kmp_int32 gtid = __kmp_entry_gtid();
 
-    KC_TRACE( 10, ("__kmpc_global_thread_num: T#%d\n", gtid ) );
+  KC_TRACE(10, ("__kmpc_global_thread_num: T#%d\n", gtid));
 
-    return gtid;
+  return gtid;
 }
 
 /*!
@@ -116,32 +107,30 @@ __kmpc_global_thread_num(ident_t *loc)
 @return The number of threads under control of the OpenMP<sup>*</sup> runtime
 
 This function can be called in any context.
-It returns the total number of threads under the control of the OpenMP runtime. That is
-not a number that can be determined by any OpenMP standard calls, since the library may be
-called from more than one non-OpenMP thread, and this reflects the total over all such calls.
-Similarly the runtime maintains underlying threads even when they are not active (since the cost
-of creating and destroying OS threads is high), this call counts all such threads even if they are not
-waiting for work.
-*/
-kmp_int32
-__kmpc_global_num_threads(ident_t *loc)
-{
-    KC_TRACE(10,("__kmpc_global_num_threads: num_threads = %d\n", __kmp_all_nth));
+It returns the total number of threads under the control of the OpenMP runtime.
+That is not a number that can be determined by any OpenMP standard calls, since
+the library may be called from more than one non-OpenMP thread, and this
+reflects the total over all such calls. Similarly the runtime maintains
+underlying threads even when they are not active (since the cost of creating
+and destroying OS threads is high), this call counts all such threads even if
+they are not waiting for work.
+*/
+kmp_int32 __kmpc_global_num_threads(ident_t *loc) {
+  KC_TRACE(10,
+           ("__kmpc_global_num_threads: num_threads = %d\n", __kmp_all_nth));
 
-    return TCR_4(__kmp_all_nth);
+  return TCR_4(__kmp_all_nth);
 }
 
 /*!
 @ingroup THREAD_STATES
 @param loc Source location information.
- at return The thread number of the calling thread in the innermost active parallel construct.
-
+ at return The thread number of the calling thread in the innermost active parallel
+construct.
 */
-kmp_int32
-__kmpc_bound_thread_num(ident_t *loc)
-{
-    KC_TRACE( 10, ("__kmpc_bound_thread_num: called\n" ) );
-    return __kmp_tid_from_gtid( __kmp_entry_gtid() );
+kmp_int32 __kmpc_bound_thread_num(ident_t *loc) {
+  KC_TRACE(10, ("__kmpc_bound_thread_num: called\n"));
+  return __kmp_tid_from_gtid(__kmp_entry_gtid());
 }
 
 /*!
@@ -149,12 +138,10 @@ __kmpc_bound_thread_num(ident_t *loc)
 @param loc Source location information.
 @return The number of threads in the innermost active parallel construct.
 */
-kmp_int32
-__kmpc_bound_num_threads(ident_t *loc)
-{
-    KC_TRACE( 10, ("__kmpc_bound_num_threads: called\n" ) );
+kmp_int32 __kmpc_bound_num_threads(ident_t *loc) {
+  KC_TRACE(10, ("__kmpc_bound_num_threads: called\n"));
 
-    return __kmp_entry_thread() -> th.th_team -> t.t_nproc;
+  return __kmp_entry_thread()->th.th_team->t.t_nproc;
 }
 
 /*!
@@ -163,74 +150,70 @@ __kmpc_bound_num_threads(ident_t *loc)
  *
  * This function need not be called. It always returns TRUE.
  */
-kmp_int32
-__kmpc_ok_to_fork(ident_t *loc)
-{
+kmp_int32 __kmpc_ok_to_fork(ident_t *loc) {
 #ifndef KMP_DEBUG
 
-    return TRUE;
+  return TRUE;
 
 #else
 
-    const char *semi2;
-    const char *semi3;
-    int line_no;
-
-    if (__kmp_par_range == 0) {
-        return TRUE;
-    }
-    semi2 = loc->psource;
-    if (semi2 == NULL) {
-        return TRUE;
-    }
-    semi2 = strchr(semi2, ';');
-    if (semi2 == NULL) {
-        return TRUE;
-    }
-    semi2 = strchr(semi2 + 1, ';');
-    if (semi2 == NULL) {
-        return TRUE;
-    }
-    if (__kmp_par_range_filename[0]) {
-        const char *name = semi2 - 1;
-        while ((name > loc->psource) && (*name != '/') && (*name != ';')) {
-            name--;
-        }
-        if ((*name == '/') || (*name == ';')) {
-            name++;
-        }
-        if (strncmp(__kmp_par_range_filename, name, semi2 - name)) {
-            return __kmp_par_range < 0;
-        }
-    }
-    semi3 = strchr(semi2 + 1, ';');
-    if (__kmp_par_range_routine[0]) {
-        if ((semi3 != NULL) && (semi3 > semi2)
-          && (strncmp(__kmp_par_range_routine, semi2 + 1, semi3 - semi2 - 1))) {
-            return __kmp_par_range < 0;
-        }
-    }
-    if (KMP_SSCANF(semi3 + 1, "%d", &line_no) == 1) {
-        if ((line_no >= __kmp_par_range_lb) && (line_no <= __kmp_par_range_ub)) {
-            return __kmp_par_range > 0;
-        }
-        return __kmp_par_range < 0;
-    }
+  const char *semi2;
+  const char *semi3;
+  int line_no;
+
+  if (__kmp_par_range == 0) {
+    return TRUE;
+  }
+  semi2 = loc->psource;
+  if (semi2 == NULL) {
+    return TRUE;
+  }
+  semi2 = strchr(semi2, ';');
+  if (semi2 == NULL) {
     return TRUE;
+  }
+  semi2 = strchr(semi2 + 1, ';');
+  if (semi2 == NULL) {
+    return TRUE;
+  }
+  if (__kmp_par_range_filename[0]) {
+    const char *name = semi2 - 1;
+    while ((name > loc->psource) && (*name != '/') && (*name != ';')) {
+      name--;
+    }
+    if ((*name == '/') || (*name == ';')) {
+      name++;
+    }
+    if (strncmp(__kmp_par_range_filename, name, semi2 - name)) {
+      return __kmp_par_range < 0;
+    }
+  }
+  semi3 = strchr(semi2 + 1, ';');
+  if (__kmp_par_range_routine[0]) {
+    if ((semi3 != NULL) && (semi3 > semi2) &&
+        (strncmp(__kmp_par_range_routine, semi2 + 1, semi3 - semi2 - 1))) {
+      return __kmp_par_range < 0;
+    }
+  }
+  if (KMP_SSCANF(semi3 + 1, "%d", &line_no) == 1) {
+    if ((line_no >= __kmp_par_range_lb) && (line_no <= __kmp_par_range_ub)) {
+      return __kmp_par_range > 0;
+    }
+    return __kmp_par_range < 0;
+  }
+  return TRUE;
 
 #endif /* KMP_DEBUG */
-
 }
 
 /*!
 @ingroup THREAD_STATES
 @param loc Source location information.
- at return 1 if this thread is executing inside an active parallel region, zero if not.
+ at return 1 if this thread is executing inside an active parallel region, zero if
+not.
 */
-kmp_int32
-__kmpc_in_parallel( ident_t *loc )
-{
-    return __kmp_entry_thread() -> th.th_root -> r.r_active;
+kmp_int32 __kmpc_in_parallel(ident_t *loc) {
+  return __kmp_entry_thread()->th.th_root->r.r_active;
 }
 
 /*!
@@ -242,115 +225,103 @@ __kmpc_in_parallel( ident_t *loc )
 Set the number of threads to be used by the next fork spawned by this thread.
 This call is only required if the parallel construct has a `num_threads` clause.
 */
-void
-__kmpc_push_num_threads(ident_t *loc, kmp_int32 global_tid, kmp_int32 num_threads )
-{
-    KA_TRACE( 20, ("__kmpc_push_num_threads: enter T#%d num_threads=%d\n",
-      global_tid, num_threads ) );
+void __kmpc_push_num_threads(ident_t *loc, kmp_int32 global_tid,
+                             kmp_int32 num_threads) {
+  KA_TRACE(20, ("__kmpc_push_num_threads: enter T#%d num_threads=%d\n",
+                global_tid, num_threads));
 
-    __kmp_push_num_threads( loc, global_tid, num_threads );
+  __kmp_push_num_threads(loc, global_tid, num_threads);
 }
 
-void
-__kmpc_pop_num_threads(ident_t *loc, kmp_int32 global_tid )
-{
-    KA_TRACE( 20, ("__kmpc_pop_num_threads: enter\n" ) );
+void __kmpc_pop_num_threads(ident_t *loc, kmp_int32 global_tid) {
+  KA_TRACE(20, ("__kmpc_pop_num_threads: enter\n"));
 
-    /* the num_threads are automatically popped */
+  /* the num_threads are automatically popped */
 }
 
-
 #if OMP_40_ENABLED
 
-void
-__kmpc_push_proc_bind(ident_t *loc, kmp_int32 global_tid, kmp_int32 proc_bind )
-{
-    KA_TRACE( 20, ("__kmpc_push_proc_bind: enter T#%d proc_bind=%d\n",
-      global_tid, proc_bind ) );
+void __kmpc_push_proc_bind(ident_t *loc, kmp_int32 global_tid,
+                           kmp_int32 proc_bind) {
+  KA_TRACE(20, ("__kmpc_push_proc_bind: enter T#%d proc_bind=%d\n", global_tid,
+                proc_bind));
 
-    __kmp_push_proc_bind( loc, global_tid, (kmp_proc_bind_t)proc_bind );
+  __kmp_push_proc_bind(loc, global_tid, (kmp_proc_bind_t)proc_bind);
 }
 
 #endif /* OMP_40_ENABLED */
 
-
 /*!
 @ingroup PARALLEL
 @param loc  source location information
 @param argc  total number of arguments in the ellipsis
- at param microtask  pointer to callback routine consisting of outlined parallel construct
+ at param microtask  pointer to callback routine consisting of outlined parallel
+construct
 @param ...  pointers to shared variables that aren't global
 
 Do the actual fork and call the microtask in the relevant number of threads.
 */
-void
-__kmpc_fork_call(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...)
-{
-  int         gtid = __kmp_entry_gtid();
+void __kmpc_fork_call(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...) {
+  int gtid = __kmp_entry_gtid();
 
 #if (KMP_STATS_ENABLED)
   int inParallel = __kmpc_in_parallel(loc);
-  if (inParallel)
-  {
-      KMP_COUNT_BLOCK(OMP_NESTED_PARALLEL);
-  }
-  else
-  {
-      KMP_COUNT_BLOCK(OMP_PARALLEL);
+  if (inParallel) {
+    KMP_COUNT_BLOCK(OMP_NESTED_PARALLEL);
+  } else {
+    KMP_COUNT_BLOCK(OMP_PARALLEL);
   }
 #endif
 
   // maybe to save thr_state is enough here
   {
-    va_list     ap;
-    va_start(   ap, microtask );
+    va_list ap;
+    va_start(ap, microtask);
 
 #if OMPT_SUPPORT
-    ompt_frame_t* ompt_frame;
+    ompt_frame_t *ompt_frame;
     if (ompt_enabled) {
-       kmp_info_t *master_th = __kmp_threads[ gtid ];
-       kmp_team_t *parent_team = master_th->th.th_team;
-       ompt_lw_taskteam_t *lwt = parent_team->t.ompt_serialized_team_info;
-       if (lwt)
-         ompt_frame = &(lwt->ompt_task_info.frame);
-       else
-       {
-         int tid = __kmp_tid_from_gtid( gtid );
-         ompt_frame = &(parent_team->t.t_implicit_task_taskdata[tid].
-         ompt_task_info.frame);
-       }
-       ompt_frame->reenter_runtime_frame = __builtin_frame_address(1);
+      kmp_info_t *master_th = __kmp_threads[gtid];
+      kmp_team_t *parent_team = master_th->th.th_team;
+      ompt_lw_taskteam_t *lwt = parent_team->t.ompt_serialized_team_info;
+      if (lwt)
+        ompt_frame = &(lwt->ompt_task_info.frame);
+      else {
+        int tid = __kmp_tid_from_gtid(gtid);
+        ompt_frame = &(
+            parent_team->t.t_implicit_task_taskdata[tid].ompt_task_info.frame);
+      }
+      ompt_frame->reenter_runtime_frame = __builtin_frame_address(1);
     }
 #endif
 
 #if INCLUDE_SSC_MARKS
     SSC_MARK_FORKING();
 #endif
-    __kmp_fork_call( loc, gtid, fork_context_intel,
-            argc,
+    __kmp_fork_call(loc, gtid, fork_context_intel, argc,
 #if OMPT_SUPPORT
-            VOLATILE_CAST(void *) microtask,      // "unwrapped" task
+                    VOLATILE_CAST(void *) microtask, // "unwrapped" task
 #endif
-            VOLATILE_CAST(microtask_t) microtask, // "wrapped" task
-            VOLATILE_CAST(launch_t)    __kmp_invoke_task_func,
+                    VOLATILE_CAST(microtask_t) microtask, // "wrapped" task
+                    VOLATILE_CAST(launch_t) __kmp_invoke_task_func,
 /* TODO: revert workaround for Intel(R) 64 tracker #96 */
 #if (KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64) && KMP_OS_LINUX
-            &ap
+                    &ap
 #else
-            ap
+                    ap
 #endif
-            );
+                    );
 #if INCLUDE_SSC_MARKS
     SSC_MARK_JOINING();
 #endif
-    __kmp_join_call( loc, gtid
+    __kmp_join_call(loc, gtid
 #if OMPT_SUPPORT
-        , fork_context_intel
+                    ,
+                    fork_context_intel
 #endif
-    );
-
-    va_end( ap );
+                    );
 
+    va_end(ap);
   }
 }
 
@@ -366,93 +337,90 @@ Set the number of teams to be used by th
 This call is only required if the teams construct has a `num_teams` clause
 or a `thread_limit` clause (or both).
 */
-void
-__kmpc_push_num_teams(ident_t *loc, kmp_int32 global_tid, kmp_int32 num_teams, kmp_int32 num_threads )
-{
-    KA_TRACE( 20, ("__kmpc_push_num_teams: enter T#%d num_teams=%d num_threads=%d\n",
-      global_tid, num_teams, num_threads ) );
+void __kmpc_push_num_teams(ident_t *loc, kmp_int32 global_tid,
+                           kmp_int32 num_teams, kmp_int32 num_threads) {
+  KA_TRACE(20,
+           ("__kmpc_push_num_teams: enter T#%d num_teams=%d num_threads=%d\n",
+            global_tid, num_teams, num_threads));
 
-    __kmp_push_num_teams( loc, global_tid, num_teams, num_threads );
+  __kmp_push_num_teams(loc, global_tid, num_teams, num_threads);
 }
 
 /*!
 @ingroup PARALLEL
 @param loc  source location information
 @param argc  total number of arguments in the ellipsis
- at param microtask  pointer to callback routine consisting of outlined teams construct
+ at param microtask  pointer to callback routine consisting of outlined teams
+construct
 @param ...  pointers to shared variables that aren't global
 
 Do the actual fork and call the microtask in the relevant number of threads.
 */
-void
-__kmpc_fork_teams(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...)
-{
-    int         gtid = __kmp_entry_gtid();
-    kmp_info_t *this_thr = __kmp_threads[ gtid ];
-    va_list     ap;
-    va_start(   ap, microtask );
-
-    KMP_COUNT_BLOCK(OMP_TEAMS);
-
-    // remember teams entry point and nesting level
-    this_thr->th.th_teams_microtask = microtask;
-    this_thr->th.th_teams_level = this_thr->th.th_team->t.t_level; // AC: can be >0 on host
+void __kmpc_fork_teams(ident_t *loc, kmp_int32 argc, kmpc_micro microtask,
+                       ...) {
+  int gtid = __kmp_entry_gtid();
+  kmp_info_t *this_thr = __kmp_threads[gtid];
+  va_list ap;
+  va_start(ap, microtask);
+
+  KMP_COUNT_BLOCK(OMP_TEAMS);
+
+  // remember teams entry point and nesting level
+  this_thr->th.th_teams_microtask = microtask;
+  this_thr->th.th_teams_level =
+      this_thr->th.th_team->t.t_level; // AC: can be >0 on host
 
 #if OMPT_SUPPORT
-    kmp_team_t *parent_team = this_thr->th.th_team;
-    int tid = __kmp_tid_from_gtid( gtid );
-    if (ompt_enabled) {
-        parent_team->t.t_implicit_task_taskdata[tid].
-           ompt_task_info.frame.reenter_runtime_frame = __builtin_frame_address(1);
-    }
+  kmp_team_t *parent_team = this_thr->th.th_team;
+  int tid = __kmp_tid_from_gtid(gtid);
+  if (ompt_enabled) {
+    parent_team->t.t_implicit_task_taskdata[tid]
+        .ompt_task_info.frame.reenter_runtime_frame =
+        __builtin_frame_address(1);
+  }
 #endif
 
-    // check if __kmpc_push_num_teams called, set default number of teams otherwise
-    if ( this_thr->th.th_teams_size.nteams == 0 ) {
-        __kmp_push_num_teams( loc, gtid, 0, 0 );
-    }
-    KMP_DEBUG_ASSERT(this_thr->th.th_set_nproc >= 1);
-    KMP_DEBUG_ASSERT(this_thr->th.th_teams_size.nteams >= 1);
-    KMP_DEBUG_ASSERT(this_thr->th.th_teams_size.nth >= 1);
+  // check if __kmpc_push_num_teams called, set default number of teams
+  // otherwise
+  if (this_thr->th.th_teams_size.nteams == 0) {
+    __kmp_push_num_teams(loc, gtid, 0, 0);
+  }
+  KMP_DEBUG_ASSERT(this_thr->th.th_set_nproc >= 1);
+  KMP_DEBUG_ASSERT(this_thr->th.th_teams_size.nteams >= 1);
+  KMP_DEBUG_ASSERT(this_thr->th.th_teams_size.nth >= 1);
 
-    __kmp_fork_call( loc, gtid, fork_context_intel,
-            argc,
+  __kmp_fork_call(loc, gtid, fork_context_intel, argc,
 #if OMPT_SUPPORT
-            VOLATILE_CAST(void *) microtask,               // "unwrapped" task
+                  VOLATILE_CAST(void *) microtask, // "unwrapped" task
 #endif
-            VOLATILE_CAST(microtask_t) __kmp_teams_master, // "wrapped" task
-            VOLATILE_CAST(launch_t)    __kmp_invoke_teams_master,
+                  VOLATILE_CAST(microtask_t)
+                      __kmp_teams_master, // "wrapped" task
+                  VOLATILE_CAST(launch_t) __kmp_invoke_teams_master,
 #if (KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64) && KMP_OS_LINUX
-            &ap
+                  &ap
 #else
-            ap
+                  ap
 #endif
-            );
-    __kmp_join_call( loc, gtid
+                  );
+  __kmp_join_call(loc, gtid
 #if OMPT_SUPPORT
-        , fork_context_intel
+                  ,
+                  fork_context_intel
 #endif
-    );
+                  );
 
-    this_thr->th.th_teams_microtask = NULL;
-    this_thr->th.th_teams_level = 0;
-    *(kmp_int64*)(&this_thr->th.th_teams_size) = 0L;
-    va_end( ap );
+  this_thr->th.th_teams_microtask = NULL;
+  this_thr->th.th_teams_level = 0;
+  *(kmp_int64 *)(&this_thr->th.th_teams_size) = 0L;
+  va_end(ap);
 }
 #endif /* OMP_40_ENABLED */
 
-
-//
 // I don't think this function should ever have been exported.
 // The __kmpc_ prefix was misapplied.  I'm fairly certain that no generated
 // openmp code ever called it, but it's been exported from the RTL for so
 // long that I'm afraid to remove the definition.
-//
-int
-__kmpc_invoke_task_func( int gtid )
-{
-    return __kmp_invoke_task_func( gtid );
-}
+int __kmpc_invoke_task_func(int gtid) { return __kmp_invoke_task_func(gtid); }
 
 /*!
 @ingroup PARALLEL
@@ -466,13 +434,11 @@ conditional parallel region, like this,
 @endcode
 when the condition is false.
 */
-void
-__kmpc_serialized_parallel(ident_t *loc, kmp_int32 global_tid)
-{
-    // The implementation is now in kmp_runtime.cpp so that it can share static
-    // functions with kmp_fork_call since the tasks to be done are similar in
-    // each case.
-    __kmp_serialized_parallel(loc, global_tid);
+void __kmpc_serialized_parallel(ident_t *loc, kmp_int32 global_tid) {
+  // The implementation is now in kmp_runtime.cpp so that it can share static
+  // functions with kmp_fork_call since the tasks to be done are similar in
+  // each case.
+  __kmp_serialized_parallel(loc, global_tid);
 }
 
 /*!
@@ -482,108 +448,114 @@ __kmpc_serialized_parallel(ident_t *loc,
 
 Leave a serialized parallel construct.
 */
-void
-__kmpc_end_serialized_parallel(ident_t *loc, kmp_int32 global_tid)
-{
-    kmp_internal_control_t *top;
-    kmp_info_t *this_thr;
-    kmp_team_t *serial_team;
-
-    KC_TRACE( 10, ("__kmpc_end_serialized_parallel: called by T#%d\n", global_tid ) );
-
-    /* skip all this code for autopar serialized loops since it results in
-       unacceptable overhead */
-    if( loc != NULL && (loc->flags & KMP_IDENT_AUTOPAR ) )
-        return;
+void __kmpc_end_serialized_parallel(ident_t *loc, kmp_int32 global_tid) {
+  kmp_internal_control_t *top;
+  kmp_info_t *this_thr;
+  kmp_team_t *serial_team;
+
+  KC_TRACE(10,
+           ("__kmpc_end_serialized_parallel: called by T#%d\n", global_tid));
+
+  /* skip all this code for autopar serialized loops since it results in
+     unacceptable overhead */
+  if (loc != NULL && (loc->flags & KMP_IDENT_AUTOPAR))
+    return;
 
-    // Not autopar code
-    if( ! TCR_4( __kmp_init_parallel ) )
-        __kmp_parallel_initialize();
-
-    this_thr    = __kmp_threads[ global_tid ];
-    serial_team = this_thr->th.th_serial_team;
-
-   #if OMP_45_ENABLED
-   kmp_task_team_t *   task_team = this_thr->th.th_task_team;
-
-   // we need to wait for the proxy tasks before finishing the thread
-   if ( task_team != NULL && task_team->tt.tt_found_proxy_tasks )
-        __kmp_task_team_wait(this_thr, serial_team USE_ITT_BUILD_ARG(NULL) ); // is an ITT object needed here?
-   #endif
+  // Not autopar code
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
 
-    KMP_MB();
-    KMP_DEBUG_ASSERT( serial_team );
-    KMP_ASSERT(       serial_team -> t.t_serialized );
-    KMP_DEBUG_ASSERT( this_thr -> th.th_team == serial_team );
-    KMP_DEBUG_ASSERT( serial_team != this_thr->th.th_root->r.r_root_team );
-    KMP_DEBUG_ASSERT( serial_team -> t.t_threads );
-    KMP_DEBUG_ASSERT( serial_team -> t.t_threads[0] == this_thr );
-
-    /* If necessary, pop the internal control stack values and replace the team values */
-    top = serial_team -> t.t_control_stack_top;
-    if ( top && top -> serial_nesting_level == serial_team -> t.t_serialized ) {
-        copy_icvs( &serial_team -> t.t_threads[0] -> th.th_current_task -> td_icvs, top );
-        serial_team -> t.t_control_stack_top = top -> next;
-        __kmp_free(top);
-    }
+  this_thr = __kmp_threads[global_tid];
+  serial_team = this_thr->th.th_serial_team;
+
+#if OMP_45_ENABLED
+  kmp_task_team_t *task_team = this_thr->th.th_task_team;
 
-    //if( serial_team -> t.t_serialized > 1 )
-    serial_team -> t.t_level--;
+  // we need to wait for the proxy tasks before finishing the thread
+  if (task_team != NULL && task_team->tt.tt_found_proxy_tasks)
+    __kmp_task_team_wait(this_thr, serial_team USE_ITT_BUILD_ARG(NULL));
+#endif
+
+  KMP_MB();
+  KMP_DEBUG_ASSERT(serial_team);
+  KMP_ASSERT(serial_team->t.t_serialized);
+  KMP_DEBUG_ASSERT(this_thr->th.th_team == serial_team);
+  KMP_DEBUG_ASSERT(serial_team != this_thr->th.th_root->r.r_root_team);
+  KMP_DEBUG_ASSERT(serial_team->t.t_threads);
+  KMP_DEBUG_ASSERT(serial_team->t.t_threads[0] == this_thr);
+
+  /* If necessary, pop the internal control stack values and replace the team
+   * values */
+  top = serial_team->t.t_control_stack_top;
+  if (top && top->serial_nesting_level == serial_team->t.t_serialized) {
+    copy_icvs(&serial_team->t.t_threads[0]->th.th_current_task->td_icvs, top);
+    serial_team->t.t_control_stack_top = top->next;
+    __kmp_free(top);
+  }
 
-    /* pop dispatch buffers stack */
-    KMP_DEBUG_ASSERT(serial_team->t.t_dispatch->th_disp_buffer);
-    {
-        dispatch_private_info_t * disp_buffer = serial_team->t.t_dispatch->th_disp_buffer;
-        serial_team->t.t_dispatch->th_disp_buffer =
-            serial_team->t.t_dispatch->th_disp_buffer->next;
-        __kmp_free( disp_buffer );
-    }
+  // if( serial_team -> t.t_serialized > 1 )
+  serial_team->t.t_level--;
 
-    -- serial_team -> t.t_serialized;
-    if ( serial_team -> t.t_serialized == 0 ) {
+  /* pop dispatch buffers stack */
+  KMP_DEBUG_ASSERT(serial_team->t.t_dispatch->th_disp_buffer);
+  {
+    dispatch_private_info_t *disp_buffer =
+        serial_team->t.t_dispatch->th_disp_buffer;
+    serial_team->t.t_dispatch->th_disp_buffer =
+        serial_team->t.t_dispatch->th_disp_buffer->next;
+    __kmp_free(disp_buffer);
+  }
 
-        /* return to the parallel section */
+  --serial_team->t.t_serialized;
+  if (serial_team->t.t_serialized == 0) {
+
+/* return to the parallel section */
 
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
-        if ( __kmp_inherit_fp_control && serial_team->t.t_fp_control_saved ) {
-            __kmp_clear_x87_fpu_status_word();
-            __kmp_load_x87_fpu_control_word( &serial_team->t.t_x87_fpu_control_word );
-            __kmp_load_mxcsr( &serial_team->t.t_mxcsr );
-        }
+    if (__kmp_inherit_fp_control && serial_team->t.t_fp_control_saved) {
+      __kmp_clear_x87_fpu_status_word();
+      __kmp_load_x87_fpu_control_word(&serial_team->t.t_x87_fpu_control_word);
+      __kmp_load_mxcsr(&serial_team->t.t_mxcsr);
+    }
 #endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
 
-        this_thr -> th.th_team           = serial_team -> t.t_parent;
-        this_thr -> th.th_info.ds.ds_tid = serial_team -> t.t_master_tid;
+    this_thr->th.th_team = serial_team->t.t_parent;
+    this_thr->th.th_info.ds.ds_tid = serial_team->t.t_master_tid;
 
-        /* restore values cached in the thread */
-        this_thr -> th.th_team_nproc     = serial_team -> t.t_parent -> t.t_nproc;          /*  JPH */
-        this_thr -> th.th_team_master    = serial_team -> t.t_parent -> t.t_threads[0];     /* JPH */
-        this_thr -> th.th_team_serialized = this_thr -> th.th_team -> t.t_serialized;
-
-        /* TODO the below shouldn't need to be adjusted for serialized teams */
-        this_thr -> th.th_dispatch       = & this_thr -> th.th_team ->
-            t.t_dispatch[ serial_team -> t.t_master_tid ];
-
-        __kmp_pop_current_task_from_thread( this_thr );
-
-        KMP_ASSERT( this_thr -> th.th_current_task -> td_flags.executing == 0 );
-        this_thr -> th.th_current_task -> td_flags.executing = 1;
-
-        if ( __kmp_tasking_mode != tskm_immediate_exec ) {
-            // Copy the task team from the new child / old parent team to the thread.
-            this_thr->th.th_task_team = this_thr->th.th_team->t.t_task_team[this_thr->th.th_task_state];
-            KA_TRACE( 20, ( "__kmpc_end_serialized_parallel: T#%d restoring task_team %p / team %p\n",
-                            global_tid, this_thr -> th.th_task_team, this_thr -> th.th_team ) );
-        }
-    } else {
-        if ( __kmp_tasking_mode != tskm_immediate_exec ) {
-            KA_TRACE( 20, ( "__kmpc_end_serialized_parallel: T#%d decreasing nesting depth of serial team %p to %d\n",
-                            global_tid, serial_team, serial_team -> t.t_serialized ) );
-        }
+    /* restore values cached in the thread */
+    this_thr->th.th_team_nproc = serial_team->t.t_parent->t.t_nproc; /*  JPH */
+    this_thr->th.th_team_master =
+        serial_team->t.t_parent->t.t_threads[0]; /* JPH */
+    this_thr->th.th_team_serialized = this_thr->th.th_team->t.t_serialized;
+
+    /* TODO the below shouldn't need to be adjusted for serialized teams */
+    this_thr->th.th_dispatch =
+        &this_thr->th.th_team->t.t_dispatch[serial_team->t.t_master_tid];
+
+    __kmp_pop_current_task_from_thread(this_thr);
+
+    KMP_ASSERT(this_thr->th.th_current_task->td_flags.executing == 0);
+    this_thr->th.th_current_task->td_flags.executing = 1;
+
+    if (__kmp_tasking_mode != tskm_immediate_exec) {
+      // Copy the task team from the new child / old parent team to the thread.
+      this_thr->th.th_task_team =
+          this_thr->th.th_team->t.t_task_team[this_thr->th.th_task_state];
+      KA_TRACE(20,
+               ("__kmpc_end_serialized_parallel: T#%d restoring task_team %p / "
+                "team %p\n",
+                global_tid, this_thr->th.th_task_team, this_thr->th.th_team));
+    }
+  } else {
+    if (__kmp_tasking_mode != tskm_immediate_exec) {
+      KA_TRACE(20, ("__kmpc_end_serialized_parallel: T#%d decreasing nesting "
+                    "depth of serial team %p to %d\n",
+                    global_tid, serial_team, serial_team->t.t_serialized));
     }
+  }
 
-    if ( __kmp_env_consistency_check )
-        __kmp_pop_parallel( global_tid, NULL );
+  if (__kmp_env_consistency_check)
+    __kmp_pop_parallel(global_tid, NULL);
 }
 
 /*!
@@ -594,67 +566,62 @@ Execute <tt>flush</tt>. This is implemen
 depending on the memory ordering convention obeyed by the compiler
 even that may not be necessary).
 */
-void
-__kmpc_flush(ident_t *loc)
-{
-    KC_TRACE( 10, ("__kmpc_flush: called\n" ) );
-
-    /* need explicit __mf() here since use volatile instead in library */
-    KMP_MB();       /* Flush all pending memory write invalidates.  */
-
-    #if ( KMP_ARCH_X86 || KMP_ARCH_X86_64 )
-        #if KMP_MIC
-            // fence-style instructions do not exist, but lock; xaddl $0,(%rsp) can be used.
-            // We shouldn't need it, though, since the ABI rules require that
-            // * If the compiler generates NGO stores it also generates the fence
-            // * If users hand-code NGO stores they should insert the fence
-            // therefore no incomplete unordered stores should be visible.
-        #else
-            // C74404
-            // This is to address non-temporal store instructions (sfence needed).
-            // The clflush instruction is addressed either (mfence needed).
-            // Probably the non-temporal load monvtdqa instruction should also be addressed.
-            // mfence is a SSE2 instruction. Do not execute it if CPU is not SSE2.
-            if ( ! __kmp_cpuinfo.initialized ) {
-                __kmp_query_cpuid( & __kmp_cpuinfo );
-            }; // if
-            if ( ! __kmp_cpuinfo.sse2 ) {
-                // CPU cannot execute SSE2 instructions.
-            } else {
-                #if KMP_COMPILER_ICC
-                _mm_mfence();
-                #elif KMP_COMPILER_MSVC
-                MemoryBarrier();
-                #else
-                __sync_synchronize();
-                #endif // KMP_COMPILER_ICC
-            }; // if
-        #endif // KMP_MIC
-    #elif (KMP_ARCH_ARM || KMP_ARCH_AARCH64 || KMP_ARCH_MIPS || KMP_ARCH_MIPS64)
-        // Nothing to see here move along
-    #elif KMP_ARCH_PPC64
-        // Nothing needed here (we have a real MB above).
-        #if KMP_OS_CNK
-        // The flushing thread needs to yield here; this prevents a
-       // busy-waiting thread from saturating the pipeline. flush is
-          // often used in loops like this:
-           // while (!flag) {
-           //   #pragma omp flush(flag)
-           // }
-       // and adding the yield here is good for at least a 10x speedup
-          // when running >2 threads per core (on the NAS LU benchmark).
-            __kmp_yield(TRUE);
-        #endif
-    #else
-        #error Unknown or unsupported architecture
-    #endif
+void __kmpc_flush(ident_t *loc) {
+  KC_TRACE(10, ("__kmpc_flush: called\n"));
 
-}
+  /* need explicit __mf() here since use volatile instead in library */
+  KMP_MB(); /* Flush all pending memory write invalidates.  */
 
-/* -------------------------------------------------------------------------- */
+#if (KMP_ARCH_X86 || KMP_ARCH_X86_64)
+#if KMP_MIC
+// fence-style instructions do not exist, but lock; xaddl $0,(%rsp) can be used.
+// We shouldn't need it, though, since the ABI rules require that
+// * If the compiler generates NGO stores it also generates the fence
+// * If users hand-code NGO stores they should insert the fence
+// therefore no incomplete unordered stores should be visible.
+#else
+  // C74404
+  // This is to address non-temporal store instructions (sfence needed).
+  // The clflush instruction is addressed either (mfence needed).
+  // Probably the non-temporal load monvtdqa instruction should also be
+  // addressed.
+  // mfence is a SSE2 instruction. Do not execute it if CPU is not SSE2.
+  if (!__kmp_cpuinfo.initialized) {
+    __kmp_query_cpuid(&__kmp_cpuinfo);
+  }; // if
+  if (!__kmp_cpuinfo.sse2) {
+    // CPU cannot execute SSE2 instructions.
+  } else {
+#if KMP_COMPILER_ICC
+    _mm_mfence();
+#elif KMP_COMPILER_MSVC
+    MemoryBarrier();
+#else
+    __sync_synchronize();
+#endif // KMP_COMPILER_ICC
+  }; // if
+#endif // KMP_MIC
+#elif (KMP_ARCH_ARM || KMP_ARCH_AARCH64 || KMP_ARCH_MIPS || KMP_ARCH_MIPS64)
+// Nothing to see here move along
+#elif KMP_ARCH_PPC64
+// Nothing needed here (we have a real MB above).
+#if KMP_OS_CNK
+  // The flushing thread needs to yield here; this prevents a
+  // busy-waiting thread from saturating the pipeline. flush is
+  // often used in loops like this:
+  // while (!flag) {
+  //   #pragma omp flush(flag)
+  // }
+  // and adding the yield here is good for at least a 10x speedup
+  // when running >2 threads per core (on the NAS LU benchmark).
+  __kmp_yield(TRUE);
+#endif
+#else
+#error Unknown or unsupported architecture
+#endif
+}
 
 /* -------------------------------------------------------------------------- */
-
 /*!
 @ingroup SYNCHRONIZATION
 @param loc source location information
@@ -662,44 +629,42 @@ __kmpc_flush(ident_t *loc)
 
 Execute a barrier.
 */
-void
-__kmpc_barrier(ident_t *loc, kmp_int32 global_tid)
-{
-    KMP_COUNT_BLOCK(OMP_BARRIER);
-    KC_TRACE( 10, ("__kmpc_barrier: called T#%d\n", global_tid ) );
-
-    if (! TCR_4(__kmp_init_parallel))
-        __kmp_parallel_initialize();
-
-    if ( __kmp_env_consistency_check ) {
-        if ( loc == 0 ) {
-            KMP_WARNING( ConstructIdentInvalid ); // ??? What does it mean for the user?
-        }; // if
+void __kmpc_barrier(ident_t *loc, kmp_int32 global_tid) {
+  KMP_COUNT_BLOCK(OMP_BARRIER);
+  KC_TRACE(10, ("__kmpc_barrier: called T#%d\n", global_tid));
+
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
+
+  if (__kmp_env_consistency_check) {
+    if (loc == 0) {
+      KMP_WARNING(ConstructIdentInvalid); // ??? What does it mean for the user?
+    }; // if
 
-        __kmp_check_barrier( global_tid, ct_barrier, loc );
-    }
+    __kmp_check_barrier(global_tid, ct_barrier, loc);
+  }
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    ompt_frame_t * ompt_frame;
-    if (ompt_enabled ) {
-        ompt_frame = __ompt_get_task_frame_internal(0);
-        if ( ompt_frame->reenter_runtime_frame == NULL )
-            ompt_frame->reenter_runtime_frame = __builtin_frame_address(1);
-    }
-#endif
-    __kmp_threads[ global_tid ]->th.th_ident = loc;
-    // TODO: explicit barrier_wait_id:
-    //   this function is called when 'barrier' directive is present or
-    //   implicit barrier at the end of a worksharing construct.
-    // 1) better to add a per-thread barrier counter to a thread data structure
-    // 2) set to 0 when a new team is created
-    // 4) no sync is required
+  ompt_frame_t *ompt_frame;
+  if (ompt_enabled) {
+    ompt_frame = __ompt_get_task_frame_internal(0);
+    if (ompt_frame->reenter_runtime_frame == NULL)
+      ompt_frame->reenter_runtime_frame = __builtin_frame_address(1);
+  }
+#endif
+  __kmp_threads[global_tid]->th.th_ident = loc;
+  // TODO: explicit barrier_wait_id:
+  //   this function is called when 'barrier' directive is present or
+  //   implicit barrier at the end of a worksharing construct.
+  // 1) better to add a per-thread barrier counter to a thread data structure
+  // 2) set to 0 when a new team is created
+  // 4) no sync is required
 
-    __kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
+  __kmp_barrier(bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL);
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled ) {
-        ompt_frame->reenter_runtime_frame = NULL;
-    }
+  if (ompt_enabled) {
+    ompt_frame->reenter_runtime_frame = NULL;
+  }
 #endif
 }
 
@@ -710,52 +675,49 @@ __kmpc_barrier(ident_t *loc, kmp_int32 g
 @param global_tid  global thread number .
 @return 1 if this thread should execute the <tt>master</tt> block, 0 otherwise.
 */
-kmp_int32
-__kmpc_master(ident_t *loc, kmp_int32 global_tid)
-{
-    int status = 0;
-
-    KC_TRACE( 10, ("__kmpc_master: called T#%d\n", global_tid ) );
-
-    if( ! TCR_4( __kmp_init_parallel ) )
-        __kmp_parallel_initialize();
-
-    if( KMP_MASTER_GTID( global_tid )) {
-        KMP_COUNT_BLOCK(OMP_MASTER);
-        KMP_PUSH_PARTITIONED_TIMER(OMP_master);
-        status = 1;
-    }
+kmp_int32 __kmpc_master(ident_t *loc, kmp_int32 global_tid) {
+  int status = 0;
 
-#if OMPT_SUPPORT && OMPT_TRACE
-    if (status) {
-        if (ompt_enabled &&
-            ompt_callbacks.ompt_callback(ompt_event_master_begin)) {
-            kmp_info_t  *this_thr        = __kmp_threads[ global_tid ];
-            kmp_team_t  *team            = this_thr -> th.th_team;
+  KC_TRACE(10, ("__kmpc_master: called T#%d\n", global_tid));
+
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
+
+  if (KMP_MASTER_GTID(global_tid)) {
+    KMP_COUNT_BLOCK(OMP_MASTER);
+    KMP_PUSH_PARTITIONED_TIMER(OMP_master);
+    status = 1;
+  }
 
-            int  tid = __kmp_tid_from_gtid( global_tid );
-            ompt_callbacks.ompt_callback(ompt_event_master_begin)(
-                team->t.ompt_team_info.parallel_id,
-                team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
-        }
+#if OMPT_SUPPORT && OMPT_TRACE
+  if (status) {
+    if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_master_begin)) {
+      kmp_info_t *this_thr = __kmp_threads[global_tid];
+      kmp_team_t *team = this_thr->th.th_team;
+
+      int tid = __kmp_tid_from_gtid(global_tid);
+      ompt_callbacks.ompt_callback(ompt_event_master_begin)(
+          team->t.ompt_team_info.parallel_id,
+          team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
     }
+  }
 #endif
 
-    if ( __kmp_env_consistency_check ) {
+  if (__kmp_env_consistency_check) {
 #if KMP_USE_DYNAMIC_LOCK
-        if (status)
-            __kmp_push_sync( global_tid, ct_master, loc, NULL, 0 );
-        else
-            __kmp_check_sync( global_tid, ct_master, loc, NULL, 0 );
+    if (status)
+      __kmp_push_sync(global_tid, ct_master, loc, NULL, 0);
+    else
+      __kmp_check_sync(global_tid, ct_master, loc, NULL, 0);
 #else
-        if (status)
-            __kmp_push_sync( global_tid, ct_master, loc, NULL );
-        else
-            __kmp_check_sync( global_tid, ct_master, loc, NULL );
+    if (status)
+      __kmp_push_sync(global_tid, ct_master, loc, NULL);
+    else
+      __kmp_check_sync(global_tid, ct_master, loc, NULL);
 #endif
-    }
+  }
 
-    return status;
+  return status;
 }
 
 /*!
@@ -763,36 +725,33 @@ __kmpc_master(ident_t *loc, kmp_int32 gl
 @param loc  source location information.
 @param global_tid  global thread number .
 
-Mark the end of a <tt>master</tt> region. This should only be called by the thread
-that executes the <tt>master</tt> region.
+Mark the end of a <tt>master</tt> region. This should only be called by the
+thread that executes the <tt>master</tt> region.
 */
-void
-__kmpc_end_master(ident_t *loc, kmp_int32 global_tid)
-{
-    KC_TRACE( 10, ("__kmpc_end_master: called T#%d\n", global_tid ) );
+void __kmpc_end_master(ident_t *loc, kmp_int32 global_tid) {
+  KC_TRACE(10, ("__kmpc_end_master: called T#%d\n", global_tid));
 
-    KMP_DEBUG_ASSERT( KMP_MASTER_GTID( global_tid ));
-    KMP_POP_PARTITIONED_TIMER();
+  KMP_DEBUG_ASSERT(KMP_MASTER_GTID(global_tid));
+  KMP_POP_PARTITIONED_TIMER();
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    kmp_info_t  *this_thr        = __kmp_threads[ global_tid ];
-    kmp_team_t  *team            = this_thr -> th.th_team;
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_master_end)) {
-        int  tid = __kmp_tid_from_gtid( global_tid );
-        ompt_callbacks.ompt_callback(ompt_event_master_end)(
-            team->t.ompt_team_info.parallel_id,
-            team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
-    }
+  kmp_info_t *this_thr = __kmp_threads[global_tid];
+  kmp_team_t *team = this_thr->th.th_team;
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_master_end)) {
+    int tid = __kmp_tid_from_gtid(global_tid);
+    ompt_callbacks.ompt_callback(ompt_event_master_end)(
+        team->t.ompt_team_info.parallel_id,
+        team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
+  }
 #endif
 
-    if ( __kmp_env_consistency_check ) {
-        if( global_tid < 0 )
-            KMP_WARNING( ThreadIdentInvalid );
+  if (__kmp_env_consistency_check) {
+    if (global_tid < 0)
+      KMP_WARNING(ThreadIdentInvalid);
 
-        if( KMP_MASTER_GTID( global_tid ))
-            __kmp_pop_sync( global_tid, ct_master, loc );
-    }
+    if (KMP_MASTER_GTID(global_tid))
+      __kmp_pop_sync(global_tid, ct_master, loc);
+  }
 }
 
 /*!
@@ -802,60 +761,58 @@ __kmpc_end_master(ident_t *loc, kmp_int3
 
 Start execution of an <tt>ordered</tt> construct.
 */
-void
-__kmpc_ordered( ident_t * loc, kmp_int32 gtid )
-{
-    int cid = 0;
-    kmp_info_t *th;
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+void __kmpc_ordered(ident_t *loc, kmp_int32 gtid) {
+  int cid = 0;
+  kmp_info_t *th;
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
 
-    KC_TRACE( 10, ("__kmpc_ordered: called T#%d\n", gtid ));
+  KC_TRACE(10, ("__kmpc_ordered: called T#%d\n", gtid));
 
-    if (! TCR_4(__kmp_init_parallel))
-        __kmp_parallel_initialize();
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
 
 #if USE_ITT_BUILD
-    __kmp_itt_ordered_prep( gtid );
-    // TODO: ordered_wait_id
+  __kmp_itt_ordered_prep(gtid);
+// TODO: ordered_wait_id
 #endif /* USE_ITT_BUILD */
 
-    th = __kmp_threads[ gtid ];
+  th = __kmp_threads[gtid];
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled) {
-        /* OMPT state update */
-        th->th.ompt_thread_info.wait_id = (uint64_t) loc;
-        th->th.ompt_thread_info.state = ompt_state_wait_ordered;
-
-        /* OMPT event callback */
-        if (ompt_callbacks.ompt_callback(ompt_event_wait_ordered)) {
-            ompt_callbacks.ompt_callback(ompt_event_wait_ordered)(
-                th->th.ompt_thread_info.wait_id);
-        }
+  if (ompt_enabled) {
+    /* OMPT state update */
+    th->th.ompt_thread_info.wait_id = (uint64_t)loc;
+    th->th.ompt_thread_info.state = ompt_state_wait_ordered;
+
+    /* OMPT event callback */
+    if (ompt_callbacks.ompt_callback(ompt_event_wait_ordered)) {
+      ompt_callbacks.ompt_callback(ompt_event_wait_ordered)(
+          th->th.ompt_thread_info.wait_id);
     }
+  }
 #endif
 
-    if ( th -> th.th_dispatch -> th_deo_fcn != 0 )
-        (*th->th.th_dispatch->th_deo_fcn)( & gtid, & cid, loc );
-    else
-        __kmp_parallel_deo( & gtid, & cid, loc );
+  if (th->th.th_dispatch->th_deo_fcn != 0)
+    (*th->th.th_dispatch->th_deo_fcn)(&gtid, &cid, loc);
+  else
+    __kmp_parallel_deo(&gtid, &cid, loc);
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled) {
-        /* OMPT state update */
-        th->th.ompt_thread_info.state = ompt_state_work_parallel;
-        th->th.ompt_thread_info.wait_id = 0;
-
-        /* OMPT event callback */
-        if (ompt_callbacks.ompt_callback(ompt_event_acquired_ordered)) {
-            ompt_callbacks.ompt_callback(ompt_event_acquired_ordered)(
-                th->th.ompt_thread_info.wait_id);
-        }
+  if (ompt_enabled) {
+    /* OMPT state update */
+    th->th.ompt_thread_info.state = ompt_state_work_parallel;
+    th->th.ompt_thread_info.wait_id = 0;
+
+    /* OMPT event callback */
+    if (ompt_callbacks.ompt_callback(ompt_event_acquired_ordered)) {
+      ompt_callbacks.ompt_callback(ompt_event_acquired_ordered)(
+          th->th.ompt_thread_info.wait_id);
     }
+  }
 #endif
 
 #if USE_ITT_BUILD
-    __kmp_itt_ordered_start( gtid );
+  __kmp_itt_ordered_start(gtid);
 #endif /* USE_ITT_BUILD */
 }
 
@@ -866,216 +823,231 @@ __kmpc_ordered( ident_t * loc, kmp_int32
 
 End execution of an <tt>ordered</tt> construct.
 */
-void
-__kmpc_end_ordered( ident_t * loc, kmp_int32 gtid )
-{
-    int cid = 0;
-    kmp_info_t *th;
+void __kmpc_end_ordered(ident_t *loc, kmp_int32 gtid) {
+  int cid = 0;
+  kmp_info_t *th;
 
-    KC_TRACE( 10, ("__kmpc_end_ordered: called T#%d\n", gtid ) );
+  KC_TRACE(10, ("__kmpc_end_ordered: called T#%d\n", gtid));
 
 #if USE_ITT_BUILD
-    __kmp_itt_ordered_end( gtid );
-    // TODO: ordered_wait_id
+  __kmp_itt_ordered_end(gtid);
+// TODO: ordered_wait_id
 #endif /* USE_ITT_BUILD */
 
-    th = __kmp_threads[ gtid ];
+  th = __kmp_threads[gtid];
 
-    if ( th -> th.th_dispatch -> th_dxo_fcn != 0 )
-        (*th->th.th_dispatch->th_dxo_fcn)( & gtid, & cid, loc );
-    else
-        __kmp_parallel_dxo( & gtid, & cid, loc );
+  if (th->th.th_dispatch->th_dxo_fcn != 0)
+    (*th->th.th_dispatch->th_dxo_fcn)(&gtid, &cid, loc);
+  else
+    __kmp_parallel_dxo(&gtid, &cid, loc);
 
 #if OMPT_SUPPORT && OMPT_BLAME
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_release_ordered)) {
-        ompt_callbacks.ompt_callback(ompt_event_release_ordered)(
-            th->th.ompt_thread_info.wait_id);
-    }
+  if (ompt_enabled &&
+      ompt_callbacks.ompt_callback(ompt_event_release_ordered)) {
+    ompt_callbacks.ompt_callback(ompt_event_release_ordered)(
+        th->th.ompt_thread_info.wait_id);
+  }
 #endif
 }
 
 #if KMP_USE_DYNAMIC_LOCK
 
 static __forceinline void
-__kmp_init_indirect_csptr(kmp_critical_name * crit, ident_t const * loc, kmp_int32 gtid, kmp_indirect_locktag_t tag)
-{
-    // Pointer to the allocated indirect lock is written to crit, while indexing is ignored.
-    void *idx;
-    kmp_indirect_lock_t **lck;
-    lck = (kmp_indirect_lock_t **)crit;
-    kmp_indirect_lock_t *ilk = __kmp_allocate_indirect_lock(&idx, gtid, tag);
-    KMP_I_LOCK_FUNC(ilk, init)(ilk->lock);
-    KMP_SET_I_LOCK_LOCATION(ilk, loc);
-    KMP_SET_I_LOCK_FLAGS(ilk, kmp_lf_critical_section);
-    KA_TRACE(20, ("__kmp_init_indirect_csptr: initialized indirect lock #%d\n", tag));
-#if USE_ITT_BUILD
-    __kmp_itt_critical_creating(ilk->lock, loc);
-#endif
-    int status = KMP_COMPARE_AND_STORE_PTR(lck, 0, ilk);
-    if (status == 0) {
-#if USE_ITT_BUILD
-        __kmp_itt_critical_destroyed(ilk->lock);
-#endif
-        // We don't really need to destroy the unclaimed lock here since it will be cleaned up at program exit.
-        //KMP_D_LOCK_FUNC(&idx, destroy)((kmp_dyna_lock_t *)&idx);
-    }
-    KMP_DEBUG_ASSERT(*lck != NULL);
+__kmp_init_indirect_csptr(kmp_critical_name *crit, ident_t const *loc,
+                          kmp_int32 gtid, kmp_indirect_locktag_t tag) {
+  // Pointer to the allocated indirect lock is written to crit, while indexing
+  // is ignored.
+  void *idx;
+  kmp_indirect_lock_t **lck;
+  lck = (kmp_indirect_lock_t **)crit;
+  kmp_indirect_lock_t *ilk = __kmp_allocate_indirect_lock(&idx, gtid, tag);
+  KMP_I_LOCK_FUNC(ilk, init)(ilk->lock);
+  KMP_SET_I_LOCK_LOCATION(ilk, loc);
+  KMP_SET_I_LOCK_FLAGS(ilk, kmp_lf_critical_section);
+  KA_TRACE(20,
+           ("__kmp_init_indirect_csptr: initialized indirect lock #%d\n", tag));
+#if USE_ITT_BUILD
+  __kmp_itt_critical_creating(ilk->lock, loc);
+#endif
+  int status = KMP_COMPARE_AND_STORE_PTR(lck, 0, ilk);
+  if (status == 0) {
+#if USE_ITT_BUILD
+    __kmp_itt_critical_destroyed(ilk->lock);
+#endif
+    // We don't really need to destroy the unclaimed lock here since it will be
+    // cleaned up at program exit.
+    // KMP_D_LOCK_FUNC(&idx, destroy)((kmp_dyna_lock_t *)&idx);
+  }
+  KMP_DEBUG_ASSERT(*lck != NULL);
 }
 
 // Fast-path acquire tas lock
-#define KMP_ACQUIRE_TAS_LOCK(lock, gtid) {                                                                       \
-    kmp_tas_lock_t *l = (kmp_tas_lock_t *)lock;                                                                  \
-    if (l->lk.poll != KMP_LOCK_FREE(tas) ||                                                                      \
-            ! KMP_COMPARE_AND_STORE_ACQ32(&(l->lk.poll), KMP_LOCK_FREE(tas), KMP_LOCK_BUSY(gtid+1, tas))) {      \
-        kmp_uint32 spins;                                                                                        \
-        KMP_FSYNC_PREPARE(l);                                                                                    \
-        KMP_INIT_YIELD(spins);                                                                                   \
-        if (TCR_4(__kmp_nth) > (__kmp_avail_proc ? __kmp_avail_proc : __kmp_xproc)) {                            \
-            KMP_YIELD(TRUE);                                                                                     \
-        } else {                                                                                                 \
-            KMP_YIELD_SPIN(spins);                                                                               \
-        }                                                                                                        \
-        kmp_backoff_t backoff = __kmp_spin_backoff_params;                                                       \
-        while (l->lk.poll != KMP_LOCK_FREE(tas) ||                                                               \
-               ! KMP_COMPARE_AND_STORE_ACQ32(&(l->lk.poll), KMP_LOCK_FREE(tas), KMP_LOCK_BUSY(gtid+1, tas))) {   \
-            __kmp_spin_backoff(&backoff);                                                                        \
-            if (TCR_4(__kmp_nth) > (__kmp_avail_proc ? __kmp_avail_proc : __kmp_xproc)) {                        \
-                KMP_YIELD(TRUE);                                                                                 \
-            } else {                                                                                             \
-                KMP_YIELD_SPIN(spins);                                                                           \
-            }                                                                                                    \
-        }                                                                                                        \
-    }                                                                                                            \
-    KMP_FSYNC_ACQUIRED(l);                                                                                       \
-}
+#define KMP_ACQUIRE_TAS_LOCK(lock, gtid)                                       \
+  {                                                                            \
+    kmp_tas_lock_t *l = (kmp_tas_lock_t *)lock;                                \
+    if (l->lk.poll != KMP_LOCK_FREE(tas) ||                                    \
+        !KMP_COMPARE_AND_STORE_ACQ32(&(l->lk.poll), KMP_LOCK_FREE(tas),        \
+                                     KMP_LOCK_BUSY(gtid + 1, tas))) {          \
+      kmp_uint32 spins;                                                        \
+      KMP_FSYNC_PREPARE(l);                                                    \
+      KMP_INIT_YIELD(spins);                                                   \
+      if (TCR_4(__kmp_nth) >                                                   \
+          (__kmp_avail_proc ? __kmp_avail_proc : __kmp_xproc)) {               \
+        KMP_YIELD(TRUE);                                                       \
+      } else {                                                                 \
+        KMP_YIELD_SPIN(spins);                                                 \
+      }                                                                        \
+      kmp_backoff_t backoff = __kmp_spin_backoff_params;                       \
+      while (l->lk.poll != KMP_LOCK_FREE(tas) ||                               \
+             !KMP_COMPARE_AND_STORE_ACQ32(&(l->lk.poll), KMP_LOCK_FREE(tas),   \
+                                          KMP_LOCK_BUSY(gtid + 1, tas))) {     \
+        __kmp_spin_backoff(&backoff);                                          \
+        if (TCR_4(__kmp_nth) >                                                 \
+            (__kmp_avail_proc ? __kmp_avail_proc : __kmp_xproc)) {             \
+          KMP_YIELD(TRUE);                                                     \
+        } else {                                                               \
+          KMP_YIELD_SPIN(spins);                                               \
+        }                                                                      \
+      }                                                                        \
+    }                                                                          \
+    KMP_FSYNC_ACQUIRED(l);                                                     \
+  }
 
 // Fast-path test tas lock
-#define KMP_TEST_TAS_LOCK(lock, gtid, rc) {                                                            \
-    kmp_tas_lock_t *l = (kmp_tas_lock_t *)lock;                                                        \
-    rc = l->lk.poll == KMP_LOCK_FREE(tas) &&                                                           \
-         KMP_COMPARE_AND_STORE_ACQ32(&(l->lk.poll), KMP_LOCK_FREE(tas), KMP_LOCK_BUSY(gtid+1, tas));   \
-}
+#define KMP_TEST_TAS_LOCK(lock, gtid, rc)                                      \
+  {                                                                            \
+    kmp_tas_lock_t *l = (kmp_tas_lock_t *)lock;                                \
+    rc = l->lk.poll == KMP_LOCK_FREE(tas) &&                                   \
+         KMP_COMPARE_AND_STORE_ACQ32(&(l->lk.poll), KMP_LOCK_FREE(tas),        \
+                                     KMP_LOCK_BUSY(gtid + 1, tas));            \
+  }
 
 // Fast-path release tas lock
-#define KMP_RELEASE_TAS_LOCK(lock, gtid) {                          \
-    TCW_4(((kmp_tas_lock_t *)lock)->lk.poll, KMP_LOCK_FREE(tas));   \
-    KMP_MB();                                                       \
-}
+#define KMP_RELEASE_TAS_LOCK(lock, gtid)                                       \
+  {                                                                            \
+    TCW_4(((kmp_tas_lock_t *)lock)->lk.poll, KMP_LOCK_FREE(tas));              \
+    KMP_MB();                                                                  \
+  }
 
 #if KMP_USE_FUTEX
 
-# include <unistd.h>
-# include <sys/syscall.h>
-# ifndef FUTEX_WAIT
-#  define FUTEX_WAIT 0
-# endif
-# ifndef FUTEX_WAKE
-#  define FUTEX_WAKE 1
-# endif
+#include <sys/syscall.h>
+#include <unistd.h>
+#ifndef FUTEX_WAIT
+#define FUTEX_WAIT 0
+#endif
+#ifndef FUTEX_WAKE
+#define FUTEX_WAKE 1
+#endif
 
 // Fast-path acquire futex lock
-#define KMP_ACQUIRE_FUTEX_LOCK(lock, gtid) {                                                                        \
-    kmp_futex_lock_t *ftx = (kmp_futex_lock_t *)lock;                                                               \
-    kmp_int32 gtid_code = (gtid+1) << 1;                                                                            \
-    KMP_MB();                                                                                                       \
-    KMP_FSYNC_PREPARE(ftx);                                                                                         \
-    kmp_int32 poll_val;                                                                                             \
-    while ((poll_val = KMP_COMPARE_AND_STORE_RET32(&(ftx->lk.poll), KMP_LOCK_FREE(futex),                           \
-                                                   KMP_LOCK_BUSY(gtid_code, futex))) != KMP_LOCK_FREE(futex)) {     \
-        kmp_int32 cond = KMP_LOCK_STRIP(poll_val) & 1;                                                              \
-        if (!cond) {                                                                                                \
-            if (!KMP_COMPARE_AND_STORE_RET32(&(ftx->lk.poll), poll_val, poll_val | KMP_LOCK_BUSY(1, futex))) {      \
-                continue;                                                                                           \
-            }                                                                                                       \
-            poll_val |= KMP_LOCK_BUSY(1, futex);                                                                    \
-        }                                                                                                           \
-        kmp_int32 rc;                                                                                               \
-        if ((rc = syscall(__NR_futex, &(ftx->lk.poll), FUTEX_WAIT, poll_val, NULL, NULL, 0)) != 0) {                \
-            continue;                                                                                               \
-        }                                                                                                           \
-        gtid_code |= 1;                                                                                             \
-    }                                                                                                               \
-    KMP_FSYNC_ACQUIRED(ftx);                                                                                        \
-}
+#define KMP_ACQUIRE_FUTEX_LOCK(lock, gtid)                                     \
+  {                                                                            \
+    kmp_futex_lock_t *ftx = (kmp_futex_lock_t *)lock;                          \
+    kmp_int32 gtid_code = (gtid + 1) << 1;                                     \
+    KMP_MB();                                                                  \
+    KMP_FSYNC_PREPARE(ftx);                                                    \
+    kmp_int32 poll_val;                                                        \
+    while ((poll_val = KMP_COMPARE_AND_STORE_RET32(                            \
+                &(ftx->lk.poll), KMP_LOCK_FREE(futex),                         \
+                KMP_LOCK_BUSY(gtid_code, futex))) != KMP_LOCK_FREE(futex)) {   \
+      kmp_int32 cond = KMP_LOCK_STRIP(poll_val) & 1;                           \
+      if (!cond) {                                                             \
+        if (!KMP_COMPARE_AND_STORE_RET32(&(ftx->lk.poll), poll_val,            \
+                                         poll_val |                            \
+                                             KMP_LOCK_BUSY(1, futex))) {       \
+          continue;                                                            \
+        }                                                                      \
+        poll_val |= KMP_LOCK_BUSY(1, futex);                                   \
+      }                                                                        \
+      kmp_int32 rc;                                                            \
+      if ((rc = syscall(__NR_futex, &(ftx->lk.poll), FUTEX_WAIT, poll_val,     \
+                        NULL, NULL, 0)) != 0) {                                \
+        continue;                                                              \
+      }                                                                        \
+      gtid_code |= 1;                                                          \
+    }                                                                          \
+    KMP_FSYNC_ACQUIRED(ftx);                                                   \
+  }
 
 // Fast-path test futex lock
-#define KMP_TEST_FUTEX_LOCK(lock, gtid, rc) {                                                                       \
-    kmp_futex_lock_t *ftx = (kmp_futex_lock_t *)lock;                                                               \
-    if (KMP_COMPARE_AND_STORE_ACQ32(&(ftx->lk.poll), KMP_LOCK_FREE(futex), KMP_LOCK_BUSY(gtid+1 << 1, futex))) {    \
-        KMP_FSYNC_ACQUIRED(ftx);                                                                                    \
-        rc = TRUE;                                                                                                  \
-    } else {                                                                                                        \
-        rc = FALSE;                                                                                                 \
-    }                                                                                                               \
-}
+#define KMP_TEST_FUTEX_LOCK(lock, gtid, rc)                                    \
+  {                                                                            \
+    kmp_futex_lock_t *ftx = (kmp_futex_lock_t *)lock;                          \
+    if (KMP_COMPARE_AND_STORE_ACQ32(&(ftx->lk.poll), KMP_LOCK_FREE(futex),     \
+                                    KMP_LOCK_BUSY(gtid + 1 << 1, futex))) {    \
+      KMP_FSYNC_ACQUIRED(ftx);                                                 \
+      rc = TRUE;                                                               \
+    } else {                                                                   \
+      rc = FALSE;                                                              \
+    }                                                                          \
+  }
 
 // Fast-path release futex lock
-#define KMP_RELEASE_FUTEX_LOCK(lock, gtid) {                                                        \
-    kmp_futex_lock_t *ftx = (kmp_futex_lock_t *)lock;                                               \
-    KMP_MB();                                                                                       \
-    KMP_FSYNC_RELEASING(ftx);                                                                       \
-    kmp_int32 poll_val = KMP_XCHG_FIXED32(&(ftx->lk.poll), KMP_LOCK_FREE(futex));                   \
-    if (KMP_LOCK_STRIP(poll_val) & 1) {                                                             \
-        syscall(__NR_futex, &(ftx->lk.poll), FUTEX_WAKE, KMP_LOCK_BUSY(1, futex), NULL, NULL, 0);   \
-    }                                                                                               \
-    KMP_MB();                                                                                       \
-    KMP_YIELD(TCR_4(__kmp_nth) > (__kmp_avail_proc ? __kmp_avail_proc : __kmp_xproc));              \
-}
+#define KMP_RELEASE_FUTEX_LOCK(lock, gtid)                                     \
+  {                                                                            \
+    kmp_futex_lock_t *ftx = (kmp_futex_lock_t *)lock;                          \
+    KMP_MB();                                                                  \
+    KMP_FSYNC_RELEASING(ftx);                                                  \
+    kmp_int32 poll_val =                                                       \
+        KMP_XCHG_FIXED32(&(ftx->lk.poll), KMP_LOCK_FREE(futex));               \
+    if (KMP_LOCK_STRIP(poll_val) & 1) {                                        \
+      syscall(__NR_futex, &(ftx->lk.poll), FUTEX_WAKE,                         \
+              KMP_LOCK_BUSY(1, futex), NULL, NULL, 0);                         \
+    }                                                                          \
+    KMP_MB();                                                                  \
+    KMP_YIELD(TCR_4(__kmp_nth) >                                               \
+              (__kmp_avail_proc ? __kmp_avail_proc : __kmp_xproc));            \
+  }
 
 #endif // KMP_USE_FUTEX
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-static kmp_user_lock_p
-__kmp_get_critical_section_ptr( kmp_critical_name * crit, ident_t const * loc, kmp_int32 gtid )
-{
-    kmp_user_lock_p *lck_pp = (kmp_user_lock_p *)crit;
-
-    //
-    // Because of the double-check, the following load
-    // doesn't need to be volatile.
-    //
-    kmp_user_lock_p lck = (kmp_user_lock_p)TCR_PTR( *lck_pp );
-
-    if ( lck == NULL ) {
-        void * idx;
-
-        // Allocate & initialize the lock.
-        // Remember allocated locks in table in order to free them in __kmp_cleanup()
-        lck = __kmp_user_lock_allocate( &idx, gtid, kmp_lf_critical_section );
-        __kmp_init_user_lock_with_checks( lck );
-        __kmp_set_user_lock_location( lck, loc );
-#if USE_ITT_BUILD
-        __kmp_itt_critical_creating( lck );
-            // __kmp_itt_critical_creating() should be called *before* the first usage of underlying
-            // lock. It is the only place where we can guarantee it. There are chances the lock will
-            // destroyed with no usage, but it is not a problem, because this is not real event seen
-            // by user but rather setting name for object (lock). See more details in kmp_itt.h.
-#endif /* USE_ITT_BUILD */
-
-        //
-        // Use a cmpxchg instruction to slam the start of the critical
-        // section with the lock pointer.  If another thread beat us
-        // to it, deallocate the lock, and use the lock that the other
-        // thread allocated.
-        //
-        int status = KMP_COMPARE_AND_STORE_PTR( lck_pp, 0, lck );
-
-        if ( status == 0 ) {
-            // Deallocate the lock and reload the value.
-#if USE_ITT_BUILD
-            __kmp_itt_critical_destroyed( lck );
-                // Let ITT know the lock is destroyed and the same memory location may be reused for
-                // another purpose.
-#endif /* USE_ITT_BUILD */
-            __kmp_destroy_user_lock_with_checks( lck );
-            __kmp_user_lock_free( &idx, gtid, lck );
-            lck = (kmp_user_lock_p)TCR_PTR( *lck_pp );
-            KMP_DEBUG_ASSERT( lck != NULL );
-        }
+static kmp_user_lock_p __kmp_get_critical_section_ptr(kmp_critical_name *crit,
+                                                      ident_t const *loc,
+                                                      kmp_int32 gtid) {
+  kmp_user_lock_p *lck_pp = (kmp_user_lock_p *)crit;
+
+  // Because of the double-check, the following load doesn't need to be volatile
+  kmp_user_lock_p lck = (kmp_user_lock_p)TCR_PTR(*lck_pp);
+
+  if (lck == NULL) {
+    void *idx;
+
+    // Allocate & initialize the lock.
+    // Remember alloc'ed locks in table in order to free them in __kmp_cleanup()
+    lck = __kmp_user_lock_allocate(&idx, gtid, kmp_lf_critical_section);
+    __kmp_init_user_lock_with_checks(lck);
+    __kmp_set_user_lock_location(lck, loc);
+#if USE_ITT_BUILD
+    __kmp_itt_critical_creating(lck);
+// __kmp_itt_critical_creating() should be called *before* the first usage
+// of underlying lock. It is the only place where we can guarantee it. There
+// are chances the lock will destroyed with no usage, but it is not a
+// problem, because this is not real event seen by user but rather setting
+// name for object (lock). See more details in kmp_itt.h.
+#endif /* USE_ITT_BUILD */
+
+    // Use a cmpxchg instruction to slam the start of the critical section with
+    // the lock pointer.  If another thread beat us to it, deallocate the lock,
+    // and use the lock that the other thread allocated.
+    int status = KMP_COMPARE_AND_STORE_PTR(lck_pp, 0, lck);
+
+    if (status == 0) {
+// Deallocate the lock and reload the value.
+#if USE_ITT_BUILD
+      __kmp_itt_critical_destroyed(lck);
+// Let ITT know the lock is destroyed and the same memory location may be reused
+// for another purpose.
+#endif /* USE_ITT_BUILD */
+      __kmp_destroy_user_lock_with_checks(lck);
+      __kmp_user_lock_free(&idx, gtid, lck);
+      lck = (kmp_user_lock_p)TCR_PTR(*lck_pp);
+      KMP_DEBUG_ASSERT(lck != NULL);
     }
-    return lck;
+  }
+  return lck;
 }
 
 #endif // KMP_USE_DYNAMIC_LOCK
@@ -1084,183 +1056,186 @@ __kmp_get_critical_section_ptr( kmp_crit
 @ingroup WORK_SHARING
 @param loc  source location information.
 @param global_tid  global thread number .
- at param crit identity of the critical section. This could be a pointer to a lock associated with the critical section, or
-some other suitably unique value.
+ at param crit identity of the critical section. This could be a pointer to a lock
+associated with the critical section, or some other suitably unique value.
 
 Enter code protected by a `critical` construct.
 This function blocks until the executing thread can enter the critical section.
 */
-void
-__kmpc_critical( ident_t * loc, kmp_int32 global_tid, kmp_critical_name * crit )
-{
+void __kmpc_critical(ident_t *loc, kmp_int32 global_tid,
+                     kmp_critical_name *crit) {
 #if KMP_USE_DYNAMIC_LOCK
-    __kmpc_critical_with_hint(loc, global_tid, crit, omp_lock_hint_none);
+  __kmpc_critical_with_hint(loc, global_tid, crit, omp_lock_hint_none);
 #else
-    KMP_COUNT_BLOCK(OMP_CRITICAL);
-    KMP_TIME_PARTITIONED_BLOCK(OMP_critical_wait);        /* Time spent waiting to enter the critical section */
-    kmp_user_lock_p lck;
+  KMP_COUNT_BLOCK(OMP_CRITICAL);
+  KMP_TIME_PARTITIONED_BLOCK(
+      OMP_critical_wait); /* Time spent waiting to enter the critical section */
+  kmp_user_lock_p lck;
 
-    KC_TRACE( 10, ("__kmpc_critical: called T#%d\n", global_tid ) );
+  KC_TRACE(10, ("__kmpc_critical: called T#%d\n", global_tid));
 
-    //TODO: add THR_OVHD_STATE
+  // TODO: add THR_OVHD_STATE
 
-    KMP_CHECK_USER_LOCK_INIT();
+  KMP_CHECK_USER_LOCK_INIT();
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_CRITICAL_SIZE ) ) {
-        lck = (kmp_user_lock_p)crit;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_CRITICAL_SIZE)) {
+    lck = (kmp_user_lock_p)crit;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_CRITICAL_SIZE ) ) {
-        lck = (kmp_user_lock_p)crit;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_CRITICAL_SIZE)) {
+    lck = (kmp_user_lock_p)crit;
+  }
 #endif
-    else { // ticket, queuing or drdpa
-        lck = __kmp_get_critical_section_ptr( crit, loc, global_tid );
-    }
+  else { // ticket, queuing or drdpa
+    lck = __kmp_get_critical_section_ptr(crit, loc, global_tid);
+  }
 
-    if ( __kmp_env_consistency_check )
-        __kmp_push_sync( global_tid, ct_critical, loc, lck );
+  if (__kmp_env_consistency_check)
+    __kmp_push_sync(global_tid, ct_critical, loc, lck);
 
-    /* since the critical directive binds to all threads, not just
-     * the current team we have to check this even if we are in a
-     * serialized team */
-    /* also, even if we are the uber thread, we still have to conduct the lock,
-     * as we have to contend with sibling threads */
+// since the critical directive binds to all threads, not just the current
+// team we have to check this even if we are in a serialized team.
+// also, even if we are the uber thread, we still have to conduct the lock,
+// as we have to contend with sibling threads.
 
 #if USE_ITT_BUILD
-    __kmp_itt_critical_acquiring( lck );
+  __kmp_itt_critical_acquiring(lck);
 #endif /* USE_ITT_BUILD */
-    // Value of 'crit' should be good for using as a critical_id of the critical section directive.
-    __kmp_acquire_user_lock_with_checks( lck, global_tid );
+  // Value of 'crit' should be good for using as a critical_id of the critical
+  // section directive.
+  __kmp_acquire_user_lock_with_checks(lck, global_tid);
 
 #if USE_ITT_BUILD
-    __kmp_itt_critical_acquired( lck );
+  __kmp_itt_critical_acquired(lck);
 #endif /* USE_ITT_BUILD */
 
-    KMP_START_EXPLICIT_TIMER(OMP_critical);
-    KA_TRACE( 15, ("__kmpc_critical: done T#%d\n", global_tid ));
+  KMP_START_EXPLICIT_TIMER(OMP_critical);
+  KA_TRACE(15, ("__kmpc_critical: done T#%d\n", global_tid));
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
 #if KMP_USE_DYNAMIC_LOCK
 
 // Converts the given hint to an internal lock implementation
-static __forceinline kmp_dyna_lockseq_t
-__kmp_map_hint_to_lock(uintptr_t hint)
-{
+static __forceinline kmp_dyna_lockseq_t __kmp_map_hint_to_lock(uintptr_t hint) {
 #if KMP_USE_TSX
-# define KMP_TSX_LOCK(seq) lockseq_##seq
+#define KMP_TSX_LOCK(seq) lockseq_##seq
 #else
-# define KMP_TSX_LOCK(seq) __kmp_user_lock_seq
+#define KMP_TSX_LOCK(seq) __kmp_user_lock_seq
 #endif
 
 #if KMP_ARCH_X86 || KMP_ARCH_X86_64
-# define KMP_CPUINFO_RTM (__kmp_cpuinfo.rtm)
+#define KMP_CPUINFO_RTM (__kmp_cpuinfo.rtm)
 #else
-# define KMP_CPUINFO_RTM 0
+#define KMP_CPUINFO_RTM 0
 #endif
 
-    // Hints that do not require further logic
-    if (hint & kmp_lock_hint_hle)
-        return KMP_TSX_LOCK(hle);
-    if (hint & kmp_lock_hint_rtm)
-        return KMP_CPUINFO_RTM ? KMP_TSX_LOCK(rtm): __kmp_user_lock_seq;
-    if (hint & kmp_lock_hint_adaptive)
-        return KMP_CPUINFO_RTM ? KMP_TSX_LOCK(adaptive): __kmp_user_lock_seq;
-
-    // Rule out conflicting hints first by returning the default lock
-    if ((hint & omp_lock_hint_contended) && (hint & omp_lock_hint_uncontended))
-        return __kmp_user_lock_seq;
-    if ((hint & omp_lock_hint_speculative) && (hint & omp_lock_hint_nonspeculative))
-        return __kmp_user_lock_seq;
-
-    // Do not even consider speculation when it appears to be contended
-    if (hint & omp_lock_hint_contended)
-        return lockseq_queuing;
-
-    // Uncontended lock without speculation
-    if ((hint & omp_lock_hint_uncontended) && !(hint & omp_lock_hint_speculative))
-        return lockseq_tas;
-
-    // HLE lock for speculation
-    if (hint & omp_lock_hint_speculative)
-        return KMP_TSX_LOCK(hle);
+  // Hints that do not require further logic
+  if (hint & kmp_lock_hint_hle)
+    return KMP_TSX_LOCK(hle);
+  if (hint & kmp_lock_hint_rtm)
+    return KMP_CPUINFO_RTM ? KMP_TSX_LOCK(rtm) : __kmp_user_lock_seq;
+  if (hint & kmp_lock_hint_adaptive)
+    return KMP_CPUINFO_RTM ? KMP_TSX_LOCK(adaptive) : __kmp_user_lock_seq;
 
+  // Rule out conflicting hints first by returning the default lock
+  if ((hint & omp_lock_hint_contended) && (hint & omp_lock_hint_uncontended))
+    return __kmp_user_lock_seq;
+  if ((hint & omp_lock_hint_speculative) &&
+      (hint & omp_lock_hint_nonspeculative))
     return __kmp_user_lock_seq;
+
+  // Do not even consider speculation when it appears to be contended
+  if (hint & omp_lock_hint_contended)
+    return lockseq_queuing;
+
+  // Uncontended lock without speculation
+  if ((hint & omp_lock_hint_uncontended) && !(hint & omp_lock_hint_speculative))
+    return lockseq_tas;
+
+  // HLE lock for speculation
+  if (hint & omp_lock_hint_speculative)
+    return KMP_TSX_LOCK(hle);
+
+  return __kmp_user_lock_seq;
 }
 
 /*!
 @ingroup WORK_SHARING
 @param loc  source location information.
 @param global_tid  global thread number.
- at param crit identity of the critical section. This could be a pointer to a lock associated with the critical section,
-or some other suitably unique value.
+ at param crit identity of the critical section. This could be a pointer to a lock
+associated with the critical section, or some other suitably unique value.
 @param hint the lock hint.
 
-Enter code protected by a `critical` construct with a hint. The hint value is used to suggest a lock implementation.
-This function blocks until the executing thread can enter the critical section unless the hint suggests use of
+Enter code protected by a `critical` construct with a hint. The hint value is
+used to suggest a lock implementation. This function blocks until the executing
+thread can enter the critical section unless the hint suggests use of
 speculative execution and the hardware supports it.
 */
-void
-__kmpc_critical_with_hint( ident_t * loc, kmp_int32 global_tid, kmp_critical_name * crit, uintptr_t hint )
-{
-    KMP_COUNT_BLOCK(OMP_CRITICAL);
-    kmp_user_lock_p lck;
-
-    KC_TRACE( 10, ("__kmpc_critical: called T#%d\n", global_tid ) );
-
-    kmp_dyna_lock_t *lk = (kmp_dyna_lock_t *)crit;
-    // Check if it is initialized.
-    if (*lk == 0) {
-        kmp_dyna_lockseq_t lckseq = __kmp_map_hint_to_lock(hint);
-        if (KMP_IS_D_LOCK(lckseq)) {
-            KMP_COMPARE_AND_STORE_ACQ32((volatile kmp_int32 *)crit, 0, KMP_GET_D_TAG(lckseq));
-        } else {
-            __kmp_init_indirect_csptr(crit, loc, global_tid, KMP_GET_I_TAG(lckseq));
-        }
-    }
-    // Branch for accessing the actual lock object and set operation. This branching is inevitable since
-    // this lock initialization does not follow the normal dispatch path (lock table is not used).
-    if (KMP_EXTRACT_D_TAG(lk) != 0) {
-        lck = (kmp_user_lock_p)lk;
-        if (__kmp_env_consistency_check) {
-            __kmp_push_sync(global_tid, ct_critical, loc, lck, __kmp_map_hint_to_lock(hint));
-        }
-# if USE_ITT_BUILD
-        __kmp_itt_critical_acquiring(lck);
-# endif
-# if KMP_USE_INLINED_TAS
-        if (__kmp_user_lock_seq == lockseq_tas && !__kmp_env_consistency_check) {
-            KMP_ACQUIRE_TAS_LOCK(lck, global_tid);
-        } else
-# elif KMP_USE_INLINED_FUTEX
-        if (__kmp_user_lock_seq == lockseq_futex && !__kmp_env_consistency_check) {
-            KMP_ACQUIRE_FUTEX_LOCK(lck, global_tid);
-        } else
-# endif
-        {
-            KMP_D_LOCK_FUNC(lk, set)(lk, global_tid);
-        }
+void __kmpc_critical_with_hint(ident_t *loc, kmp_int32 global_tid,
+                               kmp_critical_name *crit, uintptr_t hint) {
+  KMP_COUNT_BLOCK(OMP_CRITICAL);
+  kmp_user_lock_p lck;
+
+  KC_TRACE(10, ("__kmpc_critical: called T#%d\n", global_tid));
+
+  kmp_dyna_lock_t *lk = (kmp_dyna_lock_t *)crit;
+  // Check if it is initialized.
+  if (*lk == 0) {
+    kmp_dyna_lockseq_t lckseq = __kmp_map_hint_to_lock(hint);
+    if (KMP_IS_D_LOCK(lckseq)) {
+      KMP_COMPARE_AND_STORE_ACQ32((volatile kmp_int32 *)crit, 0,
+                                  KMP_GET_D_TAG(lckseq));
     } else {
-        kmp_indirect_lock_t *ilk = *((kmp_indirect_lock_t **)lk);
-        lck = ilk->lock;
-        if (__kmp_env_consistency_check) {
-            __kmp_push_sync(global_tid, ct_critical, loc, lck, __kmp_map_hint_to_lock(hint));
-        }
-# if USE_ITT_BUILD
-        __kmp_itt_critical_acquiring(lck);
-# endif
-        KMP_I_LOCK_FUNC(ilk, set)(lck, global_tid);
+      __kmp_init_indirect_csptr(crit, loc, global_tid, KMP_GET_I_TAG(lckseq));
+    }
+  }
+  // Branch for accessing the actual lock object and set operation. This
+  // branching is inevitable since this lock initialization does not follow the
+  // normal dispatch path (lock table is not used).
+  if (KMP_EXTRACT_D_TAG(lk) != 0) {
+    lck = (kmp_user_lock_p)lk;
+    if (__kmp_env_consistency_check) {
+      __kmp_push_sync(global_tid, ct_critical, loc, lck,
+                      __kmp_map_hint_to_lock(hint));
+    }
+#if USE_ITT_BUILD
+    __kmp_itt_critical_acquiring(lck);
+#endif
+#if KMP_USE_INLINED_TAS
+    if (__kmp_user_lock_seq == lockseq_tas && !__kmp_env_consistency_check) {
+      KMP_ACQUIRE_TAS_LOCK(lck, global_tid);
+    } else
+#elif KMP_USE_INLINED_FUTEX
+    if (__kmp_user_lock_seq == lockseq_futex && !__kmp_env_consistency_check) {
+      KMP_ACQUIRE_FUTEX_LOCK(lck, global_tid);
+    } else
+#endif
+    {
+      KMP_D_LOCK_FUNC(lk, set)(lk, global_tid);
     }
+  } else {
+    kmp_indirect_lock_t *ilk = *((kmp_indirect_lock_t **)lk);
+    lck = ilk->lock;
+    if (__kmp_env_consistency_check) {
+      __kmp_push_sync(global_tid, ct_critical, loc, lck,
+                      __kmp_map_hint_to_lock(hint));
+    }
+#if USE_ITT_BUILD
+    __kmp_itt_critical_acquiring(lck);
+#endif
+    KMP_I_LOCK_FUNC(ilk, set)(lck, global_tid);
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_critical_acquired( lck );
+  __kmp_itt_critical_acquired(lck);
 #endif /* USE_ITT_BUILD */
 
-    KMP_PUSH_PARTITIONED_TIMER(OMP_critical);
-    KA_TRACE( 15, ("__kmpc_critical: done T#%d\n", global_tid ));
+  KMP_PUSH_PARTITIONED_TIMER(OMP_critical);
+  KA_TRACE(15, ("__kmpc_critical: done T#%d\n", global_tid));
 } // __kmpc_critical_with_hint
 
 #endif // KMP_USE_DYNAMIC_LOCK
@@ -1269,91 +1244,91 @@ __kmpc_critical_with_hint( ident_t * loc
 @ingroup WORK_SHARING
 @param loc  source location information.
 @param global_tid  global thread number .
- at param crit identity of the critical section. This could be a pointer to a lock associated with the critical section, or
-some other suitably unique value.
+ at param crit identity of the critical section. This could be a pointer to a lock
+associated with the critical section, or some other suitably unique value.
 
 Leave a critical section, releasing any lock that was held during its execution.
 */
-void
-__kmpc_end_critical(ident_t *loc, kmp_int32 global_tid, kmp_critical_name *crit)
-{
-    kmp_user_lock_p lck;
+void __kmpc_end_critical(ident_t *loc, kmp_int32 global_tid,
+                         kmp_critical_name *crit) {
+  kmp_user_lock_p lck;
 
-    KC_TRACE( 10, ("__kmpc_end_critical: called T#%d\n", global_tid ));
+  KC_TRACE(10, ("__kmpc_end_critical: called T#%d\n", global_tid));
 
 #if KMP_USE_DYNAMIC_LOCK
-    if (KMP_IS_D_LOCK(__kmp_user_lock_seq)) {
-        lck = (kmp_user_lock_p)crit;
-        KMP_ASSERT(lck != NULL);
-        if (__kmp_env_consistency_check) {
-            __kmp_pop_sync(global_tid, ct_critical, loc);
-        }
-# if USE_ITT_BUILD
-        __kmp_itt_critical_releasing( lck );
-# endif
-# if KMP_USE_INLINED_TAS
-        if (__kmp_user_lock_seq == lockseq_tas && !__kmp_env_consistency_check) {
-            KMP_RELEASE_TAS_LOCK(lck, global_tid);
-        } else
-# elif KMP_USE_INLINED_FUTEX
-        if (__kmp_user_lock_seq == lockseq_futex && !__kmp_env_consistency_check) {
-            KMP_RELEASE_FUTEX_LOCK(lck, global_tid);
-        } else
-# endif
-        {
-            KMP_D_LOCK_FUNC(lck, unset)((kmp_dyna_lock_t *)lck, global_tid);
-        }
-    } else {
-        kmp_indirect_lock_t *ilk = (kmp_indirect_lock_t *)TCR_PTR(*((kmp_indirect_lock_t **)crit));
-        KMP_ASSERT(ilk != NULL);
-        lck = ilk->lock;
-        if (__kmp_env_consistency_check) {
-            __kmp_pop_sync(global_tid, ct_critical, loc);
-        }
-# if USE_ITT_BUILD
-        __kmp_itt_critical_releasing( lck );
-# endif
-        KMP_I_LOCK_FUNC(ilk, unset)(lck, global_tid);
+  if (KMP_IS_D_LOCK(__kmp_user_lock_seq)) {
+    lck = (kmp_user_lock_p)crit;
+    KMP_ASSERT(lck != NULL);
+    if (__kmp_env_consistency_check) {
+      __kmp_pop_sync(global_tid, ct_critical, loc);
+    }
+#if USE_ITT_BUILD
+    __kmp_itt_critical_releasing(lck);
+#endif
+#if KMP_USE_INLINED_TAS
+    if (__kmp_user_lock_seq == lockseq_tas && !__kmp_env_consistency_check) {
+      KMP_RELEASE_TAS_LOCK(lck, global_tid);
+    } else
+#elif KMP_USE_INLINED_FUTEX
+    if (__kmp_user_lock_seq == lockseq_futex && !__kmp_env_consistency_check) {
+      KMP_RELEASE_FUTEX_LOCK(lck, global_tid);
+    } else
+#endif
+    {
+      KMP_D_LOCK_FUNC(lck, unset)((kmp_dyna_lock_t *)lck, global_tid);
     }
+  } else {
+    kmp_indirect_lock_t *ilk =
+        (kmp_indirect_lock_t *)TCR_PTR(*((kmp_indirect_lock_t **)crit));
+    KMP_ASSERT(ilk != NULL);
+    lck = ilk->lock;
+    if (__kmp_env_consistency_check) {
+      __kmp_pop_sync(global_tid, ct_critical, loc);
+    }
+#if USE_ITT_BUILD
+    __kmp_itt_critical_releasing(lck);
+#endif
+    KMP_I_LOCK_FUNC(ilk, unset)(lck, global_tid);
+  }
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_CRITICAL_SIZE ) ) {
-        lck = (kmp_user_lock_p)crit;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_CRITICAL_SIZE)) {
+    lck = (kmp_user_lock_p)crit;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_CRITICAL_SIZE ) ) {
-        lck = (kmp_user_lock_p)crit;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_CRITICAL_SIZE)) {
+    lck = (kmp_user_lock_p)crit;
+  }
 #endif
-    else { // ticket, queuing or drdpa
-        lck = (kmp_user_lock_p) TCR_PTR(*((kmp_user_lock_p *)crit));
-    }
+  else { // ticket, queuing or drdpa
+    lck = (kmp_user_lock_p)TCR_PTR(*((kmp_user_lock_p *)crit));
+  }
 
-    KMP_ASSERT(lck != NULL);
+  KMP_ASSERT(lck != NULL);
 
-    if ( __kmp_env_consistency_check )
-        __kmp_pop_sync( global_tid, ct_critical, loc );
+  if (__kmp_env_consistency_check)
+    __kmp_pop_sync(global_tid, ct_critical, loc);
 
 #if USE_ITT_BUILD
-    __kmp_itt_critical_releasing( lck );
+  __kmp_itt_critical_releasing(lck);
 #endif /* USE_ITT_BUILD */
-    // Value of 'crit' should be good for using as a critical_id of the critical section directive.
-    __kmp_release_user_lock_with_checks( lck, global_tid );
+  // Value of 'crit' should be good for using as a critical_id of the critical
+  // section directive.
+  __kmp_release_user_lock_with_checks(lck, global_tid);
 
 #if OMPT_SUPPORT && OMPT_BLAME
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_release_critical)) {
-        ompt_callbacks.ompt_callback(ompt_event_release_critical)(
-            (uint64_t) lck);
-    }
+  if (ompt_enabled &&
+      ompt_callbacks.ompt_callback(ompt_event_release_critical)) {
+    ompt_callbacks.ompt_callback(ompt_event_release_critical)((uint64_t)lck);
+  }
 #endif
 
 #endif // KMP_USE_DYNAMIC_LOCK
-    KMP_POP_PARTITIONED_TIMER();
-    KA_TRACE( 15, ("__kmpc_end_critical: done T#%d\n", global_tid ));
+  KMP_POP_PARTITIONED_TIMER();
+  KA_TRACE(15, ("__kmpc_end_critical: done T#%d\n", global_tid));
 }
 
 /*!
@@ -1362,27 +1337,26 @@ __kmpc_end_critical(ident_t *loc, kmp_in
 @param global_tid thread id.
 @return one if the thread should execute the master block, zero otherwise
 
-Start execution of a combined barrier and master. The barrier is executed inside this function.
+Start execution of a combined barrier and master. The barrier is executed inside
+this function.
 */
-kmp_int32
-__kmpc_barrier_master(ident_t *loc, kmp_int32 global_tid)
-{
-    int status;
+kmp_int32 __kmpc_barrier_master(ident_t *loc, kmp_int32 global_tid) {
+  int status;
 
-    KC_TRACE( 10, ("__kmpc_barrier_master: called T#%d\n", global_tid ) );
+  KC_TRACE(10, ("__kmpc_barrier_master: called T#%d\n", global_tid));
 
-    if (! TCR_4(__kmp_init_parallel))
-        __kmp_parallel_initialize();
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
 
-    if ( __kmp_env_consistency_check )
-        __kmp_check_barrier( global_tid, ct_barrier, loc );
+  if (__kmp_env_consistency_check)
+    __kmp_check_barrier(global_tid, ct_barrier, loc);
 
 #if USE_ITT_NOTIFY
-    __kmp_threads[global_tid]->th.th_ident = loc;
+  __kmp_threads[global_tid]->th.th_ident = loc;
 #endif
-    status = __kmp_barrier( bs_plain_barrier, global_tid, TRUE, 0, NULL, NULL );
+  status = __kmp_barrier(bs_plain_barrier, global_tid, TRUE, 0, NULL, NULL);
 
-    return (status != 0) ? 0 : 1;
+  return (status != 0) ? 0 : 1;
 }
 
 /*!
@@ -1394,12 +1368,10 @@ Complete the execution of a combined bar
 only be called at the completion of the <tt>master</tt> code. Other threads will
 still be waiting at the barrier and this call releases them.
 */
-void
-__kmpc_end_barrier_master(ident_t *loc, kmp_int32 global_tid)
-{
-    KC_TRACE( 10, ("__kmpc_end_barrier_master: called T#%d\n", global_tid ));
+void __kmpc_end_barrier_master(ident_t *loc, kmp_int32 global_tid) {
+  KC_TRACE(10, ("__kmpc_end_barrier_master: called T#%d\n", global_tid));
 
-    __kmp_end_split_barrier ( bs_plain_barrier, global_tid );
+  __kmp_end_split_barrier(bs_plain_barrier, global_tid);
 }
 
 /*!
@@ -1412,46 +1384,44 @@ Start execution of a combined barrier an
 The barrier is executed inside this function.
 There is no equivalent "end" function, since the
 */
-kmp_int32
-__kmpc_barrier_master_nowait( ident_t * loc, kmp_int32 global_tid )
-{
-    kmp_int32 ret;
-
-    KC_TRACE( 10, ("__kmpc_barrier_master_nowait: called T#%d\n", global_tid ));
-
-    if (! TCR_4(__kmp_init_parallel))
-        __kmp_parallel_initialize();
-
-    if ( __kmp_env_consistency_check ) {
-        if ( loc == 0 ) {
-            KMP_WARNING( ConstructIdentInvalid ); // ??? What does it mean for the user?
-        }
-        __kmp_check_barrier( global_tid, ct_barrier, loc );
+kmp_int32 __kmpc_barrier_master_nowait(ident_t *loc, kmp_int32 global_tid) {
+  kmp_int32 ret;
+
+  KC_TRACE(10, ("__kmpc_barrier_master_nowait: called T#%d\n", global_tid));
+
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
+
+  if (__kmp_env_consistency_check) {
+    if (loc == 0) {
+      KMP_WARNING(ConstructIdentInvalid); // ??? What does it mean for the user?
     }
+    __kmp_check_barrier(global_tid, ct_barrier, loc);
+  }
 
 #if USE_ITT_NOTIFY
-    __kmp_threads[global_tid]->th.th_ident = loc;
+  __kmp_threads[global_tid]->th.th_ident = loc;
 #endif
-    __kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
+  __kmp_barrier(bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL);
 
-    ret = __kmpc_master (loc, global_tid);
+  ret = __kmpc_master(loc, global_tid);
 
-    if ( __kmp_env_consistency_check ) {
-        /*  there's no __kmpc_end_master called; so the (stats) */
-        /*  actions of __kmpc_end_master are done here          */
+  if (__kmp_env_consistency_check) {
+    /*  there's no __kmpc_end_master called; so the (stats) */
+    /*  actions of __kmpc_end_master are done here          */
 
-        if ( global_tid < 0 ) {
-            KMP_WARNING( ThreadIdentInvalid );
-        }
-        if (ret) {
-            /* only one thread should do the pop since only */
-            /* one did the push (see __kmpc_master())       */
+    if (global_tid < 0) {
+      KMP_WARNING(ThreadIdentInvalid);
+    }
+    if (ret) {
+      /* only one thread should do the pop since only */
+      /* one did the push (see __kmpc_master())       */
 
-            __kmp_pop_sync( global_tid, ct_master, loc );
-        }
+      __kmp_pop_sync(global_tid, ct_master, loc);
     }
+  }
 
-    return (ret);
+  return (ret);
 }
 
 /* The BARRIER for a SINGLE process section is always explicit   */
@@ -1462,46 +1432,44 @@ __kmpc_barrier_master_nowait( ident_t *
 @return One if this thread should execute the single construct, zero otherwise.
 
 Test whether to execute a <tt>single</tt> construct.
-There are no implicit barriers in the two "single" calls, rather the compiler should
-introduce an explicit barrier if it is required.
+There are no implicit barriers in the two "single" calls, rather the compiler
+should introduce an explicit barrier if it is required.
 */
 
-kmp_int32
-__kmpc_single(ident_t *loc, kmp_int32 global_tid)
-{
-    kmp_int32 rc = __kmp_enter_single( global_tid, loc, TRUE );
+kmp_int32 __kmpc_single(ident_t *loc, kmp_int32 global_tid) {
+  kmp_int32 rc = __kmp_enter_single(global_tid, loc, TRUE);
 
-    if (rc) {
-        // We are going to execute the single statement, so we should count it.
-        KMP_COUNT_BLOCK(OMP_SINGLE);
-        KMP_PUSH_PARTITIONED_TIMER(OMP_single);
-    }
+  if (rc) {
+    // We are going to execute the single statement, so we should count it.
+    KMP_COUNT_BLOCK(OMP_SINGLE);
+    KMP_PUSH_PARTITIONED_TIMER(OMP_single);
+  }
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    kmp_info_t *this_thr        = __kmp_threads[ global_tid ];
-    kmp_team_t *team            = this_thr -> th.th_team;
-    int tid = __kmp_tid_from_gtid( global_tid );
+  kmp_info_t *this_thr = __kmp_threads[global_tid];
+  kmp_team_t *team = this_thr->th.th_team;
+  int tid = __kmp_tid_from_gtid(global_tid);
 
-    if (ompt_enabled) {
-        if (rc) {
-            if (ompt_callbacks.ompt_callback(ompt_event_single_in_block_begin)) {
-                ompt_callbacks.ompt_callback(ompt_event_single_in_block_begin)(
-                    team->t.ompt_team_info.parallel_id,
-                    team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id,
-                    team->t.ompt_team_info.microtask);
-            }
-        } else {
-            if (ompt_callbacks.ompt_callback(ompt_event_single_others_begin)) {
-                ompt_callbacks.ompt_callback(ompt_event_single_others_begin)(
-                    team->t.ompt_team_info.parallel_id,
-                    team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
-            }
-            this_thr->th.ompt_thread_info.state = ompt_state_wait_single;
-        }
+  if (ompt_enabled) {
+    if (rc) {
+      if (ompt_callbacks.ompt_callback(ompt_event_single_in_block_begin)) {
+        ompt_callbacks.ompt_callback(ompt_event_single_in_block_begin)(
+            team->t.ompt_team_info.parallel_id,
+            team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id,
+            team->t.ompt_team_info.microtask);
+      }
+    } else {
+      if (ompt_callbacks.ompt_callback(ompt_event_single_others_begin)) {
+        ompt_callbacks.ompt_callback(ompt_event_single_others_begin)(
+            team->t.ompt_team_info.parallel_id,
+            team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
+      }
+      this_thr->th.ompt_thread_info.state = ompt_state_wait_single;
     }
+  }
 #endif
 
-    return rc;
+  return rc;
 }
 
 /*!
@@ -1513,23 +1481,21 @@ Mark the end of a <tt>single</tt> constr
 only be called by the thread that executed the block of code protected
 by the `single` construct.
 */
-void
-__kmpc_end_single(ident_t *loc, kmp_int32 global_tid)
-{
-    __kmp_exit_single( global_tid );
-    KMP_POP_PARTITIONED_TIMER();
+void __kmpc_end_single(ident_t *loc, kmp_int32 global_tid) {
+  __kmp_exit_single(global_tid);
+  KMP_POP_PARTITIONED_TIMER();
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    kmp_info_t *this_thr        = __kmp_threads[ global_tid ];
-    kmp_team_t *team            = this_thr -> th.th_team;
-    int tid = __kmp_tid_from_gtid( global_tid );
-
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_single_in_block_end)) {
-        ompt_callbacks.ompt_callback(ompt_event_single_in_block_end)(
-            team->t.ompt_team_info.parallel_id,
-            team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
-    }
+  kmp_info_t *this_thr = __kmp_threads[global_tid];
+  kmp_team_t *team = this_thr->th.th_team;
+  int tid = __kmp_tid_from_gtid(global_tid);
+
+  if (ompt_enabled &&
+      ompt_callbacks.ompt_callback(ompt_event_single_in_block_end)) {
+    ompt_callbacks.ompt_callback(ompt_event_single_in_block_end)(
+        team->t.ompt_team_info.parallel_id,
+        team->t.t_implicit_task_taskdata[tid].ompt_task_info.task_id);
+  }
 #endif
 }
 
@@ -1540,182 +1506,144 @@ __kmpc_end_single(ident_t *loc, kmp_int3
 
 Mark the end of a statically scheduled loop.
 */
-void
-__kmpc_for_static_fini( ident_t *loc, kmp_int32 global_tid )
-{
-    KE_TRACE( 10, ("__kmpc_for_static_fini called T#%d\n", global_tid));
+void __kmpc_for_static_fini(ident_t *loc, kmp_int32 global_tid) {
+  KE_TRACE(10, ("__kmpc_for_static_fini called T#%d\n", global_tid));
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_loop_end)) {
-        ompt_team_info_t *team_info = __ompt_get_teaminfo(0, NULL);
-        ompt_task_info_t *task_info = __ompt_get_taskinfo(0);
-        ompt_callbacks.ompt_callback(ompt_event_loop_end)(
-            team_info->parallel_id, task_info->task_id);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_loop_end)) {
+    ompt_team_info_t *team_info = __ompt_get_teaminfo(0, NULL);
+    ompt_task_info_t *task_info = __ompt_get_taskinfo(0);
+    ompt_callbacks.ompt_callback(ompt_event_loop_end)(team_info->parallel_id,
+                                                      task_info->task_id);
+  }
 #endif
 
-    if ( __kmp_env_consistency_check )
-     __kmp_pop_workshare( global_tid, ct_pdo, loc );
+  if (__kmp_env_consistency_check)
+    __kmp_pop_workshare(global_tid, ct_pdo, loc);
 }
 
-/*
- * User routines which take C-style arguments (call by value)
- * different from the Fortran equivalent routines
- */
+// User routines which take C-style arguments (call by value)
+// different from the Fortran equivalent routines
 
-void
-ompc_set_num_threads( int arg )
-{
-// !!!!! TODO: check the per-task binding
-    __kmp_set_num_threads( arg, __kmp_entry_gtid() );
+void ompc_set_num_threads(int arg) {
+  // !!!!! TODO: check the per-task binding
+  __kmp_set_num_threads(arg, __kmp_entry_gtid());
 }
 
-void
-ompc_set_dynamic( int flag )
-{
-    kmp_info_t *thread;
+void ompc_set_dynamic(int flag) {
+  kmp_info_t *thread;
 
-    /* For the thread-private implementation of the internal controls */
-    thread = __kmp_entry_thread();
+  /* For the thread-private implementation of the internal controls */
+  thread = __kmp_entry_thread();
 
-    __kmp_save_internal_controls( thread );
+  __kmp_save_internal_controls(thread);
 
-    set__dynamic( thread, flag ? TRUE : FALSE );
+  set__dynamic(thread, flag ? TRUE : FALSE);
 }
 
-void
-ompc_set_nested( int flag )
-{
-    kmp_info_t *thread;
+void ompc_set_nested(int flag) {
+  kmp_info_t *thread;
 
-    /* For the thread-private internal controls implementation */
-    thread = __kmp_entry_thread();
+  /* For the thread-private internal controls implementation */
+  thread = __kmp_entry_thread();
 
-    __kmp_save_internal_controls( thread );
+  __kmp_save_internal_controls(thread);
 
-    set__nested( thread, flag ? TRUE : FALSE );
+  set__nested(thread, flag ? TRUE : FALSE);
 }
 
-void
-ompc_set_max_active_levels( int max_active_levels )
-{
-    /* TO DO */
-    /* we want per-task implementation of this internal control */
+void ompc_set_max_active_levels(int max_active_levels) {
+  /* TO DO */
+  /* we want per-task implementation of this internal control */
 
-    /* For the per-thread internal controls implementation */
-    __kmp_set_max_active_levels( __kmp_entry_gtid(), max_active_levels );
+  /* For the per-thread internal controls implementation */
+  __kmp_set_max_active_levels(__kmp_entry_gtid(), max_active_levels);
 }
 
-void
-ompc_set_schedule( omp_sched_t kind, int modifier )
-{
-// !!!!! TODO: check the per-task binding
-    __kmp_set_schedule( __kmp_entry_gtid(), ( kmp_sched_t ) kind, modifier );
+void ompc_set_schedule(omp_sched_t kind, int modifier) {
+  // !!!!! TODO: check the per-task binding
+  __kmp_set_schedule(__kmp_entry_gtid(), (kmp_sched_t)kind, modifier);
 }
 
-int
-ompc_get_ancestor_thread_num( int level )
-{
-    return __kmp_get_ancestor_thread_num( __kmp_entry_gtid(), level );
+int ompc_get_ancestor_thread_num(int level) {
+  return __kmp_get_ancestor_thread_num(__kmp_entry_gtid(), level);
 }
 
-int
-ompc_get_team_size( int level )
-{
-    return __kmp_get_team_size( __kmp_entry_gtid(), level );
+int ompc_get_team_size(int level) {
+  return __kmp_get_team_size(__kmp_entry_gtid(), level);
 }
 
-void
-kmpc_set_stacksize( int arg )
-{
-    // __kmp_aux_set_stacksize initializes the library if needed
-    __kmp_aux_set_stacksize( arg );
+void kmpc_set_stacksize(int arg) {
+  // __kmp_aux_set_stacksize initializes the library if needed
+  __kmp_aux_set_stacksize(arg);
 }
 
-void
-kmpc_set_stacksize_s( size_t arg )
-{
-    // __kmp_aux_set_stacksize initializes the library if needed
-    __kmp_aux_set_stacksize( arg );
+void kmpc_set_stacksize_s(size_t arg) {
+  // __kmp_aux_set_stacksize initializes the library if needed
+  __kmp_aux_set_stacksize(arg);
 }
 
-void
-kmpc_set_blocktime( int arg )
-{
-    int gtid, tid;
-    kmp_info_t *thread;
+void kmpc_set_blocktime(int arg) {
+  int gtid, tid;
+  kmp_info_t *thread;
 
-    gtid = __kmp_entry_gtid();
-    tid = __kmp_tid_from_gtid(gtid);
-    thread = __kmp_thread_from_gtid(gtid);
+  gtid = __kmp_entry_gtid();
+  tid = __kmp_tid_from_gtid(gtid);
+  thread = __kmp_thread_from_gtid(gtid);
 
-    __kmp_aux_set_blocktime( arg, thread, tid );
+  __kmp_aux_set_blocktime(arg, thread, tid);
 }
 
-void
-kmpc_set_library( int arg )
-{
-    // __kmp_user_set_library initializes the library if needed
-    __kmp_user_set_library( (enum library_type)arg );
+void kmpc_set_library(int arg) {
+  // __kmp_user_set_library initializes the library if needed
+  __kmp_user_set_library((enum library_type)arg);
 }
 
-void
-kmpc_set_defaults( char const * str )
-{
-    // __kmp_aux_set_defaults initializes the library if needed
-    __kmp_aux_set_defaults( str, KMP_STRLEN( str ) );
+void kmpc_set_defaults(char const *str) {
+  // __kmp_aux_set_defaults initializes the library if needed
+  __kmp_aux_set_defaults(str, KMP_STRLEN(str));
 }
 
-void
-kmpc_set_disp_num_buffers( int arg )
-{
-    // ignore after initialization because some teams have already
-    // allocated dispatch buffers
-    if( __kmp_init_serial == 0 && arg > 0 )
-        __kmp_dispatch_num_buffers = arg;
+void kmpc_set_disp_num_buffers(int arg) {
+  // ignore after initialization because some teams have already
+  // allocated dispatch buffers
+  if (__kmp_init_serial == 0 && arg > 0)
+    __kmp_dispatch_num_buffers = arg;
 }
 
-int
-kmpc_set_affinity_mask_proc( int proc, void **mask )
-{
+int kmpc_set_affinity_mask_proc(int proc, void **mask) {
 #if defined(KMP_STUB) || !KMP_AFFINITY_SUPPORTED
-    return -1;
+  return -1;
 #else
-    if ( ! TCR_4(__kmp_init_middle) ) {
-        __kmp_middle_initialize();
-    }
-    return __kmp_aux_set_affinity_mask_proc( proc, mask );
+  if (!TCR_4(__kmp_init_middle)) {
+    __kmp_middle_initialize();
+  }
+  return __kmp_aux_set_affinity_mask_proc(proc, mask);
 #endif
 }
 
-int
-kmpc_unset_affinity_mask_proc( int proc, void **mask )
-{
+int kmpc_unset_affinity_mask_proc(int proc, void **mask) {
 #if defined(KMP_STUB) || !KMP_AFFINITY_SUPPORTED
-    return -1;
+  return -1;
 #else
-    if ( ! TCR_4(__kmp_init_middle) ) {
-        __kmp_middle_initialize();
-    }
-    return __kmp_aux_unset_affinity_mask_proc( proc, mask );
+  if (!TCR_4(__kmp_init_middle)) {
+    __kmp_middle_initialize();
+  }
+  return __kmp_aux_unset_affinity_mask_proc(proc, mask);
 #endif
 }
 
-int
-kmpc_get_affinity_mask_proc( int proc, void **mask )
-{
+int kmpc_get_affinity_mask_proc(int proc, void **mask) {
 #if defined(KMP_STUB) || !KMP_AFFINITY_SUPPORTED
-    return -1;
+  return -1;
 #else
-    if ( ! TCR_4(__kmp_init_middle) ) {
-        __kmp_middle_initialize();
-    }
-    return __kmp_aux_get_affinity_mask_proc( proc, mask );
+  if (!TCR_4(__kmp_init_middle)) {
+    __kmp_middle_initialize();
+  }
+  return __kmp_aux_get_affinity_mask_proc(proc, mask);
 #endif
 }
 
-
 /* -------------------------------------------------------------------------- */
 /*!
 @ingroup THREADPRIVATE
@@ -1726,29 +1654,33 @@ kmpc_get_affinity_mask_proc( int proc, v
 @param cpy_func  helper function to call for copying data
 @param didit     flag variable: 1=single thread; 0=not single thread
 
-__kmpc_copyprivate implements the interface for the private data broadcast needed for
-the copyprivate clause associated with a single region in an OpenMP<sup>*</sup> program (both C and Fortran).
+__kmpc_copyprivate implements the interface for the private data broadcast
+needed for the copyprivate clause associated with a single region in an
+OpenMP<sup>*</sup> program (both C and Fortran).
 All threads participating in the parallel region call this routine.
-One of the threads (called the single thread) should have the <tt>didit</tt> variable set to 1
-and all other threads should have that variable set to 0.
+One of the threads (called the single thread) should have the <tt>didit</tt>
+variable set to 1 and all other threads should have that variable set to 0.
 All threads pass a pointer to a data buffer (cpy_data) that they have built.
 
-The OpenMP specification forbids the use of nowait on the single region when a copyprivate
-clause is present. However, @ref __kmpc_copyprivate implements a barrier internally to avoid
-race conditions, so the code generation for the single region should avoid generating a barrier
-after the call to @ref __kmpc_copyprivate.
+The OpenMP specification forbids the use of nowait on the single region when a
+copyprivate clause is present. However, @ref __kmpc_copyprivate implements a
+barrier internally to avoid race conditions, so the code generation for the
+single region should avoid generating a barrier after the call to @ref
+__kmpc_copyprivate.
 
 The <tt>gtid</tt> parameter is the global thread id for the current thread.
 The <tt>loc</tt> parameter is a pointer to source location information.
 
-Internal implementation: The single thread will first copy its descriptor address (cpy_data)
-to a team-private location, then the other threads will each call the function pointed to by
-the parameter cpy_func, which carries out the copy by copying the data using the cpy_data buffer.
-
-The cpy_func routine used for the copy and the contents of the data area defined by cpy_data
-and cpy_size may be built in any fashion that will allow the copy to be done. For instance,
-the cpy_data buffer can hold the actual data to be copied or it may hold a list of pointers
-to the data. The cpy_func routine must interpret the cpy_data buffer appropriately.
+Internal implementation: The single thread will first copy its descriptor
+address (cpy_data) to a team-private location, then the other threads will each
+call the function pointed to by the parameter cpy_func, which carries out the
+copy by copying the data using the cpy_data buffer.
+
+The cpy_func routine used for the copy and the contents of the data area defined
+by cpy_data and cpy_size may be built in any fashion that will allow the copy
+to be done. For instance, the cpy_data buffer can hold the actual data to be
+copied or it may hold a list of pointers to the data. The cpy_func routine must
+interpret the cpy_data buffer appropriately.
 
 The interface to cpy_func is as follows:
 @code
@@ -1757,891 +1689,886 @@ void cpy_func( void *destination, void *
 where void *destination is the cpy_data pointer for the thread being copied to
 and void *source is the cpy_data pointer for the thread being copied from.
 */
-void
-__kmpc_copyprivate( ident_t *loc, kmp_int32 gtid, size_t cpy_size, void *cpy_data, void(*cpy_func)(void*,void*), kmp_int32 didit )
-{
-    void **data_ptr;
+void __kmpc_copyprivate(ident_t *loc, kmp_int32 gtid, size_t cpy_size,
+                        void *cpy_data, void (*cpy_func)(void *, void *),
+                        kmp_int32 didit) {
+  void **data_ptr;
 
-    KC_TRACE( 10, ("__kmpc_copyprivate: called T#%d\n", gtid ));
+  KC_TRACE(10, ("__kmpc_copyprivate: called T#%d\n", gtid));
 
-    KMP_MB();
+  KMP_MB();
 
-    data_ptr = & __kmp_team_from_gtid( gtid )->t.t_copypriv_data;
+  data_ptr = &__kmp_team_from_gtid(gtid)->t.t_copypriv_data;
 
-    if ( __kmp_env_consistency_check ) {
-        if ( loc == 0 ) {
-            KMP_WARNING( ConstructIdentInvalid );
-        }
+  if (__kmp_env_consistency_check) {
+    if (loc == 0) {
+      KMP_WARNING(ConstructIdentInvalid);
     }
+  }
 
-    /* ToDo: Optimize the following two barriers into some kind of split barrier */
+  // ToDo: Optimize the following two barriers into some kind of split barrier
 
-    if (didit) *data_ptr = cpy_data;
+  if (didit)
+    *data_ptr = cpy_data;
 
-    /* This barrier is not a barrier region boundary */
+/* This barrier is not a barrier region boundary */
 #if USE_ITT_NOTIFY
-    __kmp_threads[gtid]->th.th_ident = loc;
+  __kmp_threads[gtid]->th.th_ident = loc;
 #endif
-    __kmp_barrier( bs_plain_barrier, gtid, FALSE , 0, NULL, NULL );
+  __kmp_barrier(bs_plain_barrier, gtid, FALSE, 0, NULL, NULL);
 
-    if (! didit) (*cpy_func)( cpy_data, *data_ptr );
+  if (!didit)
+    (*cpy_func)(cpy_data, *data_ptr);
 
-    /* Consider next barrier the user-visible barrier for barrier region boundaries */
-    /* Nesting checks are already handled by the single construct checks */
+// Consider next barrier a user-visible barrier for barrier region boundaries
+// Nesting checks are already handled by the single construct checks
 
 #if USE_ITT_NOTIFY
-    __kmp_threads[gtid]->th.th_ident = loc; // TODO: check if it is needed (e.g. tasks can overwrite the location)
+  __kmp_threads[gtid]->th.th_ident = loc; // TODO: check if it is needed (e.g.
+// tasks can overwrite the location)
 #endif
-    __kmp_barrier( bs_plain_barrier, gtid, FALSE , 0, NULL, NULL );
+  __kmp_barrier(bs_plain_barrier, gtid, FALSE, 0, NULL, NULL);
 }
 
 /* -------------------------------------------------------------------------- */
 
-#define INIT_LOCK                 __kmp_init_user_lock_with_checks
-#define INIT_NESTED_LOCK          __kmp_init_nested_user_lock_with_checks
-#define ACQUIRE_LOCK              __kmp_acquire_user_lock_with_checks
-#define ACQUIRE_LOCK_TIMED        __kmp_acquire_user_lock_with_checks_timed
-#define ACQUIRE_NESTED_LOCK       __kmp_acquire_nested_user_lock_with_checks
-#define ACQUIRE_NESTED_LOCK_TIMED __kmp_acquire_nested_user_lock_with_checks_timed
-#define RELEASE_LOCK              __kmp_release_user_lock_with_checks
-#define RELEASE_NESTED_LOCK       __kmp_release_nested_user_lock_with_checks
-#define TEST_LOCK                 __kmp_test_user_lock_with_checks
-#define TEST_NESTED_LOCK          __kmp_test_nested_user_lock_with_checks
-#define DESTROY_LOCK              __kmp_destroy_user_lock_with_checks
-#define DESTROY_NESTED_LOCK       __kmp_destroy_nested_user_lock_with_checks
-
-
-/*
- * TODO: Make check abort messages use location info & pass it
- * into with_checks routines
- */
+#define INIT_LOCK __kmp_init_user_lock_with_checks
+#define INIT_NESTED_LOCK __kmp_init_nested_user_lock_with_checks
+#define ACQUIRE_LOCK __kmp_acquire_user_lock_with_checks
+#define ACQUIRE_LOCK_TIMED __kmp_acquire_user_lock_with_checks_timed
+#define ACQUIRE_NESTED_LOCK __kmp_acquire_nested_user_lock_with_checks
+#define ACQUIRE_NESTED_LOCK_TIMED                                              \
+  __kmp_acquire_nested_user_lock_with_checks_timed
+#define RELEASE_LOCK __kmp_release_user_lock_with_checks
+#define RELEASE_NESTED_LOCK __kmp_release_nested_user_lock_with_checks
+#define TEST_LOCK __kmp_test_user_lock_with_checks
+#define TEST_NESTED_LOCK __kmp_test_nested_user_lock_with_checks
+#define DESTROY_LOCK __kmp_destroy_user_lock_with_checks
+#define DESTROY_NESTED_LOCK __kmp_destroy_nested_user_lock_with_checks
+
+// TODO: Make check abort messages use location info & pass it into
+// with_checks routines
 
 #if KMP_USE_DYNAMIC_LOCK
 
 // internal lock initializer
-static __forceinline void
-__kmp_init_lock_with_hint(ident_t *loc, void **lock, kmp_dyna_lockseq_t seq)
-{
-    if (KMP_IS_D_LOCK(seq)) {
-        KMP_INIT_D_LOCK(lock, seq);
+static __forceinline void __kmp_init_lock_with_hint(ident_t *loc, void **lock,
+                                                    kmp_dyna_lockseq_t seq) {
+  if (KMP_IS_D_LOCK(seq)) {
+    KMP_INIT_D_LOCK(lock, seq);
 #if USE_ITT_BUILD
-        __kmp_itt_lock_creating((kmp_user_lock_p)lock, NULL);
+    __kmp_itt_lock_creating((kmp_user_lock_p)lock, NULL);
 #endif
-    } else {
-        KMP_INIT_I_LOCK(lock, seq);
+  } else {
+    KMP_INIT_I_LOCK(lock, seq);
 #if USE_ITT_BUILD
-        kmp_indirect_lock_t *ilk = KMP_LOOKUP_I_LOCK(lock);
-        __kmp_itt_lock_creating(ilk->lock, loc);
+    kmp_indirect_lock_t *ilk = KMP_LOOKUP_I_LOCK(lock);
+    __kmp_itt_lock_creating(ilk->lock, loc);
 #endif
-    }
+  }
 }
 
 // internal nest lock initializer
 static __forceinline void
-__kmp_init_nest_lock_with_hint(ident_t *loc, void **lock, kmp_dyna_lockseq_t seq)
-{
+__kmp_init_nest_lock_with_hint(ident_t *loc, void **lock,
+                               kmp_dyna_lockseq_t seq) {
 #if KMP_USE_TSX
-    // Don't have nested lock implementation for speculative locks
-    if (seq == lockseq_hle || seq == lockseq_rtm || seq == lockseq_adaptive)
-        seq = __kmp_user_lock_seq;
-#endif
-    switch (seq) {
-        case lockseq_tas:
-            seq = lockseq_nested_tas;
-            break;
+  // Don't have nested lock implementation for speculative locks
+  if (seq == lockseq_hle || seq == lockseq_rtm || seq == lockseq_adaptive)
+    seq = __kmp_user_lock_seq;
+#endif
+  switch (seq) {
+  case lockseq_tas:
+    seq = lockseq_nested_tas;
+    break;
 #if KMP_USE_FUTEX
-        case lockseq_futex:
-            seq = lockseq_nested_futex;
-            break;
-#endif
-        case lockseq_ticket:
-            seq = lockseq_nested_ticket;
-            break;
-        case lockseq_queuing:
-            seq = lockseq_nested_queuing;
-            break;
-        case lockseq_drdpa:
-            seq = lockseq_nested_drdpa;
-            break;
-        default:
-            seq = lockseq_nested_queuing;
-    }
-    KMP_INIT_I_LOCK(lock, seq);
+  case lockseq_futex:
+    seq = lockseq_nested_futex;
+    break;
+#endif
+  case lockseq_ticket:
+    seq = lockseq_nested_ticket;
+    break;
+  case lockseq_queuing:
+    seq = lockseq_nested_queuing;
+    break;
+  case lockseq_drdpa:
+    seq = lockseq_nested_drdpa;
+    break;
+  default:
+    seq = lockseq_nested_queuing;
+  }
+  KMP_INIT_I_LOCK(lock, seq);
 #if USE_ITT_BUILD
-    kmp_indirect_lock_t *ilk = KMP_LOOKUP_I_LOCK(lock);
-    __kmp_itt_lock_creating(ilk->lock, loc);
+  kmp_indirect_lock_t *ilk = KMP_LOOKUP_I_LOCK(lock);
+  __kmp_itt_lock_creating(ilk->lock, loc);
 #endif
 }
 
 /* initialize the lock with a hint */
-void
-__kmpc_init_lock_with_hint(ident_t *loc, kmp_int32 gtid, void **user_lock, uintptr_t hint)
-{
-    KMP_DEBUG_ASSERT(__kmp_init_serial);
-    if (__kmp_env_consistency_check && user_lock == NULL) {
-        KMP_FATAL(LockIsUninitialized, "omp_init_lock_with_hint");
-    }
+void __kmpc_init_lock_with_hint(ident_t *loc, kmp_int32 gtid, void **user_lock,
+                                uintptr_t hint) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+  if (__kmp_env_consistency_check && user_lock == NULL) {
+    KMP_FATAL(LockIsUninitialized, "omp_init_lock_with_hint");
+  }
 
-    __kmp_init_lock_with_hint(loc, user_lock, __kmp_map_hint_to_lock(hint));
+  __kmp_init_lock_with_hint(loc, user_lock, __kmp_map_hint_to_lock(hint));
 }
 
 /* initialize the lock with a hint */
-void
-__kmpc_init_nest_lock_with_hint(ident_t *loc, kmp_int32 gtid, void **user_lock, uintptr_t hint)
-{
-    KMP_DEBUG_ASSERT(__kmp_init_serial);
-    if (__kmp_env_consistency_check && user_lock == NULL) {
-        KMP_FATAL(LockIsUninitialized, "omp_init_nest_lock_with_hint");
-    }
+void __kmpc_init_nest_lock_with_hint(ident_t *loc, kmp_int32 gtid,
+                                     void **user_lock, uintptr_t hint) {
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+  if (__kmp_env_consistency_check && user_lock == NULL) {
+    KMP_FATAL(LockIsUninitialized, "omp_init_nest_lock_with_hint");
+  }
 
-    __kmp_init_nest_lock_with_hint(loc, user_lock, __kmp_map_hint_to_lock(hint));
+  __kmp_init_nest_lock_with_hint(loc, user_lock, __kmp_map_hint_to_lock(hint));
 }
 
 #endif // KMP_USE_DYNAMIC_LOCK
 
 /* initialize the lock */
-void
-__kmpc_init_lock( ident_t * loc, kmp_int32 gtid,  void ** user_lock ) {
+void __kmpc_init_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
-    KMP_DEBUG_ASSERT(__kmp_init_serial);
-    if (__kmp_env_consistency_check && user_lock == NULL) {
-        KMP_FATAL(LockIsUninitialized, "omp_init_lock");
-    }
-    __kmp_init_lock_with_hint(loc, user_lock, __kmp_user_lock_seq);
 
-#else // KMP_USE_DYNAMIC_LOCK
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+  if (__kmp_env_consistency_check && user_lock == NULL) {
+    KMP_FATAL(LockIsUninitialized, "omp_init_lock");
+  }
+  __kmp_init_lock_with_hint(loc, user_lock, __kmp_user_lock_seq);
 
-    static char const * const func = "omp_init_lock";
-    kmp_user_lock_p lck;
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
+#else // KMP_USE_DYNAMIC_LOCK
 
-    if ( __kmp_env_consistency_check ) {
-        if ( user_lock == NULL ) {
-            KMP_FATAL( LockIsUninitialized, func );
-        }
+  static char const *const func = "omp_init_lock";
+  kmp_user_lock_p lck;
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+
+  if (__kmp_env_consistency_check) {
+    if (user_lock == NULL) {
+      KMP_FATAL(LockIsUninitialized, func);
     }
+  }
 
-    KMP_CHECK_USER_LOCK_INIT();
+  KMP_CHECK_USER_LOCK_INIT();
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_user_lock_allocate( user_lock, gtid, 0 );
-    }
-    INIT_LOCK( lck );
-    __kmp_set_user_lock_location( lck, loc );
+  else {
+    lck = __kmp_user_lock_allocate(user_lock, gtid, 0);
+  }
+  INIT_LOCK(lck);
+  __kmp_set_user_lock_location(lck, loc);
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_init_lock)) {
-        ompt_callbacks.ompt_callback(ompt_event_init_lock)((uint64_t) lck);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_init_lock)) {
+    ompt_callbacks.ompt_callback(ompt_event_init_lock)((uint64_t)lck);
+  }
 #endif
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_creating( lck );
+  __kmp_itt_lock_creating(lck);
 #endif /* USE_ITT_BUILD */
 
 #endif // KMP_USE_DYNAMIC_LOCK
 } // __kmpc_init_lock
 
 /* initialize the lock */
-void
-__kmpc_init_nest_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
+void __kmpc_init_nest_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
 
-    KMP_DEBUG_ASSERT(__kmp_init_serial);
-    if (__kmp_env_consistency_check && user_lock == NULL) {
-        KMP_FATAL(LockIsUninitialized, "omp_init_nest_lock");
-    }
-    __kmp_init_nest_lock_with_hint(loc, user_lock, __kmp_user_lock_seq);
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+  if (__kmp_env_consistency_check && user_lock == NULL) {
+    KMP_FATAL(LockIsUninitialized, "omp_init_nest_lock");
+  }
+  __kmp_init_nest_lock_with_hint(loc, user_lock, __kmp_user_lock_seq);
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    static char const * const func = "omp_init_nest_lock";
-    kmp_user_lock_p lck;
-    KMP_DEBUG_ASSERT( __kmp_init_serial );
-
-    if ( __kmp_env_consistency_check ) {
-        if ( user_lock == NULL ) {
-            KMP_FATAL( LockIsUninitialized, func );
-        }
+  static char const *const func = "omp_init_nest_lock";
+  kmp_user_lock_p lck;
+  KMP_DEBUG_ASSERT(__kmp_init_serial);
+
+  if (__kmp_env_consistency_check) {
+    if (user_lock == NULL) {
+      KMP_FATAL(LockIsUninitialized, func);
     }
+  }
 
-    KMP_CHECK_USER_LOCK_INIT();
+  KMP_CHECK_USER_LOCK_INIT();
 
-    if ( ( __kmp_user_lock_kind == lk_tas ) && ( sizeof( lck->tas.lk.poll )
-      + sizeof( lck->tas.lk.depth_locked ) <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) + sizeof(lck->tas.lk.depth_locked) <=
+       OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-     && ( sizeof( lck->futex.lk.poll ) + sizeof( lck->futex.lk.depth_locked )
-     <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) + sizeof(lck->futex.lk.depth_locked) <=
+            OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_user_lock_allocate( user_lock, gtid, 0 );
-    }
+  else {
+    lck = __kmp_user_lock_allocate(user_lock, gtid, 0);
+  }
 
-    INIT_NESTED_LOCK( lck );
-    __kmp_set_user_lock_location( lck, loc );
+  INIT_NESTED_LOCK(lck);
+  __kmp_set_user_lock_location(lck, loc);
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_init_nest_lock)) {
-        ompt_callbacks.ompt_callback(ompt_event_init_nest_lock)((uint64_t) lck);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_init_nest_lock)) {
+    ompt_callbacks.ompt_callback(ompt_event_init_nest_lock)((uint64_t)lck);
+  }
 #endif
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_creating( lck );
+  __kmp_itt_lock_creating(lck);
 #endif /* USE_ITT_BUILD */
 
 #endif // KMP_USE_DYNAMIC_LOCK
 } // __kmpc_init_nest_lock
 
-void
-__kmpc_destroy_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
+void __kmpc_destroy_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
 
-# if USE_ITT_BUILD
-    kmp_user_lock_p lck;
-    if (KMP_EXTRACT_D_TAG(user_lock) == 0) {
-        lck = ((kmp_indirect_lock_t *)KMP_LOOKUP_I_LOCK(user_lock))->lock;
-    } else {
-        lck = (kmp_user_lock_p)user_lock;
-    }
-    __kmp_itt_lock_destroyed(lck);
-# endif
-    KMP_D_LOCK_FUNC(user_lock, destroy)((kmp_dyna_lock_t *)user_lock);
+#if USE_ITT_BUILD
+  kmp_user_lock_p lck;
+  if (KMP_EXTRACT_D_TAG(user_lock) == 0) {
+    lck = ((kmp_indirect_lock_t *)KMP_LOOKUP_I_LOCK(user_lock))->lock;
+  } else {
+    lck = (kmp_user_lock_p)user_lock;
+  }
+  __kmp_itt_lock_destroyed(lck);
+#endif
+  KMP_D_LOCK_FUNC(user_lock, destroy)((kmp_dyna_lock_t *)user_lock);
 #else
-    kmp_user_lock_p lck;
+  kmp_user_lock_p lck;
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_destroy_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_destroy_lock");
+  }
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_destroy_lock)) {
-        ompt_callbacks.ompt_callback(ompt_event_destroy_lock)((uint64_t) lck);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_destroy_lock)) {
+    ompt_callbacks.ompt_callback(ompt_event_destroy_lock)((uint64_t)lck);
+  }
 #endif
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_destroyed( lck );
+  __kmp_itt_lock_destroyed(lck);
 #endif /* USE_ITT_BUILD */
-    DESTROY_LOCK( lck );
+  DESTROY_LOCK(lck);
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        ;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    ;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        ;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    ;
+  }
 #endif
-    else {
-        __kmp_user_lock_free( user_lock, gtid, lck );
-    }
+  else {
+    __kmp_user_lock_free(user_lock, gtid, lck);
+  }
 #endif // KMP_USE_DYNAMIC_LOCK
 } // __kmpc_destroy_lock
 
 /* destroy the lock */
-void
-__kmpc_destroy_nest_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
+void __kmpc_destroy_nest_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
 
-# if USE_ITT_BUILD
-    kmp_indirect_lock_t *ilk = KMP_LOOKUP_I_LOCK(user_lock);
-    __kmp_itt_lock_destroyed(ilk->lock);
-# endif
-    KMP_D_LOCK_FUNC(user_lock, destroy)((kmp_dyna_lock_t *)user_lock);
+#if USE_ITT_BUILD
+  kmp_indirect_lock_t *ilk = KMP_LOOKUP_I_LOCK(user_lock);
+  __kmp_itt_lock_destroyed(ilk->lock);
+#endif
+  KMP_D_LOCK_FUNC(user_lock, destroy)((kmp_dyna_lock_t *)user_lock);
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    kmp_user_lock_p lck;
+  kmp_user_lock_p lck;
 
-    if ( ( __kmp_user_lock_kind == lk_tas ) && ( sizeof( lck->tas.lk.poll )
-      + sizeof( lck->tas.lk.depth_locked ) <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) + sizeof(lck->tas.lk.depth_locked) <=
+       OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-     && ( sizeof( lck->futex.lk.poll ) + sizeof( lck->futex.lk.depth_locked )
-     <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) + sizeof(lck->futex.lk.depth_locked) <=
+            OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_destroy_nest_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_destroy_nest_lock");
+  }
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_destroy_nest_lock)) {
-        ompt_callbacks.ompt_callback(ompt_event_destroy_nest_lock)((uint64_t) lck);
-    }
+  if (ompt_enabled &&
+      ompt_callbacks.ompt_callback(ompt_event_destroy_nest_lock)) {
+    ompt_callbacks.ompt_callback(ompt_event_destroy_nest_lock)((uint64_t)lck);
+  }
 #endif
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_destroyed( lck );
+  __kmp_itt_lock_destroyed(lck);
 #endif /* USE_ITT_BUILD */
 
-    DESTROY_NESTED_LOCK( lck );
+  DESTROY_NESTED_LOCK(lck);
 
-    if ( ( __kmp_user_lock_kind == lk_tas ) && ( sizeof( lck->tas.lk.poll )
-     + sizeof( lck->tas.lk.depth_locked ) <= OMP_NEST_LOCK_T_SIZE ) ) {
-        ;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) + sizeof(lck->tas.lk.depth_locked) <=
+       OMP_NEST_LOCK_T_SIZE)) {
+    ;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-     && ( sizeof( lck->futex.lk.poll ) + sizeof( lck->futex.lk.depth_locked )
-     <= OMP_NEST_LOCK_T_SIZE ) ) {
-        ;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) + sizeof(lck->futex.lk.depth_locked) <=
+            OMP_NEST_LOCK_T_SIZE)) {
+    ;
+  }
 #endif
-    else {
-        __kmp_user_lock_free( user_lock, gtid, lck );
-    }
+  else {
+    __kmp_user_lock_free(user_lock, gtid, lck);
+  }
 #endif // KMP_USE_DYNAMIC_LOCK
 } // __kmpc_destroy_nest_lock
 
-void
-__kmpc_set_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
-    KMP_COUNT_BLOCK(OMP_set_lock);
+void __kmpc_set_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
+  KMP_COUNT_BLOCK(OMP_set_lock);
 #if KMP_USE_DYNAMIC_LOCK
-    int tag = KMP_EXTRACT_D_TAG(user_lock);
-# if USE_ITT_BUILD
-   __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock); // itt function will get to the right lock object.
-# endif
-# if KMP_USE_INLINED_TAS
-    if (tag == locktag_tas && !__kmp_env_consistency_check) {
-        KMP_ACQUIRE_TAS_LOCK(user_lock, gtid);
-    } else
-# elif KMP_USE_INLINED_FUTEX
-    if (tag == locktag_futex && !__kmp_env_consistency_check) {
-        KMP_ACQUIRE_FUTEX_LOCK(user_lock, gtid);
-    } else
-# endif
-    {
-        __kmp_direct_set[tag]((kmp_dyna_lock_t *)user_lock, gtid);
-    }
-# if USE_ITT_BUILD
-    __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
-# endif
+  int tag = KMP_EXTRACT_D_TAG(user_lock);
+#if USE_ITT_BUILD
+  __kmp_itt_lock_acquiring(
+      (kmp_user_lock_p)
+          user_lock); // itt function will get to the right lock object.
+#endif
+#if KMP_USE_INLINED_TAS
+  if (tag == locktag_tas && !__kmp_env_consistency_check) {
+    KMP_ACQUIRE_TAS_LOCK(user_lock, gtid);
+  } else
+#elif KMP_USE_INLINED_FUTEX
+  if (tag == locktag_futex && !__kmp_env_consistency_check) {
+    KMP_ACQUIRE_FUTEX_LOCK(user_lock, gtid);
+  } else
+#endif
+  {
+    __kmp_direct_set[tag]((kmp_dyna_lock_t *)user_lock, gtid);
+  }
+#if USE_ITT_BUILD
+  __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
+#endif
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    kmp_user_lock_p lck;
+  kmp_user_lock_p lck;
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_set_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_set_lock");
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring( lck );
+  __kmp_itt_lock_acquiring(lck);
 #endif /* USE_ITT_BUILD */
 
-    ACQUIRE_LOCK( lck, gtid );
+  ACQUIRE_LOCK(lck, gtid);
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_acquired( lck );
+  __kmp_itt_lock_acquired(lck);
 #endif /* USE_ITT_BUILD */
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_acquired_lock)) {
-        ompt_callbacks.ompt_callback(ompt_event_acquired_lock)((uint64_t) lck);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_acquired_lock)) {
+    ompt_callbacks.ompt_callback(ompt_event_acquired_lock)((uint64_t)lck);
+  }
 #endif
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
-void
-__kmpc_set_nest_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
+void __kmpc_set_nest_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
 
-# if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock);
-# endif
-    KMP_D_LOCK_FUNC(user_lock, set)((kmp_dyna_lock_t *)user_lock, gtid);
-# if USE_ITT_BUILD
-    __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
+#if USE_ITT_BUILD
+  __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock);
+#endif
+  KMP_D_LOCK_FUNC(user_lock, set)((kmp_dyna_lock_t *)user_lock, gtid);
+#if USE_ITT_BUILD
+  __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
 #endif
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled) {
-        // missing support here: need to know whether acquired first or not
-    }
+  if (ompt_enabled) {
+    // missing support here: need to know whether acquired first or not
+  }
 #endif
 
 #else // KMP_USE_DYNAMIC_LOCK
-    int acquire_status;
-    kmp_user_lock_p lck;
+  int acquire_status;
+  kmp_user_lock_p lck;
 
-    if ( ( __kmp_user_lock_kind == lk_tas ) && ( sizeof( lck->tas.lk.poll )
-      + sizeof( lck->tas.lk.depth_locked ) <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) + sizeof(lck->tas.lk.depth_locked) <=
+       OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-     && ( sizeof( lck->futex.lk.poll ) + sizeof( lck->futex.lk.depth_locked )
-     <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) + sizeof(lck->futex.lk.depth_locked) <=
+            OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_set_nest_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_set_nest_lock");
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring( lck );
+  __kmp_itt_lock_acquiring(lck);
 #endif /* USE_ITT_BUILD */
 
-    ACQUIRE_NESTED_LOCK( lck, gtid, &acquire_status );
+  ACQUIRE_NESTED_LOCK(lck, gtid, &acquire_status);
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_acquired( lck );
+  __kmp_itt_lock_acquired(lck);
 #endif /* USE_ITT_BUILD */
 
 #if OMPT_SUPPORT && OMPT_TRACE
-    if (ompt_enabled) {
-        if (acquire_status == KMP_LOCK_ACQUIRED_FIRST) {
-           if(ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_first))
-              ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_first)((uint64_t) lck);
-        } else {
-           if(ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_next))
-              ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_next)((uint64_t) lck);
-        }
+  if (ompt_enabled) {
+    if (acquire_status == KMP_LOCK_ACQUIRED_FIRST) {
+      if (ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_first))
+        ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_first)(
+            (uint64_t)lck);
+    } else {
+      if (ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_next))
+        ompt_callbacks.ompt_callback(ompt_event_acquired_nest_lock_next)(
+            (uint64_t)lck);
     }
+  }
 #endif
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
-void
-__kmpc_unset_lock( ident_t *loc, kmp_int32 gtid, void **user_lock )
-{
+void __kmpc_unset_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
 
-    int tag = KMP_EXTRACT_D_TAG(user_lock);
-# if USE_ITT_BUILD
-    __kmp_itt_lock_releasing((kmp_user_lock_p)user_lock);
-# endif
-# if KMP_USE_INLINED_TAS
-    if (tag == locktag_tas && !__kmp_env_consistency_check) {
-        KMP_RELEASE_TAS_LOCK(user_lock, gtid);
-    } else
-# elif KMP_USE_INLINED_FUTEX
-    if (tag == locktag_futex && !__kmp_env_consistency_check) {
-        KMP_RELEASE_FUTEX_LOCK(user_lock, gtid);
-    } else
-# endif
-    {
-        __kmp_direct_unset[tag]((kmp_dyna_lock_t *)user_lock, gtid);
-    }
+  int tag = KMP_EXTRACT_D_TAG(user_lock);
+#if USE_ITT_BUILD
+  __kmp_itt_lock_releasing((kmp_user_lock_p)user_lock);
+#endif
+#if KMP_USE_INLINED_TAS
+  if (tag == locktag_tas && !__kmp_env_consistency_check) {
+    KMP_RELEASE_TAS_LOCK(user_lock, gtid);
+  } else
+#elif KMP_USE_INLINED_FUTEX
+  if (tag == locktag_futex && !__kmp_env_consistency_check) {
+    KMP_RELEASE_FUTEX_LOCK(user_lock, gtid);
+  } else
+#endif
+  {
+    __kmp_direct_unset[tag]((kmp_dyna_lock_t *)user_lock, gtid);
+  }
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    kmp_user_lock_p lck;
+  kmp_user_lock_p lck;
 
-    /* Can't use serial interval since not block structured */
-    /* release the lock */
+  /* Can't use serial interval since not block structured */
+  /* release the lock */
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-#if KMP_OS_LINUX && (KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64)
-        // "fast" path implemented to fix customer performance issue
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_LOCK_T_SIZE)) {
+#if KMP_OS_LINUX &&                                                            \
+    (KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64)
+// "fast" path implemented to fix customer performance issue
 #if USE_ITT_BUILD
-        __kmp_itt_lock_releasing( (kmp_user_lock_p)user_lock );
+    __kmp_itt_lock_releasing((kmp_user_lock_p)user_lock);
 #endif /* USE_ITT_BUILD */
-        TCW_4(((kmp_user_lock_p)user_lock)->tas.lk.poll, 0);
-        KMP_MB();
-        return;
+    TCW_4(((kmp_user_lock_p)user_lock)->tas.lk.poll, 0);
+    KMP_MB();
+    return;
 #else
-        lck = (kmp_user_lock_p)user_lock;
+    lck = (kmp_user_lock_p)user_lock;
 #endif
-    }
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_unset_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_unset_lock");
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_releasing( lck );
+  __kmp_itt_lock_releasing(lck);
 #endif /* USE_ITT_BUILD */
 
-    RELEASE_LOCK( lck, gtid );
+  RELEASE_LOCK(lck, gtid);
 
 #if OMPT_SUPPORT && OMPT_BLAME
-    if (ompt_enabled &&
-        ompt_callbacks.ompt_callback(ompt_event_release_lock)) {
-        ompt_callbacks.ompt_callback(ompt_event_release_lock)((uint64_t) lck);
-    }
+  if (ompt_enabled && ompt_callbacks.ompt_callback(ompt_event_release_lock)) {
+    ompt_callbacks.ompt_callback(ompt_event_release_lock)((uint64_t)lck);
+  }
 #endif
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
 /* release the lock */
-void
-__kmpc_unset_nest_lock( ident_t *loc, kmp_int32 gtid, void **user_lock )
-{
+void __kmpc_unset_nest_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
 
-# if USE_ITT_BUILD
-    __kmp_itt_lock_releasing((kmp_user_lock_p)user_lock);
-# endif
-    KMP_D_LOCK_FUNC(user_lock, unset)((kmp_dyna_lock_t *)user_lock, gtid);
+#if USE_ITT_BUILD
+  __kmp_itt_lock_releasing((kmp_user_lock_p)user_lock);
+#endif
+  KMP_D_LOCK_FUNC(user_lock, unset)((kmp_dyna_lock_t *)user_lock, gtid);
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    kmp_user_lock_p lck;
+  kmp_user_lock_p lck;
 
-    /* Can't use serial interval since not block structured */
+  /* Can't use serial interval since not block structured */
 
-    if ( ( __kmp_user_lock_kind == lk_tas ) && ( sizeof( lck->tas.lk.poll )
-      + sizeof( lck->tas.lk.depth_locked ) <= OMP_NEST_LOCK_T_SIZE ) ) {
-#if KMP_OS_LINUX && (KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64)
-        // "fast" path implemented to fix customer performance issue
-        kmp_tas_lock_t *tl = (kmp_tas_lock_t*)user_lock;
-#if USE_ITT_BUILD
-        __kmp_itt_lock_releasing( (kmp_user_lock_p)user_lock );
-#endif /* USE_ITT_BUILD */
-        if ( --(tl->lk.depth_locked) == 0 ) {
-            TCW_4(tl->lk.poll, 0);
-        }
-        KMP_MB();
-        return;
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) + sizeof(lck->tas.lk.depth_locked) <=
+       OMP_NEST_LOCK_T_SIZE)) {
+#if KMP_OS_LINUX &&                                                            \
+    (KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64)
+    // "fast" path implemented to fix customer performance issue
+    kmp_tas_lock_t *tl = (kmp_tas_lock_t *)user_lock;
+#if USE_ITT_BUILD
+    __kmp_itt_lock_releasing((kmp_user_lock_p)user_lock);
+#endif /* USE_ITT_BUILD */
+    if (--(tl->lk.depth_locked) == 0) {
+      TCW_4(tl->lk.poll, 0);
+    }
+    KMP_MB();
+    return;
 #else
-        lck = (kmp_user_lock_p)user_lock;
+    lck = (kmp_user_lock_p)user_lock;
 #endif
-    }
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-     && ( sizeof( lck->futex.lk.poll ) + sizeof( lck->futex.lk.depth_locked )
-     <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) + sizeof(lck->futex.lk.depth_locked) <=
+            OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_unset_nest_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_unset_nest_lock");
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_releasing( lck );
+  __kmp_itt_lock_releasing(lck);
 #endif /* USE_ITT_BUILD */
 
-    int release_status;
-    release_status = RELEASE_NESTED_LOCK( lck, gtid );
+  int release_status;
+  release_status = RELEASE_NESTED_LOCK(lck, gtid);
 #if OMPT_SUPPORT && OMPT_BLAME
-    if (ompt_enabled) {
-        if (release_status == KMP_LOCK_RELEASED) {
-            if (ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_last)) {
-                ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_last)(
-                    (uint64_t) lck);
-            }
-        } else if (ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_prev)) {
-            ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_prev)(
-                (uint64_t) lck);
-        }
+  if (ompt_enabled) {
+    if (release_status == KMP_LOCK_RELEASED) {
+      if (ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_last)) {
+        ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_last)(
+            (uint64_t)lck);
+      }
+    } else if (ompt_callbacks.ompt_callback(
+                   ompt_event_release_nest_lock_prev)) {
+      ompt_callbacks.ompt_callback(ompt_event_release_nest_lock_prev)(
+          (uint64_t)lck);
     }
+  }
 #endif
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
 /* try to acquire the lock */
-int
-__kmpc_test_lock( ident_t *loc, kmp_int32 gtid, void **user_lock )
-{
-    KMP_COUNT_BLOCK(OMP_test_lock);
+int __kmpc_test_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
+  KMP_COUNT_BLOCK(OMP_test_lock);
 
 #if KMP_USE_DYNAMIC_LOCK
-    int rc;
-    int tag = KMP_EXTRACT_D_TAG(user_lock);
-# if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock);
-# endif
-# if KMP_USE_INLINED_TAS
-    if (tag == locktag_tas && !__kmp_env_consistency_check) {
-        KMP_TEST_TAS_LOCK(user_lock, gtid, rc);
-    } else
-# elif KMP_USE_INLINED_FUTEX
-    if (tag == locktag_futex && !__kmp_env_consistency_check) {
-        KMP_TEST_FUTEX_LOCK(user_lock, gtid, rc);
-    } else
-# endif
-    {
-        rc = __kmp_direct_test[tag]((kmp_dyna_lock_t *)user_lock, gtid);
-    }
-    if (rc) {
-# if USE_ITT_BUILD
-        __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
-# endif
-        return FTN_TRUE;
-    } else {
-# if USE_ITT_BUILD
-        __kmp_itt_lock_cancelled((kmp_user_lock_p)user_lock);
-# endif
-        return FTN_FALSE;
-    }
+  int rc;
+  int tag = KMP_EXTRACT_D_TAG(user_lock);
+#if USE_ITT_BUILD
+  __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock);
+#endif
+#if KMP_USE_INLINED_TAS
+  if (tag == locktag_tas && !__kmp_env_consistency_check) {
+    KMP_TEST_TAS_LOCK(user_lock, gtid, rc);
+  } else
+#elif KMP_USE_INLINED_FUTEX
+  if (tag == locktag_futex && !__kmp_env_consistency_check) {
+    KMP_TEST_FUTEX_LOCK(user_lock, gtid, rc);
+  } else
+#endif
+  {
+    rc = __kmp_direct_test[tag]((kmp_dyna_lock_t *)user_lock, gtid);
+  }
+  if (rc) {
+#if USE_ITT_BUILD
+    __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
+#endif
+    return FTN_TRUE;
+  } else {
+#if USE_ITT_BUILD
+    __kmp_itt_lock_cancelled((kmp_user_lock_p)user_lock);
+#endif
+    return FTN_FALSE;
+  }
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    kmp_user_lock_p lck;
-    int          rc;
+  kmp_user_lock_p lck;
+  int rc;
 
-    if ( ( __kmp_user_lock_kind == lk_tas )
-      && ( sizeof( lck->tas.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-      && ( sizeof( lck->futex.lk.poll ) <= OMP_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) <= OMP_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_test_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_test_lock");
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring( lck );
+  __kmp_itt_lock_acquiring(lck);
 #endif /* USE_ITT_BUILD */
 
-    rc = TEST_LOCK( lck, gtid );
+  rc = TEST_LOCK(lck, gtid);
 #if USE_ITT_BUILD
-    if ( rc ) {
-        __kmp_itt_lock_acquired( lck );
-    } else {
-        __kmp_itt_lock_cancelled( lck );
-    }
+  if (rc) {
+    __kmp_itt_lock_acquired(lck);
+  } else {
+    __kmp_itt_lock_cancelled(lck);
+  }
 #endif /* USE_ITT_BUILD */
-    return ( rc ? FTN_TRUE : FTN_FALSE );
+  return (rc ? FTN_TRUE : FTN_FALSE);
 
-    /* Can't use serial interval since not block structured */
+/* Can't use serial interval since not block structured */
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
 /* try to acquire the lock */
-int
-__kmpc_test_nest_lock( ident_t *loc, kmp_int32 gtid, void **user_lock )
-{
+int __kmpc_test_nest_lock(ident_t *loc, kmp_int32 gtid, void **user_lock) {
 #if KMP_USE_DYNAMIC_LOCK
-    int rc;
-# if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock);
-# endif
-    rc = KMP_D_LOCK_FUNC(user_lock, test)((kmp_dyna_lock_t *)user_lock, gtid);
-# if USE_ITT_BUILD
-    if (rc) {
-        __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
-    } else {
-        __kmp_itt_lock_cancelled((kmp_user_lock_p)user_lock);
-    }
-# endif
-    return rc;
+  int rc;
+#if USE_ITT_BUILD
+  __kmp_itt_lock_acquiring((kmp_user_lock_p)user_lock);
+#endif
+  rc = KMP_D_LOCK_FUNC(user_lock, test)((kmp_dyna_lock_t *)user_lock, gtid);
+#if USE_ITT_BUILD
+  if (rc) {
+    __kmp_itt_lock_acquired((kmp_user_lock_p)user_lock);
+  } else {
+    __kmp_itt_lock_cancelled((kmp_user_lock_p)user_lock);
+  }
+#endif
+  return rc;
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    kmp_user_lock_p lck;
-    int          rc;
+  kmp_user_lock_p lck;
+  int rc;
 
-    if ( ( __kmp_user_lock_kind == lk_tas ) && ( sizeof( lck->tas.lk.poll )
-      + sizeof( lck->tas.lk.depth_locked ) <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  if ((__kmp_user_lock_kind == lk_tas) &&
+      (sizeof(lck->tas.lk.poll) + sizeof(lck->tas.lk.depth_locked) <=
+       OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #if KMP_USE_FUTEX
-    else if ( ( __kmp_user_lock_kind == lk_futex )
-     && ( sizeof( lck->futex.lk.poll ) + sizeof( lck->futex.lk.depth_locked )
-     <= OMP_NEST_LOCK_T_SIZE ) ) {
-        lck = (kmp_user_lock_p)user_lock;
-    }
+  else if ((__kmp_user_lock_kind == lk_futex) &&
+           (sizeof(lck->futex.lk.poll) + sizeof(lck->futex.lk.depth_locked) <=
+            OMP_NEST_LOCK_T_SIZE)) {
+    lck = (kmp_user_lock_p)user_lock;
+  }
 #endif
-    else {
-        lck = __kmp_lookup_user_lock( user_lock, "omp_test_nest_lock" );
-    }
+  else {
+    lck = __kmp_lookup_user_lock(user_lock, "omp_test_nest_lock");
+  }
 
 #if USE_ITT_BUILD
-    __kmp_itt_lock_acquiring( lck );
+  __kmp_itt_lock_acquiring(lck);
 #endif /* USE_ITT_BUILD */
 
-    rc = TEST_NESTED_LOCK( lck, gtid );
+  rc = TEST_NESTED_LOCK(lck, gtid);
 #if USE_ITT_BUILD
-    if ( rc ) {
-        __kmp_itt_lock_acquired( lck );
-    } else {
-        __kmp_itt_lock_cancelled( lck );
-    }
+  if (rc) {
+    __kmp_itt_lock_acquired(lck);
+  } else {
+    __kmp_itt_lock_cancelled(lck);
+  }
 #endif /* USE_ITT_BUILD */
-    return rc;
+  return rc;
 
-    /* Can't use serial interval since not block structured */
+/* Can't use serial interval since not block structured */
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
+// Interface to fast scalable reduce methods routines
 
-/*--------------------------------------------------------------------------------------------------------------------*/
-
-/*
- * Interface to fast scalable reduce methods routines
- */
-
-// keep the selected method in a thread local structure for cross-function usage: will be used in __kmpc_end_reduce* functions;
-// another solution: to re-determine the method one more time in __kmpc_end_reduce* functions (new prototype required then)
+// keep the selected method in a thread local structure for cross-function
+// usage: will be used in __kmpc_end_reduce* functions;
+// another solution: to re-determine the method one more time in
+// __kmpc_end_reduce* functions (new prototype required then)
 // AT: which solution is better?
-#define __KMP_SET_REDUCTION_METHOD(gtid,rmethod) \
-                   ( ( __kmp_threads[ ( gtid ) ] -> th.th_local.packed_reduction_method ) = ( rmethod ) )
+#define __KMP_SET_REDUCTION_METHOD(gtid, rmethod)                              \
+  ((__kmp_threads[(gtid)]->th.th_local.packed_reduction_method) = (rmethod))
 
-#define __KMP_GET_REDUCTION_METHOD(gtid) \
-                   ( __kmp_threads[ ( gtid ) ] -> th.th_local.packed_reduction_method )
-
-// description of the packed_reduction_method variable: look at the macros in kmp.h
+#define __KMP_GET_REDUCTION_METHOD(gtid)                                       \
+  (__kmp_threads[(gtid)]->th.th_local.packed_reduction_method)
 
+// description of the packed_reduction_method variable: look at the macros in
+// kmp.h
 
 // used in a critical section reduce block
 static __forceinline void
-__kmp_enter_critical_section_reduce_block( ident_t * loc, kmp_int32 global_tid, kmp_critical_name * crit ) {
+__kmp_enter_critical_section_reduce_block(ident_t *loc, kmp_int32 global_tid,
+                                          kmp_critical_name *crit) {
 
-    // this lock was visible to a customer and to the threading profile tool as a serial overhead span
-    //            (although it's used for an internal purpose only)
-    //            why was it visible in previous implementation?
-    //            should we keep it visible in new reduce block?
-    kmp_user_lock_p lck;
+  // this lock was visible to a customer and to the threading profile tool as a
+  // serial overhead span (although it's used for an internal purpose only)
+  //            why was it visible in previous implementation?
+  //            should we keep it visible in new reduce block?
+  kmp_user_lock_p lck;
 
 #if KMP_USE_DYNAMIC_LOCK
 
-    kmp_dyna_lock_t *lk = (kmp_dyna_lock_t *)crit;
-    // Check if it is initialized.
-    if (*lk == 0) {
-        if (KMP_IS_D_LOCK(__kmp_user_lock_seq)) {
-            KMP_COMPARE_AND_STORE_ACQ32((volatile kmp_int32 *)crit, 0, KMP_GET_D_TAG(__kmp_user_lock_seq));
-        } else {
-            __kmp_init_indirect_csptr(crit, loc, global_tid, KMP_GET_I_TAG(__kmp_user_lock_seq));
-        }
-    }
-    // Branch for accessing the actual lock object and set operation. This branching is inevitable since
-    // this lock initialization does not follow the normal dispatch path (lock table is not used).
-    if (KMP_EXTRACT_D_TAG(lk) != 0) {
-        lck = (kmp_user_lock_p)lk;
-        KMP_DEBUG_ASSERT(lck != NULL);
-        if (__kmp_env_consistency_check) {
-            __kmp_push_sync(global_tid, ct_critical, loc, lck, __kmp_user_lock_seq);
-        }
-        KMP_D_LOCK_FUNC(lk, set)(lk, global_tid);
+  kmp_dyna_lock_t *lk = (kmp_dyna_lock_t *)crit;
+  // Check if it is initialized.
+  if (*lk == 0) {
+    if (KMP_IS_D_LOCK(__kmp_user_lock_seq)) {
+      KMP_COMPARE_AND_STORE_ACQ32((volatile kmp_int32 *)crit, 0,
+                                  KMP_GET_D_TAG(__kmp_user_lock_seq));
     } else {
-        kmp_indirect_lock_t *ilk = *((kmp_indirect_lock_t **)lk);
-        lck = ilk->lock;
-        KMP_DEBUG_ASSERT(lck != NULL);
-        if (__kmp_env_consistency_check) {
-            __kmp_push_sync(global_tid, ct_critical, loc, lck, __kmp_user_lock_seq);
-        }
-        KMP_I_LOCK_FUNC(ilk, set)(lck, global_tid);
+      __kmp_init_indirect_csptr(crit, loc, global_tid,
+                                KMP_GET_I_TAG(__kmp_user_lock_seq));
+    }
+  }
+  // Branch for accessing the actual lock object and set operation. This
+  // branching is inevitable since this lock initialization does not follow the
+  // normal dispatch path (lock table is not used).
+  if (KMP_EXTRACT_D_TAG(lk) != 0) {
+    lck = (kmp_user_lock_p)lk;
+    KMP_DEBUG_ASSERT(lck != NULL);
+    if (__kmp_env_consistency_check) {
+      __kmp_push_sync(global_tid, ct_critical, loc, lck, __kmp_user_lock_seq);
+    }
+    KMP_D_LOCK_FUNC(lk, set)(lk, global_tid);
+  } else {
+    kmp_indirect_lock_t *ilk = *((kmp_indirect_lock_t **)lk);
+    lck = ilk->lock;
+    KMP_DEBUG_ASSERT(lck != NULL);
+    if (__kmp_env_consistency_check) {
+      __kmp_push_sync(global_tid, ct_critical, loc, lck, __kmp_user_lock_seq);
     }
+    KMP_I_LOCK_FUNC(ilk, set)(lck, global_tid);
+  }
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    // We know that the fast reduction code is only emitted by Intel compilers
-    // with 32 byte critical sections. If there isn't enough space, then we
-    // have to use a pointer.
-    if ( __kmp_base_user_lock_size <= INTEL_CRITICAL_SIZE ) {
-        lck = (kmp_user_lock_p)crit;
-    }
-    else {
-        lck = __kmp_get_critical_section_ptr( crit, loc, global_tid );
-    }
-    KMP_DEBUG_ASSERT( lck != NULL );
+  // We know that the fast reduction code is only emitted by Intel compilers
+  // with 32 byte critical sections. If there isn't enough space, then we
+  // have to use a pointer.
+  if (__kmp_base_user_lock_size <= INTEL_CRITICAL_SIZE) {
+    lck = (kmp_user_lock_p)crit;
+  } else {
+    lck = __kmp_get_critical_section_ptr(crit, loc, global_tid);
+  }
+  KMP_DEBUG_ASSERT(lck != NULL);
 
-    if ( __kmp_env_consistency_check )
-        __kmp_push_sync( global_tid, ct_critical, loc, lck );
+  if (__kmp_env_consistency_check)
+    __kmp_push_sync(global_tid, ct_critical, loc, lck);
 
-    __kmp_acquire_user_lock_with_checks( lck, global_tid );
+  __kmp_acquire_user_lock_with_checks(lck, global_tid);
 
 #endif // KMP_USE_DYNAMIC_LOCK
 }
 
 // used in a critical section reduce block
 static __forceinline void
-__kmp_end_critical_section_reduce_block( ident_t * loc, kmp_int32 global_tid, kmp_critical_name * crit ) {
+__kmp_end_critical_section_reduce_block(ident_t *loc, kmp_int32 global_tid,
+                                        kmp_critical_name *crit) {
 
-    kmp_user_lock_p lck;
+  kmp_user_lock_p lck;
 
 #if KMP_USE_DYNAMIC_LOCK
 
-    if (KMP_IS_D_LOCK(__kmp_user_lock_seq)) {
-        lck = (kmp_user_lock_p)crit;
-        if (__kmp_env_consistency_check)
-            __kmp_pop_sync(global_tid, ct_critical, loc);
-        KMP_D_LOCK_FUNC(lck, unset)((kmp_dyna_lock_t *)lck, global_tid);
-    } else {
-        kmp_indirect_lock_t *ilk = (kmp_indirect_lock_t *)TCR_PTR(*((kmp_indirect_lock_t **)crit));
-        if (__kmp_env_consistency_check)
-            __kmp_pop_sync(global_tid, ct_critical, loc);
-        KMP_I_LOCK_FUNC(ilk, unset)(ilk->lock, global_tid);
-    }
+  if (KMP_IS_D_LOCK(__kmp_user_lock_seq)) {
+    lck = (kmp_user_lock_p)crit;
+    if (__kmp_env_consistency_check)
+      __kmp_pop_sync(global_tid, ct_critical, loc);
+    KMP_D_LOCK_FUNC(lck, unset)((kmp_dyna_lock_t *)lck, global_tid);
+  } else {
+    kmp_indirect_lock_t *ilk =
+        (kmp_indirect_lock_t *)TCR_PTR(*((kmp_indirect_lock_t **)crit));
+    if (__kmp_env_consistency_check)
+      __kmp_pop_sync(global_tid, ct_critical, loc);
+    KMP_I_LOCK_FUNC(ilk, unset)(ilk->lock, global_tid);
+  }
 
 #else // KMP_USE_DYNAMIC_LOCK
 
-    // We know that the fast reduction code is only emitted by Intel compilers with 32 byte critical
-    // sections. If there isn't enough space, then we have to use a pointer.
-    if ( __kmp_base_user_lock_size > 32 ) {
-        lck = *( (kmp_user_lock_p *) crit );
-        KMP_ASSERT( lck != NULL );
-    } else {
-        lck = (kmp_user_lock_p) crit;
-    }
+  // We know that the fast reduction code is only emitted by Intel compilers
+  // with 32 byte critical sections. If there isn't enough space, then we have
+  // to use a pointer.
+  if (__kmp_base_user_lock_size > 32) {
+    lck = *((kmp_user_lock_p *)crit);
+    KMP_ASSERT(lck != NULL);
+  } else {
+    lck = (kmp_user_lock_p)crit;
+  }
 
-    if ( __kmp_env_consistency_check )
-        __kmp_pop_sync( global_tid, ct_critical, loc );
+  if (__kmp_env_consistency_check)
+    __kmp_pop_sync(global_tid, ct_critical, loc);
 
-    __kmp_release_user_lock_with_checks( lck, global_tid );
+  __kmp_release_user_lock_with_checks(lck, global_tid);
 
 #endif // KMP_USE_DYNAMIC_LOCK
 } // __kmp_end_critical_section_reduce_block
 
-
 /* 2.a.i. Reduce Block without a terminating barrier */
 /*!
 @ingroup SYNCHRONIZATION
@@ -2650,141 +2577,165 @@ __kmp_end_critical_section_reduce_block(
 @param num_vars number of items (variables) to be reduced
 @param reduce_size size of data in bytes to be reduced
 @param reduce_data pointer to data to be reduced
- at param reduce_func callback function providing reduction operation on two operands and returning result of reduction in lhs_data
+ at param reduce_func callback function providing reduction operation on two
+operands and returning result of reduction in lhs_data
 @param lck pointer to the unique lock data structure
- at result 1 for the master thread, 0 for all other team threads, 2 for all team threads if atomic reduction needed
+ at result 1 for the master thread, 0 for all other team threads, 2 for all team
+threads if atomic reduction needed
 
 The nowait version is used for a reduce clause with the nowait argument.
 */
 kmp_int32
-__kmpc_reduce_nowait(
-    ident_t *loc, kmp_int32 global_tid,
-    kmp_int32 num_vars, size_t reduce_size, void *reduce_data, void (*reduce_func)(void *lhs_data, void *rhs_data),
-    kmp_critical_name *lck ) {
-
-    KMP_COUNT_BLOCK(REDUCE_nowait);
-    int retval = 0;
-    PACKED_REDUCTION_METHOD_T packed_reduction_method;
+__kmpc_reduce_nowait(ident_t *loc, kmp_int32 global_tid, kmp_int32 num_vars,
+                     size_t reduce_size, void *reduce_data,
+                     void (*reduce_func)(void *lhs_data, void *rhs_data),
+                     kmp_critical_name *lck) {
+
+  KMP_COUNT_BLOCK(REDUCE_nowait);
+  int retval = 0;
+  PACKED_REDUCTION_METHOD_T packed_reduction_method;
 #if OMP_40_ENABLED
-    kmp_team_t *team;
-    kmp_info_t *th;
-    int teams_swapped = 0, task_state;
-#endif
-    KA_TRACE( 10, ( "__kmpc_reduce_nowait() enter: called T#%d\n", global_tid ) );
-
-    // why do we need this initialization here at all?
-    // Reduction clause can not be used as a stand-alone directive.
+  kmp_team_t *team;
+  kmp_info_t *th;
+  int teams_swapped = 0, task_state;
+#endif
+  KA_TRACE(10, ("__kmpc_reduce_nowait() enter: called T#%d\n", global_tid));
+
+  // why do we need this initialization here at all?
+  // Reduction clause can not be used as a stand-alone directive.
+
+  // do not call __kmp_serial_initialize(), it will be called by
+  // __kmp_parallel_initialize() if needed
+  // possible detection of false-positive race by the threadchecker ???
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
 
-    // do not call __kmp_serial_initialize(), it will be called by __kmp_parallel_initialize() if needed
-    // possible detection of false-positive race by the threadchecker ???
-    if( ! TCR_4( __kmp_init_parallel ) )
-        __kmp_parallel_initialize();
-
-    // check correctness of reduce block nesting
+// check correctness of reduce block nesting
 #if KMP_USE_DYNAMIC_LOCK
-    if ( __kmp_env_consistency_check )
-        __kmp_push_sync( global_tid, ct_reduce, loc, NULL, 0 );
+  if (__kmp_env_consistency_check)
+    __kmp_push_sync(global_tid, ct_reduce, loc, NULL, 0);
 #else
-    if ( __kmp_env_consistency_check )
-        __kmp_push_sync( global_tid, ct_reduce, loc, NULL );
+  if (__kmp_env_consistency_check)
+    __kmp_push_sync(global_tid, ct_reduce, loc, NULL);
 #endif
 
 #if OMP_40_ENABLED
-    th = __kmp_thread_from_gtid(global_tid);
-    if( th->th.th_teams_microtask ) {   // AC: check if we are inside the teams construct?
-        team = th->th.th_team;
-        if( team->t.t_level == th->th.th_teams_level ) {
-            // this is reduction at teams construct
-            KMP_DEBUG_ASSERT(!th->th.th_info.ds.ds_tid);  // AC: check that tid == 0
-            // Let's swap teams temporarily for the reduction barrier
-            teams_swapped = 1;
-            th->th.th_info.ds.ds_tid = team->t.t_master_tid;
-            th->th.th_team = team->t.t_parent;
-            th->th.th_team_nproc = th->th.th_team->t.t_nproc;
-            th->th.th_task_team = th->th.th_team->t.t_task_team[0];
-            task_state = th->th.th_task_state;
-            th->th.th_task_state = 0;
-        }
+  th = __kmp_thread_from_gtid(global_tid);
+  if (th->th.th_teams_microtask) { // AC: check if we are inside the teams
+    // construct?
+    team = th->th.th_team;
+    if (team->t.t_level == th->th.th_teams_level) {
+      // this is reduction at teams construct
+      KMP_DEBUG_ASSERT(!th->th.th_info.ds.ds_tid); // AC: check that tid == 0
+      // Let's swap teams temporarily for the reduction barrier
+      teams_swapped = 1;
+      th->th.th_info.ds.ds_tid = team->t.t_master_tid;
+      th->th.th_team = team->t.t_parent;
+      th->th.th_team_nproc = th->th.th_team->t.t_nproc;
+      th->th.th_task_team = th->th.th_team->t.t_task_team[0];
+      task_state = th->th.th_task_state;
+      th->th.th_task_state = 0;
     }
+  }
 #endif // OMP_40_ENABLED
 
-    // packed_reduction_method value will be reused by __kmp_end_reduce* function, the value should be kept in a variable
-    // the variable should be either a construct-specific or thread-specific property, not a team specific property
-    //     (a thread can reach the next reduce block on the next construct, reduce method may differ on the next construct)
-    // an ident_t "loc" parameter could be used as a construct-specific property (what if loc == 0?)
-    //     (if both construct-specific and team-specific variables were shared, then unness extra syncs should be needed)
-    // a thread-specific variable is better regarding two issues above (next construct and extra syncs)
-    // a thread-specific "th_local.reduction_method" variable is used currently
-    // each thread executes 'determine' and 'set' lines (no need to execute by one thread, to avoid unness extra syncs)
-
-    packed_reduction_method = __kmp_determine_reduction_method( loc, global_tid, num_vars, reduce_size, reduce_data, reduce_func, lck );
-    __KMP_SET_REDUCTION_METHOD( global_tid, packed_reduction_method );
-
-    if( packed_reduction_method == critical_reduce_block ) {
-
-        __kmp_enter_critical_section_reduce_block( loc, global_tid, lck );
-        retval = 1;
-
-    } else if( packed_reduction_method == empty_reduce_block ) {
-
-        // usage: if team size == 1, no synchronization is required ( Intel platforms only )
-        retval = 1;
-
-    } else if( packed_reduction_method == atomic_reduce_block ) {
-
-        retval = 2;
-
-        // all threads should do this pop here (because __kmpc_end_reduce_nowait() won't be called by the code gen)
-        //     (it's not quite good, because the checking block has been closed by this 'pop',
-        //      but atomic operation has not been executed yet, will be executed slightly later, literally on next instruction)
-        if ( __kmp_env_consistency_check )
-            __kmp_pop_sync( global_tid, ct_reduce, loc );
-
-    } else if( TEST_REDUCTION_METHOD( packed_reduction_method, tree_reduce_block ) ) {
-
-        //AT: performance issue: a real barrier here
-        //AT:     (if master goes slow, other threads are blocked here waiting for the master to come and release them)
-        //AT:     (it's not what a customer might expect specifying NOWAIT clause)
-        //AT:     (specifying NOWAIT won't result in improvement of performance, it'll be confusing to a customer)
-        //AT: another implementation of *barrier_gather*nowait() (or some other design) might go faster
-        //        and be more in line with sense of NOWAIT
-        //AT: TO DO: do epcc test and compare times
-
-        // this barrier should be invisible to a customer and to the threading profile tool
-        //              (it's neither a terminating barrier nor customer's code, it's used for an internal purpose)
+  // packed_reduction_method value will be reused by __kmp_end_reduce* function,
+  // the value should be kept in a variable
+  // the variable should be either a construct-specific or thread-specific
+  // property, not a team specific property
+  //     (a thread can reach the next reduce block on the next construct, reduce
+  //     method may differ on the next construct)
+  // an ident_t "loc" parameter could be used as a construct-specific property
+  // (what if loc == 0?)
+  //     (if both construct-specific and team-specific variables were shared,
+  //     then unness extra syncs should be needed)
+  // a thread-specific variable is better regarding two issues above (next
+  // construct and extra syncs)
+  // a thread-specific "th_local.reduction_method" variable is used currently
+  // each thread executes 'determine' and 'set' lines (no need to execute by one
+  // thread, to avoid unness extra syncs)
+
+  packed_reduction_method = __kmp_determine_reduction_method(
+      loc, global_tid, num_vars, reduce_size, reduce_data, reduce_func, lck);
+  __KMP_SET_REDUCTION_METHOD(global_tid, packed_reduction_method);
+
+  if (packed_reduction_method == critical_reduce_block) {
+
+    __kmp_enter_critical_section_reduce_block(loc, global_tid, lck);
+    retval = 1;
+
+  } else if (packed_reduction_method == empty_reduce_block) {
+
+    // usage: if team size == 1, no synchronization is required ( Intel
+    // platforms only )
+    retval = 1;
+
+  } else if (packed_reduction_method == atomic_reduce_block) {
+
+    retval = 2;
+
+    // all threads should do this pop here (because __kmpc_end_reduce_nowait()
+    // won't be called by the code gen)
+    //     (it's not quite good, because the checking block has been closed by
+    //     this 'pop',
+    //      but atomic operation has not been executed yet, will be executed
+    //      slightly later, literally on next instruction)
+    if (__kmp_env_consistency_check)
+      __kmp_pop_sync(global_tid, ct_reduce, loc);
+
+  } else if (TEST_REDUCTION_METHOD(packed_reduction_method,
+                                   tree_reduce_block)) {
+
+// AT: performance issue: a real barrier here
+// AT:     (if master goes slow, other threads are blocked here waiting for the
+// master to come and release them)
+// AT:     (it's not what a customer might expect specifying NOWAIT clause)
+// AT:     (specifying NOWAIT won't result in improvement of performance, it'll
+// be confusing to a customer)
+// AT: another implementation of *barrier_gather*nowait() (or some other design)
+// might go faster and be more in line with sense of NOWAIT
+// AT: TO DO: do epcc test and compare times
+
+// this barrier should be invisible to a customer and to the threading profile
+// tool (it's neither a terminating barrier nor customer's code, it's
+// used for an internal purpose)
 #if USE_ITT_NOTIFY
-        __kmp_threads[global_tid]->th.th_ident = loc;
+    __kmp_threads[global_tid]->th.th_ident = loc;
 #endif
-        retval = __kmp_barrier( UNPACK_REDUCTION_BARRIER( packed_reduction_method ), global_tid, FALSE, reduce_size, reduce_data, reduce_func );
-        retval = ( retval != 0 ) ? ( 0 ) : ( 1 );
-
-        // all other workers except master should do this pop here
-        //     ( none of other workers will get to __kmpc_end_reduce_nowait() )
-        if ( __kmp_env_consistency_check ) {
-            if( retval == 0 ) {
-                __kmp_pop_sync( global_tid, ct_reduce, loc );
-            }
-        }
-
-    } else {
+    retval =
+        __kmp_barrier(UNPACK_REDUCTION_BARRIER(packed_reduction_method),
+                      global_tid, FALSE, reduce_size, reduce_data, reduce_func);
+    retval = (retval != 0) ? (0) : (1);
+
+    // all other workers except master should do this pop here
+    //     ( none of other workers will get to __kmpc_end_reduce_nowait() )
+    if (__kmp_env_consistency_check) {
+      if (retval == 0) {
+        __kmp_pop_sync(global_tid, ct_reduce, loc);
+      }
+    }
 
-        // should never reach this block
-        KMP_ASSERT( 0 ); // "unexpected method"
+  } else {
 
-    }
+    // should never reach this block
+    KMP_ASSERT(0); // "unexpected method"
+  }
 #if OMP_40_ENABLED
-    if( teams_swapped ) {
-        // Restore thread structure
-        th->th.th_info.ds.ds_tid = 0;
-        th->th.th_team = team;
-        th->th.th_team_nproc = team->t.t_nproc;
-        th->th.th_task_team = team->t.t_task_team[task_state];
-        th->th.th_task_state = task_state;
-    }
+  if (teams_swapped) {
+    // Restore thread structure
+    th->th.th_info.ds.ds_tid = 0;
+    th->th.th_team = team;
+    th->th.th_team_nproc = team->t.t_nproc;
+    th->th.th_task_team = team->t.t_task_team[task_state];
+    th->th.th_task_state = task_state;
+  }
 #endif
-    KA_TRACE( 10, ( "__kmpc_reduce_nowait() exit: called T#%d: method %08x, returns %08x\n", global_tid, packed_reduction_method, retval ) );
+  KA_TRACE(
+      10,
+      ("__kmpc_reduce_nowait() exit: called T#%d: method %08x, returns %08x\n",
+       global_tid, packed_reduction_method, retval));
 
-    return retval;
+  return retval;
 }
 
 /*!
@@ -2795,47 +2746,49 @@ __kmpc_reduce_nowait(
 
 Finish the execution of a reduce nowait.
 */
-void
-__kmpc_end_reduce_nowait( ident_t *loc, kmp_int32 global_tid, kmp_critical_name *lck ) {
+void __kmpc_end_reduce_nowait(ident_t *loc, kmp_int32 global_tid,
+                              kmp_critical_name *lck) {
 
-    PACKED_REDUCTION_METHOD_T packed_reduction_method;
+  PACKED_REDUCTION_METHOD_T packed_reduction_method;
 
-    KA_TRACE( 10, ( "__kmpc_end_reduce_nowait() enter: called T#%d\n", global_tid ) );
+  KA_TRACE(10, ("__kmpc_end_reduce_nowait() enter: called T#%d\n", global_tid));
 
-    packed_reduction_method = __KMP_GET_REDUCTION_METHOD( global_tid );
+  packed_reduction_method = __KMP_GET_REDUCTION_METHOD(global_tid);
 
-    if( packed_reduction_method == critical_reduce_block ) {
+  if (packed_reduction_method == critical_reduce_block) {
 
-        __kmp_end_critical_section_reduce_block( loc, global_tid, lck );
+    __kmp_end_critical_section_reduce_block(loc, global_tid, lck);
 
-    } else if( packed_reduction_method == empty_reduce_block ) {
+  } else if (packed_reduction_method == empty_reduce_block) {
 
-        // usage: if team size == 1, no synchronization is required ( on Intel platforms only )
+    // usage: if team size == 1, no synchronization is required ( on Intel
+    // platforms only )
 
-    } else if( packed_reduction_method == atomic_reduce_block ) {
+  } else if (packed_reduction_method == atomic_reduce_block) {
 
-        // neither master nor other workers should get here
-        //     (code gen does not generate this call in case 2: atomic reduce block)
-        // actually it's better to remove this elseif at all;
-        // after removal this value will checked by the 'else' and will assert
+    // neither master nor other workers should get here
+    //     (code gen does not generate this call in case 2: atomic reduce block)
+    // actually it's better to remove this elseif at all;
+    // after removal this value will checked by the 'else' and will assert
 
-    } else if( TEST_REDUCTION_METHOD( packed_reduction_method, tree_reduce_block ) ) {
+  } else if (TEST_REDUCTION_METHOD(packed_reduction_method,
+                                   tree_reduce_block)) {
 
-        // only master gets here
+    // only master gets here
 
-    } else {
-
-        // should never reach this block
-        KMP_ASSERT( 0 ); // "unexpected method"
+  } else {
 
-    }
+    // should never reach this block
+    KMP_ASSERT(0); // "unexpected method"
+  }
 
-    if ( __kmp_env_consistency_check )
-        __kmp_pop_sync( global_tid, ct_reduce, loc );
+  if (__kmp_env_consistency_check)
+    __kmp_pop_sync(global_tid, ct_reduce, loc);
 
-    KA_TRACE( 10, ( "__kmpc_end_reduce_nowait() exit: called T#%d: method %08x\n", global_tid, packed_reduction_method ) );
+  KA_TRACE(10, ("__kmpc_end_reduce_nowait() exit: called T#%d: method %08x\n",
+                global_tid, packed_reduction_method));
 
-    return;
+  return;
 }
 
 /* 2.a.ii. Reduce Block with a terminating barrier */
@@ -2847,88 +2800,95 @@ __kmpc_end_reduce_nowait( ident_t *loc,
 @param num_vars number of items (variables) to be reduced
 @param reduce_size size of data in bytes to be reduced
 @param reduce_data pointer to data to be reduced
- at param reduce_func callback function providing reduction operation on two operands and returning result of reduction in lhs_data
+ at param reduce_func callback function providing reduction operation on two
+operands and returning result of reduction in lhs_data
 @param lck pointer to the unique lock data structure
- at result 1 for the master thread, 0 for all other team threads, 2 for all team threads if atomic reduction needed
+ at result 1 for the master thread, 0 for all other team threads, 2 for all team
+threads if atomic reduction needed
 
 A blocking reduce that includes an implicit barrier.
 */
-kmp_int32
-__kmpc_reduce(
-    ident_t *loc, kmp_int32 global_tid,
-    kmp_int32 num_vars, size_t reduce_size, void *reduce_data,
-    void (*reduce_func)(void *lhs_data, void *rhs_data),
-    kmp_critical_name *lck )
-{
-    KMP_COUNT_BLOCK(REDUCE_wait);
-    int retval = 0;
-    PACKED_REDUCTION_METHOD_T packed_reduction_method;
-
-    KA_TRACE( 10, ( "__kmpc_reduce() enter: called T#%d\n", global_tid ) );
-
-    // why do we need this initialization here at all?
-    // Reduction clause can not be a stand-alone directive.
-
-    // do not call __kmp_serial_initialize(), it will be called by __kmp_parallel_initialize() if needed
-    // possible detection of false-positive race by the threadchecker ???
-    if( ! TCR_4( __kmp_init_parallel ) )
-        __kmp_parallel_initialize();
+kmp_int32 __kmpc_reduce(ident_t *loc, kmp_int32 global_tid, kmp_int32 num_vars,
+                        size_t reduce_size, void *reduce_data,
+                        void (*reduce_func)(void *lhs_data, void *rhs_data),
+                        kmp_critical_name *lck) {
+  KMP_COUNT_BLOCK(REDUCE_wait);
+  int retval = 0;
+  PACKED_REDUCTION_METHOD_T packed_reduction_method;
+
+  KA_TRACE(10, ("__kmpc_reduce() enter: called T#%d\n", global_tid));
+
+  // why do we need this initialization here at all?
+  // Reduction clause can not be a stand-alone directive.
+
+  // do not call __kmp_serial_initialize(), it will be called by
+  // __kmp_parallel_initialize() if needed
+  // possible detection of false-positive race by the threadchecker ???
+  if (!TCR_4(__kmp_init_parallel))
+    __kmp_parallel_initialize();
 
-    // check correctness of reduce block nesting
+// check correctness of reduce block nesting
 #if KMP_USE_DYNAMIC_LOCK
-    if ( __kmp_env_consistency_check )
-        __kmp_push_sync( global_tid, ct_reduce, loc, NULL, 0 );
+  if (__kmp_env_consistency_check)
+    __kmp_push_sync(global_tid, ct_reduce, loc, NULL, 0);
 #else
-    if ( __kmp_env_consistency_check )
-        __kmp_push_sync( global_tid, ct_reduce, loc, NULL );
+  if (__kmp_env_consistency_check)
+    __kmp_push_sync(global_tid, ct_reduce, loc, NULL);
 #endif
 
-    packed_reduction_method = __kmp_determine_reduction_method( loc, global_tid, num_vars, reduce_size, reduce_data, reduce_func, lck );
-    __KMP_SET_REDUCTION_METHOD( global_tid, packed_reduction_method );
+  packed_reduction_method = __kmp_determine_reduction_method(
+      loc, global_tid, num_vars, reduce_size, reduce_data, reduce_func, lck);
+  __KMP_SET_REDUCTION_METHOD(global_tid, packed_reduction_method);
 
-    if( packed_reduction_method == critical_reduce_block ) {
+  if (packed_reduction_method == critical_reduce_block) {
 
-        __kmp_enter_critical_section_reduce_block( loc, global_tid, lck );
-        retval = 1;
+    __kmp_enter_critical_section_reduce_block(loc, global_tid, lck);
+    retval = 1;
 
-    } else if( packed_reduction_method == empty_reduce_block ) {
+  } else if (packed_reduction_method == empty_reduce_block) {
 
-        // usage: if team size == 1, no synchronization is required ( Intel platforms only )
-        retval = 1;
+    // usage: if team size == 1, no synchronization is required ( Intel
+    // platforms only )
+    retval = 1;
 
-    } else if( packed_reduction_method == atomic_reduce_block ) {
+  } else if (packed_reduction_method == atomic_reduce_block) {
 
-        retval = 2;
+    retval = 2;
 
-    } else if( TEST_REDUCTION_METHOD( packed_reduction_method, tree_reduce_block ) ) {
+  } else if (TEST_REDUCTION_METHOD(packed_reduction_method,
+                                   tree_reduce_block)) {
 
-        //case tree_reduce_block:
-        // this barrier should be visible to a customer and to the threading profile tool
-        //              (it's a terminating barrier on constructs if NOWAIT not specified)
+// case tree_reduce_block:
+// this barrier should be visible to a customer and to the threading profile
+// tool (it's a terminating barrier on constructs if NOWAIT not specified)
 #if USE_ITT_NOTIFY
-        __kmp_threads[global_tid]->th.th_ident = loc; // needed for correct notification of frames
+    __kmp_threads[global_tid]->th.th_ident =
+        loc; // needed for correct notification of frames
 #endif
-        retval = __kmp_barrier( UNPACK_REDUCTION_BARRIER( packed_reduction_method ), global_tid, TRUE, reduce_size, reduce_data, reduce_func );
-        retval = ( retval != 0 ) ? ( 0 ) : ( 1 );
-
-        // all other workers except master should do this pop here
-        //     ( none of other workers except master will enter __kmpc_end_reduce() )
-        if ( __kmp_env_consistency_check ) {
-            if( retval == 0 ) { // 0: all other workers; 1: master
-                __kmp_pop_sync( global_tid, ct_reduce, loc );
-            }
-        }
-
-    } else {
+    retval =
+        __kmp_barrier(UNPACK_REDUCTION_BARRIER(packed_reduction_method),
+                      global_tid, TRUE, reduce_size, reduce_data, reduce_func);
+    retval = (retval != 0) ? (0) : (1);
+
+    // all other workers except master should do this pop here
+    // ( none of other workers except master will enter __kmpc_end_reduce() )
+    if (__kmp_env_consistency_check) {
+      if (retval == 0) { // 0: all other workers; 1: master
+        __kmp_pop_sync(global_tid, ct_reduce, loc);
+      }
+    }
 
-        // should never reach this block
-        KMP_ASSERT( 0 ); // "unexpected method"
+  } else {
 
-    }
+    // should never reach this block
+    KMP_ASSERT(0); // "unexpected method"
+  }
 
-    KA_TRACE( 10, ( "__kmpc_reduce() exit: called T#%d: method %08x, returns %08x\n", global_tid, packed_reduction_method, retval ) );
+  KA_TRACE(10,
+           ("__kmpc_reduce() exit: called T#%d: method %08x, returns %08x\n",
+            global_tid, packed_reduction_method, retval));
 
-    return retval;
+  return retval;
 }
 
 /*!
@@ -2938,103 +2898,103 @@ __kmpc_reduce(
 @param lck pointer to the unique lock data structure
 
 Finish the execution of a blocking reduce.
-The <tt>lck</tt> pointer must be the same as that used in the corresponding start function.
+The <tt>lck</tt> pointer must be the same as that used in the corresponding
+start function.
 */
-void
-__kmpc_end_reduce( ident_t *loc, kmp_int32 global_tid, kmp_critical_name *lck ) {
+void __kmpc_end_reduce(ident_t *loc, kmp_int32 global_tid,
+                       kmp_critical_name *lck) {
 
-    PACKED_REDUCTION_METHOD_T packed_reduction_method;
+  PACKED_REDUCTION_METHOD_T packed_reduction_method;
 
-    KA_TRACE( 10, ( "__kmpc_end_reduce() enter: called T#%d\n", global_tid ) );
+  KA_TRACE(10, ("__kmpc_end_reduce() enter: called T#%d\n", global_tid));
 
-    packed_reduction_method = __KMP_GET_REDUCTION_METHOD( global_tid );
+  packed_reduction_method = __KMP_GET_REDUCTION_METHOD(global_tid);
 
-    // this barrier should be visible to a customer and to the threading profile tool
-    //              (it's a terminating barrier on constructs if NOWAIT not specified)
+  // this barrier should be visible to a customer and to the threading profile
+  // tool (it's a terminating barrier on constructs if NOWAIT not specified)
 
-    if( packed_reduction_method == critical_reduce_block ) {
+  if (packed_reduction_method == critical_reduce_block) {
 
-        __kmp_end_critical_section_reduce_block( loc, global_tid, lck );
+    __kmp_end_critical_section_reduce_block(loc, global_tid, lck);
 
-        // TODO: implicit barrier: should be exposed
+// TODO: implicit barrier: should be exposed
 #if USE_ITT_NOTIFY
-        __kmp_threads[global_tid]->th.th_ident = loc;
+    __kmp_threads[global_tid]->th.th_ident = loc;
 #endif
-        __kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
+    __kmp_barrier(bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL);
 
-    } else if( packed_reduction_method == empty_reduce_block ) {
+  } else if (packed_reduction_method == empty_reduce_block) {
 
-        // usage: if team size == 1, no synchronization is required ( Intel platforms only )
+// usage: if team size==1, no synchronization is required (Intel platforms only)
 
-        // TODO: implicit barrier: should be exposed
+// TODO: implicit barrier: should be exposed
 #if USE_ITT_NOTIFY
-        __kmp_threads[global_tid]->th.th_ident = loc;
+    __kmp_threads[global_tid]->th.th_ident = loc;
 #endif
-        __kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
+    __kmp_barrier(bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL);
 
-    } else if( packed_reduction_method == atomic_reduce_block ) {
+  } else if (packed_reduction_method == atomic_reduce_block) {
 
-        // TODO: implicit barrier: should be exposed
+// TODO: implicit barrier: should be exposed
 #if USE_ITT_NOTIFY
-        __kmp_threads[global_tid]->th.th_ident = loc;
+    __kmp_threads[global_tid]->th.th_ident = loc;
 #endif
-        __kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
-
-    } else if( TEST_REDUCTION_METHOD( packed_reduction_method, tree_reduce_block ) ) {
+    __kmp_barrier(bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL);
 
-        // only master executes here (master releases all other workers)
-        __kmp_end_split_barrier( UNPACK_REDUCTION_BARRIER( packed_reduction_method ), global_tid );
+  } else if (TEST_REDUCTION_METHOD(packed_reduction_method,
+                                   tree_reduce_block)) {
 
-    } else {
+    // only master executes here (master releases all other workers)
+    __kmp_end_split_barrier(UNPACK_REDUCTION_BARRIER(packed_reduction_method),
+                            global_tid);
 
-        // should never reach this block
-        KMP_ASSERT( 0 ); // "unexpected method"
+  } else {
 
-    }
+    // should never reach this block
+    KMP_ASSERT(0); // "unexpected method"
+  }
 
-    if ( __kmp_env_consistency_check )
-        __kmp_pop_sync( global_tid, ct_reduce, loc );
+  if (__kmp_env_consistency_check)
+    __kmp_pop_sync(global_tid, ct_reduce, loc);
 
-    KA_TRACE( 10, ( "__kmpc_end_reduce() exit: called T#%d: method %08x\n", global_tid, packed_reduction_method ) );
+  KA_TRACE(10, ("__kmpc_end_reduce() exit: called T#%d: method %08x\n",
+                global_tid, packed_reduction_method));
 
-    return;
+  return;
 }
 
 #undef __KMP_GET_REDUCTION_METHOD
 #undef __KMP_SET_REDUCTION_METHOD
 
-/*-- end of interface to fast scalable reduce routines ---------------------------------------------------------------*/
+/* end of interface to fast scalable reduce routines */
 
-kmp_uint64
-__kmpc_get_taskid() {
+kmp_uint64 __kmpc_get_taskid() {
 
-    kmp_int32    gtid;
-    kmp_info_t * thread;
+  kmp_int32 gtid;
+  kmp_info_t *thread;
 
-    gtid = __kmp_get_gtid();
-    if ( gtid < 0 ) {
-        return 0;
-    }; // if
-    thread = __kmp_thread_from_gtid( gtid );
-    return thread->th.th_current_task->td_task_id;
+  gtid = __kmp_get_gtid();
+  if (gtid < 0) {
+    return 0;
+  }; // if
+  thread = __kmp_thread_from_gtid(gtid);
+  return thread->th.th_current_task->td_task_id;
 
 } // __kmpc_get_taskid
 
+kmp_uint64 __kmpc_get_parent_taskid() {
 
-kmp_uint64
-__kmpc_get_parent_taskid() {
-
-    kmp_int32        gtid;
-    kmp_info_t *     thread;
-    kmp_taskdata_t * parent_task;
-
-    gtid = __kmp_get_gtid();
-    if ( gtid < 0 ) {
-        return 0;
-    }; // if
-    thread      = __kmp_thread_from_gtid( gtid );
-    parent_task = thread->th.th_current_task->td_parent;
-    return ( parent_task == NULL ? 0 : parent_task->td_task_id );
+  kmp_int32 gtid;
+  kmp_info_t *thread;
+  kmp_taskdata_t *parent_task;
+
+  gtid = __kmp_get_gtid();
+  if (gtid < 0) {
+    return 0;
+  }; // if
+  thread = __kmp_thread_from_gtid(gtid);
+  parent_task = thread->th.th_current_task->td_parent;
+  return (parent_task == NULL ? 0 : parent_task->td_task_id);
 
 } // __kmpc_get_parent_taskid
 
@@ -3050,282 +3010,292 @@ Initialize doacross loop information.
 Expect compiler send us inclusive bounds,
 e.g. for(i=2;i<9;i+=2) lo=2, up=8, st=2.
 */
-void
-__kmpc_doacross_init(ident_t *loc, int gtid, int num_dims, struct kmp_dim * dims)
-{
-    int j, idx;
-    kmp_int64 last, trace_count;
-    kmp_info_t *th = __kmp_threads[gtid];
-    kmp_team_t *team = th->th.th_team;
-    kmp_uint32 *flags;
-    kmp_disp_t *pr_buf = th->th.th_dispatch;
-    dispatch_shared_info_t *sh_buf;
-
-    KA_TRACE(20,("__kmpc_doacross_init() enter: called T#%d, num dims %d, active %d\n",
-                 gtid, num_dims, !team->t.t_serialized));
-    KMP_DEBUG_ASSERT(dims != NULL);
-    KMP_DEBUG_ASSERT(num_dims > 0);
-
-    if( team->t.t_serialized ) {
-        KA_TRACE(20,("__kmpc_doacross_init() exit: serialized team\n"));
-        return; // no dependencies if team is serialized
-    }
-    KMP_DEBUG_ASSERT(team->t.t_nproc > 1);
-    idx = pr_buf->th_doacross_buf_idx++; // Increment index of shared buffer for the next loop
-    sh_buf = &team->t.t_disp_buffer[idx % __kmp_dispatch_num_buffers];
-
-    // Save bounds info into allocated private buffer
-    KMP_DEBUG_ASSERT(pr_buf->th_doacross_info == NULL);
-    pr_buf->th_doacross_info =
-        (kmp_int64*)__kmp_thread_malloc(th, sizeof(kmp_int64)*(4 * num_dims + 1));
-    KMP_DEBUG_ASSERT(pr_buf->th_doacross_info != NULL);
-    pr_buf->th_doacross_info[0] = (kmp_int64)num_dims; // first element is number of dimensions
-    // Save also address of num_done in order to access it later without knowing the buffer index
-    pr_buf->th_doacross_info[1] = (kmp_int64)&sh_buf->doacross_num_done;
-    pr_buf->th_doacross_info[2] = dims[0].lo;
-    pr_buf->th_doacross_info[3] = dims[0].up;
-    pr_buf->th_doacross_info[4] = dims[0].st;
-    last = 5;
-    for( j = 1; j < num_dims; ++j ) {
-        kmp_int64 range_length; // To keep ranges of all dimensions but the first dims[0]
-        if( dims[j].st == 1 ) { // most common case
-            // AC: should we care of ranges bigger than LLONG_MAX? (not for now)
-            range_length = dims[j].up - dims[j].lo + 1;
-        } else {
-            if( dims[j].st > 0 ) {
-                KMP_DEBUG_ASSERT(dims[j].up > dims[j].lo);
-                range_length = (kmp_uint64)(dims[j].up - dims[j].lo) / dims[j].st + 1;
-            } else {            // negative increment
-                KMP_DEBUG_ASSERT(dims[j].lo > dims[j].up);
-                range_length = (kmp_uint64)(dims[j].lo - dims[j].up) / (-dims[j].st) + 1;
-            }
-        }
-        pr_buf->th_doacross_info[last++] = range_length;
-        pr_buf->th_doacross_info[last++] = dims[j].lo;
-        pr_buf->th_doacross_info[last++] = dims[j].up;
-        pr_buf->th_doacross_info[last++] = dims[j].st;
-    }
-
-    // Compute total trip count.
-    // Start with range of dims[0] which we don't need to keep in the buffer.
-    if( dims[0].st == 1 ) { // most common case
-        trace_count = dims[0].up - dims[0].lo + 1;
-    } else if( dims[0].st > 0 ) {
-        KMP_DEBUG_ASSERT(dims[0].up > dims[0].lo);
-        trace_count = (kmp_uint64)(dims[0].up - dims[0].lo) / dims[0].st + 1;
-    } else {   // negative increment
-        KMP_DEBUG_ASSERT(dims[0].lo > dims[0].up);
-        trace_count = (kmp_uint64)(dims[0].lo - dims[0].up) / (-dims[0].st) + 1;
-    }
-    for( j = 1; j < num_dims; ++j ) {
-        trace_count *= pr_buf->th_doacross_info[4 * j + 1]; // use kept ranges
-    }
-    KMP_DEBUG_ASSERT(trace_count > 0);
-
-    // Check if shared buffer is not occupied by other loop (idx - __kmp_dispatch_num_buffers)
-    if( idx != sh_buf->doacross_buf_idx ) {
-        // Shared buffer is occupied, wait for it to be free
-        __kmp_wait_yield_4( (kmp_uint32*)&sh_buf->doacross_buf_idx, idx, __kmp_eq_4, NULL );
-    }
-    // Check if we are the first thread. After the CAS the first thread gets 0,
-    // others get 1 if initialization is in progress, allocated pointer otherwise.
-    flags = (kmp_uint32*)KMP_COMPARE_AND_STORE_RET64(
-        (kmp_int64*)&sh_buf->doacross_flags,NULL,(kmp_int64)1);
-    if( flags == NULL ) {
-        // we are the first thread, allocate the array of flags
-        kmp_int64 size = trace_count / 8 + 8; // in bytes, use single bit per iteration
-        sh_buf->doacross_flags = (kmp_uint32*)__kmp_thread_calloc(th, size, 1);
-    } else if( (kmp_int64)flags == 1 ) {
-        // initialization is still in progress, need to wait
-        while( (volatile kmp_int64)sh_buf->doacross_flags == 1 ) {
-            KMP_YIELD(TRUE);
-        }
-    }
-    KMP_DEBUG_ASSERT((kmp_int64)sh_buf->doacross_flags > 1); // check value of pointer
-    pr_buf->th_doacross_flags = sh_buf->doacross_flags;      // save private copy in order to not
-                                                             // touch shared buffer on each iteration
-    KA_TRACE(20,("__kmpc_doacross_init() exit: T#%d\n", gtid));
-}
-
-void
-__kmpc_doacross_wait(ident_t *loc, int gtid, long long *vec)
-{
-    kmp_int32 shft, num_dims, i;
-    kmp_uint32 flag;
-    kmp_int64 iter_number; // iteration number of "collapsed" loop nest
-    kmp_info_t *th = __kmp_threads[gtid];
-    kmp_team_t *team = th->th.th_team;
-    kmp_disp_t *pr_buf;
-    kmp_int64 lo, up, st;
-
-    KA_TRACE(20,("__kmpc_doacross_wait() enter: called T#%d\n", gtid));
-    if( team->t.t_serialized ) {
-        KA_TRACE(20,("__kmpc_doacross_wait() exit: serialized team\n"));
-        return; // no dependencies if team is serialized
-    }
-
-    // calculate sequential iteration number and check out-of-bounds condition
-    pr_buf = th->th.th_dispatch;
-    KMP_DEBUG_ASSERT(pr_buf->th_doacross_info != NULL);
-    num_dims = pr_buf->th_doacross_info[0];
-    lo = pr_buf->th_doacross_info[2];
-    up = pr_buf->th_doacross_info[3];
-    st = pr_buf->th_doacross_info[4];
-    if( st == 1 ) { // most common case
-        if( vec[0] < lo || vec[0] > up ) {
-            KA_TRACE(20,(
-                "__kmpc_doacross_wait() exit: T#%d iter %lld is out of bounds [%lld,%lld]\n",
-                gtid, vec[0], lo, up));
-            return;
-        }
-        iter_number = vec[0] - lo;
-    } else if( st > 0 ) {
-        if( vec[0] < lo || vec[0] > up ) {
-            KA_TRACE(20,(
-                "__kmpc_doacross_wait() exit: T#%d iter %lld is out of bounds [%lld,%lld]\n",
-                gtid, vec[0], lo, up));
-            return;
-        }
-        iter_number = (kmp_uint64)(vec[0] - lo) / st;
-    } else {        // negative increment
-        if( vec[0] > lo || vec[0] < up ) {
-            KA_TRACE(20,(
-                "__kmpc_doacross_wait() exit: T#%d iter %lld is out of bounds [%lld,%lld]\n",
-                gtid, vec[0], lo, up));
-            return;
-        }
-        iter_number = (kmp_uint64)(lo - vec[0]) / (-st);
-    }
-    for( i = 1; i < num_dims; ++i ) {
-        kmp_int64 iter, ln;
-        kmp_int32 j = i * 4;
-        ln = pr_buf->th_doacross_info[j + 1];
-        lo = pr_buf->th_doacross_info[j + 2];
-        up = pr_buf->th_doacross_info[j + 3];
-        st = pr_buf->th_doacross_info[j + 4];
-        if( st == 1 ) {
-            if( vec[i] < lo || vec[i] > up ) {
-                KA_TRACE(20,(
-                    "__kmpc_doacross_wait() exit: T#%d iter %lld is out of bounds [%lld,%lld]\n",
-                    gtid, vec[i], lo, up));
-                return;
-            }
-            iter = vec[i] - lo;
-        } else if( st > 0 ) {
-            if( vec[i] < lo || vec[i] > up ) {
-                KA_TRACE(20,(
-                    "__kmpc_doacross_wait() exit: T#%d iter %lld is out of bounds [%lld,%lld]\n",
-                    gtid, vec[i], lo, up));
-                return;
-            }
-            iter = (kmp_uint64)(vec[i] - lo) / st;
-        } else {   // st < 0
-            if( vec[i] > lo || vec[i] < up ) {
-                KA_TRACE(20,(
-                    "__kmpc_doacross_wait() exit: T#%d iter %lld is out of bounds [%lld,%lld]\n",
-                    gtid, vec[i], lo, up));
-                return;
-            }
-            iter = (kmp_uint64)(lo - vec[i]) / (-st);
-        }
-        iter_number = iter + ln * iter_number;
-    }
-    shft = iter_number % 32; // use 32-bit granularity
-    iter_number >>= 5;       // divided by 32
-    flag = 1 << shft;
-    while( (flag & pr_buf->th_doacross_flags[iter_number]) == 0 ) {
-        KMP_YIELD(TRUE);
-    }
-    KA_TRACE(20,("__kmpc_doacross_wait() exit: T#%d wait for iter %lld completed\n",
-                 gtid, (iter_number<<5)+shft));
-}
-
-void
-__kmpc_doacross_post(ident_t *loc, int gtid, long long *vec)
-{
-    kmp_int32 shft, num_dims, i;
-    kmp_uint32 flag;
-    kmp_int64 iter_number; // iteration number of "collapsed" loop nest
-    kmp_info_t *th = __kmp_threads[gtid];
-    kmp_team_t *team = th->th.th_team;
-    kmp_disp_t *pr_buf;
-    kmp_int64 lo, st;
-
-    KA_TRACE(20,("__kmpc_doacross_post() enter: called T#%d\n", gtid));
-    if( team->t.t_serialized ) {
-        KA_TRACE(20,("__kmpc_doacross_post() exit: serialized team\n"));
-        return; // no dependencies if team is serialized
-    }
-
-    // calculate sequential iteration number (same as in "wait" but no out-of-bounds checks)
-    pr_buf = th->th.th_dispatch;
-    KMP_DEBUG_ASSERT(pr_buf->th_doacross_info != NULL);
-    num_dims = pr_buf->th_doacross_info[0];
-    lo = pr_buf->th_doacross_info[2];
-    st = pr_buf->th_doacross_info[4];
-    if( st == 1 ) { // most common case
-        iter_number = vec[0] - lo;
-    } else if( st > 0 ) {
-        iter_number = (kmp_uint64)(vec[0] - lo) / st;
-    } else {        // negative increment
-        iter_number = (kmp_uint64)(lo - vec[0]) / (-st);
-    }
-    for( i = 1; i < num_dims; ++i ) {
-        kmp_int64 iter, ln;
-        kmp_int32 j = i * 4;
-        ln = pr_buf->th_doacross_info[j + 1];
-        lo = pr_buf->th_doacross_info[j + 2];
-        st = pr_buf->th_doacross_info[j + 4];
-        if( st == 1 ) {
-            iter = vec[i] - lo;
-        } else if( st > 0 ) {
-            iter = (kmp_uint64)(vec[i] - lo) / st;
-        } else {   // st < 0
-            iter = (kmp_uint64)(lo - vec[i]) / (-st);
-        }
-        iter_number = iter + ln * iter_number;
-    }
-    shft = iter_number % 32; // use 32-bit granularity
-    iter_number >>= 5;       // divided by 32
-    flag = 1 << shft;
-    if( (flag & pr_buf->th_doacross_flags[iter_number]) == 0 )
-        KMP_TEST_THEN_OR32( (kmp_int32*)&pr_buf->th_doacross_flags[iter_number], (kmp_int32)flag );
-    KA_TRACE(20,("__kmpc_doacross_post() exit: T#%d iter %lld posted\n",
-                 gtid, (iter_number<<5)+shft));
-}
-
-void
-__kmpc_doacross_fini(ident_t *loc, int gtid)
-{
-    kmp_int64 num_done;
-    kmp_info_t *th = __kmp_threads[gtid];
-    kmp_team_t *team = th->th.th_team;
-    kmp_disp_t *pr_buf = th->th.th_dispatch;
-
-    KA_TRACE(20,("__kmpc_doacross_fini() enter: called T#%d\n", gtid));
-    if( team->t.t_serialized ) {
-        KA_TRACE(20,("__kmpc_doacross_fini() exit: serialized team %p\n", team));
-        return; // nothing to do
-    }
-    num_done = KMP_TEST_THEN_INC64((kmp_int64*)pr_buf->th_doacross_info[1]) + 1;
-    if( num_done == th->th.th_team_nproc ) {
-        // we are the last thread, need to free shared resources
-        int idx = pr_buf->th_doacross_buf_idx - 1;
-        dispatch_shared_info_t *sh_buf = &team->t.t_disp_buffer[idx % __kmp_dispatch_num_buffers];
-        KMP_DEBUG_ASSERT(pr_buf->th_doacross_info[1] == (kmp_int64)&sh_buf->doacross_num_done);
-        KMP_DEBUG_ASSERT(num_done == (kmp_int64)sh_buf->doacross_num_done);
-        KMP_DEBUG_ASSERT(idx == sh_buf->doacross_buf_idx);
-        __kmp_thread_free(th, (void*)sh_buf->doacross_flags);
-        sh_buf->doacross_flags = NULL;
-        sh_buf->doacross_num_done = 0;
-        sh_buf->doacross_buf_idx += __kmp_dispatch_num_buffers; // free buffer for future re-use
-    }
-    // free private resources (need to keep buffer index forever)
-    __kmp_thread_free(th, (void*)pr_buf->th_doacross_info);
-    pr_buf->th_doacross_info = NULL;
-    KA_TRACE(20,("__kmpc_doacross_fini() exit: T#%d\n", gtid));
+void __kmpc_doacross_init(ident_t *loc, int gtid, int num_dims,
+                          struct kmp_dim *dims) {
+  int j, idx;
+  kmp_int64 last, trace_count;
+  kmp_info_t *th = __kmp_threads[gtid];
+  kmp_team_t *team = th->th.th_team;
+  kmp_uint32 *flags;
+  kmp_disp_t *pr_buf = th->th.th_dispatch;
+  dispatch_shared_info_t *sh_buf;
+
+  KA_TRACE(
+      20,
+      ("__kmpc_doacross_init() enter: called T#%d, num dims %d, active %d\n",
+       gtid, num_dims, !team->t.t_serialized));
+  KMP_DEBUG_ASSERT(dims != NULL);
+  KMP_DEBUG_ASSERT(num_dims > 0);
+
+  if (team->t.t_serialized) {
+    KA_TRACE(20, ("__kmpc_doacross_init() exit: serialized team\n"));
+    return; // no dependencies if team is serialized
+  }
+  KMP_DEBUG_ASSERT(team->t.t_nproc > 1);
+  idx = pr_buf->th_doacross_buf_idx++; // Increment index of shared buffer for
+  // the next loop
+  sh_buf = &team->t.t_disp_buffer[idx % __kmp_dispatch_num_buffers];
+
+  // Save bounds info into allocated private buffer
+  KMP_DEBUG_ASSERT(pr_buf->th_doacross_info == NULL);
+  pr_buf->th_doacross_info = (kmp_int64 *)__kmp_thread_malloc(
+      th, sizeof(kmp_int64) * (4 * num_dims + 1));
+  KMP_DEBUG_ASSERT(pr_buf->th_doacross_info != NULL);
+  pr_buf->th_doacross_info[0] =
+      (kmp_int64)num_dims; // first element is number of dimensions
+  // Save also address of num_done in order to access it later without knowing
+  // the buffer index
+  pr_buf->th_doacross_info[1] = (kmp_int64)&sh_buf->doacross_num_done;
+  pr_buf->th_doacross_info[2] = dims[0].lo;
+  pr_buf->th_doacross_info[3] = dims[0].up;
+  pr_buf->th_doacross_info[4] = dims[0].st;
+  last = 5;
+  for (j = 1; j < num_dims; ++j) {
+    kmp_int64
+        range_length; // To keep ranges of all dimensions but the first dims[0]
+    if (dims[j].st == 1) { // most common case
+      // AC: should we care of ranges bigger than LLONG_MAX? (not for now)
+      range_length = dims[j].up - dims[j].lo + 1;
+    } else {
+      if (dims[j].st > 0) {
+        KMP_DEBUG_ASSERT(dims[j].up > dims[j].lo);
+        range_length = (kmp_uint64)(dims[j].up - dims[j].lo) / dims[j].st + 1;
+      } else { // negative increment
+        KMP_DEBUG_ASSERT(dims[j].lo > dims[j].up);
+        range_length =
+            (kmp_uint64)(dims[j].lo - dims[j].up) / (-dims[j].st) + 1;
+      }
+    }
+    pr_buf->th_doacross_info[last++] = range_length;
+    pr_buf->th_doacross_info[last++] = dims[j].lo;
+    pr_buf->th_doacross_info[last++] = dims[j].up;
+    pr_buf->th_doacross_info[last++] = dims[j].st;
+  }
+
+  // Compute total trip count.
+  // Start with range of dims[0] which we don't need to keep in the buffer.
+  if (dims[0].st == 1) { // most common case
+    trace_count = dims[0].up - dims[0].lo + 1;
+  } else if (dims[0].st > 0) {
+    KMP_DEBUG_ASSERT(dims[0].up > dims[0].lo);
+    trace_count = (kmp_uint64)(dims[0].up - dims[0].lo) / dims[0].st + 1;
+  } else { // negative increment
+    KMP_DEBUG_ASSERT(dims[0].lo > dims[0].up);
+    trace_count = (kmp_uint64)(dims[0].lo - dims[0].up) / (-dims[0].st) + 1;
+  }
+  for (j = 1; j < num_dims; ++j) {
+    trace_count *= pr_buf->th_doacross_info[4 * j + 1]; // use kept ranges
+  }
+  KMP_DEBUG_ASSERT(trace_count > 0);
+
+  // Check if shared buffer is not occupied by other loop (idx -
+  // __kmp_dispatch_num_buffers)
+  if (idx != sh_buf->doacross_buf_idx) {
+    // Shared buffer is occupied, wait for it to be free
+    __kmp_wait_yield_4((kmp_uint32 *)&sh_buf->doacross_buf_idx, idx, __kmp_eq_4,
+                       NULL);
+  }
+  // Check if we are the first thread. After the CAS the first thread gets 0,
+  // others get 1 if initialization is in progress, allocated pointer otherwise.
+  flags = (kmp_uint32 *)KMP_COMPARE_AND_STORE_RET64(
+      (kmp_int64 *)&sh_buf->doacross_flags, NULL, (kmp_int64)1);
+  if (flags == NULL) {
+    // we are the first thread, allocate the array of flags
+    kmp_int64 size =
+        trace_count / 8 + 8; // in bytes, use single bit per iteration
+    sh_buf->doacross_flags = (kmp_uint32 *)__kmp_thread_calloc(th, size, 1);
+  } else if ((kmp_int64)flags == 1) {
+    // initialization is still in progress, need to wait
+    while ((volatile kmp_int64)sh_buf->doacross_flags == 1) {
+      KMP_YIELD(TRUE);
+    }
+  }
+  KMP_DEBUG_ASSERT((kmp_int64)sh_buf->doacross_flags >
+                   1); // check value of pointer
+  pr_buf->th_doacross_flags =
+      sh_buf->doacross_flags; // save private copy in order to not
+  // touch shared buffer on each iteration
+  KA_TRACE(20, ("__kmpc_doacross_init() exit: T#%d\n", gtid));
+}
+
+void __kmpc_doacross_wait(ident_t *loc, int gtid, long long *vec) {
+  kmp_int32 shft, num_dims, i;
+  kmp_uint32 flag;
+  kmp_int64 iter_number; // iteration number of "collapsed" loop nest
+  kmp_info_t *th = __kmp_threads[gtid];
+  kmp_team_t *team = th->th.th_team;
+  kmp_disp_t *pr_buf;
+  kmp_int64 lo, up, st;
+
+  KA_TRACE(20, ("__kmpc_doacross_wait() enter: called T#%d\n", gtid));
+  if (team->t.t_serialized) {
+    KA_TRACE(20, ("__kmpc_doacross_wait() exit: serialized team\n"));
+    return; // no dependencies if team is serialized
+  }
+
+  // calculate sequential iteration number and check out-of-bounds condition
+  pr_buf = th->th.th_dispatch;
+  KMP_DEBUG_ASSERT(pr_buf->th_doacross_info != NULL);
+  num_dims = pr_buf->th_doacross_info[0];
+  lo = pr_buf->th_doacross_info[2];
+  up = pr_buf->th_doacross_info[3];
+  st = pr_buf->th_doacross_info[4];
+  if (st == 1) { // most common case
+    if (vec[0] < lo || vec[0] > up) {
+      KA_TRACE(20, ("__kmpc_doacross_wait() exit: T#%d iter %lld is out of "
+                    "bounds [%lld,%lld]\n",
+                    gtid, vec[0], lo, up));
+      return;
+    }
+    iter_number = vec[0] - lo;
+  } else if (st > 0) {
+    if (vec[0] < lo || vec[0] > up) {
+      KA_TRACE(20, ("__kmpc_doacross_wait() exit: T#%d iter %lld is out of "
+                    "bounds [%lld,%lld]\n",
+                    gtid, vec[0], lo, up));
+      return;
+    }
+    iter_number = (kmp_uint64)(vec[0] - lo) / st;
+  } else { // negative increment
+    if (vec[0] > lo || vec[0] < up) {
+      KA_TRACE(20, ("__kmpc_doacross_wait() exit: T#%d iter %lld is out of "
+                    "bounds [%lld,%lld]\n",
+                    gtid, vec[0], lo, up));
+      return;
+    }
+    iter_number = (kmp_uint64)(lo - vec[0]) / (-st);
+  }
+  for (i = 1; i < num_dims; ++i) {
+    kmp_int64 iter, ln;
+    kmp_int32 j = i * 4;
+    ln = pr_buf->th_doacross_info[j + 1];
+    lo = pr_buf->th_doacross_info[j + 2];
+    up = pr_buf->th_doacross_info[j + 3];
+    st = pr_buf->th_doacross_info[j + 4];
+    if (st == 1) {
+      if (vec[i] < lo || vec[i] > up) {
+        KA_TRACE(20, ("__kmpc_doacross_wait() exit: T#%d iter %lld is out of "
+                      "bounds [%lld,%lld]\n",
+                      gtid, vec[i], lo, up));
+        return;
+      }
+      iter = vec[i] - lo;
+    } else if (st > 0) {
+      if (vec[i] < lo || vec[i] > up) {
+        KA_TRACE(20, ("__kmpc_doacross_wait() exit: T#%d iter %lld is out of "
+                      "bounds [%lld,%lld]\n",
+                      gtid, vec[i], lo, up));
+        return;
+      }
+      iter = (kmp_uint64)(vec[i] - lo) / st;
+    } else { // st < 0
+      if (vec[i] > lo || vec[i] < up) {
+        KA_TRACE(20, ("__kmpc_doacross_wait() exit: T#%d iter %lld is out of "
+                      "bounds [%lld,%lld]\n",
+                      gtid, vec[i], lo, up));
+        return;
+      }
+      iter = (kmp_uint64)(lo - vec[i]) / (-st);
+    }
+    iter_number = iter + ln * iter_number;
+  }
+  shft = iter_number % 32; // use 32-bit granularity
+  iter_number >>= 5; // divided by 32
+  flag = 1 << shft;
+  while ((flag & pr_buf->th_doacross_flags[iter_number]) == 0) {
+    KMP_YIELD(TRUE);
+  }
+  KA_TRACE(20,
+           ("__kmpc_doacross_wait() exit: T#%d wait for iter %lld completed\n",
+            gtid, (iter_number << 5) + shft));
+}
+
+void __kmpc_doacross_post(ident_t *loc, int gtid, long long *vec) {
+  kmp_int32 shft, num_dims, i;
+  kmp_uint32 flag;
+  kmp_int64 iter_number; // iteration number of "collapsed" loop nest
+  kmp_info_t *th = __kmp_threads[gtid];
+  kmp_team_t *team = th->th.th_team;
+  kmp_disp_t *pr_buf;
+  kmp_int64 lo, st;
+
+  KA_TRACE(20, ("__kmpc_doacross_post() enter: called T#%d\n", gtid));
+  if (team->t.t_serialized) {
+    KA_TRACE(20, ("__kmpc_doacross_post() exit: serialized team\n"));
+    return; // no dependencies if team is serialized
+  }
+
+  // calculate sequential iteration number (same as in "wait" but no
+  // out-of-bounds checks)
+  pr_buf = th->th.th_dispatch;
+  KMP_DEBUG_ASSERT(pr_buf->th_doacross_info != NULL);
+  num_dims = pr_buf->th_doacross_info[0];
+  lo = pr_buf->th_doacross_info[2];
+  st = pr_buf->th_doacross_info[4];
+  if (st == 1) { // most common case
+    iter_number = vec[0] - lo;
+  } else if (st > 0) {
+    iter_number = (kmp_uint64)(vec[0] - lo) / st;
+  } else { // negative increment
+    iter_number = (kmp_uint64)(lo - vec[0]) / (-st);
+  }
+  for (i = 1; i < num_dims; ++i) {
+    kmp_int64 iter, ln;
+    kmp_int32 j = i * 4;
+    ln = pr_buf->th_doacross_info[j + 1];
+    lo = pr_buf->th_doacross_info[j + 2];
+    st = pr_buf->th_doacross_info[j + 4];
+    if (st == 1) {
+      iter = vec[i] - lo;
+    } else if (st > 0) {
+      iter = (kmp_uint64)(vec[i] - lo) / st;
+    } else { // st < 0
+      iter = (kmp_uint64)(lo - vec[i]) / (-st);
+    }
+    iter_number = iter + ln * iter_number;
+  }
+  shft = iter_number % 32; // use 32-bit granularity
+  iter_number >>= 5; // divided by 32
+  flag = 1 << shft;
+  if ((flag & pr_buf->th_doacross_flags[iter_number]) == 0)
+    KMP_TEST_THEN_OR32((kmp_int32 *)&pr_buf->th_doacross_flags[iter_number],
+                       (kmp_int32)flag);
+  KA_TRACE(20, ("__kmpc_doacross_post() exit: T#%d iter %lld posted\n", gtid,
+                (iter_number << 5) + shft));
+}
+
+void __kmpc_doacross_fini(ident_t *loc, int gtid) {
+  kmp_int64 num_done;
+  kmp_info_t *th = __kmp_threads[gtid];
+  kmp_team_t *team = th->th.th_team;
+  kmp_disp_t *pr_buf = th->th.th_dispatch;
+
+  KA_TRACE(20, ("__kmpc_doacross_fini() enter: called T#%d\n", gtid));
+  if (team->t.t_serialized) {
+    KA_TRACE(20, ("__kmpc_doacross_fini() exit: serialized team %p\n", team));
+    return; // nothing to do
+  }
+  num_done = KMP_TEST_THEN_INC64((kmp_int64 *)pr_buf->th_doacross_info[1]) + 1;
+  if (num_done == th->th.th_team_nproc) {
+    // we are the last thread, need to free shared resources
+    int idx = pr_buf->th_doacross_buf_idx - 1;
+    dispatch_shared_info_t *sh_buf =
+        &team->t.t_disp_buffer[idx % __kmp_dispatch_num_buffers];
+    KMP_DEBUG_ASSERT(pr_buf->th_doacross_info[1] ==
+                     (kmp_int64)&sh_buf->doacross_num_done);
+    KMP_DEBUG_ASSERT(num_done == (kmp_int64)sh_buf->doacross_num_done);
+    KMP_DEBUG_ASSERT(idx == sh_buf->doacross_buf_idx);
+    __kmp_thread_free(th, (void *)sh_buf->doacross_flags);
+    sh_buf->doacross_flags = NULL;
+    sh_buf->doacross_num_done = 0;
+    sh_buf->doacross_buf_idx +=
+        __kmp_dispatch_num_buffers; // free buffer for future re-use
+  }
+  // free private resources (need to keep buffer index forever)
+  __kmp_thread_free(th, (void *)pr_buf->th_doacross_info);
+  pr_buf->th_doacross_info = NULL;
+  KA_TRACE(20, ("__kmpc_doacross_fini() exit: T#%d\n", gtid));
 }
 #endif
 
 // end of file //
-

Modified: openmp/trunk/runtime/src/kmp_debug.cpp
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_debug.cpp?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_debug.cpp (original)
+++ openmp/trunk/runtime/src/kmp_debug.cpp Fri May 12 13:01:32 2017
@@ -19,124 +19,116 @@
 #include "kmp_io.h"
 
 #ifdef KMP_DEBUG
-void
-__kmp_debug_printf_stdout( char const * format, ... )
-{
-    va_list ap;
-    va_start( ap, format );
+void __kmp_debug_printf_stdout(char const *format, ...) {
+  va_list ap;
+  va_start(ap, format);
 
-    __kmp_vprintf( kmp_out, format, ap );
+  __kmp_vprintf(kmp_out, format, ap);
 
-    va_end(ap);
+  va_end(ap);
 }
 #endif
 
-void
-__kmp_debug_printf( char const * format, ... )
-{
-    va_list ap;
-    va_start( ap, format );
+void __kmp_debug_printf(char const *format, ...) {
+  va_list ap;
+  va_start(ap, format);
 
-    __kmp_vprintf( kmp_err, format, ap );
+  __kmp_vprintf(kmp_err, format, ap);
 
-    va_end( ap );
+  va_end(ap);
 }
 
 #ifdef KMP_USE_ASSERT
-    int
-    __kmp_debug_assert(
-        char const *  msg,
-        char const *  file,
-        int           line
-    ) {
-
-        if ( file == NULL ) {
-            file = KMP_I18N_STR( UnknownFile );
-        } else {
-            // Remove directories from path, leave only file name. File name is enough, there is no need
-            // in bothering developers and customers with full paths.
-            char const * slash = strrchr( file, '/' );
-            if ( slash != NULL ) {
-                file = slash + 1;
-            }; // if
-        }; // if
-
-        #ifdef KMP_DEBUG
-            __kmp_acquire_bootstrap_lock( & __kmp_stdio_lock );
-            __kmp_debug_printf( "Assertion failure at %s(%d): %s.\n", file, line, msg );
-            __kmp_release_bootstrap_lock( & __kmp_stdio_lock );
-            #ifdef USE_ASSERT_BREAK
-                #if KMP_OS_WINDOWS
-                    DebugBreak();
-                #endif
-            #endif // USE_ASSERT_BREAK
-            #ifdef USE_ASSERT_STALL
-                /*    __kmp_infinite_loop(); */
-                for(;;);
-            #endif // USE_ASSERT_STALL
-            #ifdef USE_ASSERT_SEG
-                {
-                    int volatile * ZERO = (int*) 0;
-                    ++ (*ZERO);
-                }
-            #endif // USE_ASSERT_SEG
-        #endif
-
-        __kmp_msg(
-            kmp_ms_fatal,
-            KMP_MSG( AssertionFailure, file, line ),
-            KMP_HNT( SubmitBugReport ),
-            __kmp_msg_null
-        );
+int __kmp_debug_assert(char const *msg, char const *file, int line) {
 
-        return 0;
+  if (file == NULL) {
+    file = KMP_I18N_STR(UnknownFile);
+  } else {
+    // Remove directories from path, leave only file name. File name is enough,
+    // there is no need in bothering developers and customers with full paths.
+    char const *slash = strrchr(file, '/');
+    if (slash != NULL) {
+      file = slash + 1;
+    }; // if
+  }; // if
 
-    } // __kmp_debug_assert
+#ifdef KMP_DEBUG
+  __kmp_acquire_bootstrap_lock(&__kmp_stdio_lock);
+  __kmp_debug_printf("Assertion failure at %s(%d): %s.\n", file, line, msg);
+  __kmp_release_bootstrap_lock(&__kmp_stdio_lock);
+#ifdef USE_ASSERT_BREAK
+#if KMP_OS_WINDOWS
+  DebugBreak();
+#endif
+#endif // USE_ASSERT_BREAK
+#ifdef USE_ASSERT_STALL
+  /*    __kmp_infinite_loop(); */
+  for (;;)
+    ;
+#endif // USE_ASSERT_STALL
+#ifdef USE_ASSERT_SEG
+  {
+    int volatile *ZERO = (int *)0;
+    ++(*ZERO);
+  }
+#endif // USE_ASSERT_SEG
+#endif
+
+  __kmp_msg(kmp_ms_fatal, KMP_MSG(AssertionFailure, file, line),
+            KMP_HNT(SubmitBugReport), __kmp_msg_null);
+
+  return 0;
+
+} // __kmp_debug_assert
 
 #endif // KMP_USE_ASSERT
 
 /* Dump debugging buffer to stderr */
-void
-__kmp_dump_debug_buffer( void )
-{
-    if ( __kmp_debug_buffer != NULL ) {
-        int i;
-        int dc = __kmp_debug_count;
-        char *db = & __kmp_debug_buffer[ (dc % __kmp_debug_buf_lines) * __kmp_debug_buf_chars ];
-        char *db_end = & __kmp_debug_buffer[ __kmp_debug_buf_lines * __kmp_debug_buf_chars ];
-        char *db2;
-
-        __kmp_acquire_bootstrap_lock( & __kmp_stdio_lock );
-        __kmp_printf_no_lock( "\nStart dump of debugging buffer (entry=%d):\n",
-                 dc % __kmp_debug_buf_lines );
-
-        for ( i = 0; i < __kmp_debug_buf_lines; i++ ) {
-
-            if ( *db != '\0' ) {
-                /* Fix up where no carriage return before string termination char */
-                for ( db2 = db + 1; db2 < db + __kmp_debug_buf_chars - 1; db2 ++) {
-                    if ( *db2 == '\0' ) {
-                        if ( *(db2-1) != '\n' ) { *db2 = '\n'; *(db2+1) = '\0'; }
-                        break;
-                    }
-                }
-                /* Handle case at end by shortening the printed message by one char if necessary */
-                if ( db2 == db + __kmp_debug_buf_chars - 1 &&
-                     *db2 == '\0' && *(db2-1) != '\n' ) {
-                    *(db2-1) = '\n';
-                }
-
-                __kmp_printf_no_lock( "%4d: %.*s", i, __kmp_debug_buf_chars, db );
-                *db = '\0'; /* only let it print once! */
+void __kmp_dump_debug_buffer(void) {
+  if (__kmp_debug_buffer != NULL) {
+    int i;
+    int dc = __kmp_debug_count;
+    char *db = &__kmp_debug_buffer[(dc % __kmp_debug_buf_lines) *
+                                   __kmp_debug_buf_chars];
+    char *db_end =
+        &__kmp_debug_buffer[__kmp_debug_buf_lines * __kmp_debug_buf_chars];
+    char *db2;
+
+    __kmp_acquire_bootstrap_lock(&__kmp_stdio_lock);
+    __kmp_printf_no_lock("\nStart dump of debugging buffer (entry=%d):\n",
+                         dc % __kmp_debug_buf_lines);
+
+    for (i = 0; i < __kmp_debug_buf_lines; i++) {
+
+      if (*db != '\0') {
+        /* Fix up where no carriage return before string termination char */
+        for (db2 = db + 1; db2 < db + __kmp_debug_buf_chars - 1; db2++) {
+          if (*db2 == '\0') {
+            if (*(db2 - 1) != '\n') {
+              *db2 = '\n';
+              *(db2 + 1) = '\0';
             }
-
-            db += __kmp_debug_buf_chars;
-            if ( db >= db_end )
-                db = __kmp_debug_buffer;
+            break;
+          }
+        }
+        /* Handle case at end by shortening the printed message by one char if
+         * necessary */
+        if (db2 == db + __kmp_debug_buf_chars - 1 && *db2 == '\0' &&
+            *(db2 - 1) != '\n') {
+          *(db2 - 1) = '\n';
         }
 
-        __kmp_printf_no_lock( "End dump of debugging buffer (entry=%d).\n\n",
-                 ( dc+i-1 ) % __kmp_debug_buf_lines );
-        __kmp_release_bootstrap_lock( & __kmp_stdio_lock );
+        __kmp_printf_no_lock("%4d: %.*s", i, __kmp_debug_buf_chars, db);
+        *db = '\0'; /* only let it print once! */
+      }
+
+      db += __kmp_debug_buf_chars;
+      if (db >= db_end)
+        db = __kmp_debug_buffer;
     }
+
+    __kmp_printf_no_lock("End dump of debugging buffer (entry=%d).\n\n",
+                         (dc + i - 1) % __kmp_debug_buf_lines);
+    __kmp_release_bootstrap_lock(&__kmp_stdio_lock);
+  }
 }

Modified: openmp/trunk/runtime/src/kmp_debug.h
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_debug.h?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_debug.h (original)
+++ openmp/trunk/runtime/src/kmp_debug.h Fri May 12 13:01:32 2017
@@ -19,94 +19,155 @@
 #include <stdarg.h>
 
 #ifdef __cplusplus
-    extern "C" {
+extern "C" {
 #endif // __cplusplus
 
-// -------------------------------------------------------------------------------------------------
+// -----------------------------------------------------------------------------
 // Build-time assertion.
-// -------------------------------------------------------------------------------------------------
 
 // New C++11 style build assert
-#define KMP_BUILD_ASSERT( expr )            static_assert(expr, "Build condition error")
+#define KMP_BUILD_ASSERT(expr) static_assert(expr, "Build condition error")
 
-// -------------------------------------------------------------------------------------------------
+// -----------------------------------------------------------------------------
 // Run-time assertions.
-// -------------------------------------------------------------------------------------------------
 
-extern void __kmp_dump_debug_buffer( void );
+extern void __kmp_dump_debug_buffer(void);
 
 #ifdef KMP_USE_ASSERT
-    extern int __kmp_debug_assert( char const * expr, char const * file, int line );
-    #ifdef KMP_DEBUG
-        #define KMP_ASSERT( cond )             ( (cond) ? 0 : __kmp_debug_assert( #cond, __FILE__, __LINE__ ) )
-        #define KMP_ASSERT2( cond, msg )       ( (cond) ? 0 : __kmp_debug_assert( (msg), __FILE__, __LINE__ ) )
-        #define KMP_DEBUG_ASSERT( cond )       KMP_ASSERT( cond )
-        #define KMP_DEBUG_ASSERT2( cond, msg ) KMP_ASSERT2( cond, msg )
-    #else
-        // Do not expose condition in release build. Use "assertion failure".
-        #define KMP_ASSERT( cond )             ( (cond) ? 0 : __kmp_debug_assert( "assertion failure", __FILE__, __LINE__ ) )
-        #define KMP_ASSERT2( cond, msg )       KMP_ASSERT( cond )
-        #define KMP_DEBUG_ASSERT( cond )       0
-        #define KMP_DEBUG_ASSERT2( cond, msg ) 0
-    #endif // KMP_DEBUG
+extern int __kmp_debug_assert(char const *expr, char const *file, int line);
+#ifdef KMP_DEBUG
+#define KMP_ASSERT(cond)                                                       \
+  ((cond) ? 0 : __kmp_debug_assert(#cond, __FILE__, __LINE__))
+#define KMP_ASSERT2(cond, msg)                                                 \
+  ((cond) ? 0 : __kmp_debug_assert((msg), __FILE__, __LINE__))
+#define KMP_DEBUG_ASSERT(cond) KMP_ASSERT(cond)
+#define KMP_DEBUG_ASSERT2(cond, msg) KMP_ASSERT2(cond, msg)
+#else
+// Do not expose condition in release build. Use "assertion failure".
+#define KMP_ASSERT(cond)                                                       \
+  ((cond) ? 0 : __kmp_debug_assert("assertion failure", __FILE__, __LINE__))
+#define KMP_ASSERT2(cond, msg) KMP_ASSERT(cond)
+#define KMP_DEBUG_ASSERT(cond) 0
+#define KMP_DEBUG_ASSERT2(cond, msg) 0
+#endif // KMP_DEBUG
 #else
-    #define KMP_ASSERT( cond )             0
-    #define KMP_ASSERT2( cond, msg )       0
-    #define KMP_DEBUG_ASSERT( cond )       0
-    #define KMP_DEBUG_ASSERT2( cond, msg ) 0
+#define KMP_ASSERT(cond) 0
+#define KMP_ASSERT2(cond, msg) 0
+#define KMP_DEBUG_ASSERT(cond) 0
+#define KMP_DEBUG_ASSERT2(cond, msg) 0
 #endif // KMP_USE_ASSERT
 
 #ifdef KMP_DEBUG
-    extern void __kmp_debug_printf_stdout( char const * format, ... );
+extern void __kmp_debug_printf_stdout(char const *format, ...);
 #endif
-extern void __kmp_debug_printf( char const * format, ... );
+extern void __kmp_debug_printf(char const *format, ...);
 
 #ifdef KMP_DEBUG
 
-    extern int kmp_a_debug;
-    extern int kmp_b_debug;
-    extern int kmp_c_debug;
-    extern int kmp_d_debug;
-    extern int kmp_e_debug;
-    extern int kmp_f_debug;
-    extern int kmp_diag;
-
-    #define KA_TRACE(d,x)     if (kmp_a_debug >= d) { __kmp_debug_printf x ; }
-    #define KB_TRACE(d,x)     if (kmp_b_debug >= d) { __kmp_debug_printf x ; }
-    #define KC_TRACE(d,x)     if (kmp_c_debug >= d) { __kmp_debug_printf x ; }
-    #define KD_TRACE(d,x)     if (kmp_d_debug >= d) { __kmp_debug_printf x ; }
-    #define KE_TRACE(d,x)     if (kmp_e_debug >= d) { __kmp_debug_printf x ; }
-    #define KF_TRACE(d,x)     if (kmp_f_debug >= d) { __kmp_debug_printf x ; }
-    #define K_DIAG(d,x)       {if (kmp_diag == d) { __kmp_debug_printf_stdout x ; } }
-
-    #define KA_DUMP(d,x)     if (kmp_a_debug >= d) { int ks; __kmp_disable(&ks); (x) ; __kmp_enable(ks); }
-    #define KB_DUMP(d,x)     if (kmp_b_debug >= d) { int ks; __kmp_disable(&ks); (x) ; __kmp_enable(ks); }
-    #define KC_DUMP(d,x)     if (kmp_c_debug >= d) { int ks; __kmp_disable(&ks); (x) ; __kmp_enable(ks); }
-    #define KD_DUMP(d,x)     if (kmp_d_debug >= d) { int ks; __kmp_disable(&ks); (x) ; __kmp_enable(ks); }
-    #define KE_DUMP(d,x)     if (kmp_e_debug >= d) { int ks; __kmp_disable(&ks); (x) ; __kmp_enable(ks); }
-    #define KF_DUMP(d,x)     if (kmp_f_debug >= d) { int ks; __kmp_disable(&ks); (x) ; __kmp_enable(ks); }
+extern int kmp_a_debug;
+extern int kmp_b_debug;
+extern int kmp_c_debug;
+extern int kmp_d_debug;
+extern int kmp_e_debug;
+extern int kmp_f_debug;
+extern int kmp_diag;
+
+#define KA_TRACE(d, x)                                                         \
+  if (kmp_a_debug >= d) {                                                      \
+    __kmp_debug_printf x;                                                      \
+  }
+#define KB_TRACE(d, x)                                                         \
+  if (kmp_b_debug >= d) {                                                      \
+    __kmp_debug_printf x;                                                      \
+  }
+#define KC_TRACE(d, x)                                                         \
+  if (kmp_c_debug >= d) {                                                      \
+    __kmp_debug_printf x;                                                      \
+  }
+#define KD_TRACE(d, x)                                                         \
+  if (kmp_d_debug >= d) {                                                      \
+    __kmp_debug_printf x;                                                      \
+  }
+#define KE_TRACE(d, x)                                                         \
+  if (kmp_e_debug >= d) {                                                      \
+    __kmp_debug_printf x;                                                      \
+  }
+#define KF_TRACE(d, x)                                                         \
+  if (kmp_f_debug >= d) {                                                      \
+    __kmp_debug_printf x;                                                      \
+  }
+#define K_DIAG(d, x)                                                           \
+  {                                                                            \
+    if (kmp_diag == d) {                                                       \
+      __kmp_debug_printf_stdout x;                                             \
+    }                                                                          \
+  }
+
+#define KA_DUMP(d, x)                                                          \
+  if (kmp_a_debug >= d) {                                                      \
+    int ks;                                                                    \
+    __kmp_disable(&ks);                                                        \
+    (x);                                                                       \
+    __kmp_enable(ks);                                                          \
+  }
+#define KB_DUMP(d, x)                                                          \
+  if (kmp_b_debug >= d) {                                                      \
+    int ks;                                                                    \
+    __kmp_disable(&ks);                                                        \
+    (x);                                                                       \
+    __kmp_enable(ks);                                                          \
+  }
+#define KC_DUMP(d, x)                                                          \
+  if (kmp_c_debug >= d) {                                                      \
+    int ks;                                                                    \
+    __kmp_disable(&ks);                                                        \
+    (x);                                                                       \
+    __kmp_enable(ks);                                                          \
+  }
+#define KD_DUMP(d, x)                                                          \
+  if (kmp_d_debug >= d) {                                                      \
+    int ks;                                                                    \
+    __kmp_disable(&ks);                                                        \
+    (x);                                                                       \
+    __kmp_enable(ks);                                                          \
+  }
+#define KE_DUMP(d, x)                                                          \
+  if (kmp_e_debug >= d) {                                                      \
+    int ks;                                                                    \
+    __kmp_disable(&ks);                                                        \
+    (x);                                                                       \
+    __kmp_enable(ks);                                                          \
+  }
+#define KF_DUMP(d, x)                                                          \
+  if (kmp_f_debug >= d) {                                                      \
+    int ks;                                                                    \
+    __kmp_disable(&ks);                                                        \
+    (x);                                                                       \
+    __kmp_enable(ks);                                                          \
+  }
 
 #else
 
-    #define KA_TRACE(d,x)     /* nothing to do */
-    #define KB_TRACE(d,x)     /* nothing to do */
-    #define KC_TRACE(d,x)     /* nothing to do */
-    #define KD_TRACE(d,x)     /* nothing to do */
-    #define KE_TRACE(d,x)     /* nothing to do */
-    #define KF_TRACE(d,x)     /* nothing to do */
-    #define K_DIAG(d,x)       {}/* nothing to do */
-
-    #define KA_DUMP(d,x)     /* nothing to do */
-    #define KB_DUMP(d,x)     /* nothing to do */
-    #define KC_DUMP(d,x)     /* nothing to do */
-    #define KD_DUMP(d,x)     /* nothing to do */
-    #define KE_DUMP(d,x)     /* nothing to do */
-    #define KF_DUMP(d,x)     /* nothing to do */
+#define KA_TRACE(d, x) /* nothing to do */
+#define KB_TRACE(d, x) /* nothing to do */
+#define KC_TRACE(d, x) /* nothing to do */
+#define KD_TRACE(d, x) /* nothing to do */
+#define KE_TRACE(d, x) /* nothing to do */
+#define KF_TRACE(d, x) /* nothing to do */
+#define K_DIAG(d, x)                                                           \
+  {} /* nothing to do */
+
+#define KA_DUMP(d, x) /* nothing to do */
+#define KB_DUMP(d, x) /* nothing to do */
+#define KC_DUMP(d, x) /* nothing to do */
+#define KD_DUMP(d, x) /* nothing to do */
+#define KE_DUMP(d, x) /* nothing to do */
+#define KF_DUMP(d, x) /* nothing to do */
 
 #endif // KMP_DEBUG
 
 #ifdef __cplusplus
-    } // extern "C"
+} // extern "C"
 #endif // __cplusplus
 
 #endif /* KMP_DEBUG_H */

Modified: openmp/trunk/runtime/src/kmp_debugger.cpp
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_debugger.cpp?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_debugger.cpp (original)
+++ openmp/trunk/runtime/src/kmp_debugger.cpp Fri May 12 13:01:32 2017
@@ -1,6 +1,6 @@
 #if USE_DEBUGGER
 /*
- * kmp_debugger.c -- debugger support.
+ * kmp_debugger.cpp -- debugger support.
  */
 
 
@@ -19,47 +19,36 @@
 #include "kmp_omp.h"
 #include "kmp_str.h"
 
-/*
-    NOTE: All variable names are known to the debugger, do not change!
-*/
+// NOTE: All variable names are known to the debugger, do not change!
 
 #ifdef __cplusplus
-    extern "C" {
-        extern kmp_omp_struct_info_t __kmp_omp_debug_struct_info;
-    } // extern "C"
+extern "C" {
+extern kmp_omp_struct_info_t __kmp_omp_debug_struct_info;
+} // extern "C"
 #endif // __cplusplus
 
-int __kmp_debugging          = FALSE;    // Boolean whether currently debugging OpenMP RTL.
+int __kmp_debugging = FALSE; // Boolean whether currently debugging OpenMP RTL.
+
+#define offset_and_size_of(structure, field)                                   \
+  { offsetof(structure, field), sizeof(((structure *)NULL)->field) }
 
-#define offset_and_size_of( structure, field )     \
-    {                                              \
-        offsetof( structure,           field ),    \
-        sizeof( ( (structure *) NULL)->field )     \
-    }
-
-#define offset_and_size_not_available \
-    { -1, -1 }
-
-#define addr_and_size_of( var )                    \
-    {                                              \
-        (kmp_uint64)( & var ),                     \
-        sizeof( var )                              \
-    }
+#define offset_and_size_not_available                                          \
+  { -1, -1 }
+
+#define addr_and_size_of(var)                                                  \
+  { (kmp_uint64)(&var), sizeof(var) }
 
 #define nthr_buffer_size 1024
-static kmp_int32
-kmp_omp_nthr_info_buffer[ nthr_buffer_size ] =
-    { nthr_buffer_size * sizeof( kmp_int32 ) };
+static kmp_int32 kmp_omp_nthr_info_buffer[nthr_buffer_size] = {
+    nthr_buffer_size * sizeof(kmp_int32)};
 
 /* TODO: Check punctuation for various platforms here */
-static char func_microtask[]    = "__kmp_invoke_microtask";
-static char func_fork[]         = "__kmpc_fork_call";
-static char func_fork_teams[]   = "__kmpc_fork_teams";
-
+static char func_microtask[] = "__kmp_invoke_microtask";
+static char func_fork[] = "__kmpc_fork_call";
+static char func_fork_teams[] = "__kmpc_fork_teams";
 
 // Various info about runtime structures: addresses, field offsets, sizes, etc.
-kmp_omp_struct_info_t
-__kmp_omp_debug_struct_info = {
+kmp_omp_struct_info_t __kmp_omp_debug_struct_info = {
 
     /* Change this only if you make a fundamental data structure change here */
     KMP_OMP_VERSION,
@@ -67,166 +56,167 @@ __kmp_omp_debug_struct_info = {
     /* sanity check.  Only should be checked if versions are identical
      * This is also used for backward compatibility to get the runtime
      * structure size if it the runtime is older than the interface */
-    sizeof( kmp_omp_struct_info_t ),
+    sizeof(kmp_omp_struct_info_t),
 
     /* OpenMP RTL version info. */
-    addr_and_size_of( __kmp_version_major ),
-    addr_and_size_of( __kmp_version_minor ),
-    addr_and_size_of( __kmp_version_build ),
-    addr_and_size_of( __kmp_openmp_version ),
-    { (kmp_uint64)( __kmp_copyright ) + KMP_VERSION_MAGIC_LEN, 0 },        // Skip magic prefix.
+    addr_and_size_of(__kmp_version_major),
+    addr_and_size_of(__kmp_version_minor),
+    addr_and_size_of(__kmp_version_build),
+    addr_and_size_of(__kmp_openmp_version),
+    {(kmp_uint64)(__kmp_copyright) + KMP_VERSION_MAGIC_LEN,
+     0}, // Skip magic prefix.
 
     /* Various globals. */
-    addr_and_size_of( __kmp_threads ),
-    addr_and_size_of( __kmp_root ),
-    addr_and_size_of( __kmp_threads_capacity ),
-    addr_and_size_of( __kmp_monitor ),
-#if ! KMP_USE_DYNAMIC_LOCK
-    addr_and_size_of( __kmp_user_lock_table ),
+    addr_and_size_of(__kmp_threads),
+    addr_and_size_of(__kmp_root),
+    addr_and_size_of(__kmp_threads_capacity),
+    addr_and_size_of(__kmp_monitor),
+#if !KMP_USE_DYNAMIC_LOCK
+    addr_and_size_of(__kmp_user_lock_table),
 #endif
-    addr_and_size_of( func_microtask ),
-    addr_and_size_of( func_fork ),
-    addr_and_size_of( func_fork_teams ),
-    addr_and_size_of( __kmp_team_counter ),
-    addr_and_size_of( __kmp_task_counter ),
-    addr_and_size_of( kmp_omp_nthr_info_buffer ),
-    sizeof( void * ),
+    addr_and_size_of(func_microtask),
+    addr_and_size_of(func_fork),
+    addr_and_size_of(func_fork_teams),
+    addr_and_size_of(__kmp_team_counter),
+    addr_and_size_of(__kmp_task_counter),
+    addr_and_size_of(kmp_omp_nthr_info_buffer),
+    sizeof(void *),
     OMP_LOCK_T_SIZE < sizeof(void *),
     bs_last_barrier,
     INITIAL_TASK_DEQUE_SIZE,
 
     // thread structure information
-    sizeof( kmp_base_info_t ),
-    offset_and_size_of( kmp_base_info_t, th_info ),
-    offset_and_size_of( kmp_base_info_t, th_team ),
-    offset_and_size_of( kmp_base_info_t, th_root ),
-    offset_and_size_of( kmp_base_info_t, th_serial_team ),
-    offset_and_size_of( kmp_base_info_t, th_ident ),
-    offset_and_size_of( kmp_base_info_t, th_spin_here    ),
-    offset_and_size_of( kmp_base_info_t, th_next_waiting ),
-    offset_and_size_of( kmp_base_info_t, th_task_team    ),
-    offset_and_size_of( kmp_base_info_t, th_current_task ),
-    offset_and_size_of( kmp_base_info_t, th_task_state   ),
-    offset_and_size_of( kmp_base_info_t,   th_bar ),
-    offset_and_size_of( kmp_bstate_t,      b_worker_arrived ),
+    sizeof(kmp_base_info_t),
+    offset_and_size_of(kmp_base_info_t, th_info),
+    offset_and_size_of(kmp_base_info_t, th_team),
+    offset_and_size_of(kmp_base_info_t, th_root),
+    offset_and_size_of(kmp_base_info_t, th_serial_team),
+    offset_and_size_of(kmp_base_info_t, th_ident),
+    offset_and_size_of(kmp_base_info_t, th_spin_here),
+    offset_and_size_of(kmp_base_info_t, th_next_waiting),
+    offset_and_size_of(kmp_base_info_t, th_task_team),
+    offset_and_size_of(kmp_base_info_t, th_current_task),
+    offset_and_size_of(kmp_base_info_t, th_task_state),
+    offset_and_size_of(kmp_base_info_t, th_bar),
+    offset_and_size_of(kmp_bstate_t, b_worker_arrived),
 
 #if OMP_40_ENABLED
     // teams information
-    offset_and_size_of( kmp_base_info_t, th_teams_microtask),
-    offset_and_size_of( kmp_base_info_t, th_teams_level),
-    offset_and_size_of( kmp_teams_size_t, nteams ),
-    offset_and_size_of( kmp_teams_size_t, nth ),
+    offset_and_size_of(kmp_base_info_t, th_teams_microtask),
+    offset_and_size_of(kmp_base_info_t, th_teams_level),
+    offset_and_size_of(kmp_teams_size_t, nteams),
+    offset_and_size_of(kmp_teams_size_t, nth),
 #endif
 
     // kmp_desc structure (for info field above)
-    sizeof( kmp_desc_base_t ),
-    offset_and_size_of( kmp_desc_base_t, ds_tid    ),
-    offset_and_size_of( kmp_desc_base_t, ds_gtid   ),
-    // On Windows* OS, ds_thread contains a thread /handle/, which is not usable, while thread /id/
-    // is in ds_thread_id.
-    #if KMP_OS_WINDOWS
-    offset_and_size_of( kmp_desc_base_t, ds_thread_id),
-    #else
-    offset_and_size_of( kmp_desc_base_t, ds_thread),
-    #endif
+    sizeof(kmp_desc_base_t),
+    offset_and_size_of(kmp_desc_base_t, ds_tid),
+    offset_and_size_of(kmp_desc_base_t, ds_gtid),
+// On Windows* OS, ds_thread contains a thread /handle/, which is not usable,
+// while thread /id/ is in ds_thread_id.
+#if KMP_OS_WINDOWS
+    offset_and_size_of(kmp_desc_base_t, ds_thread_id),
+#else
+    offset_and_size_of(kmp_desc_base_t, ds_thread),
+#endif
 
     // team structure information
-    sizeof( kmp_base_team_t ),
-    offset_and_size_of( kmp_base_team_t,   t_master_tid ),
-    offset_and_size_of( kmp_base_team_t,   t_ident      ),
-    offset_and_size_of( kmp_base_team_t,   t_parent     ),
-    offset_and_size_of( kmp_base_team_t,   t_nproc      ),
-    offset_and_size_of( kmp_base_team_t,   t_threads    ),
-    offset_and_size_of( kmp_base_team_t,   t_serialized ),
-    offset_and_size_of( kmp_base_team_t,   t_id         ),
-    offset_and_size_of( kmp_base_team_t,   t_pkfn       ),
-    offset_and_size_of( kmp_base_team_t,   t_task_team ),
-    offset_and_size_of( kmp_base_team_t,   t_implicit_task_taskdata ),
+    sizeof(kmp_base_team_t),
+    offset_and_size_of(kmp_base_team_t, t_master_tid),
+    offset_and_size_of(kmp_base_team_t, t_ident),
+    offset_and_size_of(kmp_base_team_t, t_parent),
+    offset_and_size_of(kmp_base_team_t, t_nproc),
+    offset_and_size_of(kmp_base_team_t, t_threads),
+    offset_and_size_of(kmp_base_team_t, t_serialized),
+    offset_and_size_of(kmp_base_team_t, t_id),
+    offset_and_size_of(kmp_base_team_t, t_pkfn),
+    offset_and_size_of(kmp_base_team_t, t_task_team),
+    offset_and_size_of(kmp_base_team_t, t_implicit_task_taskdata),
 #if OMP_40_ENABLED
-    offset_and_size_of( kmp_base_team_t,   t_cancel_request ),
+    offset_and_size_of(kmp_base_team_t, t_cancel_request),
 #endif
-    offset_and_size_of( kmp_base_team_t,   t_bar ),
-    offset_and_size_of( kmp_balign_team_t, b_master_arrived ),
-    offset_and_size_of( kmp_balign_team_t, b_team_arrived ),
+    offset_and_size_of(kmp_base_team_t, t_bar),
+    offset_and_size_of(kmp_balign_team_t, b_master_arrived),
+    offset_and_size_of(kmp_balign_team_t, b_team_arrived),
 
     // root structure information
-    sizeof( kmp_base_root_t ),
-    offset_and_size_of( kmp_base_root_t, r_root_team   ),
-    offset_and_size_of( kmp_base_root_t, r_hot_team    ),
-    offset_and_size_of( kmp_base_root_t, r_uber_thread ),
+    sizeof(kmp_base_root_t),
+    offset_and_size_of(kmp_base_root_t, r_root_team),
+    offset_and_size_of(kmp_base_root_t, r_hot_team),
+    offset_and_size_of(kmp_base_root_t, r_uber_thread),
     offset_and_size_not_available,
 
     // ident structure information
-    sizeof( ident_t ),
-    offset_and_size_of( ident_t, psource ),
-    offset_and_size_of( ident_t, flags   ),
+    sizeof(ident_t),
+    offset_and_size_of(ident_t, psource),
+    offset_and_size_of(ident_t, flags),
 
     // lock structure information
-    sizeof( kmp_base_queuing_lock_t ),
-    offset_and_size_of( kmp_base_queuing_lock_t, initialized  ),
-    offset_and_size_of( kmp_base_queuing_lock_t, location ),
-    offset_and_size_of( kmp_base_queuing_lock_t, tail_id  ),
-    offset_and_size_of( kmp_base_queuing_lock_t, head_id  ),
-    offset_and_size_of( kmp_base_queuing_lock_t, next_ticket  ),
-    offset_and_size_of( kmp_base_queuing_lock_t, now_serving  ),
-    offset_and_size_of( kmp_base_queuing_lock_t, owner_id     ),
-    offset_and_size_of( kmp_base_queuing_lock_t, depth_locked ),
-    offset_and_size_of( kmp_base_queuing_lock_t, flags ),
+    sizeof(kmp_base_queuing_lock_t),
+    offset_and_size_of(kmp_base_queuing_lock_t, initialized),
+    offset_and_size_of(kmp_base_queuing_lock_t, location),
+    offset_and_size_of(kmp_base_queuing_lock_t, tail_id),
+    offset_and_size_of(kmp_base_queuing_lock_t, head_id),
+    offset_and_size_of(kmp_base_queuing_lock_t, next_ticket),
+    offset_and_size_of(kmp_base_queuing_lock_t, now_serving),
+    offset_and_size_of(kmp_base_queuing_lock_t, owner_id),
+    offset_and_size_of(kmp_base_queuing_lock_t, depth_locked),
+    offset_and_size_of(kmp_base_queuing_lock_t, flags),
 
-#if ! KMP_USE_DYNAMIC_LOCK
+#if !KMP_USE_DYNAMIC_LOCK
     /* Lock table. */
-    sizeof( kmp_lock_table_t ),
-    offset_and_size_of( kmp_lock_table_t, used       ),
-    offset_and_size_of( kmp_lock_table_t, allocated  ),
-    offset_and_size_of( kmp_lock_table_t, table      ),
+    sizeof(kmp_lock_table_t),
+    offset_and_size_of(kmp_lock_table_t, used),
+    offset_and_size_of(kmp_lock_table_t, allocated),
+    offset_and_size_of(kmp_lock_table_t, table),
 #endif
 
     // Task team structure information.
-    sizeof( kmp_base_task_team_t ),
-    offset_and_size_of( kmp_base_task_team_t, tt_threads_data       ),
-    offset_and_size_of( kmp_base_task_team_t, tt_found_tasks        ),
-    offset_and_size_of( kmp_base_task_team_t, tt_nproc              ),
-    offset_and_size_of( kmp_base_task_team_t, tt_unfinished_threads ),
-    offset_and_size_of( kmp_base_task_team_t, tt_active             ),
+    sizeof(kmp_base_task_team_t),
+    offset_and_size_of(kmp_base_task_team_t, tt_threads_data),
+    offset_and_size_of(kmp_base_task_team_t, tt_found_tasks),
+    offset_and_size_of(kmp_base_task_team_t, tt_nproc),
+    offset_and_size_of(kmp_base_task_team_t, tt_unfinished_threads),
+    offset_and_size_of(kmp_base_task_team_t, tt_active),
 
     // task_data_t.
-    sizeof( kmp_taskdata_t ),
-    offset_and_size_of( kmp_taskdata_t, td_task_id                ),
-    offset_and_size_of( kmp_taskdata_t, td_flags                  ),
-    offset_and_size_of( kmp_taskdata_t, td_team                   ),
-    offset_and_size_of( kmp_taskdata_t, td_parent                 ),
-    offset_and_size_of( kmp_taskdata_t, td_level                  ),
-    offset_and_size_of( kmp_taskdata_t, td_ident                  ),
-    offset_and_size_of( kmp_taskdata_t, td_allocated_child_tasks  ),
-    offset_and_size_of( kmp_taskdata_t, td_incomplete_child_tasks ),
-
-    offset_and_size_of( kmp_taskdata_t, td_taskwait_ident   ),
-    offset_and_size_of( kmp_taskdata_t, td_taskwait_counter ),
-    offset_and_size_of( kmp_taskdata_t, td_taskwait_thread  ),
+    sizeof(kmp_taskdata_t),
+    offset_and_size_of(kmp_taskdata_t, td_task_id),
+    offset_and_size_of(kmp_taskdata_t, td_flags),
+    offset_and_size_of(kmp_taskdata_t, td_team),
+    offset_and_size_of(kmp_taskdata_t, td_parent),
+    offset_and_size_of(kmp_taskdata_t, td_level),
+    offset_and_size_of(kmp_taskdata_t, td_ident),
+    offset_and_size_of(kmp_taskdata_t, td_allocated_child_tasks),
+    offset_and_size_of(kmp_taskdata_t, td_incomplete_child_tasks),
+
+    offset_and_size_of(kmp_taskdata_t, td_taskwait_ident),
+    offset_and_size_of(kmp_taskdata_t, td_taskwait_counter),
+    offset_and_size_of(kmp_taskdata_t, td_taskwait_thread),
 
 #if OMP_40_ENABLED
-    offset_and_size_of( kmp_taskdata_t, td_taskgroup        ),
-    offset_and_size_of( kmp_taskgroup_t, count              ),
-    offset_and_size_of( kmp_taskgroup_t, cancel_request     ),
-
-    offset_and_size_of( kmp_taskdata_t, td_depnode          ),
-    offset_and_size_of( kmp_depnode_list_t, node            ),
-    offset_and_size_of( kmp_depnode_list_t, next            ),
-    offset_and_size_of( kmp_base_depnode_t, successors      ),
-    offset_and_size_of( kmp_base_depnode_t, task            ),
-    offset_and_size_of( kmp_base_depnode_t, npredecessors   ),
-    offset_and_size_of( kmp_base_depnode_t, nrefs           ),
+    offset_and_size_of(kmp_taskdata_t, td_taskgroup),
+    offset_and_size_of(kmp_taskgroup_t, count),
+    offset_and_size_of(kmp_taskgroup_t, cancel_request),
+
+    offset_and_size_of(kmp_taskdata_t, td_depnode),
+    offset_and_size_of(kmp_depnode_list_t, node),
+    offset_and_size_of(kmp_depnode_list_t, next),
+    offset_and_size_of(kmp_base_depnode_t, successors),
+    offset_and_size_of(kmp_base_depnode_t, task),
+    offset_and_size_of(kmp_base_depnode_t, npredecessors),
+    offset_and_size_of(kmp_base_depnode_t, nrefs),
 #endif
-    offset_and_size_of( kmp_task_t, routine                 ),
+    offset_and_size_of(kmp_task_t, routine),
 
     // thread_data_t.
-    sizeof( kmp_thread_data_t ),
-    offset_and_size_of( kmp_base_thread_data_t, td_deque             ),
-    offset_and_size_of( kmp_base_thread_data_t, td_deque_size        ),
-    offset_and_size_of( kmp_base_thread_data_t, td_deque_head        ),
-    offset_and_size_of( kmp_base_thread_data_t, td_deque_tail        ),
-    offset_and_size_of( kmp_base_thread_data_t, td_deque_ntasks      ),
-    offset_and_size_of( kmp_base_thread_data_t, td_deque_last_stolen ),
+    sizeof(kmp_thread_data_t),
+    offset_and_size_of(kmp_base_thread_data_t, td_deque),
+    offset_and_size_of(kmp_base_thread_data_t, td_deque_size),
+    offset_and_size_of(kmp_base_thread_data_t, td_deque_head),
+    offset_and_size_of(kmp_base_thread_data_t, td_deque_tail),
+    offset_and_size_of(kmp_base_thread_data_t, td_deque_ntasks),
+    offset_and_size_of(kmp_base_thread_data_t, td_deque_last_stolen),
 
     // The last field.
     KMP_OMP_VERSION,
@@ -236,80 +226,66 @@ __kmp_omp_debug_struct_info = {
 #undef offset_and_size_of
 #undef addr_and_size_of
 
-/*
-  Intel compiler on IA-32 architecture issues a warning "conversion
+/* Intel compiler on IA-32 architecture issues a warning "conversion
   from "unsigned long long" to "char *" may lose significant bits"
   when 64-bit value is assigned to 32-bit pointer. Use this function
-  to suppress the warning.
-*/
-static inline
-void *
-__kmp_convert_to_ptr(
-    kmp_uint64    addr
-) {
-    #if KMP_COMPILER_ICC
-        #pragma warning( push )
-        #pragma warning( disable:  810 ) // conversion from "unsigned long long" to "char *" may lose significant bits
-        #pragma warning( disable: 1195 ) // conversion from integer to smaller pointer
-    #endif // KMP_COMPILER_ICC
-    return (void *) addr;
-    #if KMP_COMPILER_ICC
-        #pragma warning( pop )
-    #endif // KMP_COMPILER_ICC
+  to suppress the warning. */
+static inline void *__kmp_convert_to_ptr(kmp_uint64 addr) {
+#if KMP_COMPILER_ICC
+#pragma warning(push)
+#pragma warning(disable : 810) // conversion from "unsigned long long" to "char
+// *" may lose significant bits
+#pragma warning(disable : 1195) // conversion from integer to smaller pointer
+#endif // KMP_COMPILER_ICC
+  return (void *)addr;
+#if KMP_COMPILER_ICC
+#pragma warning(pop)
+#endif // KMP_COMPILER_ICC
 } // __kmp_convert_to_ptr
 
+static int kmp_location_match(kmp_str_loc_t *loc, kmp_omp_nthr_item_t *item) {
 
-static int
-kmp_location_match(
-    kmp_str_loc_t *        loc,
-    kmp_omp_nthr_item_t *  item
-) {
-
-    int file_match = 0;
-    int func_match = 0;
-    int line_match = 0;
-
-    char * file = (char *) __kmp_convert_to_ptr( item->file );
-    char * func = (char *) __kmp_convert_to_ptr( item->func );
-    file_match = __kmp_str_fname_match( & loc->fname, file );
-    func_match =
-        item->func == 0  // If item->func is NULL, it allows any func name.
-        ||
-        strcmp( func, "*" ) == 0
-        ||
-        ( loc->func != NULL && strcmp( loc->func, func ) == 0 );
-    line_match =
-        item->begin <= loc->line
-        &&
-        ( item->end <= 0 || loc->line <= item->end ); // if item->end <= 0, it means "end of file".
+  int file_match = 0;
+  int func_match = 0;
+  int line_match = 0;
+
+  char *file = (char *)__kmp_convert_to_ptr(item->file);
+  char *func = (char *)__kmp_convert_to_ptr(item->func);
+  file_match = __kmp_str_fname_match(&loc->fname, file);
+  func_match =
+      item->func == 0 // If item->func is NULL, it allows any func name.
+      || strcmp(func, "*") == 0 ||
+      (loc->func != NULL && strcmp(loc->func, func) == 0);
+  line_match =
+      item->begin <= loc->line &&
+      (item->end <= 0 ||
+       loc->line <= item->end); // if item->end <= 0, it means "end of file".
 
-    return ( file_match && func_match && line_match );
+  return (file_match && func_match && line_match);
 
 } // kmp_location_match
 
+int __kmp_omp_num_threads(ident_t const *ident) {
+
+  int num_threads = 0;
 
-int
-__kmp_omp_num_threads(
-    ident_t const * ident
-) {
-
-    int num_threads = 0;
-
-    kmp_omp_nthr_info_t * info =
-        (kmp_omp_nthr_info_t *) __kmp_convert_to_ptr(  __kmp_omp_debug_struct_info.nthr_info.addr );
-    if ( info->num > 0 && info->array != 0 ) {
-        kmp_omp_nthr_item_t * items = (kmp_omp_nthr_item_t *) __kmp_convert_to_ptr( info->array );
-        kmp_str_loc_t         loc   = __kmp_str_loc_init( ident->psource, 1 );
-        int i;
-        for ( i = 0; i < info->num; ++ i ) {
-            if ( kmp_location_match( & loc, & items[ i ] ) ) {
-                num_threads = items[ i ].num_threads;
-            }; // if
-        }; // for
-        __kmp_str_loc_free( & loc );
-    }; // if
+  kmp_omp_nthr_info_t *info = (kmp_omp_nthr_info_t *)__kmp_convert_to_ptr(
+      __kmp_omp_debug_struct_info.nthr_info.addr);
+  if (info->num > 0 && info->array != 0) {
+    kmp_omp_nthr_item_t *items =
+        (kmp_omp_nthr_item_t *)__kmp_convert_to_ptr(info->array);
+    kmp_str_loc_t loc = __kmp_str_loc_init(ident->psource, 1);
+    int i;
+    for (i = 0; i < info->num; ++i) {
+      if (kmp_location_match(&loc, &items[i])) {
+        num_threads = items[i].num_threads;
+      }; // if
+    }; // for
+    __kmp_str_loc_free(&loc);
+  }; // if
 
-    return num_threads;;
+  return num_threads;
+  ;
 
 } // __kmp_omp_num_threads
 #endif /* USE_DEBUGGER */

Modified: openmp/trunk/runtime/src/kmp_debugger.h
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/runtime/src/kmp_debugger.h?rev=302929&r1=302928&r2=302929&view=diff
==============================================================================
--- openmp/trunk/runtime/src/kmp_debugger.h (original)
+++ openmp/trunk/runtime/src/kmp_debugger.h Fri May 12 13:01:32 2017
@@ -18,34 +18,34 @@
 #define KMP_DEBUGGER_H
 
 #ifdef __cplusplus
-    extern "C" {
+extern "C" {
 #endif // __cplusplus
 
-/* * This external variable can be set by any debugger to flag to the runtime that we
-   are currently executing inside a debugger.  This will allow the debugger to override
-   the number of threads spawned in a parallel region by using __kmp_omp_num_threads() (below).
-   * When __kmp_debugging is TRUE, each team and each task gets a unique integer identifier
-   that can be used by debugger to conveniently identify teams and tasks.
-   * The debugger has access to __kmp_omp_debug_struct_info which contains information
-   about the OpenMP library's important internal structures.  This access will allow the debugger
-   to read detailed information from the typical OpenMP constructs (teams, threads, tasking, etc. )
-   during a debugging session and offer detailed and useful information which the user can probe
-   about the OpenMP portion of their code.
-   */
-extern int __kmp_debugging;             /* Boolean whether currently debugging OpenMP RTL */
+/* This external variable can be set by any debugger to flag to the runtime
+   that we are currently executing inside a debugger.  This will allow the
+   debugger to override the number of threads spawned in a parallel region by
+   using __kmp_omp_num_threads() (below).
+   * When __kmp_debugging is TRUE, each team and each task gets a unique integer
+   identifier that can be used by debugger to conveniently identify teams and
+   tasks.
+   * The debugger has access to __kmp_omp_debug_struct_info which contains
+   information about the OpenMP library's important internal structures.  This
+   access will allow the debugger to read detailed information from the typical
+   OpenMP constructs (teams, threads, tasking, etc. ) during a debugging
+   session and offer detailed and useful information which the user can probe
+   about the OpenMP portion of their code. */
+extern int __kmp_debugging; /* Boolean whether currently debugging OpenMP RTL */
 // Return number of threads specified by the debugger for given parallel region.
-/* The ident field, which represents a source file location, is used to check if the
-   debugger has changed the number of threads for the parallel region at source file
-   location ident.  This way, specific parallel regions' number of threads can be changed
-   at the debugger's request.
- */
-int __kmp_omp_num_threads( ident_t const * ident );
+/* The ident field, which represents a source file location, is used to check if
+   the debugger has changed the number of threads for the parallel region at
+   source file location ident.  This way, specific parallel regions' number of
+   threads can be changed at the debugger's request. */
+int __kmp_omp_num_threads(ident_t const *ident);
 
 #ifdef __cplusplus
-    } // extern "C"
+} // extern "C"
 #endif // __cplusplus
 
-
 #endif // KMP_DEBUGGER_H
 
 #endif // USE_DEBUGGER




More information about the Openmp-commits mailing list