[Openmp-dev] About discussion of vectorization pass and openmp `simd` and `ordered simd` directives

Tue Sep 14 19:39:35 PDT 2021

Hi Johannes,

Sorry for the misunderstanding since I did not fully understand `declare simd` directive. I think you are right.

I am working on codegen of `threadprivate` directive in LLVM Flang, after which I should be able to have time to investigate how to implement the code generation for `ordered simd` in clang frontend and vectorization or other passes. I will ask you after I finished codegen of `threadprivate` directive in LLVM Flang. If this issue is still not implemented, I can take up it. Or if this issue has high priority for you or Alexey, you or Alexey can take up it if you have time to do it.

All the best,
Peixin

-----Original Message-----
From: Johannes Doerfert [mailto:jdoerfert at anl.gov] 
Sent: Wednesday, September 15, 2021 1:53 AM
To: Alexey Bataev <a.bataev at hotmail.com>
Cc: qiaopeixin <qiaopeixin at huawei.com>; openmp-dev <openmp-dev at lists.llvm.org>
Subject: Re: [Openmp-dev] About discussion of vectorization pass and openmp `simd` and `ordered simd` directives

On 9/14/21 12:35 PM, Alexey Bataev wrote:
>
> Best regards,
> Alexey Bataev
>
>> 14 сент. 2021 г., в 13:25, Johannes Doerfert via Openmp-dev <openmp-dev at lists.llvm.org> написал(а):
>>
>> 
>>> On 9/13/21 1:24 AM, qiaopeixin wrote:
>>> Thanks for the high-level code. I think this method is not a good method by considering the following things:
>>>
>>> 1. Extra `getelementptr` and `load` instructions are generated because of the vector version of outlined region. If we can serialize the ordered region, much less instructions are needed.
>> I don't follow. My code did not serialize the outlined region, for that you need to do the actual work.
>> My code showed that the scheme works in general.
>>
>>
>>> 2. What if the vectorization of the outlined region cannot be done such as the following example?
>>> void func(float *a, float *b, float *c, float *d, int N, int j) {
>>>         #pragma omp simd
>>>         for (int i = 1; i < N; i++) {
>>>           d[i] = c[i] + 1.0;
>>>           #pragma omp ordered simd
>>>           if (i > j)
>>>             a[i] = a[i-1] + 1.0;
>>>         }
>>> }
>> The outlined region should not be vectorized, that is the entire point, no?
>> Also, FWIW, the code above is as vectorizable as the original version you shared.
>>
>>
>>> 3. What if the outlined region is one user-defined function?
>>> void func(float *a, float *b, float *c, float *d, int N, int j) {
>>>         #pragma omp simd
>>>         for (int i = 1; i < N; i++) {
>>>           d[i] = c[i] + 1.0;
>>>           #pragma omp ordered simd
>>>           a[i] = user_defined_func(a[i-1]);
>>>         }
>>> }
>> I don't think you understood what I am trying to tell you.
>>
>> The outlined function would be the same as we have now.
>> The sequentialized "vector version" of it would look like this:
>> ```
>> void seq_outlined(float *a, int i) {
>>    a[i+0] = user_defined_func(a[i-1+0]);
>>    a[i+1] = user_defined_func(a[i-1+1]);
>>    a[i+2] = user_defined_func(a[i-1+2]);
>>    a[i+3] = user_defined_func(a[i-1+3]); ```
>>
> Johannes, I agree with your solution in general. Just the frontend does not have info about the vectorization factor. We can emit a scalar version of the function and decorate it with declare simd attributes. We need a pass in LLVM to generate vector variants of such functions.

You're right.

We should use the `declare simd` scheme with an annotation that says it should actually be sequentialized instead. The generator in the vectorizer can then do the right thing.

~ Johannes

>
>> I think you should take a look at how vector versions of functions work in IR and the vectorizer.
>> It seems there is a conceptual mismatch between what I'm trying to say and what you try to argue.
>>
>> ~ Johannes
>>
>>
>>> All the best,
>>> Peixin
>>>
>>> -----Original Message-----
>>> From: Johannes Doerfert [mailto:jdoerfert at anl.gov]
>>> Sent: Monday, September 13, 2021 11:53 AM
>>> To: Alexey Bataev <a.bataev at hotmail.com>
>>> Cc: qiaopeixin <qiaopeixin at huawei.com>; openmp-dev 
>>> <openmp-dev at lists.llvm.org>
>>> Subject: Re: [Openmp-dev] About discussion of vectorization pass and 
>>> openmp `simd` and `ordered simd` directives
>>>
>>>
>>>> On 9/12/21 10:02 PM, Alexey Bataev wrote:
>>>> I think we can emit only declare simd attrs for the ordered region function, inner loop (in the function) should be emitted by the declare simd vectorization pass.
>>> If we get the declare simd vectorization pass to create a sequential version, sure.
>>> I was thinking something along the lines of:
>>>
>>> We create a sequentialized version of the ordered outlined function which computes multiple iterations.
>>> We present that one to the vectorizer as the "vectorized" version of the outlined function. That way the vectorizer can vectorize the loop, pick the "vectorized" version of the outlined region as replacement for the original version, and we end up with vectorized code but a sequentialized ordered region.
>>>
>>> So, we would need to create the outlined function, incl. body, as we do now but without the access group metadata. In addition, the outlined "sequentialized vector version" has to be created by the frontend (or someone else) and connected to the outlined function as if there was a vector version (e.g., a vector library was connected).
>>>
>>> So basically everything that is marked as "create" here:
>>> https://godbolt.org/z/qxKo6bjb9
>>>
>>> ~ Johannes.
>>>
>>>
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>>> 12 сент. 2021 г., в 22:37, Johannes Doerfert via Openmp-dev <openmp-dev at lists.llvm.org> написал(а):
>>>>>
>>>>> Peixin,
>>>>>
>>>>>> On 9/12/21 8:47 PM, qiaopeixin wrote:
>>>>>> Hi Johannes,
>>>>>>
>>>>>> I don't think this is the right solution. The outlined function will affect the vectorization of simd region and make the simd region not vectorized. I tried to add `noinline` to the `ordered simd` outlined function before, and it breaks the vectorization of the whole simd region.
>>>>>>
>>>>>> I agree with you that one possible method is to replicate the ordered code multiple times, but I think this method should be done in vectorization pass.
>>>>>>
>>>>>> In addition, one fast way to correct current code generation is to remove `alwaysinline` attribute and add `noinline` attribute. I prefer to dig more on vectorization pass to check how does it stop vectorization of the whole for loop when there is one outlined function inside.
>>>>> Do not add noinline, we went over this.
>>>>>
>>>>> Here is what I suggested in my last email as high-level C code.
>>>>> It works perfectly fine and is properly vectorized incl. the scalarized region:
>>>>>     https://godbolt.org/z/afar1e3Wf
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>> All the best,
>>>>>> Peixin
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Johannes Doerfert [mailto:jdoerfert at anl.gov]
>>>>>> Sent: Sunday, September 12, 2021 1:08 AM
>>>>>> To: qiaopeixin <qiaopeixin at huawei.com>
>>>>>> Cc: openmp-dev <openmp-dev at lists.llvm.org>; Alexey Bataev 
>>>>>> <a.bataev at hotmail.com>
>>>>>> Subject: Re: About discussion of vectorization pass and openmp 
>>>>>> `simd` and `ordered simd` directives
>>>>>>
>>>>>> Also, I thought about vectorizing this again.
>>>>>>
>>>>>> I think what you can do to make it vectorize the code is to provide a scalarized function that replicates the ordered code N times and which is then given to the vectorizer as the "vector version" of the outlined function. That way you might need to add `noinline` to the `ordered simd` outlined function but it should allow us to vectorize the code properly.
>>>>>>
>>>>>> Let me know if you are interested in that.
>>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>> On 9/10/21 7:57 PM, qiaopeixin wrote:
>>>>>>> Hi Johannes,
>>>>>>>
>>>>>>> I agree with your. I can take a try to remove the metadata for the outlined function to correct the current wrong code generation. If Alexey or you wants to do it, please let me know.
>>>>>>>
>>>>>>> All the best,
>>>>>>> Peixin
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Johannes Doerfert [mailto:jdoerfert at anl.gov]
>>>>>>> Sent: Saturday, September 11, 2021 3:08 AM
>>>>>>> To: qiaopeixin <qiaopeixin at huawei.com>
>>>>>>> Cc: openmp-dev <openmp-dev at lists.llvm.org>; Alexey Bataev 
>>>>>>> <a.bataev at hotmail.com>
>>>>>>> Subject: Re: About discussion of vectorization pass and openmp 
>>>>>>> `simd` and `ordered simd` directives
>>>>>>>
>>>>>>>
>>>>>>> On 9/8/21 8:10 PM, qiaopeixin wrote:
>>>>>>>> Hi Johannes,
>>>>>>>>
>>>>>>>> First, I would like to thank you and Alexey for the explanations.
>>>>>>>>
>>>>>>>>> As described, to make it at least sound we should not emit the access.group metadata for the call to the outlined function.
>>>>>>>> I would like make sure that I understand what code you want to generate. Not emitting the `access.group` metadata would cause all the code inside simd region not vectorized. Again, for the following code:
>>>>>>>> ```
>>>>>>>> void func(float *a, float *b, float *c, float *d, int N) {
>>>>>>>>       #pragma omp simd
>>>>>>>>       for (int i = 1; i < N; i++) {
>>>>>>>>         d[i] = c[i] + 1.0;
>>>>>>>>         #pragma omp ordered simd
>>>>>>>>         a[i] = a[i-1] + 1.0;
>>>>>>>>       }
>>>>>>>> }
>>>>>>>> ```
>>>>>>>> I tried to generate the IR from clang frontend and delete the metadata by hand. It turns out to break the vectorization of simd region. What I did is as follows:
>>>>>>>> $ clang++ -fopenmp -O3 test.cpp -c -emit-llvm -S -Xclang 
>>>>>>>> -disable-llvm-passes (Change "call void 
>>>>>>>> @__captured_stmt(float** %a.addr, i32* %i3), !llvm.access.group 
>>>>>>>> !15" into "call void
>>>>>>>> @__captured_stmt(float** %a.addr, i32* %i3)" by hand.) $ 
>>>>>>>> clang++ -fopenmp -O3 test.ll -c -emit-llvm -S
>>>>>>>> warning: <unknown>:0:0: loop not vectorized: the optimizer was 
>>>>>>>> unable to perform the requested transformation; the 
>>>>>>>> transformation might be disabled or specified as part of an 
>>>>>>>> unsupported transformation ordering 
>>>>>>>> [-Wpass-failed=transform-warning]
>>>>>>>> warning: <unknown>:0:0: loop not vectorized: the optimizer was 
>>>>>>>> unable to perform the requested transformation; the 
>>>>>>>> transformation might be disabled or specified as part of an 
>>>>>>>> unsupported transformation ordering 
>>>>>>>> [-Wpass-failed=transform-warning]
>>>>>>>> 2 warnings generated.
>>>>>>>>
>>>>>>>> Is this IR what you want to generate for now? I think the expected IR should vectorize the statement of ` d[i] = c[i] + 1.0` and serialize the statement of ` a[i] = a[i-1] + 1.0`. Do you want to delay this generation until vectorization pass supports patial vectorization in one for loop?
>>>>>>> Yes, that is the IR we want to generate until the loop vectorizer can be told to serialize some code.
>>>>>>> At least as far as I know it cannot right now.
>>>>>>>
>>>>>>> We could also investigate other solutions, e.g., unroll and do straight-line code vectorization based on access group metadata. Or split the loop. But everything will require more work.
>>>>>>>
>>>>>>> ~ Johannes
>>>>>>>
>>>>>>>
>>>>>>>> All the best,
>>>>>>>> Peixin
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Johannes Doerfert [mailto:jdoerfert at anl.gov]
>>>>>>>> Sent: Thursday, September 9, 2021 12:00 AM
>>>>>>>> To: Alexey Bataev <a.bataev at hotmail.com>
>>>>>>>> Cc: qiaopeixin <qiaopeixin at huawei.com>; openmp-dev 
>>>>>>>> <openmp-dev at lists.llvm.org>
>>>>>>>> Subject: Re: About discussion of vectorization pass and openmp 
>>>>>>>> `simd` and `ordered simd` directives
>>>>>>>>
>>>>>>>> Peixin,
>>>>>>>>
>>>>>>>> are you interested in trying to fix this?
>>>>>>>>
>>>>>>>> As described, to make it at least sound we should not emit the access.group metadata for the call to the outlined function.
>>>>>>>> That will not necessarily resolve your "problem", e.g., that the vectorizer uses memory checks etc., but it will at least stop us from generating wrong code.
>>>>>>>>
>>>>>>>> ~ Johannes
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/8/21 10:57 AM, Alexey Bataev wrote:
>>>>>>>>> Best regards,
>>>>>>>>> Alexey Bataev
>>>>>>>>>
>>>>>>>>>> 8 сент. 2021 г., в 09:41, Johannes Doerfert <jdoerfert at anl.gov> написал(а):
>>>>>>>>>>
>>>>>>>>>> Hi Peixin,
>>>>>>>>>>
>>>>>>>>>> I think you are right that the code we generate is not correct.
>>>>>>>>>> The problem is not that a[i] is vectorized, the problem is 
>>>>>>>>>> that we might vectorize it without a memory check (with O2 
>>>>>>>>>> instead of O3, see https://godbolt.org/z/9dnqMKexT).
>>>>>>>>>>
>>>>>>>>>> @Alexey, what was the intention of the outlined ordered region?
>>>>>>>>> No special intention, that was just a straightforward way implementing this feature.
>>>>>>>>>
>>>>>>>>>> I'm not really sure how to handle this best but the 
>>>>>>>>>> access.group on the call to the outlined region seems to be 
>>>>>>>>>> wrong as it implies vectorization is sound while it isn't. WDYT?
>>>>>>>>>>
>>>>>>>>> I agree, looks like the codegen for the vectorizable loops was changed but we did not adjust it for the ordered simd region. Probably, need to fix loop vectorization metadata emission for ordered simd regions.
>>>>>>>>>
>>>>>>>>>> ~ Johannes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 9/8/21 4:41 AM, qiaopeixin wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I would like to discuss the behaviors of openmp `simd` and `ordered simd` directives. I think current Clang may not give expected results as OpenMP 5.0 standard defines.
>>>>>>>>>>>
>>>>>>>>>>> Let's start one c++ example:
>>>>>>>>>>> ```
>>>>>>>>>>> void func(float *a, float *b, float *c, float *d, int N) {
>>>>>>>>>>>        #pragma omp simd
>>>>>>>>>>>        for (int i = 0; i < N; i++) {
>>>>>>>>>>>          d[i] = c[i] + 1.0;
>>>>>>>>>>>          #pragma omp ordered simd
>>>>>>>>>>>          a[i] = b[i] + 1.0;
>>>>>>>>>>>        }
>>>>>>>>>>> }
>>>>>>>>>>> ```
>>>>>>>>>>> What is expected according to OpenMP 5.0 standard is like the following:
>>>>>>>>>>> ```
>>>>>>>>>>> void func(float *a, float *b, float *c, float *d, int N) {
>>>>>>>>>>>        for (int i = 0; i < N; i += 4) {
>>>>>>>>>>>          #pragma omp simd
>>>>>>>>>>>          for (int j = i; j < 4; j++)
>>>>>>>>>>>            d[i] = c[i] + 1.0; // vectorized
>>>>>>>>>>>
>>>>>>>>>>>          for (int j = i; j < 4; j++)
>>>>>>>>>>>            a[i] = b[i] + 1.0; // not vectorized
>>>>>>>>>>>        }
>>>>>>>>>>> }
>>>>>>>>>>> ```
>>>>>>>>>>> It seems that current Clang and LLVM do not support it.
>>>>>>>>>>>
>>>>>>>>>>> Without openmp enabled, clang vectorizes the loop with memcheck as follows:
>>>>>>>>>>> ```
>>>>>>>>>>> $ clang++ -O3 test.cpp -c -emit-llvm -S && cat test.ll
>>>>>>>>>>>        %scevgep = getelementptr float, float* %d, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep22 = getelementptr float, float* %a, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep25 = getelementptr float, float* %c, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep28 = getelementptr float, float* %b, i64 %wide.trip.count
>>>>>>>>>>>        %bound030 = icmp ugt float* %scevgep25, %d
>>>>>>>>>>>        %bound131 = icmp ugt float* %scevgep, %c
>>>>>>>>>>>        %found.conflict32 = and i1 %bound030, %bound131
>>>>>>>>>>>        ... fadd <4 x float> ...
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> With openmp-simd enabled, clang vectorizes the loop without memcheck. This means that only `simd` directive is enabled, while `ordered simd` directive is disabled. The results are expected.
>>>>>>>>>>> ```
>>>>>>>>>>> clang++ -fopenmp-simd -O3 test.cpp -c -emit-llvm -S && cat 
>>>>>>>>>>> clang++ test.ll
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> With openmp enabled, both `simd` and `ordered simd` directives are enabled. Clang frontend generates the outlined function `captured_stmt(float** %a.addr, i32* %i3, float** %b.addr)` with `AlwaysInline` attribute when optimization level is more than 0. The generated IR is to vectorize the loop with memcheck as follows:
>>>>>>>>>>> ```
>>>>>>>>>>> $ clang++ -fopenmp -O3 test.cpp -c -emit-llvm -S && cat test.ll
>>>>>>>>>>>        %scevgep = getelementptr float, float* %d, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep29 = getelementptr float, float* %a, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep32 = getelementptr float, float* %c, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep35 = getelementptr float, float* %b, i64 %wide.trip.count
>>>>>>>>>>>        %bound037 = icmp ugt float* %scevgep32, %d
>>>>>>>>>>>        %bound138 = icmp ugt float* %scevgep, %c
>>>>>>>>>>>        %found.conflict39 = and i1 %bound037, %bound138
>>>>>>>>>>>        ... fadd <4 x float> ...
>>>>>>>>>>> ```
>>>>>>>>>>> But the expected IR should be like the following:
>>>>>>>>>>> ```
>>>>>>>>>>>        %scevgep29 = getelementptr float, float* %a, i64 %wide.trip.count
>>>>>>>>>>>        %scevgep35 = getelementptr float, float* %b, i64 %wide.trip.count
>>>>>>>>>>>        %found.conflict...
>>>>>>>>>>>        ... fadd <4 x float> ...
>>>>>>>>>>> ```
>>>>>>>>>>> I have two questions here:
>>>>>>>>>>> 1. Does the outlined function `captured_stmt(float** %a.addr, i32* %i3, float** %b.addr)` with `AlwaysInline` attribute cause the memcheck? And how?
>>>>>>>>>>> 2. If my understanding is correct according to the above analysis, should the codegen of `ordered simd` directive be fixed to support the expected behaviors? And should `memcheck` function (emitMemRuntimeChecks) also support partial check instead of the whole region inside the loop?
>>>>>>>>>>>
>>>>>>>>>>> Also, for the following test case, both of vectorization of `d[i] = c[i] + 1.0;` and `a[i] = a[i-1] + 1.0;` are disabled.
>>>>>>>>>>> ```
>>>>>>>>>>> void func(float *a, float *b, float *c, float *d, int N) {
>>>>>>>>>>>        #pragma omp simd
>>>>>>>>>>>        for (int i = 1; i < N; i++) {
>>>>>>>>>>>          d[i] = c[i] + 1.0;
>>>>>>>>>>>          #pragma omp ordered simd
>>>>>>>>>>>          a[i] = a[i-1] + 1.0;
>>>>>>>>>>>        }
>>>>>>>>>>> }
>>>>>>>>>>> ```
>>>>>>>>>>> What is expected is to vectorize the statement `d[i] = c[i] + 1.0;`.
>>>>>>>>>>> I also test icc and gcc and here are the results:
>>>>>>>>>>> ```
>>>>>>>>>>> $ icc -v
>>>>>>>>>>> icc version 2021.1
>>>>>>>>>>> $ icc -qopenmp test.cpp -O3 -qopt-report 
>>>>>>>>>>> -qopt-report-phase=vec -S && cat test.optrpt LOOP BEGIN at test.cpp(3,3)
>>>>>>>>>>>         remark #15531: Block of statements was serialized due to user request   [ test.cpp(5,5) ]
>>>>>>>>>>>         remark #15301: SIMD LOOP WAS VECTORIZED LOOP END $ 
>>>>>>>>>>> g++ -v gcc version 9.3.0 (GCC) $ g++ test.cpp -fopenmp 
>>>>>>>>>>> -fdump-tree-all -fdump-rtl-all -O3 -ftree-vectorize -S && cat test.s
>>>>>>>>>>> fadd    s0, s0, s1 // not vectorized
>>>>>>>>>>> ...
>>>>>>>>>>> fadd    s0, s0, s1 // not vectorized
>>>>>>>>>>> // There is `GOMP_SIMD_ORDERED_START` and `GOMP_SIMD_ORDERED_END` before and after the statement of `a[i] = a[i-1] + 1.0` in ifcvt pass, after which they are used in vect pass to break the vectorization.
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> For the following test case:
>>>>>>>>>>> ```
>>>>>>>>>>> void func(float *b, float *c, float *d, int N) {
>>>>>>>>>>>        float a[N];
>>>>>>>>>>>        for (int i = 0; i < N; i++)
>>>>>>>>>>>          a[i] = 0;
>>>>>>>>>>>        #pragma omp simd
>>>>>>>>>>>        for (int i = 1; i < N; i++) {
>>>>>>>>>>>          d[i] = c[i] + 1.0;
>>>>>>>>>>>          #pragma omp ordered simd
>>>>>>>>>>>          a[i] = a[i-1] + 1.0;
>>>>>>>>>>>        }
>>>>>>>>>>> }
>>>>>>>>>>> ```
>>>>>>>>>>> The IR generated is as follows:
>>>>>>>>>>> ```
>>>>>>>>>>> $ clang++ -fopenmp -O3 test.cpp -c -emit-llvm -S
>>>>>>>>>>>        %scevgep = getelementptr float, float* %d, i64 1
>>>>>>>>>>>        %3 = add nuw nsw i64 %wide.trip.count, 1
>>>>>>>>>>>        %scevgep41 = getelementptr float, float* %d, i64 %3
>>>>>>>>>>>        %scevgep43 = getelementptr float, float* %c, i64 1
>>>>>>>>>>>        %scevgep45 = getelementptr float, float* %c, i64 %3
>>>>>>>>>>>        %bound0 = icmp ult float* %scevgep, %scevgep45
>>>>>>>>>>>        %bound1 = icmp ult float* %scevgep43, %scevgep41
>>>>>>>>>>>        %found.conflict = and i1 %bound0, %bound1
>>>>>>>>>>>        %induction = fadd <4 x float> %.splat, <float 
>>>>>>>>>>> 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 
>>>>>>>>>>> 3.000000e+00> ``` The result for the statement of `d[i] = c[i] + 1.0` and `a[i] = a[i-1] + 1.0` are both unexpected. It is safe to vectorize the statement of `a[i] = a[i-1] + 1.0` although it violates the definition of ordered construct in OpenMP 5.0 standard. But the memcheck of variables `d` and `c` should not be correct as the `simd` directive is there.
>>>>>>>>>>>
>>>>>>>>>>> All the best,
>>>>>>>>>>> Peixin
>>>>>>>>>>>
>>>>> _______________________________________________
>>>>> Openmp-dev mailing list
>>>>> Openmp-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev