[llvm-dev] [RFC] IR-level Region Annotations
Tian, Xinmin via llvm-dev
llvm-dev at lists.llvm.org
Fri Jan 20 10:45:15 PST 2017
In the case, "val" is shared per OpenMP language rule. There is no privatization needed. %val is on the stack of master, to share %val among all threads, &val is passed to the outlined function.
void main() {
int val;
#pragma omp parallel num_threads(4)
{
// Really bad naming, won't pass code review. :)
compute_something_into_val(&val, omp_get_thread_num());
}
}
The IR would be.
{ void main() {
i32* val = alloca i32
tok = llvm.experimental.intrinsic_a()[ "DIR.PARALLEL"(), "QUAL.SHARED"(i32* %val), "QUAL.NUM_THREADS"(i32 4)
%1 = omp_get_thread_num();
compute_something_into_val(%val, %1);
llvm.experimental.intrinsic_b(tok)["DIR.END.PARALLEL"()];
}
Xinmin
-----Original Message-----
From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com]
Sent: Thursday, January 19, 2017 11:40 PM
To: Tian, Xinmin <xinmin.tian at intel.com>
Cc: Adve, Vikram Sadanand <vadve at illinois.edu>; llvm-dev-request at lists.llvm.org
Subject: Re: [llvm-dev] [RFC] IR-level Region Annotations
On Thu, Jan 19, 2017 at 11:27 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> Hi Xinmin,
>
> On Thu, Jan 19, 2017 at 11:20 PM, Tian, Xinmin <xinmin.tian at intel.com> wrote:
>> Sanjoy, the IR would be like something below. It is ok to hoist alloca instruction outside the region. There are some small changes in optimizer to understand region-annotation intrinsic.
>>
>> { void main() {
>> i32* val = alloca i32
>> tok = llvm.experimental.intrinsic_a()[ "DIR.PARALLEL"(),
>> "QUAL.PRIVATE"(i32* val), "QUAL.NUM_THREADS"(i32 4)]
>>
>> int i = omp_get_thread_num();
>> compute_something_into_val(val, i);
>> a[i] = val;
>>
>> llvm.experimental.intrinsic_b(tok)["DIR.END.PARALLEL"()];
>> }
>>
>> With above representation, we can do privatization and outlining as
>> below
>>
>> { void main() {
>> i32* val = alloca i32
>> i32* I = alloca 32
>> tok = llvm.experimental.intrinsic_a()[ "DIR.PARALLEL"(),
>> "QUAL.PRIVATE"(i32* %val, i32 %i), "QUAL.NUM_THREADS"(i32 4)]
>>
>> %ii = omp_get_thread_num();
>> compute_something_into_val(%val, %i);
>> a[i] = %val;
>>
>> llvm.experimental.intrinsic_b(tok)["DIR.END.PARALLEL"()];
>> }
>>
>> 1. create i32* priv_val = alloca i32 %priv_i = ...in the region, and replace all %val with %prv_val in the region.
>> 2. perform outlining.
>
> But then what if compute_something_into_val is
>
> void compute_something_into_val(i32* ptr, i32 idx) {
> static i32* cookie = null;
> lock_mutex();
> if (cookie == null)
> cookie = ptr
> else
> assert(cookie == ptr);
> unlock_mutex();
> // don't write to ptr, so there is no race }
>
> In other words, how do you differentiate between the hoisted-alloca
> situation arising due to a hoist vs. arising because that's what the
> programmer intended (and you're required to pass in the same address
> to each call into compute_something_into_val)?
Just to be a 100% clear, the source program in the latter case would have to be:
void main() {
int val;
#pragma omp parallel num_threads(4)
{
// Really bad naming, won't pass code review. :)
compute_something_into_val(&val, omp_get_thread_num());
}
}
-- Sanjoy
More information about the llvm-dev
mailing list