[cfe-dev] Comparison of 2 schemes to implement OpenMP 5.0 declare mapper codegen
Alexey Bataev via cfe-dev
cfe-dev at lists.llvm.org
Fri Jun 28 07:00:35 PDT 2019
Hi Lingda, thanks for your comments.
We can allocate the buffer either by allocating it on the stack or calling
OpenMP allocate function.
With this solution, we allocate memory only once (no need to resize buffer
after push_backs) and we do not need to call the runtime function to put
map data to the buffer, compiler generated code can do it.
But anyway, I agree, it would be good to hear some other opinions.
--------------
Best regards,
Alexey Bataev
From: "Li, Lingda" <lli at bnl.gov>
To: Alexey Bataev <Alexey.Bataev at ibm.com>
Cc: Alexandre Eichenberger <alexe at us.ibm.com>, "Chapman, Barbara
(Contact)" <barbara.chapman at stonybrook.edu>, Kevin K O'Brien
<caomhin at us.ibm.com>, "Carlo Bertolli" <cbertol at us.ibm.com>,
Deepak Eachempati <deachempat at cray.com>, "Denny, Joel E."
<dennyje at ornl.gov>, David Oehmke <doehmke at cray.com>, "Ettore
Tiotto" <etiotto at ca.ibm.com>, "fraggamuffin at gmail.com"
<fraggamuffin at gmail.com>, "Rokos, Georgios"
<georgios.rokos at intel.com>, Gheorghe-Teod Bercea
<Gheorghe-Teod.Bercea at ibm.com>, "gregory.rodgers at amd.com"
<gregory.rodgers at amd.com>, Hal Finkel <hfinkel at anl.gov>,
"Sharif, Hashim" <hsharif3 at illinois.edu>, "Cownie, James H"
<james.h.cownie at intel.com>, "Sjodin, Jan" <Jan.Sjodin at amd.com>,
"jbeyer at nvidia.com" <jbeyer at nvidia.com>, "Doerfert, Johannes
Rudolf" <jdoerfert at anl.gov>, "Jones, Jeff C"
<jeff.c.jones at intel.com>, "josem at udel.edu" <josem at udel.edu>,
"Robichaux, Joseph" <joseph.robichaux at intel.com>, Jeff Heath
<jrheath at ca.ibm.com>, "khaldi.dounia at gmail.com"
<khaldi.dounia at gmail.com>, Kelvin Li <kli at ca.ibm.com>,
"Bobrovsky, Konstantin S" <konstantin.s.bobrovsky at intel.com>,
"Kotsifakou, Maria" <kotsifa2 at illinois.edu>, "lopezmg at ornl.org"
<lopezmg at ornl.org>, "Lopez, Matthew Graham" <lopezmg at ornl.gov>,
"Menard, Lorri" <lorri.menard at intel.com>, Martin Kong
<martin.richard.kong at gmail.com>, Sarah McNamara
<mcnamara at ca.ibm.com>, "Rice, Michael P"
<michael.p.rice at intel.com>, "Matt Martineau"
<m.martineau at bristol.ac.uk>, "oscar at ornl.gov" <oscar at ornl.gov>,
Jeeva Paudel <pjeeva01 at ca.ibm.com>, "Rao, Premanand M"
<premanand.m.rao at intel.com>, "Krishnaiyer, Rakesh"
<rakesh.krishnaiyer at intel.com>, "Narayanaswamy, Ravi"
<ravi.narayanaswamy at intel.com>, "Monteleone, Robert"
<robert.monteleone at intel.com>, "Lieberman, Ron"
<Ron.Lieberman at amd.com>, Samuel Antao <Samuel.Antao at ibm.com>,
Jeffrey Sandoval <sandoval at cray.com>, Sunita Chandrasekaran
<schandra at udel.edu>, "sergey.y.ostanevich at gmail.com"
<sergey.y.ostanevich at gmail.com>, Sergio Pino Gallardo
<sergiop at udel.edu>, "Dmitriev, Serguei N"
<serguei.n.dmitriev at intel.com>, "Chan, SiuChi"
<siuchi.chan at amd.com>, Sunil Shrestha <sshrestha at cray.com>,
"Wilmarth, Terry L" <terry.l.wilmarth at intel.com>, Tianyi Zhang
<tzhan18 at lsu.edu>, "vadve at illinois.edu" <vadve at illinois.edu>,
Wang Chen <wdchen at ca.ibm.com>, Wael Yehia <wyehia at ca.ibm.com>,
"Tian, Xinmin" <xinmin.tian at intel.com>, "Li, Lingda (Contact)"
<lildmh at gmail.com>
Date: 06/28/2019 09:49 AM
Subject: [EXTERNAL] Re: Re: Re: RE: Comparison of 2 schemes to
implement OpenMP 5.0 declare mapper codegen
I don't think we can have the buffer allocated within the mapper function.
It has to be done in the runtime, because of nested mappers.
First, all mapper functions are born in the same way. We cannot make the
outer most mapper function allocate memory, whether the inner one doesn't
and has to use what is allocated by the outer most mapper function.
I suppose we still need to allocate memory in the runtime, so the runtime
can pass the pointer and size to the mapper function, and the outer mapper
function can then pass them into inner ones.
Again, this is just like the current implementation, except that we don't
use vecter::push_back(), instead we use something like a manual
implementation of vector::push_back() (because we need to use the pointer
and the current index)
I believe the key question here is whether it is true that (the overhead of
push_back() > the overhead of precalculating the total number + the memory
allocation overhead + directly memory write). This will decide whether this
change is necessary. Any opinions?
Thanks,
Lingda Li
From: Alexey Bataev <Alexey.Bataev at ibm.com>
Sent: Thursday, June 27, 2019 5:05 PM
To: Li, Lingda
Cc: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K O'Brien;
Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David Oehmke; Ettore
Tiotto; fraggamuffin at gmail.com; Rokos, Georgios; Gheorghe-Teod Bercea;
gregory.rodgers at amd.com; Hal Finkel; Sharif, Hashim; Cownie, James H;
Sjodin, Jan; jbeyer at nvidia.com; Doerfert, Johannes Rudolf; Jones, Jeff C;
josem at udel.edu; Robichaux, Joseph; Jeff Heath; khaldi.dounia at gmail.com;
Kelvin Li; Bobrovsky, Konstantin S; Kotsifakou, Maria; lopezmg at ornl.org;
Lopez, Matthew Graham; Menard, Lorri; Martin Kong; Sarah McNamara; Rice,
Michael P; Matt Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert; Lieberman,
Ron; Samuel Antao; Jeffrey Sandoval; Sunita Chandrasekaran;
sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo; Dmitriev, Serguei N;
Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L; Tianyi Zhang;
vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
Subject: Re: Re: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
declare mapper codegen
Yes, we need 2 functions, but thw first one can be optimized very
effectively. After the optimizations and inlining it will end up with just
return s1+s2+s3... I think, inost cases those sizes will be constant, since
the mapper maps constant number of elements. And, thus, this expression
will be optimized to just a constant value.
You don't need to pass these functions to runtime. We can call the directly
from the compiler.
1st call: get number of elements.
2nd: allocate the buffer
3rd call: call mapper with this preallocated buffer that fills this buffer
without any calls of the runtime functions.
4th call: call the runtime to pass the buffer to the runtime.
Best regards,
Alexey Bataev
27 июня 2019 г., в 16:53, Li, Lingda <lli at bnl.gov> написал(а):
If we precalculate the size, first, it means we need to generate 2
functions for each mapper, rather than 1 now. One for mapping
information filling as we have, the other for size calculation (This
will not return constant values, because size depends on how many
instances we are mapping). Both these 2 functions will need to be
passed to the runtime. The runtime will need to precalculate the
number of components first, then allocate memory, then call the
mapper function to fill it up.
Compared with the scheme 1, the differences are:
1) An extra call to calculate the total number, while scheme 1 does
not;
2) A preallocated buffer, whose pointer and the current number should
be passed to the mapper function, then the mapper function uses them
to fill components, while scheme 1 uses push_back() to do the same
thing.
Is there really a benefit doing this? push_back() should be efficient
enough compared with directly writing to memory.
If people here think that, the overhead of push_back() > the overhead
of precalculating the total number + the memory allocation overhead +
directly memory write, then we can consider this scheme.
Thanks,
Lingda Li
From: Alexey Bataev <Alexey.Bataev at ibm.com>
Sent: Thursday, June 27, 2019 4:26 PM
To: Li, Lingda
Cc: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David
Oehmke; Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios;
Gheorghe-Teod Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif,
Hashim; Cownie, James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert,
Johannes Rudolf; Jones, Jeff C; josem at udel.edu; Robichaux, Joseph;
Jeff Heath; khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky, Konstantin
S; Kotsifakou, Maria; lopezmg at ornl.org; Lopez, Matthew Graham;
Menard, Lorri; Martin Kong; Sarah McNamara; Rice, Michael P; Matt
Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert;
Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L;
Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
Subject: Re: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
declare mapper codegen
If the functions are inlined (the ines, intended for size
precalculation). They can be optimized out very effectively since in
most cases they will return constant values.
If we could do this, we won't need vectors and oush_backs, we can use
preallocated memory and internal counter.
--------------
Best regards,
Alexey Bataev
<graycol.gif>"Li, Lingda" ---06/27/2019 04:13:03 PM---Hi Alexey, I
think that's why we choose to use variable size storage like
std::vector to store the m
From: "Li, Lingda" <lli at bnl.gov>
To: Alexey Bataev <Alexey.Bataev at ibm.com>, Deepak Eachempati <
deachempat at cray.com>
Cc: "Narayanaswamy, Ravi" <ravi.narayanaswamy at intel.com>, "Alexandre
Eichenberger" <alexe at us.ibm.com>, "Chapman, Barbara (Contact)" <
barbara.chapman at stonybrook.edu>, "Bobrovsky, Konstantin S" <
konstantin.s.bobrovsky at intel.com>, Carlo Bertolli <cbertol at us.ibm.com
>, "Chan, SiuChi" <siuchi.chan at amd.com>, "Cownie, James H" <
james.h.cownie at intel.com>, David Oehmke <doehmke at cray.com>, "Denny,
Joel E." <dennyje at ornl.gov>, "Dmitriev, Serguei N" <
serguei.n.dmitriev at intel.com>, "Doerfert, Johannes Rudolf" <
jdoerfert at anl.gov>, Ettore Tiotto <etiotto at ca.ibm.com>, "
fraggamuffin at gmail.com" <fraggamuffin at gmail.com>, Gheorghe-Teod
Bercea <Gheorghe-Teod.Bercea at ibm.com>, Hal Finkel <hfinkel at anl.gov>,
"jbeyer at nvidia.com" <jbeyer at nvidia.com>, Jeeva Paudel <
pjeeva01 at ca.ibm.com>, Jeff Heath <jrheath at ca.ibm.com>, Jeffrey
Sandoval <sandoval at cray.com>, "Jones, Jeff C" <jeff.c.jones at intel.com
>, "josem at udel.edu" <josem at udel.edu>, Kelvin Li <kli at ca.ibm.com>,
"Kevin K O'Brien" <caomhin at us.ibm.com>, "khaldi.dounia at gmail.com" <
khaldi.dounia at gmail.com>, "Kotsifakou, Maria" <kotsifa2 at illinois.edu
>, "Krishnaiyer, Rakesh" <rakesh.krishnaiyer at intel.com>, "Lieberman,
Ron" <Ron.Lieberman at amd.com>, "Lopez, Matthew Graham" <
lopezmg at ornl.gov>, "lopezmg at ornl.org" <lopezmg at ornl.org>, Martin Kong
<martin.richard.kong at gmail.com>, Matt Martineau <
m.martineau at bristol.ac.uk>, "Menard, Lorri" <lorri.menard at intel.com>,
"Monteleone, Robert" <robert.monteleone at intel.com>, "oscar at ornl.gov"
<oscar at ornl.gov>, "Rao, Premanand M" <premanand.m.rao at intel.com>,
"Rice, Michael P" <michael.p.rice at intel.com>, "Robichaux, Joseph" <
joseph.robichaux at intel.com>, "gregory.rodgers at amd.com" <
gregory.rodgers at amd.com>, "Rokos, Georgios" <georgios.rokos at intel.com
>, Samuel Antao <Samuel.Antao at ibm.com>, "Sarah McNamara" <
mcnamara at ca.ibm.com>, "sergey.y.ostanevich at gmail.com" <
sergey.y.ostanevich at gmail.com>, Sergio Pino Gallardo <
sergiop at udel.edu>, "Sharif, Hashim" <hsharif3 at illinois.edu>, "Sjodin,
Jan" <Jan.Sjodin at amd.com>, Sunil Shrestha <sshrestha at cray.com>,
Sunita Chandrasekaran <schandra at udel.edu>, "Tian, Xinmin" <
xinmin.tian at intel.com>, Tianyi Zhang <tzhan18 at lsu.edu>, "
vadve at illinois.edu" <vadve at illinois.edu>, Wael Yehia <
wyehia at ca.ibm.com>, Wang Chen <wdchen at ca.ibm.com>, "Wilmarth, Terry
L" <terry.l.wilmarth at intel.com>
Date: 06/27/2019 04:13 PM
Subject: [EXTERNAL] Re: RE: Comparison of 2 schemes to implement
OpenMP 5.0 declare mapper codegen
Hi Alexey,
I think that's why we choose to use variable size storage like
std::vector to store the mapping information at the first place,
right? It'll be costly to precalculate the total number of
components, especially in the presence of nested mappers. Besides, a
runtime function call is just a std::vector::push, so I think it's
okay to have multiple function calls.
Thanks,
Lingda Li
From: Alexey Bataev <Alexey.Bataev at ibm.com>
Sent: Thursday, June 27, 2019 3:52 PM
To: Deepak Eachempati
Cc: Li, Lingda; Narayanaswamy, Ravi; Alexandre Eichenberger; Chapman,
Barbara (Contact); Bobrovsky, Konstantin S; Carlo Bertolli; Chan,
SiuChi; Cownie, James H; David Oehmke; Denny, Joel E.; Dmitriev,
Serguei N; Doerfert, Johannes Rudolf; Ettore Tiotto;
fraggamuffin at gmail.com; Gheorghe-Teod Bercea; Hal Finkel;
jbeyer at nvidia.com; Jeeva Paudel; Jeff Heath; Jeffrey Sandoval; Jones,
Jeff C; josem at udel.edu; Kelvin Li; Kevin K O'Brien;
khaldi.dounia at gmail.com; Kotsifakou, Maria; Krishnaiyer, Rakesh;
Lieberman, Ron; Lopez, Matthew Graham; lopezmg at ornl.org; Martin Kong;
Matt Martineau; Menard, Lorri; Monteleone, Robert; oscar at ornl.gov;
Rao, Premanand M; Rice, Michael P; Robichaux, Joseph;
gregory.rodgers at amd.com; Rokos, Georgios; Samuel Antao; Sarah
McNamara; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
Sharif, Hashim; Sjodin, Jan; Sunil Shrestha; Sunita Chandrasekaran;
Tian, Xinmin; Tianyi Zhang; vadve at illinois.edu; Wael Yehia; Wang
Chen; Wilmarth, Terry L
Subject: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
declare mapper codegen
Lingda, can we in scheme 1 precalculate the total number of
components, allocate memory for these precalculate number of
elements, then fill it with mappers and only after that call the
runtime function (only once!) to transfer the mappings to the
runtime?
Best regards,
Alexey Bataev
27 июня 2019 г., в 15:44, Deepak Eachempati <deachempat at cray.com>
написал(а):
Got it. Thanks.
-- Deepak
From: Li, Lingda [mailto:lli at bnl.gov]
Sent: Thursday, June 27, 2019 2:41 PM
To: Deepak Eachempati <deachempat at cray.com>;
Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>;
'Alexandre Eichenberger' <alexe at us.ibm.com>; 'Alexey
Bataev' <Alexey.Bataev at ibm.com>; Chapman, Barbara
(Contact) <barbara.chapman at stonybrook.edu>; Bobrovsky,
Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi' <
siuchi.chan at amd.com>; Cownie, James H <
james.h.cownie at intel.com>; David Oehmke <doehmke at cray.com
>; 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei
N <serguei.n.dmitriev at intel.com>; Doerfert, Johannes
Rudolf <jdoerfert at anl.gov>; 'Ettore Tiotto' <
etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com' <
fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea' <
Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <
hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com
>; 'Jeeva Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <
jrheath at ca.ibm.com>; Jeffrey Sandoval <sandoval at cray.com
>; Jones, Jeff C <jeff.c.jones at intel.com>; '
josem at udel.edu' <josem at udel.edu>; 'Kelvin Li' <
kli at ca.ibm.com>; 'Kevin K O'Brien' <caomhin at us.ibm.com>;
'khaldi.dounia at gmail.com' <khaldi.dounia at gmail.com>;
'Kotsifakou, Maria' <kotsifa2 at illinois.edu>; Krishnaiyer,
Rakesh <rakesh.krishnaiyer at intel.com>; Lieberman, Ron <
Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham' <
lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt
Martineau' <m.martineau at bristol.ac.uk>; Menard, Lorri <
lorri.menard at intel.com>; Monteleone, Robert <
robert.monteleone at intel.com>; oscar at ornl.gov; Rao,
Premanand M <premanand.m.rao at intel.com>; Rice, Michael P
<michael.p.rice at intel.com>; Robichaux, Joseph <
joseph.robichaux at intel.com>; gregory.rodgers at amd.com;
Rokos, Georgios <georgios.rokos at intel.com>; '
samuel.antao at ibm.com' <samuel.antao at ibm.com>; 'Sarah
McNamara' <mcnamara at ca.ibm.com>; '
sergey.y.ostanevich at gmail.com' <
sergey.y.ostanevich at gmail.com>; 'Sergio Pino Gallardo' <
sergiop at udel.edu>; 'Sharif, Hashim' <
hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>;
Sunil Shrestha <sshrestha at cray.com>; 'Sunita
Chandrasekaran' <schandra at udel.edu>; Tian, Xinmin <
xinmin.tian at intel.com>; Tianyi Zhang <tzhan18 at lsu.edu>; '
vadve at illinois.edu' <vadve at illinois.edu>; 'Wael Yehia' <
wyehia at ca.ibm.com>; 'Wang Chen' <wdchen at ca.ibm.com>;
Wilmarth, Terry L <terry.l.wilmarth at intel.com>
Subject: Re: Comparison of 2 schemes to implement OpenMP
5.0 declare mapper codegen
In the current scheme, all mappings within a mapper
function is done atomically by one thread. In the mapper
function of the example in the original email, <push>
will just push the mapping information into an internal
data structure. Once all mapping information is
available, the runtime will do the real mapping together.
For your example, the behavior is the same as the code
below:
...
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num() == 0) {
#pragma omp target map(s.x, s.p[0:s.x])
{
for (int i = 0; i < s.x; i++) s.p[i] = i;
}
} else {
#pragma omp target map(other_data)
{
// work on other_data
}
}
...
From: Deepak Eachempati <deachempat at cray.com>
Sent: Thursday, June 27, 2019 3:34 PM
To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre
Eichenberger'; 'Alexey Bataev'; Chapman, Barbara
(Contact); Bobrovsky, Konstantin S; 'Carlo Bertolli';
'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny,
Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com';
'Gheorghe-Teod Bercea'; Hal Finkel; 'jbeyer at nvidia.com';
'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones,
Jeff C; 'josem at udel.edu'; 'Kelvin Li'; 'Kevin K O'Brien';
'khaldi.dounia at gmail.com'; 'Kotsifakou, Maria';
Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt
Martineau'; Menard, Lorri; Monteleone, Robert;
oscar at ornl.gov; Rao, Premanand M; Rice, Michael P;
Robichaux, Joseph; gregory.rodgers at amd.com; Rokos,
Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara'; '
sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo';
'Sharif, Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita
Chandrasekaran'; Tian, Xinmin; Tianyi Zhang; '
vadve at illinois.edu'; 'Wael Yehia'; 'Wang Chen'; Wilmarth,
Terry L
Subject: RE: Comparison of 2 schemes to implement OpenMP
5.0 declare mapper codegen
I was referring to something like this, where another
thread is not trying to map the same data:
#pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
S s;
...
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num() == 0) {
#pragma omp target map(s)
{
for (int i = 0; i < s.x; i++) s.p[i] = i;
}
} else {
#pragma omp target map(other_data)
{
// work on other_data
}
}
...
Since I believe you are mapping s.x and s.p as separate
map operations, it is possible that another thread could
map ‘other_data’ in between those two maps. If this
happens, will your implementation still ensure that s.x
and s.p are positioned at the right offsets with respect
to the same base address (&s)?
-- Deepak
From: Li, Lingda [mailto:lli at bnl.gov]
Sent: Thursday, June 27, 2019 2:26 PM
To: Deepak Eachempati <deachempat at cray.com>;
Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>;
'Alexandre Eichenberger' <alexe at us.ibm.com>; 'Alexey
Bataev' <Alexey.Bataev at ibm.com>; Chapman, Barbara
(Contact) <barbara.chapman at stonybrook.edu>; Bobrovsky,
Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi' <
siuchi.chan at amd.com>; Cownie, James H <
james.h.cownie at intel.com>; David Oehmke <doehmke at cray.com
>; 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei
N <serguei.n.dmitriev at intel.com>; Doerfert, Johannes
Rudolf <jdoerfert at anl.gov>; 'Ettore Tiotto' <
etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com' <
fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea' <
Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <
hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com
>; 'Jeeva Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <
jrheath at ca.ibm.com>; Jeffrey Sandoval <sandoval at cray.com
>; Jones, Jeff C <jeff.c.jones at intel.com>; '
josem at udel.edu' <josem at udel.edu>; 'Kelvin Li' <
kli at ca.ibm.com>; 'Kevin K O'Brien' <caomhin at us.ibm.com>;
'khaldi.dounia at gmail.com' <khaldi.dounia at gmail.com>;
'Kotsifakou, Maria' <kotsifa2 at illinois.edu>; Krishnaiyer,
Rakesh <rakesh.krishnaiyer at intel.com>; Lieberman, Ron <
Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham' <
lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt
Martineau' <m.martineau at bristol.ac.uk>; Menard, Lorri <
lorri.menard at intel.com>; Monteleone, Robert <
robert.monteleone at intel.com>; oscar at ornl.gov; Rao,
Premanand M <premanand.m.rao at intel.com>; Rice, Michael P
<michael.p.rice at intel.com>; Robichaux, Joseph <
joseph.robichaux at intel.com>; gregory.rodgers at amd.com;
Rokos, Georgios <georgios.rokos at intel.com>; '
samuel.antao at ibm.com' <samuel.antao at ibm.com>; 'Sarah
McNamara' <mcnamara at ca.ibm.com>; '
sergey.y.ostanevich at gmail.com' <
sergey.y.ostanevich at gmail.com>; 'Sergio Pino Gallardo' <
sergiop at udel.edu>; 'Sharif, Hashim' <
hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>;
Sunil Shrestha <sshrestha at cray.com>; 'Sunita
Chandrasekaran' <schandra at udel.edu>; Tian, Xinmin <
xinmin.tian at intel.com>; Tianyi Zhang <tzhan18 at lsu.edu>; '
vadve at illinois.edu' <vadve at illinois.edu>; 'Wael Yehia' <
wyehia at ca.ibm.com>; 'Wang Chen' <wdchen at ca.ibm.com>;
Wilmarth, Terry L <terry.l.wilmarth at intel.com>
Subject: Re: Comparison of 2 schemes to implement OpenMP
5.0 declare mapper codegen
When 2 threads try to concurrently map the same data, it
behaves the same as when 2 threads concurrently map the
same data using map clauses, and mappers don't introduce
extra considerations here. For instance, both threads use
#omp target enter data concurrently.
When 2 threads concurrently maps the same data, my
understanding based on the current code is, it will
create 2 copies of the same data, either copy is
correctly to use. It may have a problem when both copies
are mapped back if not synchronized correctly, but this
is a programming issue, not the responsibility of OpenMP.
Thanks,
Lingda Li
From: Deepak Eachempati <deachempat at cray.com>
Sent: Thursday, June 27, 2019 3:17 PM
To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre
Eichenberger'; 'Alexey Bataev'; Chapman, Barbara
(Contact); Bobrovsky, Konstantin S; 'Carlo Bertolli';
'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny,
Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com';
'Gheorghe-Teod Bercea'; Hal Finkel; 'jbeyer at nvidia.com';
'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones,
Jeff C; 'josem at udel.edu'; 'Kelvin Li'; 'Kevin K O'Brien';
'khaldi.dounia at gmail.com'; 'Kotsifakou, Maria';
Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt
Martineau'; Menard, Lorri; Monteleone, Robert;
oscar at ornl.gov; Rao, Premanand M; Rice, Michael P;
Robichaux, Joseph; gregory.rodgers at amd.com; Rokos,
Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara'; '
sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo';
'Sharif, Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita
Chandrasekaran'; Tian, Xinmin; Tianyi Zhang; '
vadve at illinois.edu'; 'Wael Yehia'; 'Wang Chen'; Wilmarth,
Terry L
Subject: RE: Comparison of 2 schemes to implement OpenMP
5.0 declare mapper codegen
Thanks.
Is it possible for another thread to be concurrently
mapped something else while the maps from the mapper
function are taking place? If so, how do you guarantee
that the allocation for each component will get you the
right addresses in device memory? Sorry if this was
covered before and I missed it.
-- Deepak
From: Li, Lingda [mailto:lli at bnl.gov]
Sent: Thursday, June 27, 2019 2:08 PM
To: Deepak Eachempati <deachempat at cray.com>;
Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>;
'Alexandre Eichenberger' <alexe at us.ibm.com>; 'Alexey
Bataev' <Alexey.Bataev at ibm.com>; Chapman, Barbara
(Contact) <barbara.chapman at stonybrook.edu>; Bobrovsky,
Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi' <
siuchi.chan at amd.com>; Cownie, James H <
james.h.cownie at intel.com>; David Oehmke <doehmke at cray.com
>; 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei
N <serguei.n.dmitriev at intel.com>; Doerfert, Johannes
Rudolf <jdoerfert at anl.gov>; 'Ettore Tiotto' <
etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com' <
fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea' <
Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <
hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com
>; 'Jeeva Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <
jrheath at ca.ibm.com>; Jeffrey Sandoval <sandoval at cray.com
>; Jones, Jeff C <jeff.c.jones at intel.com>; '
josem at udel.edu' <josem at udel.edu>; 'Kelvin Li' <
kli at ca.ibm.com>; 'Kevin K O'Brien' <caomhin at us.ibm.com>;
'khaldi.dounia at gmail.com' <khaldi.dounia at gmail.com>;
'Kotsifakou, Maria' <kotsifa2 at illinois.edu>; Krishnaiyer,
Rakesh <rakesh.krishnaiyer at intel.com>; Lieberman, Ron <
Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham' <
lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt
Martineau' <m.martineau at bristol.ac.uk>; Menard, Lorri <
lorri.menard at intel.com>; Monteleone, Robert <
robert.monteleone at intel.com>; oscar at ornl.gov; Rao,
Premanand M <premanand.m.rao at intel.com>; Rice, Michael P
<michael.p.rice at intel.com>; Robichaux, Joseph <
joseph.robichaux at intel.com>; gregory.rodgers at amd.com;
Rokos, Georgios <georgios.rokos at intel.com>; '
samuel.antao at ibm.com' <samuel.antao at ibm.com>; 'Sarah
McNamara' <mcnamara at ca.ibm.com>; '
sergey.y.ostanevich at gmail.com' <
sergey.y.ostanevich at gmail.com>; 'Sergio Pino Gallardo' <
sergiop at udel.edu>; 'Sharif, Hashim' <
hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>;
Sunil Shrestha <sshrestha at cray.com>; 'Sunita
Chandrasekaran' <schandra at udel.edu>; Tian, Xinmin <
xinmin.tian at intel.com>; Tianyi Zhang <tzhan18 at lsu.edu>; '
vadve at illinois.edu' <vadve at illinois.edu>; 'Wael Yehia' <
wyehia at ca.ibm.com>; 'Wang Chen' <wdchen at ca.ibm.com>;
Wilmarth, Terry L <terry.l.wilmarth at intel.com>
Subject: Re: Comparison of 2 schemes to implement OpenMP
5.0 declare mapper codegen
Hi Deepak,
Yes, it handles this case. The first part of mapper
function (initially allocate space for the whole array)
is just an optimization, not required for correctness, as
suggested by you in an early discussion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/7326d4a5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/7326d4a5/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/7326d4a5/attachment-0001.gif>
More information about the cfe-dev
mailing list