[Openmp-dev] [cfe-dev] Comparison of 2 schemes to implement OpenMP 5.0 declare mapper codegen

Jonas Hahnfeld via Openmp-dev openmp-dev at lists.llvm.org
Sat Jun 29 05:39:58 PDT 2019


Hi Lingda,

may I ask to start discussions about important decisions related to 
Clang's OpenMP support on the public mailing list instead of having 
private conversations? That would help to get feedback from people not 
being part of the selected circle participating in the "OpenMP / HPC in 
Clang / LLVM Multi-company Telecom".

Thanks,
Jonas

On 2019-06-28 15:59, Lingda Li via cfe-dev wrote:
> On Fri, Jun 28, 2019 at 9:49 AM Li, Lingda <lli at bnl.gov> wrote:
> 
>> I don't think we can have the buffer allocated within the mapper
>> function. It has to be done in the runtime, because of nested
>> mappers.
>> First, all mapper functions are born in the same way. We cannot
>> make the outer most mapper function allocate memory, whether the
>> inner one doesn't and has to use what is allocated by the outer most
>> mapper function.
>> I suppose we still need to allocate memory in the runtime, so the
>> runtime can pass the pointer and size to the mapper function, and
>> the outer mapper function can then pass them into inner ones.
>> Again, this is just like the current implementation, except that we
>> don't use vecter::push_back(), instead we use something like a
>> manual implementation of vector::push_back() (because we need to use
>> the pointer and the current index)
>> 
>> I believe the key question here is whether it is true that (the
>> overhead of push_back() > the overhead of precalculating the total
>> number + the memory allocation overhead + directly memory write).
>> This will decide whether this change is necessary. Any opinions?
>> 
>> Thanks,
>> Lingda Li
>> 
>> -------------------------
>> 
>> FROM: Alexey Bataev <Alexey.Bataev at ibm.com>
>> SENT: Thursday, June 27, 2019 5:05 PM
>> TO: Li, Lingda
>> CC: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
>> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David
>> Oehmke; Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios;
>> Gheorghe-Teod Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif,
>> Hashim; Cownie, James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert,
>> Johannes Rudolf; Jones, Jeff C; josem at udel.edu; Robichaux, Joseph;
>> Jeff Heath; khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky,
>> Konstantin S; Kotsifakou, Maria; lopezmg at ornl.org; Lopez, Matthew
>> Graham; Menard, Lorri; Martin Kong; Sarah McNamara; Rice, Michael P;
>> Matt Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
>> Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert;
>> Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
>> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
>> Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry
>> L; Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian,
>> Xinmin
>> SUBJECT: Re: Re: Re: RE: Comparison of 2 schemes to implement OpenMP
>> 5.0 declare mapper codegen
>> 
>> Yes, we need 2 functions, but thw first one can be optimized very
>> effectively. After the optimizations and inlining it will end up
>> with just return s1+s2+s3... I think, inost cases those sizes will
>> be constant, since the mapper maps constant number of elements. And,
>> thus, this expression will be optimized to just a constant value.
>> You don't need to pass these functions to runtime. We can call the
>> directly from the compiler.
>> 1st call: get number of elements.
>> 2nd: allocate the buffer
>> 3rd call: call mapper with this preallocated buffer that fills this
>> buffer without any calls of the runtime functions.
>> 4th call: call the runtime to pass the buffer to the runtime.
>> 
>> Best regards,
>> Alexey Bataev
>> 
>> 27 июня 2019 г., в 16:53, Li, Lingda <lli at bnl.gov>
>> написал(а):
>> 
>>> If we precalculate the size, first, it means we need to generate
>>> 2 functions for each mapper, rather than 1 now. One for mapping
>>> information filling as we have, the other for size calculation
>>> (This will not return constant values, because size depends on how
>>> many instances we are mapping). Both these 2 functions will need
>>> to be passed to the runtime. The runtime will need to precalculate
>>> the number of components first, then allocate memory, then call
>>> the mapper function to fill it up.
>>> 
>>> Compared with the scheme 1, the differences are:
>>> 1) An extra call to calculate the total number, while scheme 1
>>> does not;
>>> 2) A preallocated buffer, whose pointer and the current number
>>> should be passed to the mapper function, then the mapper function
>>> uses them to fill components, while scheme 1 uses push_back() to
>>> do the same thing.
>>> 
>>> Is there really a benefit doing this? push_back() should be
>>> efficient enough compared with directly writing to memory.
>>> 
>>> If people here think that, the overhead of push_back() > the
>>> overhead of precalculating the total number + the memory
>>> allocation overhead + directly memory write, then we can consider
>>> this scheme.
>>> 
>>> Thanks,
>>> Lingda Li
>>> 
>>> -------------------------
>>> 
>>> FROM: Alexey Bataev <Alexey.Bataev at ibm.com>
>>> SENT: Thursday, June 27, 2019 4:26 PM
>>> TO: Li, Lingda
>>> CC: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
>>> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David
>>> Oehmke; Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios;
>>> Gheorghe-Teod Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif,
>>> Hashim; Cownie, James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert,
>>> Johannes Rudolf; Jones, Jeff C; josem at udel.edu; Robichaux, Joseph;
>>> Jeff Heath; khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky,
>>> Konstantin S; Kotsifakou, Maria; lopezmg at ornl.org; Lopez, Matthew
>>> Graham; Menard, Lorri; Martin Kong; Sarah McNamara; Rice, Michael
>>> P; Matt Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
>>> Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert;
>>> Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
>>> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino
>>> Gallardo; Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha;
>>> Wilmarth, Terry L; Tianyi Zhang; vadve at illinois.edu; Wang Chen;
>>> Wael Yehia; Tian, Xinmin
>>> SUBJECT: Re: Re: RE: Comparison of 2 schemes to implement OpenMP
>>> 5.0 declare mapper codegen
>>> 
>>> If the functions are inlined (the ines, intended for size
>>> precalculation). They can be optimized out very effectively since
>>> in most cases they will return constant values.
>>> If we could do this, we won't need vectors and oush_backs, we can
>>> use preallocated memory and internal counter.
>>> --------------
>>> Best regards,
>>> Alexey Bataev
>>> 
>>> <graycol.gif>"Li, Lingda" ---06/27/2019 04:13:03 PM---Hi Alexey, I
>>> think that's why we choose to use variable size storage like
>>> std::vector to store the m
>>> 
>>> From: "Li, Lingda" <lli at bnl.gov>
>>> To: Alexey Bataev <Alexey.Bataev at ibm.com>, Deepak Eachempati
>>> <deachempat at cray.com>
>>> Cc: "Narayanaswamy, Ravi" <ravi.narayanaswamy at intel.com>,
>>> "Alexandre Eichenberger" <alexe at us.ibm.com>, "Chapman, Barbara
>>> (Contact)" <barbara.chapman at stonybrook.edu>, "Bobrovsky,
>>> Konstantin S" <konstantin.s.bobrovsky at intel.com>, Carlo Bertolli
>>> <cbertol at us.ibm.com>, "Chan, SiuChi" <siuchi.chan at amd.com>,
>>> "Cownie, James H" <james.h.cownie at intel.com>, David Oehmke
>>> <doehmke at cray.com>, "Denny, Joel E." <dennyje at ornl.gov>,
>>> "Dmitriev, Serguei N" <serguei.n.dmitriev at intel.com>, "Doerfert,
>>> Johannes Rudolf" <jdoerfert at anl.gov>, Ettore Tiotto
>>> <etiotto at ca.ibm.com>, "fraggamuffin at gmail.com"
>>> <fraggamuffin at gmail.com>, Gheorghe-Teod Bercea
>>> <Gheorghe-Teod.Bercea at ibm.com>, Hal Finkel <hfinkel at anl.gov>,
>>> "jbeyer at nvidia.com" <jbeyer at nvidia.com>, Jeeva Paudel
>>> <pjeeva01 at ca.ibm.com>, Jeff Heath <jrheath at ca.ibm.com>, Jeffrey
>>> Sandoval <sandoval at cray.com>, "Jones, Jeff C"
>>> <jeff.c.jones at intel.com>, "josem at udel.edu" <josem at udel.edu>,
>>> Kelvin Li <kli at ca.ibm.com>, "Kevin K O'Brien"
>>> <caomhin at us.ibm.com>, "khaldi.dounia at gmail.com"
>>> <khaldi.dounia at gmail.com>, "Kotsifakou, Maria"
>>> <kotsifa2 at illinois.edu>, "Krishnaiyer, Rakesh"
>>> <rakesh.krishnaiyer at intel.com>, "Lieberman, Ron"
>>> <Ron.Lieberman at amd.com>, "Lopez, Matthew Graham"
>>> <lopezmg at ornl.gov>, "lopezmg at ornl.org" <lopezmg at ornl.org>, Martin
>>> Kong <martin.richard.kong at gmail.com>, Matt Martineau
>>> <m.martineau at bristol.ac.uk>, "Menard, Lorri"
>>> <lorri.menard at intel.com>, "Monteleone, Robert"
>>> <robert.monteleone at intel.com>, "oscar at ornl.gov" <oscar at ornl.gov>,
>>> "Rao, Premanand M" <premanand.m.rao at intel.com>, "Rice, Michael P"
>>> <michael.p.rice at intel.com>, "Robichaux, Joseph"
>>> <joseph.robichaux at intel.com>, "gregory.rodgers at amd.com"
>>> <gregory.rodgers at amd.com>, "Rokos, Georgios"
>>> <georgios.rokos at intel.com>, Samuel Antao <Samuel.Antao at ibm.com>,
>>> "Sarah McNamara" <mcnamara at ca.ibm.com>,
>>> "sergey.y.ostanevich at gmail.com" <sergey.y.ostanevich at gmail.com>,
>>> Sergio Pino Gallardo <sergiop at udel.edu>, "Sharif, Hashim"
>>> <hsharif3 at illinois.edu>, "Sjodin, Jan" <Jan.Sjodin at amd.com>, Sunil
>>> Shrestha <sshrestha at cray.com>, Sunita Chandrasekaran
>>> <schandra at udel.edu>, "Tian, Xinmin" <xinmin.tian at intel.com>,
>>> Tianyi Zhang <tzhan18 at lsu.edu>, "vadve at illinois.edu"
>>> <vadve at illinois.edu>, Wael Yehia <wyehia at ca.ibm.com>, Wang Chen
>>> <wdchen at ca.ibm.com>, "Wilmarth, Terry L"
>>> <terry.l.wilmarth at intel.com>
>>> Date: 06/27/2019 04:13 PM
>>> Subject: [EXTERNAL] Re: RE: Comparison of 2 schemes to implement
>>> OpenMP 5.0 declare mapper codegen
>>> 
>>> -------------------------
>>> 
>>> Hi Alexey,
>>> 
>>> I think that's why we choose to use variable size storage like
>>> std::vector to store the mapping information at the first place,
>>> right? It'll be costly to precalculate the total number of
>>> components, especially in the presence of nested mappers. Besides,
>>> a runtime function call is just a std::vector::push, so I think
>>> it's okay to have multiple function calls.
>>> 
>>> Thanks,
>>> Lingda Li
>>> 
>>> -------------------------
>>> 
>>> FROM: Alexey Bataev <Alexey.Bataev at ibm.com>
>>> Sent: Thursday, June 27, 2019 3:52 PM
>>> To: Deepak Eachempati
>>> Cc: Li, Lingda; Narayanaswamy, Ravi; Alexandre Eichenberger;
>>> Chapman, Barbara (Contact); Bobrovsky, Konstantin S; Carlo
>>> Bertolli; Chan, SiuChi; Cownie, James H; David Oehmke; Denny, Joel
>>> E.; Dmitriev, Serguei N; Doerfert, Johannes Rudolf; Ettore Tiotto;
>>> fraggamuffin at gmail.com; Gheorghe-Teod Bercea; Hal Finkel;
>>> jbeyer at nvidia.com; Jeeva Paudel; Jeff Heath; Jeffrey Sandoval;
>>> Jones, Jeff C; josem at udel.edu; Kelvin Li; Kevin K O'Brien;
>>> khaldi.dounia at gmail.com; Kotsifakou, Maria; Krishnaiyer, Rakesh;
>>> Lieberman, Ron; Lopez, Matthew Graham; lopezmg at ornl.org; Martin
>>> Kong; Matt Martineau; Menard, Lorri; Monteleone, Robert;
>>> oscar at ornl.gov; Rao, Premanand M; Rice, Michael P; Robichaux,
>>> Joseph; gregory.rodgers at amd.com; Rokos, Georgios; Samuel Antao;
>>> Sarah McNamara; sergey.y.ostanevich at gmail.com; Sergio Pino
>>> Gallardo; Sharif, Hashim; Sjodin, Jan; Sunil Shrestha; Sunita
>>> Chandrasekaran; Tian, Xinmin; Tianyi Zhang; vadve at illinois.edu;
>>> Wael Yehia; Wang Chen; Wilmarth, Terry L
>>> Subject: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> Lingda, can we in scheme 1 precalculate the total number of
>>> components, allocate memory for these precalculate number of
>>> elements, then fill it with mappers and only after that call the
>>> runtime function (only once!) to transfer the mappings to the
>>> runtime?
>>> 
>>> Best regards,
>>> Alexey Bataev
>>> 
>>> 27 июня 2019 г., в 15:44, Deepak Eachempati
>>> <deachempat at cray.com> написал(а):
>>> 
>>> Got it. Thanks.
>>> 
>>> -- Deepak
>>> 
>>> FROM: Li, Lingda [mailto:lli at bnl.gov]
>>> Sent: Thursday, June 27, 2019 2:41 PM
>>> To: Deepak Eachempati <deachempat at cray.com>; Narayanaswamy, Ravi
>>> <ravi.narayanaswamy at intel.com>; 'Alexandre Eichenberger'
>>> <alexe at us.ibm.com>; 'Alexey Bataev' <Alexey.Bataev at ibm.com>;
>>> Chapman, Barbara (Contact) <barbara.chapman at stonybrook.edu>;
>>> Bobrovsky, Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
>>> Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi'
>>> <siuchi.chan at amd.com>; Cownie, James H <james.h.cownie at intel.com>;
>>> David Oehmke <doehmke at cray.com>; 'Denny, Joel E.'
>>> <dennyje at ornl.gov>; Dmitriev, Serguei N
>>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
>>> <jdoerfert at anl.gov>; 'Ettore Tiotto' <etiotto at ca.ibm.com>;
>>> 'fraggamuffin at gmail.com' <fraggamuffin at gmail.com>; 'Gheorghe-Teod
>>> Bercea' <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel
>>> <hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva
>>> Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>;
>>> Jeffrey Sandoval <sandoval at cray.com>; Jones, Jeff C
>>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
>>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
>>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
>>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
>>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
>>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
>>> <Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham'
>>> <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>; 'Martin
>>> Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
>>> <m.martineau at bristol.ac.uk>; Menard, Lorri
>>> <lorri.menard at intel.com>; Monteleone, Robert
>>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
>>> <premanand.m.rao at intel.com>; Rice, Michael P
>>> <michael.p.rice at intel.com>; Robichaux, Joseph
>>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
>>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
>>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
>>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
>>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
>>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
>>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
>>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
>>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
>>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
>>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
>>> <terry.l.wilmarth at intel.com>
>>> Subject: Re: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> In the current scheme, all mappings within a mapper function is
>>> done atomically by one thread. In the mapper function of the
>>> example in the original email, <push> will just push the mapping
>>> information into an internal data structure. Once all mapping
>>> information is available, the runtime will do the real mapping
>>> together. For your example, the behavior is the same as the code
>>> below:
>>> 
>>> ...
>>> #pragma omp parallel num_threads(2)
>>> {
>>> if (omp_get_thread_num() == 0) {
>>> #pragma omp target map(s.x, s.p[0:s.x])
>>> {
>>> for (int i = 0; i < s.x; i++) s.p[i] = i;
>>> }
>>> } else {
>>> #pragma omp target map(other_data)
>>> {
>>> // work on other_data
>>> }
>>> }
>>> ...
>>> 
>>> -------------------------
>>> FROM: Deepak Eachempati <deachempat at cray.com>
>>> Sent: Thursday, June 27, 2019 3:34 PM
>>> To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger';
>>> 'Alexey Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin
>>> S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H; David
>>> Oehmke; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
>>> Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com'; 'Gheorghe-Teod
>>> Bercea'; Hal Finkel; 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff
>>> Heath'; Jeffrey Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin
>>> Li'; 'Kevin K O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou,
>>> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
>>> Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt Martineau';
>>> Menard, Lorri; Monteleone, Robert; oscar at ornl.gov; Rao, Premanand
>>> M; Rice, Michael P; Robichaux, Joseph; gregory.rodgers at amd.com;
>>> Rokos, Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara';
>>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
>>> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran';
>>> Tian, Xinmin; Tianyi Zhang; 'vadve at illinois.edu'; 'Wael Yehia';
>>> 'Wang Chen'; Wilmarth, Terry L
>>> Subject: RE: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> I was referring to something like this, where another thread is
>>> not trying to map the same data:
>>> 
>>> #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
>>> S s;
>>> ...
>>> #pragma omp parallel num_threads(2)
>>> {
>>> if (omp_get_thread_num() == 0) {
>>> #pragma omp target map(s)
>>> {
>>> for (int i = 0; i < s.x; i++) s.p[i] = i;
>>> }
>>> } else {
>>> #pragma omp target map(other_data)
>>> {
>>> // work on other_data
>>> }
>>> }
>>> ...
>>> 
>>> Since I believe you are mapping s.x and s.p as separate map
>>> operations, it is possible that another thread could map
>>> ‘other_data’ in between those two maps. If this happens, will
>>> your implementation still ensure that s.x and s.p are positioned
>>> at the right offsets with respect to the same base address (&s)?
>>> 
>>> -- Deepak
>>> 
>>> FROM: Li, Lingda [mailto:lli at bnl.gov]
>>> Sent: Thursday, June 27, 2019 2:26 PM
>>> To: Deepak Eachempati <deachempat at cray.com>; Narayanaswamy, Ravi
>>> <ravi.narayanaswamy at intel.com>; 'Alexandre Eichenberger'
>>> <alexe at us.ibm.com>; 'Alexey Bataev' <Alexey.Bataev at ibm.com>;
>>> Chapman, Barbara (Contact) <barbara.chapman at stonybrook.edu>;
>>> Bobrovsky, Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
>>> Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi'
>>> <siuchi.chan at amd.com>; Cownie, James H <james.h.cownie at intel.com>;
>>> David Oehmke <doehmke at cray.com>; 'Denny, Joel E.'
>>> <dennyje at ornl.gov>; Dmitriev, Serguei N
>>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
>>> <jdoerfert at anl.gov>; 'Ettore Tiotto' <etiotto at ca.ibm.com>;
>>> 'fraggamuffin at gmail.com' <fraggamuffin at gmail.com>; 'Gheorghe-Teod
>>> Bercea' <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel
>>> <hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva
>>> Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>;
>>> Jeffrey Sandoval <sandoval at cray.com>; Jones, Jeff C
>>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
>>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
>>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
>>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
>>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
>>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
>>> <Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham'
>>> <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>; 'Martin
>>> Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
>>> <m.martineau at bristol.ac.uk>; Menard, Lorri
>>> <lorri.menard at intel.com>; Monteleone, Robert
>>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
>>> <premanand.m.rao at intel.com>; Rice, Michael P
>>> <michael.p.rice at intel.com>; Robichaux, Joseph
>>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
>>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
>>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
>>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
>>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
>>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
>>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
>>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
>>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
>>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
>>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
>>> <terry.l.wilmarth at intel.com>
>>> Subject: Re: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> When 2 threads try to concurrently map the same data, it behaves
>>> the same as when 2 threads concurrently map the same data using
>>> map clauses, and mappers don't introduce extra considerations
>>> here. For instance, both threads use #omp target enter data
>>> concurrently.
>>> 
>>> When 2 threads concurrently maps the same data, my understanding
>>> based on the current code is, it will create 2 copies of the same
>>> data, either copy is correctly to use. It may have a problem when
>>> both copies are mapped back if not synchronized correctly, but
>>> this is a programming issue, not the responsibility of OpenMP.
>>> 
>>> Thanks,
>>> Lingda Li
>>> 
>>> -------------------------
>>> FROM: Deepak Eachempati <deachempat at cray.com>
>>> Sent: Thursday, June 27, 2019 3:17 PM
>>> To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger';
>>> 'Alexey Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin
>>> S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H; David
>>> Oehmke; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
>>> Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com'; 'Gheorghe-Teod
>>> Bercea'; Hal Finkel; 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff
>>> Heath'; Jeffrey Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin
>>> Li'; 'Kevin K O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou,
>>> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
>>> Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt Martineau';
>>> Menard, Lorri; Monteleone, Robert; oscar at ornl.gov; Rao, Premanand
>>> M; Rice, Michael P; Robichaux, Joseph; gregory.rodgers at amd.com;
>>> Rokos, Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara';
>>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
>>> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran';
>>> Tian, Xinmin; Tianyi Zhang; 'vadve at illinois.edu'; 'Wael Yehia';
>>> 'Wang Chen'; Wilmarth, Terry L
>>> Subject: RE: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> Thanks.
>>> 
>>> Is it possible for another thread to be concurrently mapped
>>> something else while the maps from the mapper function are taking
>>> place? If so, how do you guarantee that the allocation for each
>>> component will get you the right addresses in device memory? Sorry
>>> if this was covered before and I missed it.
>>> 
>>> -- Deepak
>>> 
>>> FROM: Li, Lingda [mailto:lli at bnl.gov]
>>> Sent: Thursday, June 27, 2019 2:08 PM
>>> To: Deepak Eachempati <deachempat at cray.com>; Narayanaswamy, Ravi
>>> <ravi.narayanaswamy at intel.com>; 'Alexandre Eichenberger'
>>> <alexe at us.ibm.com>; 'Alexey Bataev' <Alexey.Bataev at ibm.com>;
>>> Chapman, Barbara (Contact) <barbara.chapman at stonybrook.edu>;
>>> Bobrovsky, Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
>>> Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi'
>>> <siuchi.chan at amd.com>; Cownie, James H <james.h.cownie at intel.com>;
>>> David Oehmke <doehmke at cray.com>; 'Denny, Joel E.'
>>> <dennyje at ornl.gov>; Dmitriev, Serguei N
>>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
>>> <jdoerfert at anl.gov>; 'Ettore Tiotto' <etiotto at ca.ibm.com>;
>>> 'fraggamuffin at gmail.com' <fraggamuffin at gmail.com>; 'Gheorghe-Teod
>>> Bercea' <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel
>>> <hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva
>>> Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>;
>>> Jeffrey Sandoval <sandoval at cray.com>; Jones, Jeff C
>>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
>>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
>>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
>>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
>>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
>>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
>>> <Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham'
>>> <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>; 'Martin
>>> Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
>>> <m.martineau at bristol.ac.uk>; Menard, Lorri
>>> <lorri.menard at intel.com>; Monteleone, Robert
>>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
>>> <premanand.m.rao at intel.com>; Rice, Michael P
>>> <michael.p.rice at intel.com>; Robichaux, Joseph
>>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
>>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
>>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
>>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
>>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
>>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
>>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
>>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
>>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
>>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
>>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
>>> <terry.l.wilmarth at intel.com>
>>> Subject: Re: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> Hi Deepak,
>>> 
>>> Yes, it handles this case. The first part of mapper function
>>> (initially allocate space for the whole array) is just an
>>> optimization, not required for correctness, as suggested by you in
>>> an early discussion.
>>> 
>>> In your example, s.x and s.p will be allocated separately (not in
>>> a single allocation). But Clang guarantees that their addresses
>>> will be correct because s.x and s.p share the same base address,
>>> which is &s.
>>> 
>>> Thanks,
>>> Lingda Li
>>> 
>>> -------------------------
>>> FROM: Deepak Eachempati <deachempat at cray.com>
>>> Sent: Thursday, June 27, 2019 2:49 PM
>>> To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger';
>>> 'Alexey Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin
>>> S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H; David
>>> Oehmke; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
>>> Rudolf ; 'estotzer at ti.com'; 'Ettore Tiotto';
>>> 'fraggamuffin at gmail.com'; 'Gheorghe-Teod Bercea'; Hal Finkel;
>>> 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff Heath'; Jeffrey
>>> Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin Li'; 'Kevin K
>>> O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou, Maria';
>>> Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham';
>>> 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt Martineau'; Menard,
>>> Lorri; Monteleone, Robert; oscar at ornl.gov; Rao, Premanand M; Rice,
>>> Michael P; Robichaux, Joseph; gregory.rodgers at amd.com; Rokos,
>>> Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara';
>>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
>>> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran';
>>> Tian, Xinmin; Tianyi Zhang; 'vadve at illinois.edu'; 'Wael Yehia';
>>> 'Wang Chen'; Wilmarth, Terry L
>>> Subject: RE: Comparison of 2 schemes to implement OpenMP 5.0
>>> declare mapper codegen
>>> 
>>> For Scheme 1, it looks like you are doing separate maps for each
>>> component when size == 1. It seems like the first and last if
>>> statements should have “size >= 1” rather than “size > 1”.
>>> 
>>> If the mapper is declared like this:
>>> 
>>> struct S {
>>> int x;
>>> ... // other stuff
>>> int *p;
>>> };
>>> 
>>> #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
>>> 
>>> And you have:
>>> 
>>> S s;
>>> ...
>>> #pragma omp target map(s)
>>> {
>>> for (int i = 0; i < s.x; i++) s.p[i] = i;
>>> }
>>> 
>>> Since the target construct is just mapping a single structure of
>>> type S, there should be one map that takes care of mapping storage
>>> for s.x and s.p with a single allocation, and a separate map for
>>> the array section s.p[0:s.x], and finally the pointer attachment
>>> of s.p to s.p[0:s.x]. Does Scheme 1 handle this?
>>> 
>>> -- Deepak
>>> 
>>> FROM: Li, Lingda [mailto:lli at bnl.gov]
>>> Sent: Thursday, June 27, 2019 1:07 PM
>>> To: Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>; 'Alexandre
>>> Eichenberger' <alexe at us.ibm.com>; 'Alexey Bataev'
>>> <Alexey.Bataev at ibm.com>; Chapman, Barbara (Contact)
>>> <barbara.chapman at stonybrook.edu>; Bobrovsky, Konstantin S
>>> <konstantin.s.bobrovsky at intel.com>; 'Carlo Bertolli'
>>> <cbertol at us.ibm.com>; 'Chan, SiuChi' <siuchi.chan at amd.com>;
>>> Cownie, James H <james.h.cownie at intel.com>; David Oehmke
>>> <doehmke at cray.com>; Deepak Eachempati <deachempat at cray.com>;
>>> 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei N
>>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
>>> <jdoerfert at anl.gov>; 'estotzer at ti.com' <estotzer at ti.com>; 'Ettore
>>> Tiotto' <etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com'
>>> <fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea'
>>> <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <hfinkel at anl.gov>;
>>> 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva Paudel'
>>> <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>; Jeffrey
>>> Sandoval <sandoval at cray.com>; Jones, Jeff C
>>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
>>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
>>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
>>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
>>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
>>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
>>> <Ron.Lieberman at amd.com>; Li, Lingda <lli at bnl.gov>; 'Lopez, Matthew
>>> Graham' <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
>>> 'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
>>> <m.martineau at bristol.ac.uk>; Menard, Lorri
>>> <lorri.menard at intel.com>; Monteleone, Robert
>>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
>>> <premanand.m.rao at intel.com>; Rice, Michael P
>>> <michael.p.rice at intel.com>; Robichaux, Joseph
>>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
>>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
>>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
>>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
>>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
>>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
>>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
>>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
>>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
>>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
>>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
>>> <terry.l.wilmarth at intel.com>
>>> Subject: Comparison of 2 schemes to implement OpenMP 5.0 declare
>>> mapper codegen
>>> 
>>> Hi,
>>> 
>>> Alexey and I would like to have your attention on an ongoing
>>> discussion of 2 schemes to implement the declare mapper in OpenMP
>>> 5.0. The detailed discussion can be found at
>>> https://reviews.llvm.org/D59474 [1]
>>> 
>>> Scheme 1 (the one has been implemented by me in
>>> https://reviews.llvm.org/D59474 [1]):
>>> The detailed design can be found at
>>> 
>> 
> https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
>>> [2]
>>> For each mapper function, the compiler generates a function like
>>> this:
>>> 
>>> ```
>>> void <type>.mapper(void *base, void *begin, size_t size, int64_t
>>> type) {
>>> // Allocate space for an array section first.
>>> if (size > 1 && !maptype.IsDelete)
>>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>>> 
>>> // Map members.
>>> for (unsigned i = 0; i < size; i++) {
>>> // For each component specified by this mapper:
>>> for (auto c : components) {
>>> ...; // code to generate c.arg_base, c.arg_begin, c.arg_size,
>>> c.arg_type
>>> if (c.hasMapper())
>>> (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>>> else
>>> <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>>> }
>>> }
>>> // Delete the array section.
>>> if (size > 1 && maptype.IsDelete)
>>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>>> }
>>> ```
>>> This function is passed to the OpenMP runtime, and the runtime
>>> will call this function to finish the data mapping.
>>> 
>>> Scheme 2 (which Alexey proposes):
>>> Alexey proposes to move parts of the mapper function above into
>>> the OpenMP runtime, so the compiler will generate code below:
>>> ```
>>> void <type>.mapper(void *base, void *begin, size_t size, int64_t
>>> type) {
>>> ...; // code to generate arg_base, arg_begin, arg_size, arg_type,
>>> arg_mapper.
>>> auto sub_components[] = {...}; // fill in generated begin, base,
>>> ...
>>> __tgt_mapper(base, begin, size, type, sub_components);
>>> }
>>> ```
>>> 
>>> `__tgt_mapper` is a runtime function as below:
>>> ```
>>> void __tgt_mapper(void *base, void *begin, size_t size, int64_t
>>> type, auto components[]) {
>>> // Allocate space for an array section first.
>>> if (size > 1 && !maptype.IsDelete)
>>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>>> 
>>> // Map members.
>>> for (unsigned i = 0; i < size; i++) {
>>> // For each component specified by this mapper:
>>> for (auto c : components) {
>>> if (c.hasMapper())
>>> (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>>> else
>>> <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>>> }
>>> }
>>> // Delete the array section.
>>> if (size > 1 && maptype.IsDelete)
>>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>>> }
>>> ```
>>> 
>>> Comparison:
>>> Why to choose 1 over 2:
>>> 1. In scheme 2, the compiler needs to generate all map types and
>>> pass them to __tgt_mapper through sub_components. But in this
>>> case, the compiler won't be able to generate the correct MEMBER_OF
>>> field in map type. As a result, the runtime has to fix it using
>>> the mechanism we already have here: __tgt_mapper_num_components.
>>> This not only increases complexity, but also, it means the runtime
>>> needs further manipulation of the map type, which creates locality
>>> issues. While in the current scheme, the map type is generated by
>>> compiler once, so the data locality will be very good in this
>>> case.
>>> 2. In scheme 2, sub_components includes all components that should
>>> be mapped. If we are mapping an array, this means we need to map
>>> many components, which will need to allocate memory for
>>> sub_components in the heap. This creates further memory management
>>> burden and is not an efficient way to use memory.
>>> 3. In scheme 1, we are able to inline nested mapper functions. As
>>> a result, the compiler can do further optimizations to optimize
>>> the mapper function, e.g., eliminate redundant computation, loop
>>> unrolling, and thus achieve potentially better performance. We
>>> cannot achieve these optimizations in scheme 2.
>>> 
>>> Why to choose 2 over 1:
>>> 1. Less code in the mapper function codegen (I doubt this because
>>> the codegen function of scheme 1 uses less than 200 loc)
>>> Alexey may have other reasons.
>>> 
>>> We will appreciate if you can share your thoughts.
>>> 
>>> Thanks,
>>> Lingda Li
>>> 
>>> -------------------------
>>> FROM: Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>
>>> Sent: Wednesday, June 19, 2019 3:09 PM
>>> To: 'Alexandre Eichenberger'; 'Alexey Bataev';
>>> 'barbara.chapman at stonybrook.edu'; Bobrovsky, Konstantin S; 'Carlo
>>> Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; Deepak
>>> Eachempati; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert,
>>> Johannes Rudolf ; 'estotzer at ti.com'; 'Ettore Tiotto';
>>> 'fraggamuffin at gmail.com'; 'Gheorghe-Teod Bercea';
>>> 'hfinkel at anl.gov'; 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff
>>> Heath'; Jeffrey Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin
>>> Li'; 'Kevin K O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou,
>>> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'lli at bnl.gov';
>>> 'Lopez, Matthew Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt
>>> Martineau'; Menard, Lorri; Monteleone, Robert; Narayanaswamy,
>>> Ravi; 'Oscar R. Hernandez'; Rao, Premanand M; Rice, Michael P;
>>> Robichaux, Joseph; Rodgers, Gregory; Rokos, Georgios;
>>> 'samuel.antao at ibm.com'; 'Sarah McNamara';
>>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
>>> Hashim'; Sjodin, Jan ; Sunil Shrestha (sshrestha at cray.com);
>>> 'Sunita Chandrasekaran'; Tian, Xinmin; Tianyi Zhang;
>>> 'vadve at illinois.edu'; 'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
>>> Subject: OpenMP / HPC in Clang / LLVM Multi-company Telecom
>>> Meeting Minutes June 19th 2019
>>> 
>>> NEXT MEETING : JULY 10TH (MOVED FROM JULY 3RD)
>>> 
>>> OPENS :
>>> - DOCUMENTATION
>>> - Greg : Can we have documents for libopenmp and Libomptarget.
>>> - Alexey suggested having 3 documents: libopenmp, Libomptarget and
>>> device plugin
>>> - Hal will convert the existing libomptarget document. Once done
>>> others can update document to capture the existing implementation
>>> Future addition to libomptarget will also require update to
>>> document.
>>> - Next libopenmp document will be created if it does not exist or
>>> updated if one exists.
>>> 
>>> LTO FOR FAT BINARY LINKING
>>> - Serguei (Intel) has an implementation which enables LTO and
>>> doing away with linker scripts.
>>> Everybody agreed this is a good idea, especially some linkers
>>> don’t have support for linker scripts.
>>> AMD is interested in enabling enabling LTO and will like to see
>>> the code
>>> Serguei to post the code to get feedback from all
>>> - Hal to present in next meeting his proposal to support static
>>> fat archives using LTO.
>>> 
>>> OPENMP 5.0 FEATURES
>>> - No update on setting up the public website. Johannes was out
>>> attending ISC.
>>> - New features added since last release (courtesy of Kelvin)
>>> - allocate clause/allocate directive - parsing+sema, codegen
>>> - mutexinout dependence-type for task
>>> - user-defined mapper (declare mapper) - parsing+sema.
>>> - omp_get_device_num() API routine
>>> 
>>> DEVELOPMENT ACTIVITY
>>> - ASYNC API
>>> Support in Clang and libopenmp including lit test had been checked
>>> in by Doru
>>> 
>>> - MAPPER SUPPORT
>>> Initial support for Mapper has been posted for review Lingda. Once
>>> approved, the rest of the support will be done
>>> Lingda : Should the old API being replaced by the new similar API
>>> with extra mapper argument be obsoleted
>>> Suggestion was for clang to not generated but keep the API in
>>> libomptarget for backward compatible. In the future it can be
>>> obsoleted
>>> 
>>> - REQUIRED DIRECTIVES
>>> Support for required directives has been checked in by Doru.
>>> There was one issue with checking for requires directive and
>>> confirming it the Declare type is TO or LINK.
>>> Doru removed the check and added note to make sure if things
>>> change in future need to modify this code.
>>> 
>>> ROLL CALL :
>>> 
>>> COMPANY
>>> ATTENDEES
>>> 
>>> 19-JUN
>>> 
>>> AMD
>>> 
>>> Greg Rodgers
>>> 
>>> x
>>> 
>>> Ashwin Aji
>>> 
>>> Jan Sjodin
>>> 
>>> x
>>> 
>>> Ron Lieberman
>>> 
>>> x
>>> 
>>> sameer Sahasrabuddhe
>>> 
>>> Andrey Kasaurov
>>> 
>>> ANL
>>> Hal Finkel
>>> 
>>> x
>>> 
>>> Johannes Doerfert
>>> 
>>> IBM
>>> Alexandre Eichenberger
>>> 
>>> Carlo Bertolli
>>> 
>>> Kelvin Li
>>> 
>>> Doru
>>> 
>>> x
>>> 
>>> Alexey Bataev
>>> 
>>> x
>>> 
>>> INTEL
>>> Andrey Churbanov
>>> 
>>> Ravi Narayanaswamy
>>> 
>>> x
>>> 
>>> Serguei Dmitriev
>>> 
>>> x
>>> 
>>> Rajiv Deodhar
>>> 
>>> Lorri Menard
>>> 
>>> Terry Wilmarth
>>> 
>>> Rao, Prem
>>> 
>>> Hansang Bae
>>> 
>>> George Rokos
>>> 
>>> x
>>> 
>>> CRAY
>>> Deepak Eachempati
>>> 
>>> x
>>> 
>>> MICRON
>>> John Leidel
>>> 
>>> NVIDIA
>>> James Beyer
>>> 
>>> x
>>> 
>>> ORNL
>>> Graham Lopez
>>> 
>>> Joel Denny
>>> 
>>> Geoffroy Vallee
>>> 
>>> Oscar Hernandez
>>> 
>>> SBU/BNL
>>> Lingda Li
>>> 
>>> x
>>> 
>>> Jose Monlsave
>>> 
>>> Martin Kong
>>> 
>>> TI
>>> Eric Stotzer
>>> 
>>> U OF BRISTOL
>>> Mat Martineau
>>> 
>>> U OF DELAWARE
>>> Sunita Chandrasekaran
>>> 
>>> U OF ILLINOIS
>>> Hashim Sharif
>>> 
>>> RICE
>>> John Mellor-Crummey
>>> 
>>> LSU
>>> Tianyi Zhang
>>> 
>>> 
>> 
> .........................................................................................................................................
>>> àJoin Skype Meeting [3]
>>> 
>>> Trouble Joining? Try Skype Web App [4]
>>> 
>>> Join by phone
>>> +1(916)356-2663 (or your local bridge access #) Choose bridge 5.
>>> [5] (Global) English (United States)
>>> Find a local number [6]
>>> 
>>> Conference ID: 7607896966
>>> Forgot your dial-in PIN? [6] |Help [7]
>>> 
>>> [!OC([1033])!]
>>> 
>> 
> .........................................................................................................................................
> 
> 
> Links:
> ------
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D59474&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=EVaPRpEtSzi0Y56zmjD5fXRzN87UZDOaYp5PY3TXiVQ&e=
> [2]
> https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
> [3]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=K4msFCmDvK4n0MdVQd7UTXRRvRkaNwLzMaP8fnX0iOg&e=
> [4]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR-3Fsl-3D1&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=krI3wEp2z8GhcZt6feFq3WgaBjcEoTDRk-GvI1BIdO8&e=
> [5]
> tel:+1(916)356-2663%20(or%20your%20local%20bridge%20access%20#)%20Choose%20bridge%205.
> [6]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dial.intel.com&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=g2dQtoTqaRXyBMaIUpfyoPFDRTtrQbgbWbb9b90tgBg&e=
> [7]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__o15.officeredir.microsoft.com_r_rlidLync15-3Fclid-3D1033-26p1-3D5-26p2-3D2009&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=6OCBXxzOIJfra2Pewq_p-l2pY3MyKnuG-TLr7M1xq-s&e=
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


More information about the Openmp-dev mailing list