<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="moz-cite-prefix">On 12/06/2017 10:23 PM, Jeff Hammond
wrote:<br>
</div>
<blockquote
cite="mid:CAGKz=uKdW7SDREwSLP=ynCUndT+J1FmuwGbPmP5XbAUYSq=NGA@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div><br>
<div class="gmail_quote">
<div dir="auto">On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <<a
moz-do-not-send="true" href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="m_2065056468622040889moz-cite-prefix">On
12/04/2017 10:48 PM, Serge Preis via cfe-dev wrote:<br>
</div>
<blockquote type="cite">
<div>I agree that guarantees provided by ICC may be
stronger than with other compilers, so yes, under
OpenMP terms vectorization is permitted and cannot be
assumed. However OpenMP clearly defines semantics of
variables used within OpenMP region some being
shared(scalar), some private(vector) and some being
inductions. This goes far beyond typical compiler
specific pragmas about dependencies and cost modelling
and makes vectorization much simpler task with more
predictable and robust results if properly implemented
(admittedly, even ICC implementation is far from
perfect). I hope Intel's efforts to standardize
someting like this in core C++ will evntually come to
fruition. Until then I as a regular application
developer would appreciate OpenMP-simd based execution
policy (hoping for good support for OpenMP SIMD in
clang), but it shouldn't necessary be part of libc++.
Since 'unordered' execution policy is currently not
part of C++ standard </div>
</blockquote>
<br>
</div>
<div bgcolor="#FFFFFF" text="#000000">
std::execution::par_unseq is part of C++17, and that
essentially maps to '#pragma omp parallel for simd'.</div>
<div bgcolor="#FFFFFF" text="#000000"><br>
</div>
</blockquote>
<div dir="auto"><br>
</div>
<div dir="auto">Do you expect par/par_unseq to nest?</div>
</div>
</div>
</blockquote>
<br>
Yes.<br>
<br>
<blockquote
cite="mid:CAGKz=uKdW7SDREwSLP=ynCUndT+J1FmuwGbPmP5XbAUYSq=NGA@mail.gmail.com"
type="cite">
<div>
<div class="gmail_quote">
<div dir="auto"> Nesting omp-parallel is generally regarded as
a Bad Idea.</div>
</div>
</div>
</blockquote>
<br>
Agreed. I suspect we'll want the mapping to be more like '#pragma
omp taskloop simd'.<br>
<br>
-Hal<br>
<br>
<blockquote
cite="mid:CAGKz=uKdW7SDREwSLP=ynCUndT+J1FmuwGbPmP5XbAUYSq=NGA@mail.gmail.com"
type="cite">
<div>
<div class="gmail_quote">
<div dir="auto"><br>
</div>
<div dir="auto">Jeff</div>
<div dir="auto"><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><br>
<blockquote type="cite">
<div>I don't care much on how it will be implemneted in
libc++ if it is. I just would like to ask Intel guys
and community here to make implementation extensible
in a sense that custom OpenMP-SIMD-based execution
policy along with algorithms implementations (as
specializations for the policy) can be used with the
libc++ library. And I additionally would like to ask
Intel guys to provide complete and compatible
extension on github for developers like me to use.</div>
</blockquote>
<br>
</div>
<div bgcolor="#FFFFFF" text="#000000"> In the end, I think
we want the following:<br>
<br>
1. A design for libc++ that allows the thread-level
parallelism to be implemented in terms of different
underlying providers (i.e., OpenMP, GCD, Work Queues on
Windows, whatever else).<br>
2. To follow the same philosophy with respect to
standards as we do everywhere else: Use standards where
possible with compiler/system-specific extensions as
necessary.<br>
<br>
-Hal</div>
<div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<blockquote type="cite">
<div> </div>
<div>Regards,</div>
<div>Serge.</div>
<div> </div>
<div> </div>
<div> </div>
<div>04.12.2017, 12:07, "Jeff Hammond" <a
moz-do-not-send="true"
class="m_2065056468622040889moz-txt-link-rfc2396E"
href="mailto:jeff.science@gmail.com" target="_blank"><jeff.science@gmail.com></a>:</div>
<blockquote type="cite">
<div>
<div>ICC implements a very aggressive interpretation
of the OpenMP standard, and this interpretation is
not shared by everyone in the OpenMP community.
ICC is correct but other implementations may be
far less aggressive, so _Pragma("omp simd")
doesn't guarentee vectorization unless the
compiler documentation says that is how it is
implemented. All the standard says that it means
is that vectorization is _permitted_.</div>
<div> </div>
<div>Given that the practical meaning of
_Pragma("omp simd") isn't guaranteed to be
consistent across different implementations, I
don't really know how to compare it to
compiler-specific pragmas unless we define
everything explicitly.</div>
<div> </div>
<div>In any case, my fundamental point remains: do
not use OpenMP pragmas here, but instead use
whatever the appropriate compiler-specific pragma
is, or create a new one that meets the need.</div>
<div> </div>
<div>Best,</div>
<div> </div>
<div>Jeff</div>
<div title="Page 81">
<div>
<div> </div>
</div>
</div>
<div>
<div>On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <span><<a
moz-do-not-send="true"
href="mailto:spreis@yandex-team.ru"
target="_blank">spreis@yandex-team.ru</a>></span>
wrote:
<blockquote style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>Hello,</div>
<div> </div>
<div>_Pragma("omp simd") is semantically quite
different from _Pragma("clang loop
vectorize(assume_safety)"), _Pragma("GCC
ivdep") and _Pragma("vector always"), so I
am not sure all latter will work as expected
in all cases. They definitely won't provide
any vectorization guarantees which slightly
defeat the purpose of using corresponding
execution policy.</div>
<div> </div>
<div>I support the idea of having OpenMP
orthogonal and definitely having -fopenmp
enabled by default is not an option. Intel
compiler has separate -qopenmp-simd option
which doesn't affect performance outside
explicitly marked loops, but even this is
not enabled by default. I would say that
there might exist multiple implementations
of unordered policy, originally OpenMP SIMD
based implementation may be more powerful
and one based on other pragmas being
default, but hinting about existence of
faster option. Later on one may be brave
enough to add some SIMD template library and
implement default unordered policy using it
(such implementation is possible even now
using vector types, but it will be extremely
complex if attempt to target all base data
types, vector widths and target SIMD
architectures clang supports. Even with the
library this may be quite tedious).</div>
<div> </div>
<div>Without any standard way of expressing
SIMD perallelism in pure C++ any implementer
of SIMD execution policy is to rely on means
avaialble for plaform/compiler and so it is
not totaly unnatural to ask user to enable
OpenMP SIMD for efficient support of
corresponding execution policy.</div>
<div> </div>
<div>Reagrds,</div>
<div>Serge Preis</div>
<div> </div>
<div>(Who once was part of Intel Compiler
Vectorizer team and driven OpenMP SIMD
efforts within icc and beyond, if anyone is
keeping track of conflicts-of-interest)</div>
<div> </div>
<div> </div>
<div>04.12.2017, 08:46, "Jeff Hammond via
cfe-dev" <<a moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a>>:</div>
<blockquote type="cite">
<div>
<div>
<div>It would be nice to keep PSTL and
OpenMP orthogonal, even if
_Pragma("omp simd") does not require
runtime support. It should be trivial
to use _Pragma("clang loop
vectorize(assume_safety)") instead, by
wrapping all of the different compiler
vectorization pragmas in preprocessor
logic. I similarly recommend
_Pragma("GCC ivdep") for GCC and
_Pragma("vector always") for ICC.
While this requires O(n_compilers)
effort instead of O(1), but
orthogonality is worth it.
<div> </div>
<div>While OpenMP is
vendor/compiler-agnostic, users
should not be required to use
-fopenmp or similar to enable
vectorization from PSTL, nor should
the compiler enable any OpenMP
pragma by default. I know of cases
where merely using the -fopenmp flag
alters code generation in a
performance-visible manner, and
enabling the OpenMP "simd" pragma by
default may surprise some users,
particularly if no other OpenMP
pragmas are enabled by default.
<div><br>
Best,</div>
<div> </div>
<div>Jeff</div>
<div>(who works for Intel but not on
any software products and has been
a heavy user of Intel PSTL since
it was released, if anyone is
keeping track of
conflicts-of-interest)<br>
<br>
On Wed, Nov 29, 2017 at 4:21 AM,
Kukanov, Alexey via cfe-dev <<a
moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a>>
wrote:<br>
><br>
> Hello all,<br>
><br>
> At Intel, we have developed
an implementation of C++17
execution policies<br>
> for algorithms (often
referred to as Parallel STL). We
hope to contribute it<br>
> to libc++/LLVM, so would like
to ask the community for comments
on this.<br>
><br>
> The code is already published
at GitHub (<a
moz-do-not-send="true"
href="https://github.com/intel/parallelstl"
target="_blank">https://github.com/intel/parallelstl</a>).<br>
> It supports the C++17
standard execution policies (seq,
par, par_unseq) as well as<br>
> the experimental unsequenced
policy (unseq) for SIMD execution.
At the moment,<br>
> about half of the C++17
standard algorithms that must
support execution policies<br>
> are implemented; a few more
will be ready soon, and the work
continues.<br>
> The tests that we use are
also available at GitHub; needless
to say we will<br>
> contribute those as well.<br>
><br>
> The implementation is not
specific to Intel’s hardware. For
thread-level parallelism<br>
> it uses TBB* (<a
moz-do-not-send="true"
href="https://www.threadingbuildingblocks.org/"
target="_blank">https://www.threadingbuildingblocks.org/</a>)
but abstracts it with<br>
> an internal API which can be
implemented on top of other
threading/parallel solutions –<br>
> so it is for the community to
decide which ones to use. For SIMD
parallelism<br>
> (unseq, par_unseq) we use
#pragma omp simd directives; it is
vendor-neutral and<br>
> does not require any OpenMP
runtime support.<br>
><br>
> The current implementation
meets the spirit but not always
the letter of<br>
> the standard, because it has
to be separate from but also
coexist with<br>
> implementations of standard
C++ libraries. While preparing the
contribution,<br>
> we will address
inconsistencies, adjust the code
to meet community standards,<br>
> and better integrate it into
the standard library code.<br>
><br>
> We are also proposing that
our implementation is included
into libstdc++/GCC.<br>
> Compatibility between the
implementations seems useful as it
can potentially<br>
> reduce the amount of work for
everyone. We hope to keep the code
mostly identical,<br>
> and would like to know if you
think it’s too optimistic to
expect.<br>
><br>
> Obviously we plan to use
appropriate open source licenses
to meet the different<br>
> projects’ requirements.<br>
><br>
> We expect to keep developing
the code and will take the
responsibility for<br>
> maintaining it (with
community contributions, of
course). If there are other<br>
> community efforts to
implement parallel algorithms, we
are willing to collaborate.<br>
><br>
> We look forward to your
feedback, both for the overall
idea and – if supported –<br>
> for the next steps we should
take.<br>
><br>
> Regards,<br>
> - Alexey Kukanov<br>
><br>
> * Note that TBB itself is
highly portable (and ported by
community to Power and ARM<br>
> architectures) and
permissively licensed, so could be
the base for the threading<br>
> infrastructure. But the
Parallel STL implementation itself
does not require TBB.<br>
><br>
>
_______________________________________________<br>
> cfe-dev mailing list<br>
> <a moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a><br>
> <a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
<br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
<a moz-do-not-send="true"
href="mailto:jeff.science@gmail.com"
target="_blank">jeff.science@gmail.com</a><br>
<a moz-do-not-send="true"
href="http://jeffhammond.github.io/"
target="_blank">http://jeffhammond.github.io/</a>
<div> </div>
</div>
</div>
</div>
</div>
</div>
,
<p><span>_______________________________________________<br>
cfe-dev mailing list<br>
<a moz-do-not-send="true"
href="mailto:cfe-dev@lists.llvm.org"
target="_blank">cfe-dev@lists.llvm.org</a><br>
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a></span></p>
</blockquote>
</blockquote>
</div>
<div> </div>
--
<div>Jeff Hammond<br>
<a moz-do-not-send="true"
href="mailto:jeff.science@gmail.com"
target="_blank">jeff.science@gmail.com</a><br>
<a moz-do-not-send="true"
href="http://jeffhammond.github.io/"
target="_blank">http://jeffhammond.github.io/</a></div>
</div>
</div>
</blockquote>
<br>
<fieldset
class="m_2065056468622040889mimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
cfe-dev mailing list
<a moz-do-not-send="true" class="m_2065056468622040889moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>
<a moz-do-not-send="true" class="m_2065056468622040889moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<pre class="m_2065056468622040889moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</div>
</blockquote>
</div>
</div>
<div dir="ltr">-- <br>
</div>
<div class="gmail_signature" data-smartmail="gmail_signature">Jeff
Hammond<br>
<a moz-do-not-send="true" href="mailto:jeff.science@gmail.com"
target="_blank">jeff.science@gmail.com</a><br>
<a moz-do-not-send="true" href="http://jeffhammond.github.io/"
target="_blank">http://jeffhammond.github.io/</a></div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</body>
</html>