<div><br><div class="gmail_quote"><div dir="auto">On Wed, Dec 6, 2017 at 4:23 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<div class="m_2065056468622040889moz-cite-prefix">On 12/04/2017 10:48 PM, Serge Preis via
cfe-dev wrote:<br>
</div>
<blockquote type="cite">
<div>I agree that guarantees provided by ICC may be stronger than
with other compilers, so yes, under OpenMP terms vectorization
is permitted and cannot be assumed. However OpenMP clearly
defines semantics of variables used within OpenMP region some
being shared(scalar), some private(vector) and some being
inductions. This goes far beyond typical compiler specific
pragmas about dependencies and cost modelling and makes
vectorization much simpler task with more predictable and robust
results if properly implemented (admittedly, even ICC
implementation is far from perfect). I hope Intel's efforts to
standardize someting like this in core C++ will evntually come
to fruition. Until then I as a regular application developer
would appreciate OpenMP-simd based execution policy (hoping for
good support for OpenMP SIMD in clang), but it shouldn't
necessary be part of libc++. Since 'unordered' execution policy
is currently not part of C++ standard </div>
</blockquote>
<br></div><div bgcolor="#FFFFFF" text="#000000">
std::execution::par_unseq is part of C++17, and that essentially
maps to '#pragma omp parallel for simd'.</div><div bgcolor="#FFFFFF" text="#000000"><br>
</div></blockquote><div dir="auto"><br></div><div dir="auto">Do you expect par/par_unseq to nest? Nesting omp-parallel is generally regarded as a Bad Idea.</div><div dir="auto"><br></div><div dir="auto">Jeff</div><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><br>
<blockquote type="cite">
<div>I don't care much on how it will be implemneted in libc++ if
it is. I just would like to ask Intel guys and community here to
make implementation extensible in a sense that custom
OpenMP-SIMD-based execution policy along with algorithms
implementations (as specializations for the policy) can be used
with the libc++ library. And I additionally would like to ask
Intel guys to provide complete and compatible extension on
github for developers like me to use.</div>
</blockquote>
<br></div><div bgcolor="#FFFFFF" text="#000000">
In the end, I think we want the following:<br>
<br>
1. A design for libc++ that allows the thread-level parallelism to
be implemented in terms of different underlying providers (i.e.,
OpenMP, GCD, Work Queues on Windows, whatever else).<br>
2. To follow the same philosophy with respect to standards as we do
everywhere else: Use standards where possible with
compiler/system-specific extensions as necessary.<br>
<br>
-Hal</div><div bgcolor="#FFFFFF" text="#000000"><br>
<br>
<blockquote type="cite">
<div> </div>
<div>Regards,</div>
<div>Serge.</div>
<div> </div>
<div> </div>
<div> </div>
<div>04.12.2017, 12:07, "Jeff Hammond"
<a class="m_2065056468622040889moz-txt-link-rfc2396E" href="mailto:jeff.science@gmail.com" target="_blank"><jeff.science@gmail.com></a>:</div>
<blockquote type="cite">
<div>
<div>ICC implements a very aggressive interpretation of the
OpenMP standard, and this interpretation is not shared by
everyone in the OpenMP community. ICC is correct but other
implementations may be far less aggressive, so _Pragma("omp
simd") doesn't guarentee vectorization unless the compiler
documentation says that is how it is implemented. All the
standard says that it means is that vectorization is
_permitted_.</div>
<div> </div>
<div>Given that the practical meaning of _Pragma("omp simd")
isn't guaranteed to be consistent across different
implementations, I don't really know how to compare it to
compiler-specific pragmas unless we define everything
explicitly.</div>
<div> </div>
<div>In any case, my fundamental point remains: do not use
OpenMP pragmas here, but instead use whatever the
appropriate compiler-specific pragma is, or create a new one
that meets the need.</div>
<div> </div>
<div>Best,</div>
<div> </div>
<div>Jeff</div>
<div title="Page 81">
<div>
<div> </div>
</div>
</div>
<div>
<div>On Sun, Dec 3, 2017 at 8:09 PM, Serge Preis <span><<a href="mailto:spreis@yandex-team.ru" target="_blank">spreis@yandex-team.ru</a>></span>
wrote:
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>Hello,</div>
<div> </div>
<div>_Pragma("omp simd") is semantically quite different
from _Pragma("clang loop vectorize(assume_safety)"),
_Pragma("GCC ivdep") and _Pragma("vector always"), so
I am not sure all latter will work as expected in all
cases. They definitely won't provide any vectorization
guarantees which slightly defeat the purpose of using
corresponding execution policy.</div>
<div> </div>
<div>I support the idea of having OpenMP orthogonal and
definitely having -fopenmp enabled by default is not
an option. Intel compiler has separate -qopenmp-simd
option which doesn't affect performance outside
explicitly marked loops, but even this is not enabled
by default. I would say that there might exist
multiple implementations of unordered policy,
originally OpenMP SIMD based implementation may be
more powerful and one based on other pragmas being
default, but hinting about existence of faster option.
Later on one may be brave enough to add some SIMD
template library and implement default unordered
policy using it (such implementation is possible even
now using vector types, but it will be extremely
complex if attempt to target all base data types,
vector widths and target SIMD architectures clang
supports. Even with the library this may be quite
tedious).</div>
<div> </div>
<div>Without any standard way of expressing SIMD
perallelism in pure C++ any implementer of SIMD
execution policy is to rely on means avaialble for
plaform/compiler and so it is not totaly unnatural to
ask user to enable OpenMP SIMD for efficient support
of corresponding execution policy.</div>
<div> </div>
<div>Reagrds,</div>
<div>Serge Preis</div>
<div> </div>
<div>(Who once was part of Intel Compiler Vectorizer
team and driven OpenMP SIMD efforts within icc and
beyond, if anyone is keeping track of
conflicts-of-interest)</div>
<div> </div>
<div> </div>
<div>04.12.2017, 08:46, "Jeff Hammond via cfe-dev" <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>:</div>
<blockquote type="cite">
<div>
<div>
<div>It would be nice to keep PSTL and OpenMP
orthogonal, even if _Pragma("omp simd") does not
require runtime support. It should be trivial
to use _Pragma("clang loop
vectorize(assume_safety)") instead, by wrapping
all of the different compiler vectorization
pragmas in preprocessor logic. I similarly
recommend _Pragma("GCC ivdep") for GCC and
_Pragma("vector always") for ICC. While this
requires O(n_compilers) effort instead of O(1),
but orthogonality is worth it.
<div> </div>
<div>While OpenMP is vendor/compiler-agnostic,
users should not be required to use -fopenmp
or similar to enable vectorization from PSTL,
nor should the compiler enable any OpenMP
pragma by default. I know of cases where
merely using the -fopenmp flag alters code
generation in a performance-visible manner,
and enabling the OpenMP "simd" pragma by
default may surprise some users, particularly
if no other OpenMP pragmas are enabled by
default.
<div><br>
Best,</div>
<div> </div>
<div>Jeff</div>
<div>(who works for Intel but not on any
software products and has been a heavy user
of Intel PSTL since it was released, if
anyone is keeping track of
conflicts-of-interest)<br>
<br>
On Wed, Nov 29, 2017 at 4:21 AM, Kukanov,
Alexey via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>>
wrote:<br>
><br>
> Hello all,<br>
><br>
> At Intel, we have developed an
implementation of C++17 execution policies<br>
> for algorithms (often referred to as
Parallel STL). We hope to contribute it<br>
> to libc++/LLVM, so would like to ask
the community for comments on this.<br>
><br>
> The code is already published at GitHub
(<a href="https://github.com/intel/parallelstl" target="_blank">https://github.com/intel/parallelstl</a>).<br>
> It supports the C++17 standard
execution policies (seq, par, par_unseq) as
well as<br>
> the experimental unsequenced policy
(unseq) for SIMD execution. At the moment,<br>
> about half of the C++17 standard
algorithms that must support execution
policies<br>
> are implemented; a few more will be
ready soon, and the work continues.<br>
> The tests that we use are also
available at GitHub; needless to say we will<br>
> contribute those as well.<br>
><br>
> The implementation is not specific to
Intel’s hardware. For thread-level
parallelism<br>
> it uses TBB* (<a href="https://www.threadingbuildingblocks.org/" target="_blank">https://www.threadingbuildingblocks.org/</a>)
but abstracts it with<br>
> an internal API which can be
implemented on top of other
threading/parallel solutions –<br>
> so it is for the community to decide
which ones to use. For SIMD parallelism<br>
> (unseq, par_unseq) we use #pragma omp
simd directives; it is vendor-neutral and<br>
> does not require any OpenMP runtime
support.<br>
><br>
> The current implementation meets the
spirit but not always the letter of<br>
> the standard, because it has to be
separate from but also coexist with<br>
> implementations of standard C++
libraries. While preparing the contribution,<br>
> we will address inconsistencies, adjust
the code to meet community standards,<br>
> and better integrate it into the
standard library code.<br>
><br>
> We are also proposing that our
implementation is included into
libstdc++/GCC.<br>
> Compatibility between the
implementations seems useful as it can
potentially<br>
> reduce the amount of work for everyone.
We hope to keep the code mostly identical,<br>
> and would like to know if you think
it’s too optimistic to expect.<br>
><br>
> Obviously we plan to use appropriate
open source licenses to meet the different<br>
> projects’ requirements.<br>
><br>
> We expect to keep developing the code
and will take the responsibility for<br>
> maintaining it (with community
contributions, of course). If there are
other<br>
> community efforts to implement parallel
algorithms, we are willing to collaborate.<br>
><br>
> We look forward to your feedback, both
for the overall idea and – if supported –<br>
> for the next steps we should take.<br>
><br>
> Regards,<br>
> - Alexey Kukanov<br>
><br>
> * Note that TBB itself is highly
portable (and ported by community to Power
and ARM<br>
> architectures) and permissively
licensed, so could be the base for the
threading<br>
> infrastructure. But the Parallel STL
implementation itself does not require TBB.<br>
><br>
>
_______________________________________________<br>
> cfe-dev mailing list<br>
> <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
<br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
<a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>
<a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a>
<div> </div>
</div>
</div>
</div>
</div>
</div>
,
<p><span>_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a></span></p>
</blockquote>
</blockquote>
</div>
<div> </div>
--
<div>Jeff Hammond<br>
<a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br>
<a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
</div>
</div>
</blockquote>
<br>
<fieldset class="m_2065056468622040889mimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
cfe-dev mailing list
<a class="m_2065056468622040889moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>
<a class="m_2065056468622040889moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</div><div bgcolor="#FFFFFF" text="#000000"><pre class="m_2065056468622040889moz-signature" cols="72">--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
</div></blockquote></div></div><div dir="ltr">-- <br></div><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>