[cfe-dev] Adding "simd" pragma to Clang

Fri Feb 14 07:42:44 PST 2014

----- Original Message -----
> From: "Andrey Bokhanko" <andreybokhanko at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "cfe-dev" <cfe-dev at cs.uiuc.edu>, "Renato Golin" <renato.golin at linaro.org>, "Alexey Bataev" <a.bataev at gmx.com>,
> "Douglas Gregor" <dgregor at apple.com>, "Chris Lattner" <clattner at apple.com>, "Michael Wong" <fraggamuffin at gmail.com>,
> "Arnold Schwaighofer" <aschwaighofer at apple.com>, "Nadav Rotem" <nrotem at apple.com>
> Sent: Friday, February 14, 2014 3:22:15 AM
> Subject: Re: Adding "simd" pragma to Clang
> 
> 
> 
> 
> On Thu, Feb 13, 2014 at 11:24 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> 
> 
> 
> Are the semantics of your ivdep the same as the simd pragma?
> Generally speaking, I'm supportive. As I recall, the last time we
> discussed this, there were real questions by some about what ivdep
> meant.
> 
> -Hal
> 
> 
> Current ivdep implementation sets llvm.mem.parallel_loop_access for
> each memory instruction in the loop. This can be used by both
> vectorizer and other optimizations as well.
> 
> 
> simd implementation [will] set vectorizer-specific metadata (force
> vectorization, vector width, etc) in addition to
> parallel_loop_access.
> 

Okay, so it sounds like, by default, the answer is that there is no difference. The simd pragma, however, has other options (like specifying the width) that can also be used. This sounds good, but let me be more specific about the concern that had been raised:

The Intel documentation (http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/cref_cls/common/cppref_pragma_ivdep.htm) states that, with ivdep, "Note: The proven dependencies that prevent vectorization are not ignored, only assumed dependencies are ignored." And also, one of the examples points out, "The following loop requires the parallel option in addition to the ivdep pragma to indicate there is no loop-carried dependencies." So, from this, there are two questions:

 1. Does our "llvm.mem.parallel_loop_access" metadata represent the implied semantics, which seem to cover "vector dependencies" but not loop-carried dependencies

 2. What if there are dependencies that the Intel compiler "proves" by we only "assume"? In this case, we might vectorize (or, more problematic, use the metadata for other purposes, instruction scheduling for instance) in cases where the Intel compiler will ignore the directive because of some dependence it proves.

Personally, I'm less concerned about (2), because it seems silly, at best, to rely on the compiler to ignore your directives by realizing you must have made a mistake. (1) may be more of an issue.

Thoughts?

 -Hal

> 
> Andrey
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory