[LLVMdev] Vectorization of loops with conditional dereferencing

Hal Finkel hfinkel at anl.gov
Fri Nov 1 12:41:05 PDT 2013


----- Original Message -----
> Hi Hal,
> 
> Yes, I agree that this is a problem that prevents vectorization in
> many loops.  Another problem that we have is that sunk loads don’t
> preserve their control dependence properties.  For example in the
> code below, if we sink the load into the branch then we can't
> vectorize the loop.
> 
> x = A[i]
> if (cond) {
> 	sum += x;
> }
> 
> I agree with you that checking the first and last element for each 4k
> page has many disadvantages and I would like to explore other
> options.

Yes -- So far, I've failed to think of a better way that will work automatically (meaning without adding pragmas, etc.) :(

> Yesterday Pekka proposed marking some operations as
> 'no-trap'.  Maybe we should consider marking some memory regions as
> ‘mapped'.  Every time we malloc memory or pass a pointer to
> statically allocated memory we could propagate the information about
> the availability and size of the memory.  What do you think ?

I think that this is a good idea (and I suspect that code already exists to do some of this; poolalloc perhaps?), although I fear that it generally depends on LTO to work for general codes.

In the short term, I think a pragma, and a compiler option that changes the default, may be the best solution.

Realistically, I know of 0 real applications that use loop-dependent conditions to prevent dereferencing invalid pointers. As you say, it will be interesting to turn off the safety check, and see what in the test suite is affected.

Thanks again,
Hal

> 
> Thanks,
> Nadav
> 
> 
> 
> On Nov 1, 2013, at 6:40 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > Nadav, Arnold, et al.,
> > 
> > I have a number of loops that I would like us to be able to
> > autovectorize (common, for example, in n-body inter-particle force
> > kernels), and the problem is that they look like this:
> > 
> > for (int i = 0; i < N; ++i) {
> >  if (r[i] > 0)
> >    v += m[i]*...;
> > }
> > 
> > where, as written, m[i] is not accessed unless the condition is
> > true. The general problem (as is noted by the loop vectorizer), is
> > that we don't know that dereferencing m[i] is safe for values
> > where the guarding condition is not true, and so if-converting the
> > block directly is not allowed.
> > 
> > One possibility, if we disallow for m[i] to have been munmaped or
> > page protected in the middle, is to first collect the first and
> > last index for which the condition is true, and then run the
> > vector loop only on that portion. For the loop above, I mean
> > something like:
> > 
> > FirstI = -1, LastI = -1, I = 0;
> > for (; I < N; ++I)
> >  if (r[i] > 0) {
> >    FirstI = I;
> >    break;
> >  }
> > 
> > for (; I < N; ++I) {
> >  if (r[i] > 0)
> >    LastI = I;
> > }
> > 
> > [In this case, since the guard dominates all side-effect, we don't
> > need to do anything outside that range, otherwise, we'd need to
> > run the scalar loop until FirstI, and then again from LastI until
> > N]
> > 
> > for (I = FirstI; I <= LastI; ...) {
> >  // Run the scalar/vector loop sequence as before.
> > }
> > 
> > On the other hand, if we can't make this range assumption
> > generally, then I'd assume that we could make it per-page (where a
> > page is 4kB or similar). Then the scheme becomes more complicated,
> > because we need to break the ranges on page boundaries (maybe
> > something like this):
> > 
> > Done = false;
> > FirstI = -1, LastI = -1;
> > while (!Done) {
> >  for (I = FirstI+1; I < N; ++I)
> >    if (r[i] > 0) {
> >      FirstI = I;
> >      break;
> >    }
> > 
> >  for (; I < N && !page_bound(&m[i]) && ...; ++I) {
> >    if (r[i] > 0)
> >      LastI = I;
> >  }
> > 
> >  Done = I == N;
> > 
> >  for (I = FirstI; I <= LastI; ...) {
> >    // Run the scalar/vector loop sequence as before.
> >  }
> > }
> > 
> > I hope that we don't need to do something like this, but I've been
> > disappointed by C/POSIX semantics before. On the other hand, we
> > might want to break these into groups anyway, and allocate a local
> > stack buffer to hold the mask computed from the condition, so that
> > we don't have to reevaluate the condition in the loop.
> > 
> > Also, I'm certainly open to other ways to do this. Are there any
> > other ways? Thoughts?
> > 
> > Thanks again,
> > Hal
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-dev mailing list