[PATCH] D36244: [LoopVectorize] Fix assertion failure in Fcmp vectorization

Tue Aug 8 07:46:24 PDT 2017

anna added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4867
+        IRBuilder<>::FastMathFlagGuard FMFG(Builder);
+        Builder.setFastMathFlags(Cmp->getFastMathFlags());
         C = Builder.CreateFCmp(Cmp->getPredicate(), A, B);
----------------
Ayal wrote:
> anna wrote:
> > Ayal wrote:
> > > An alternative may be to do
> > > 
> > > ```
> > >   if (isa<FCmpInst>(C))
> > >     cast<FCmpInst>(C)->copyFastMathFlags(Cmp);
> > > ```
> > > or
> > > 
> > > ```
> > >   if (auto *FCmp = dyn_cast<FCmpInst>(C))
> > >     FCmp->copyFastMathFlags(Cmp);
> > > ```
> > > 
> > > 
> > > Wonder if the last, optional argument to CreateFCmp() may be used instead?
> > > 
> > > Is this issue relevant elsewhere, e.g., SLP vectorizer? (Maybe that last argument shouldn't be optional ;-)
> > the alternative way may also be used. 
> > 
> > The `FastMathFlagGuard` seems to be the accepted practice though - in all of instcombine and LoopUtils transformations where the created instruction may not be a compare. 
> > 
> > `SLPVectorizer` creates FCmp and uses `propagateIRFlags` from LoopUtils for this. So, SLPVectorizer does not have this bug. However, it's a full blown method that copies IR flags by checking the type of instruction, and we don't need it here in LoopVectorizer. 
> this alternative way also appears in instcombine, in the form of:
> ```
>   auto *FPInst = dyn_cast<Instruction>(RI);
>   if (FPInst && isa<FPMathOperator>(FPInst))
>     FPInst->copyFastMathFlags(BO);
> ```
> and is also used by `SLPVectorizer` eventually via `copyIRFlags()`/`andIRFlags()`.
> 
> 
> If you prefer the `FastMathFlagGuard` way, suggest to also attach a preceding comment, e.g.:
> 
> ```
> // Propagate fast math flags.
> ```
> 
> BTW, curious how such a redundant comparison reached the vectorizer? Such things are better cleaned up earlier, to let the vectorizer and its cost model concentrate on 'real' instructions.
I've added the comment. Would be good to find out how the redundant comparison came in, will take a look.

I think that's an orthogonal performance problem, because there can always be other cases where we have such redundant comparisons flowing through the vectorizer. For example, it could also be a missed optimization or an in-tree/out-of-tree pass ordering issue.

https://reviews.llvm.org/D36244