[PATCH] D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size

Wed Aug 15 14:26:49 PDT 2018

hsaito added inline comments.

================
Comment at: lib/Transforms/Vectorize/VPlan.h:609
   /// VPlan opcodes, extending LLVM IR with idiomatics instructions.
-  enum { Not = Instruction::OtherOpsEnd + 1 };
+  enum { Not = Instruction::OtherOpsEnd + 1, ICmpULE };

----------------
Ayal wrote:
> dcaballe wrote:
> > hsaito wrote:
> > > Ayal wrote:
> > > > dcaballe wrote:
> > > > > I'm worried that this new opcode could be problematic since now we can have compare instructions represented as VPInstructions with Instruction::ICmp and Instruction::FCmp opcodes and VPInstructions with VPInstruction::ICmpULE. Internally, we have a VPCmpInst subclass to model I/FCmp opcodes and their predicates. Do you think it would be better to upstream that subclass first? 
> > > > An alternative of leveraging `Instruction::ICmp` opcode and existing `ICmpInst` subclasses for keeping the Predicate, in a scalable way, could be (devised jointly w/ Gil):
> > > > 
> > > > ```
> > > > +    // Introduce the early-exit compare IV <= BTC to form header block mask.
> > > > +    // This is used instead of IV < TC because TC may wrap, unlike BTC.
> > > > +    VPValue *IV = Plan->getVPValue(Legal->getPrimaryInduction());
> > > > +    VPValue *BTC = Plan->getBackedgeTakenCount();
> > > > +    Value *Undef = UndefValue::get(Legal->getPrimaryInduction()->getType());
> > > > +    auto *ICmp = new ICmpInst(ICmpInst::ICMP_ULE, Undef, Undef);
> > > > +    Plan->addDetachedValue(ICmp);
> > > > +    BlockMask = Builder.createNaryOp(Instruction::ICmp, {IV, BTC}, ICmp);
> > > >      return BlockMaskCache[BB] = BlockMask;
> > > > ```
> > > > 
> > > > and then have `VPInstruction::generateInstruction()` do
> > > > 
> > > > ```
> > > > +  case Instruction::ICmp: {
> > > > +    Value *IV = State.get(getOperand(0), Part);
> > > > +    Value *TC = State.get(getOperand(1), Part);
> > > > +    auto *ICmp = cast<ICmpInst>(getUnderlyingValue());
> > > > +    Value *V = Builder.CreateICmp(ICmp->getPredicate(), IV, TC);
> > > > +    State.set(this, V, Part);
> > > > +    break;
> > > > +  }
> > > > ```
> > > > 
> > > > where `VPlan::addDetachedValue()` is used for disposal purposes only. This has a minor (acceptable?) impact on the underlying IR: it creates/adds-users to `UndefValue`'s.
> > > Pros/cons are easier to discuss with the code in hand. Diego, would you be able to upload the subclassing in Phabricator?
> > > 
> > > The alternative by Ayal/Gil works only because the VPlan modeling is done very late in the vectorization process. That'll make it very hard to move the modeling towards the beginning of vectorization. Please don't do that.
> > > 
> > > My preference is to be able to templatize VPInstruction and Instruction as much as feasible. Is that easier with subclassing? 
> > Yes, I also feel that opening this door could be problematic in the long term. Let me see if I can quickly post the subclass in Phabricator so that we can see which changes are necessary in other places.
> > 
> > > My preference is to be able to templatize VPInstruction and Instruction as much as feasible. Is that easier with subclassing?
> > 
> > The closer the class hierarchies are, the easier will be.
> Extensions of VPInstructions such as VPCmpInst should indeed be uploaded for review and deserve a separate discussion thread and justification. This patch could tentatively make use of it, though for the purpose of this patch an ICmpULE opcode or a detached ICmpInst suffice. An ICmpULE opcode shouldn't be problematic **currently**, as this early-exit is the only VPInstruction compare with a Predicate, right? Note that detached UnderlyingValues could serve as **data containers** for all fields already implemented in the IR hierarchy, and could be constructed at any point of VPlan construction for that purpose. Extending VPInstructions to provide a similar **API** as that of IR Instructions seems to be an orthogonal concern with its own design objectives, and can coexist with detached Values; e.g., a VPCmpInst could hold its Predicate using a detached ICmpInst/FCmpInst.
I go against detached ICmpInst. We'll be moving VPlan modeling before the cost model and creating an IR Instruction before deciding to vectorize is against the VPlan concept.

>seems to be an orthogonal concern with its own design objectives

Not quite. We'd like VPInstruction as easy to use to many LLVM developers and that is an integral part of design/implementation from the beginning.

Having said that, new opcode versus VPCmpInst doesn't block the rest of the review. Other parts of the review should proceed while opcode versus VPCmpInst discussion is in progress on the side.

================
Comment at: lib/Transforms/Vectorize/VPlan.h:1126
+  VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {
+    BackedgeTakenCount = new VPValue();
+  }
----------------
Ayal wrote:
> hsaito wrote:
> > Ayal wrote:
> > > dcaballe wrote:
> > > > Instead of using an "empty" VPValue to model the BTC, would it be possible to model the actual operations to compute the BTC? We would only need a sub, right?
> > > The BTC is computed by subtracting 1 from the Trip Count, which in turn is generated by SCEVExpander. To model this decrement would require using an "empty" VPValue to model its Trip Count operand. In any case, both involve scalar instructions that take place before the vectorized loop, currently outside the VPlan'd zone.
> > I'm not a big fan of allocating memory that goes unused in many situations. We can initialize this to nullptr, and create an instance once we know BTC is needed. That'll lose the convenience of being able to check NumUsers, but creating needsBackedgeTakenCount() member function shouldn't be that bad. It's just Legal->foldTailByMasking(), until something else needs BTC, right? 
> > 
> OK. The VPValue can be created on demand, turning `getBackedgeTakenCount()` into `getOrCreateBackedgeTakenCount()`. `NumUsers` should still be checked, as this isolates the decision of creating the IR based on the VPlan.
> In any case, VPlan in general is a tentative construct, destined for destruction w/o being materialized except for the BestPlan, if at all. So holding one VPValue for the BTC, which is always well defined but possibly not always used, seems insignificant.
>VPlan in general is a tentative construct, destined for destruction w/o being materialized except for the >BestPlan, if at all. So holding one VPValue for the BTC, which is always well defined but possibly not always >used, seems insignificant.

VPlan footprint was part of the community concern. We'd like to be better wherever we can. Just as simple as that. Thanks for taking care of it.

Repository:
  rL LLVM

https://reviews.llvm.org/D50480