[PATCH] D50480: [LV] Vectorizing loops of arbitrary trip count without remainder under opt for size
Diego Caballero via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Aug 12 19:32:19 PDT 2018
dcaballe added a comment.
Thanks, Ayal! Some comments below.
Do you see any potential issue that could make modeling this in the VPlan native path complicated once we have predication?
Thanks,
Diego
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4965
+ LLVM_DEBUG(dbgs() << "LV: Aborting - trip count below given threshold for "
+ << "loop with scalar iterations.\n");
return None;
----------------
I'm trying to understand the purpose of thsi check. Prevent masked vectorization if TC is lower than `TinyTripCountInterleaveThreshold` (i.e., 128)?. Should we use an independent threshold for this?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5218
+ // variable of the vector loop wraps to zero, when tail is folded by masking;
+ // this currently happens when OptForSize, inwhich case IC is set to 1 above.
unsigned IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /
----------------
inwhich -> in which?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6355
+ if (Legal->blockNeedsPredication(OrigLoop->getHeader()))
+ CM.InterleaveInfo.reset();
+
----------------
Just curious. Could we prevent the computation or interleave groups for these cases instead of doing a reset?
================
Comment at: lib/Transforms/Vectorize/VPlan.h:609
/// VPlan opcodes, extending LLVM IR with idiomatics instructions.
- enum { Not = Instruction::OtherOpsEnd + 1 };
+ enum { Not = Instruction::OtherOpsEnd + 1, ICmpULE };
----------------
I'm worried that this new opcode could be problematic since now we can have compare instructions represented as VPInstructions with Instruction::ICmp and Instruction::FCmp opcodes and VPInstructions with VPInstruction::ICmpULE. Internally, we have a VPCmpInst subclass to model I/FCmp opcodes and their predicates. Do you think it would be better to upstream that subclass first?
================
Comment at: lib/Transforms/Vectorize/VPlan.h:1126
+ VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {
+ BackedgeTakenCount = new VPValue();
+ }
----------------
Instead of using an "empty" VPValue to model the BTC, would it be possible to model the actual operations to compute the BTC? We would only need a sub, right?
Repository:
rL LLVM
https://reviews.llvm.org/D50480
More information about the llvm-commits
mailing list