[llvm-commits] [llvm] r168931 - in /llvm/trunk: include/llvm/Target/TargetTransformImpl.h include/llvm/TargetTransformInfo.h lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86TargetMachine.h lib/Transforms/Scalar/LoopIdiomRecognize.cpp test/Transforms/LoopIdiom/popcnt.ll

Fri Dec 7 18:35:16 PST 2012

On Fri, Dec 7, 2012 at 5:58 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

>  HI, Chandler:
>
>   The bug Galina reported is false positive -- someone remove --triple=xxx
> in the testing case, which disable the the optimization.
> Now the testing case is moved to the arch-specific folder. It should be ok
> now.
>
>    >segfault on certain loops.
>   Why not show me the loop?
>

It's somewhere buried in the ICU opensource package, as it existed several
months ago. I found it through fairly rigorous testing, and I've not spent
any time trying to reduce it because I spotted the bug by inspection, and
wanted to give you a thorough code review first. Also, it should be trivial
to write a targeted testcase that exercises this code, which should be part
of this passes's regression tests anyways.

>  > there are pretty pervasive problems with this patch.
>  In you long mail, I don't see any real problems other than some typos and
> few code coding std issue.
>

Here is the snippet regarding the crash I have seen in the wild:
-------

> +    // If the popoulation counter's initial value is not zero, insert Add
> Inst.
> +    Value *CntInitVal = CntPhi->getIncomingValueForBlock(PreHead);
> +    ConstantInt *InitConst = dyn_cast<ConstantInt>(CntInitVal);
> +    if (!InitConst || !InitConst->isZero()) {
>

This is incorrect, and will crash LLVM on certain inputs: if InitConst is
null here, we pass the test and pass it to CreateAdd below that eventually
segfaults. Please fix, and please add a test case for this case. I spotted
it by inspection (after some code in the wild crashed). I can try to help
you with a testcase if you need, but it should be quite straight forward to
synthesize one.
-------

Here is the snippet regarding the other one I spotted by inspection:
----

> +      PHINode *Phi = dyn_cast<PHINode>(Inst->getOperand(0));
> +      if (!Phi && Phi->getParent() != LoopEntry)
>

This && should be an || shouldn't it? Why isn't there a test case that
covers this? Please add one as wel as fixing.
----

These are logic bugs, not style issues. Please either fix, revert, or I'll
happily revert until you have time to look at it in more depth.

>  Some "style" is the combination of different reviewer's tastes.  Some
> reviewers' taste are just opposite to you personal opinions.
>

Within the LLVM project, style is part of code review. My comments below
about the style issues with your code are code review that needs to be
addressed, even if they don't seem like important style issues to you.

>  Also see the following interleaving comment.
>
>      public:
>> +  /// PopcntHwSupport - Hardware support for population count. Compared
>> to the
>> +  /// SW implementation, HW support is supposed to significantly boost
>> the
>> +  /// performance when the population is dense, and it may or not may
>> degrade
>>
>
>  "it may or may not" rather than "it may or not may".
>
> This is *REAL* problem. a typo.
>
>
>
>> +  /// performance if the population is sparse. A HW support is
>> considered as
>> +  /// "Fast" if it can outperform, or is on a par with, SW implementaion
>> when
>> +  /// the population is sparse; otherwise, it is considered as "Slow".
>>
>
>  I wonder, why not express this in the interface? You could query for
> fast/slow with an expected density?
>
> Why not?
> for the loop like this, slow HW support may outperform the SW version:
>
>  for (i = 0; I < 32; I++)
>    cnt += (val >> 1) & 1;
>
>
>
>
>
>  +  enum PopcntHwSupport {
>
>> +    None,
>> +    Fast,
>> +    Slow
>> +  };
>>
>
>  Please follow the coding standards for naming enum types and enumerators.
>
> Why it doesn't conform the std?
>
>
>
>
>> +
>>    virtual ~ScalarTargetTransformInfo() {}
>>
>>    /// isLegalAddImmediate - Return true if the specified immediate is
>> legal
>> @@ -122,6 +134,11 @@
>>    virtual bool shouldBuildLookupTables() const {
>>      return true;
>>    }
>> +
>> +  /// getPopcntHwSupport - Return hardware support for population count.
>>
>
>  Please follow the doxygen examples in the coding standards and don't
> duplicate function names in comments.
>
> Ok, I miss this one.
>
>
>
>> +  virtual PopcntHwSupport getPopcntHwSupport(unsigned IntTyWidthInBit)
>> const {
>>
>
>  Why accept the integer type width? does it matter other than it being a
> legal integer width for the target?
>
> The cost are different for int-width.
>
>
>
>> +    return None;
>> +  }
>>  };
>>
>>  /// VectorTargetTransformInfo - This interface is used by the vectorizers
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Nov 29 13:38:54 2012
>> @@ -17670,6 +17670,17 @@
>>    return -1;
>>  }
>>
>> +ScalarTargetTransformInfo::PopcntHwSupport
>> +X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned TyWidth) const
>> {
>> +  assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
>
>
>  This constraint seems strange. It certainly doesn't seem like it should
> be an assert.
>
> It catch the case when this function is fed with nonsense parameter, and
> it at least make sense for x86
>
>
>
>
>  If i have an i24, i cant count the population of it by counting the
> population of it zext'ed to 32 bits...
>
> You have to pass 32 when calling this function.
>
>
>
>>
>
>  +  const X86Subtarget &ST =
>> TLI->getTargetMachine().getSubtarget<X86Subtarget>();
>> +
>> +  // TODO: Currently the __builtin_popcount() implementation using SSE3
>> +  //   instructions is inefficient. Once the problem is fixed, we should
>> +  //   call ST.hasSSE3() instead of ST.hasSSE4().
>>
>
>  __builtin_popcount isn't the issue here, this is in LLVM. What is the
> actual issue? Why isn't "Slow' used here?
>
> Please check my original mail.  It is a real LLVM defect in generate
> optimal instruction sequence using SSE3.
>
>
>  Also LLVM uses FIXME in comments more than TODO.
>
>
> It is in my TODO list.
>
>
>
>>   namespace {
>> +
>> +  class LoopIdiomRecognize;
>> +
>> +  /// This class defines some utility functions for loop idiom
>> recognization.
>>
>
>  Please don't use a class to organize a collection of static methods.
> That is the role of a namespace.
>
> Some reviewers hate this idea.  It is not hard to imagine, this class will
> become more and more complicate.
> At that point, it will not be a container for static functions any more.
>
>
>  And this doesn't need a namespace at all -- this is in a single .cc
> file. Please just use static functions, that is is the overwhelmingly
> predominant pattern in LLVM.
>
>
>> +  class LIRUtil {
>> +  public:
>> +    /// Return true iff the block contains nothing but an uncondition
>> branch
>> +    /// (aka goto instruction).
>> +    static bool isAlmostEmpty(BasicBlock *);
>>
>
>  Please use names for parameters in LLVM
>
>
>  Also, I don't think this needs a helper method. First, you *always* have
> a terminator instruction, the verify assures this. Typically, you would
> then test that &BB->begin() == BB->getTerminatorInst().
>
>
>
>  Or is the point of this function to test for an empty BB *and* an
> unconditional branch? If so, why do you need to handle that -- SimplifyCFG
> should have removed such blocks already?
>
> I just to make sure this code work for any input.
> I don't want to assume the input is CFG optimized.
>
>
>
>> +
>> +    static BranchInst *getBranch(BasicBlock *BB) {
>> +      return dyn_cast<BranchInst>(BB->getTerminator());
>> +    }
>>
>
>  Abstracting this into a method doesn't seem to help readability at all.
> I would rather just see the dyn_cast.
>
>
>> +
>> +    /// Return the condition of the branch terminating the given basic
>> block.
>> +    static Value *getBrCondtion(BasicBlock *);
>>
>
>  What if it doesn't terminate in a branch? I'm not really sure what
> abstraction you're trying to provide here.
>
> Then return 0, the tiny code says that.
>
>
>  +
>> +    /// Derive the precondition block (i.e the block that guards the loop
>> +    /// preheader) from the given preheader.
>> +    static BasicBlock *getPrecondBb(BasicBlock *PreHead);
>>
>
>  I don't understand this.. What is the invariant it exposes? What does it
> test first? Why is this not a method on LoopInfo if this is generally
> useful?
>
> I'm afraid I don't understand what you are saying?
> Did I say "invariant"?
>
>
>
>> +  };
>> +
>> +  /// This class is to recoginize idioms of population-count conducted in
>> +  /// a noncountable loop. Currently it only recognizes this pattern:
>> +  /// \code
>> +  ///   while(x) {cnt++; ...; x &= x - 1; ...}
>> +  /// \endcode
>> +  class NclPopcountRecognize {
>>
>
>  If NCL is an acronym you want to use, capitalize it as an acronym. I
> don't think it is a terribly useful acronym. I would call this just
> PopcountLoopRecognizer (note a noun, as this is a class), and document what
> sets of loops it handles currently. If there is some reason we wouldn't
> re-use this infrastructure for any future extensions, you should explain
> that here.
>
> Population count has tons of variants.
> The loops could be countable loop or non-countable loop.  I don't think it
> is good idea to recognize all variants in one class.
> Since the way we recognize a pattern in non-countable loop and countable
> loop are quite different.
> We have some way to differentiate them in the name.
> If "NCL" doesn't seem to be a good one, what is you suggession?
>
>
>
>
>  Also, have you thought much about the design of these classes?
> Currently, we have LoopIdiomRecognize which is a pass, and which has innate
> logic for forming memset intrinsics out of loops. But we now also have a
> non-pass helper class to recognize a specific kind of loop. How would you
> envision this design scaling up to the next loop idiom?
>
>    Yes, I did.
> It was a big class trying to recognize all interesting pattern. I don't
> want to make it become messy.
> That is why I implement the popcount in a separate class.
> In the future, I'd like new non-trivial recognization is also implemented
> in separate class as well.
> That said, I don't like pull what we have in the pass to separate classes.
> We have to make change one step at a time.
>
> ...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121207/ced45618/attachment.html>