[llvm-commits] [llvm] r168931 - in /llvm/trunk: include/llvm/Target/TargetTransformImpl.h include/llvm/TargetTransformInfo.h lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h lib/Target/X86/X86TargetMachine.h lib/Transforms/Scalar/LoopIdiomRecognize.cpp test/Transforms/LoopIdiom/popcnt.ll

Fri Dec 7 17:58:04 PST 2012

HI, Chandler:

   The bug Galina reported is false positive -- someone remove 
--triple=xxx in the testing case, which disable the the optimization.
Now the testing case is moved to the arch-specific folder. It should be 
ok now.

    >segfault on certain loops.
   Why not show me the loop?

  > there are pretty pervasive problems with this patch.
  In you long mail, I don't see any real problems other than some typos 
and few code coding std issue.
  Some "style" is the combination of different reviewer's tastes. Some 
reviewers' taste are just opposite to you personal opinions.

  Also see the following interleaving comment.

>     public:
>     +  /// PopcntHwSupport - Hardware support for population count.
>     Compared to the
>     +  /// SW implementation, HW support is supposed to significantly
>     boost the
>     +  /// performance when the population is dense, and it may or not
>     may degrade
>
>
> "it may or may not" rather than "it may or not may".
This is *REAL* problem. a typo.
>
>     +  /// performance if the population is sparse. A HW support is
>     considered as
>     +  /// "Fast" if it can outperform, or is on a par with, SW
>     implementaion when
>     +  /// the population is sparse; otherwise, it is considered as
>     "Slow".
>
>
> I wonder, why not express this in the interface? You could query for 
> fast/slow with an expected density?
Why not?
for the loop like this, slow HW support may outperform the SW version:

  for (i = 0; I < 32; I++)
    cnt += (val >> 1) & 1;

>
> +  enum PopcntHwSupport {
>
>     +    None,
>     +    Fast,
>     +    Slow
>     +  };
>
>
> Please follow the coding standards for naming enum types and enumerators.
Why it doesn't conform the std?

>     +
>        virtual ~ScalarTargetTransformInfo() {}
>
>        /// isLegalAddImmediate - Return true if the specified
>     immediate is legal
>     @@ -122,6 +134,11 @@
>        virtual bool shouldBuildLookupTables() const {
>          return true;
>        }
>     +
>     +  /// getPopcntHwSupport - Return hardware support for population
>     count.
>
>
> Please follow the doxygen examples in the coding standards and don't 
> duplicate function names in comments.
Ok, I miss this one.

>     +  virtual PopcntHwSupport getPopcntHwSupport(unsigned
>     IntTyWidthInBit) const {
>
>
> Why accept the integer type width? does it matter other than it being 
> a legal integer width for the target?
The cost are different for int-width.

>     +    return None;
>     +  }
>      };
>
>      /// VectorTargetTransformInfo - This interface is used by the
>     vectorizers
>
>     Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>     URL:
>     http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff
>     ==============================================================================
>     --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>     +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Nov 29
>     13:38:54 2012
>     @@ -17670,6 +17670,17 @@
>        return -1;
>      }
>
>     +ScalarTargetTransformInfo::PopcntHwSupport
>     +X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned
>     TyWidth) const {
>     +  assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
>
>
> This constraint seems strange. It certainly doesn't seem like it 
> should be an assert.
It catch the case when this function is fed with nonsense parameter, and 
it at least make sense for x86

>
> If i have an i24, i cant count the population of it by counting the 
> population of it zext'ed to 32 bits...
You have to pass 32 when calling this function.
>
>     +  const X86Subtarget &ST =
>     TLI->getTargetMachine().getSubtarget<X86Subtarget>();
>     +
>     +  // TODO: Currently the __builtin_popcount() implementation
>     using SSE3
>     +  //   instructions is inefficient. Once the problem is fixed, we
>     should
>     +  //   call ST.hasSSE3() instead of ST.hasSSE4().
>
>
> __builtin_popcount isn't the issue here, this is in LLVM. What is the 
> actual issue? Why isn't "Slow' used here?
Please check my original mail.  It is a real LLVM defect in generate 
optimal instruction sequence using SSE3.

>
> Also LLVM uses FIXME in comments more than TODO.

It is in my TODO list.
>
>     namespace {
>     +
>     +  class LoopIdiomRecognize;
>     +
>     +  /// This class defines some utility functions for loop idiom
>     recognization.
>
>
> Please don't use a class to organize a collection of static methods. 
> That is the role of a namespace.
Some reviewers hate this idea.  It is not hard to imagine, this class 
will become more and more complicate.
At that point, it will not be a container for static functions any more.

>
> And this doesn't need a namespace at all -- this is in a single .cc 
> file. Please just use static functions, that is is the overwhelmingly 
> predominant pattern in LLVM.
>
>     +  class LIRUtil {
>     +  public:
>     +    /// Return true iff the block contains nothing but an
>     uncondition branch
>     +    /// (aka goto instruction).
>     +    static bool isAlmostEmpty(BasicBlock *);
>
>
> Please use names for parameters in LLVM
>
> Also, I don't think this needs a helper method. First, you *always* 
> have a terminator instruction, the verify assures this. Typically, you 
> would then test that &BB->begin() == BB->getTerminatorInst().

>
> Or is the point of this function to test for an empty BB *and* an 
> unconditional branch? If so, why do you need to handle that -- 
> SimplifyCFG should have removed such blocks already?
I just to make sure this code work for any input.
I don't want to assume the input is CFG optimized.

>     +
>     +    static BranchInst *getBranch(BasicBlock *BB) {
>     +      return dyn_cast<BranchInst>(BB->getTerminator());
>     +    }
>
>
> Abstracting this into a method doesn't seem to help readability at 
> all. I would rather just see the dyn_cast.
>
>     +
>     +    /// Return the condition of the branch terminating the given
>     basic block.
>     +    static Value *getBrCondtion(BasicBlock *);
>
>
> What if it doesn't terminate in a branch? I'm not really sure what 
> abstraction you're trying to provide here.
Then return 0, the tiny code says that.

>
>     +
>     +    /// Derive the precondition block (i.e the block that guards
>     the loop
>     +    /// preheader) from the given preheader.
>     +    static BasicBlock *getPrecondBb(BasicBlock *PreHead);
>
>
> I don't understand this.. What is the invariant it exposes? What does 
> it test first? Why is this not a method on LoopInfo if this is 
> generally useful?
I'm afraid I don't understand what you are saying?
Did I say "invariant"?

>     +  };
>     +
>     +  /// This class is to recoginize idioms of population-count
>     conducted in
>     +  /// a noncountable loop. Currently it only recognizes this pattern:
>     +  /// \code
>     +  ///   while(x) {cnt++; ...; x &= x - 1; ...}
>     +  /// \endcode
>     +  class NclPopcountRecognize {
>
>
> If NCL is an acronym you want to use, capitalize it as an acronym. I 
> don't think it is a terribly useful acronym. I would call this just 
> PopcountLoopRecognizer (note a noun, as this is a class), and document 
> what sets of loops it handles currently. If there is some reason we 
> wouldn't re-use this infrastructure for any future extensions, you 
> should explain that here.
Population count has tons of variants.
The loops could be countable loop or non-countable loop.  I don't think 
it is good idea to recognize all variants in one class.
Since the way we recognize a pattern in non-countable loop and countable 
loop are quite different.
We have some way to differentiate them in the name.
If "NCL" doesn't seem to be a good one, what is you suggession?

>
>
> Also, have you thought much about the design of these classes? 
> Currently, we have LoopIdiomRecognize which is a pass, and which has 
> innate logic for forming memset intrinsics out of loops. But we now 
> also have a non-pass helper class to recognize a specific kind of 
> loop. How would you envision this design scaling up to the next loop 
> idiom?
>
Yes, I did.
It was a big class trying to recognize all interesting pattern. I don't 
want to make it become messy.
That is why I implement the popcount in a separate class.
In the future, I'd like new non-trivial recognization is also 
implemented in separate class as well.
That said, I don't like pull what we have in the pass to separate classes.
We have to make change one step at a time.

...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121207/95d389b5/attachment.html>