<div style="font-family: arial, helvetica, sans-serif; font-size: 10pt"><div dir="ltr">On Fri, Dec 7, 2012 at 5:58 PM, Shuxin Yang <span dir="ltr"><<a href="mailto:shuxin.llvm@gmail.com" target="_blank">shuxin.llvm@gmail.com</a>></span> wrote:<br>
<div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    HI, Chandler:<br>
    <br>
      The bug Galina reported is false positive -- someone remove
    --triple=xxx in the testing case, which disable the the
    optimization. <br>
    Now the testing case is moved to the arch-specific folder. It should
    be ok now. <br>
    <br>
       >segfault on certain loops.<br>
      Why not show me the loop?<br></div></blockquote><div><br></div><div style>It's somewhere buried in the ICU opensource package, as it existed several months ago. I found it through fairly rigorous testing, and I've not spent any time trying to reduce it because I spotted the bug by inspection, and wanted to give you a thorough code review first. Also, it should be trivial to write a targeted testcase that exercises this code, which should be part of this passes's regression tests anyways.</div>
<div style><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<div class="im">
    <br>
     > there are pretty pervasive problems with this patch.<br></div>
     In you long mail, I don't see any real problems other than some
    typos and few code coding std issue.<br></div></blockquote><div><br></div><div style>Here is the snippet regarding the crash I have seen in the wild:</div><div style>-------</div><div><div class="im" style="font-family:arial,helvetica,sans-serif;font-size:13px">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">+    // If the popoulation counter's initial value is not zero, insert Add Inst.<br>
+    Value *CntInitVal = CntPhi->getIncomingValueForBlock(PreHead);<br>+    ConstantInt *InitConst = dyn_cast<ConstantInt>(CntInitVal);<br>+    if (!InitConst || !InitConst->isZero()) {<br></blockquote><div><br>
</div></div><div style="font-family:arial,helvetica,sans-serif;font-size:13px">This is incorrect, and will crash LLVM on certain inputs: if InitConst is null here, we pass the test and pass it to CreateAdd below that eventually segfaults. Please fix, and please add a test case for this case. I spotted it by inspection (after some code in the wild crashed). I can try to help you with a testcase if you need, but it should be quite straight forward to synthesize one.</div>
</div><div style="font-family:arial,helvetica,sans-serif;font-size:13px">-------</div><div style="font-family:arial,helvetica,sans-serif;font-size:13px"><br></div><div style="font-family:arial,helvetica,sans-serif;font-size:13px">
Here is the snippet regarding the other one I spotted by inspection:</div><div style="font-family:arial,helvetica,sans-serif;font-size:13px">----</div><div style="font-family:arial,helvetica,sans-serif;font-size:13px"><div class="im">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">+      PHINode *Phi = dyn_cast<PHINode>(Inst->getOperand(0));<br>
+      if (!Phi && Phi->getParent() != LoopEntry)<br></blockquote><div><br></div></div><div>This && should be an || shouldn't it? Why isn't there a test case that covers this? Please add one as wel as fixing.</div>
<div>----</div><div><br></div><div style>These are logic bugs, not style issues. Please either fix, revert, or I'll happily revert until you have time to look at it in more depth.</div><div style><br></div></div><div>
 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
     Some "style" is the combination of different reviewer's tastes. 
    Some reviewers' taste are just opposite to you personal opinions.<br></div></blockquote><div><br></div><div style>Within the LLVM project, style is part of code review. My comments below about the style issues with your code are code review that needs to be addressed, even if they don't seem like important style issues to you.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
    <br>
     Also see the following interleaving comment. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> 
                public:<br>
                +  /// PopcntHwSupport - Hardware support for population
                count. Compared to the<br>
                +  /// SW implementation, HW support is supposed to
                significantly boost the<br>
                +  /// performance when the population is dense, and it
                may or not may degrade<br>
              </blockquote>
              <div><br>
              </div>
              <div>"it may or may not" rather than "it may or
                not may".</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    This is *REAL* problem. a typo. <br><div class="im">
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +  /// performance if the population is sparse. A HW
                support is considered as<br>
                +  /// "Fast" if it can outperform, or is on a par with,
                SW implementaion when<br>
                +  /// the population is sparse; otherwise, it is
                considered as "Slow".<br>
              </blockquote>
              <div><br>
              </div>
              <div>I wonder, why not express this in the
                interface? You could query for fast/slow with an
                expected density?</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Why not?<br>
    for the loop like this, slow HW support may outperform the SW
    version:<br>
    <br>
     for (i = 0; I < 32; I++)<br>
       cnt += (val >> 1) & 1;<div class="im"><br>
    <br>
    <br>
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              +  enum PopcntHwSupport {<br>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +    None,<br>
                +    Fast,<br>
                +    Slow<br>
                +  };<br>
              </blockquote>
              <div><br>
              </div>
              <div>Please follow the coding standards for
                naming enum types and enumerators.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Why it doesn't conform the std?<div class="im"><br>
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +<br>
                   virtual ~ScalarTargetTransformInfo() {}<br>
                <br>
                   /// isLegalAddImmediate - Return true if the
                specified immediate is legal<br>
                @@ -122,6 +134,11 @@<br>
                   virtual bool shouldBuildLookupTables() const {<br>
                     return true;<br>
                   }<br>
                +<br>
                +  /// getPopcntHwSupport - Return hardware support for
                population count.<br>
              </blockquote>
              <div><br>
              </div>
              <div>Please follow the doxygen examples in the
                coding standards and don't duplicate function names in
                comments.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Ok, I miss this one. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +  virtual PopcntHwSupport getPopcntHwSupport(unsigned
                IntTyWidthInBit) const {<br>
              </blockquote>
              <div><br>
              </div>
              <div>Why accept the integer type width? does it
                matter other than it being a legal integer width for the
                target?</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    The cost are different for int-width. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +    return None;<br>
                +  }<br>
                 };<br>
                <br>
                 /// VectorTargetTransformInfo - This interface is used
                by the vectorizers<br>
                <br>
                Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
                URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff</a><br>

==============================================================================<br>
                --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
                (original)<br>
                +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu
                Nov 29 13:38:54 2012<br>
                @@ -17670,6 +17670,17 @@<br>
                   return -1;<br>
                 }<br>
                <br>
                +ScalarTargetTransformInfo::PopcntHwSupport<br>
                +X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned
                TyWidth) const {<br>
                +  assert(isPowerOf2_32(TyWidth) && "Ty width
                must be power of 2");</blockquote>
              <div><br>
              </div>
              <div>This constraint seems strange. It certainly
                doesn't seem like it should be an assert.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    It catch the case when this function is fed with nonsense parameter,
    and it at least make sense for x86<div class="im"><br>
     <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>
                <br>
              </div>
              <div>If i have an i24, i cant count the
                population of it by counting the population of it
                zext'ed to 32 bits...</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    You have to pass 32 when calling this function. <br><div class="im">
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                 </blockquote>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +  const X86Subtarget &ST =
                TLI->getTargetMachine().getSubtarget<X86Subtarget>();<br>
                +<br>
                +  // TODO: Currently the __builtin_popcount()
                implementation using SSE3<br>
                +  //   instructions is inefficient. Once the problem is
                fixed, we should<br>
                +  //   call ST.hasSSE3() instead of ST.hasSSE4().<br>
              </blockquote>
              <div><br>
              </div>
              <div>__builtin_popcount isn't the issue here,
                this is in LLVM. What is the actual issue? Why isn't
                "Slow' used here?</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Please check my original mail.  It is a real LLVM defect in generate
    optimal instruction sequence using SSE3. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div>Also LLVM uses FIXME in comments more than
                TODO.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></div>
    It is in my TODO list. <br><div class="im">
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> <br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> 
                namespace {<br>
                +<br>
                +  class LoopIdiomRecognize;<br>
                +<br>
                +  /// This class defines some utility functions for
                loop idiom recognization.<br>
              </blockquote>
              <div><br>
              </div>
              <div>Please don't use a class to organize a
                collection of static methods. That is the role of a
                namespace.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Some reviewers hate this idea.  It is not hard to imagine, this
    class will become more and more complicate.<br>
    At that point, it will not be a container for static functions any
    more. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div>And this doesn't need a namespace at all --
                this is in a single .cc file. Please just use static
                functions, that is is the overwhelmingly predominant
                pattern in LLVM.</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +  class LIRUtil {<br>
                +  public:<br>
                +    /// Return true iff the block contains nothing but
                an uncondition branch<br>
                +    /// (aka goto instruction).<br>
                +    static bool isAlmostEmpty(BasicBlock *);<br>
              </blockquote>
              <div><br>
              </div>
              <div>Please use names for parameters in LLVM</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    </div><blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div><div class="im">
              <div>Also, I don't think this needs a helper
                method. First, you *always* have a terminator
                instruction, the verify assures this. Typically, you
                would then test that &BB->begin() ==
                BB->getTerminatorInst().</div>
            </div></div>
          </div>
        </div>
      </div>
    </blockquote><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div>Or is the point of this function to test for
                an empty BB *and* an unconditional branch? If so, why do
                you need to handle that -- SimplifyCFG should have
                removed such blocks already?</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    I just to make sure this code work for any input. <br>
    I don't want to assume the input is CFG optimized. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>
                 </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +<br>
                +    static BranchInst *getBranch(BasicBlock *BB) {<br>
                +      return
                dyn_cast<BranchInst>(BB->getTerminator());<br>
                +    }<br>
              </blockquote>
              <div><br>
              </div>
              <div>Abstracting this into a method doesn't seem
                to help readability at all. I would rather just see the
                dyn_cast.</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +<br>
                +    /// Return the condition of the branch terminating
                the given basic block.<br>
                +    static Value *getBrCondtion(BasicBlock *);<br>
              </blockquote>
              <div><br>
              </div>
              <div>What if it doesn't terminate in a branch?
                I'm not really sure what abstraction you're trying to
                provide here.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Then return 0, the tiny code says that. <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div>
                <br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +<br>
                +    /// Derive the precondition block (i.e the block
                that guards the loop<br>
                +    /// preheader) from the given preheader.<br>
                +    static BasicBlock *getPrecondBb(BasicBlock
                *PreHead);<br>
              </blockquote>
              <div><br>
              </div>
              <div>I don't understand this.. What is the
                invariant it exposes? What does it test first? Why is
                this not a method on LoopInfo if this is generally
                useful?</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    I'm afraid I don't understand what you are saying? <br>
    Did I say "invariant"? <br><div class="im">
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                +  };<br>
                +<br>
                +  /// This class is to recoginize idioms of
                population-count conducted in<br>
                +  /// a noncountable loop. Currently it only recognizes
                this pattern:<br>
                +  /// \code<br>
                +  ///   while(x) {cnt++; ...; x &= x - 1; ...}<br>
                +  /// \endcode<br>
                +  class NclPopcountRecognize {<br>
              </blockquote>
              <div><br>
              </div>
              <div>If NCL is an acronym you want to use,
                capitalize it as an acronym. I don't think it is a
                terribly useful acronym. I would call this just
                PopcountLoopRecognizer (note a noun, as this is a
                class), and document what sets of loops it handles
                currently. If there is some reason we wouldn't re-use
                this infrastructure for any future extensions, you
                should explain that here.</div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Population count has tons of variants. <br>
    The loops could be countable loop or non-countable loop.  I don't
    think it is good idea to recognize all variants in one class. <br>
    Since the way we recognize a pattern in non-countable loop and
    countable loop are quite different. <br>
    We have some way to differentiate them in the name. <br>
    If "NCL" doesn't seem to be a good one, what is you suggession?<div class="im"><br>
    <br>
    <blockquote type="cite">
      <div style="font-family:arial,helvetica,sans-serif;font-size:10pt">
        <div dir="ltr">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <div><br>
              </div>
              <div><br>
              </div>
              <div>Also, have you thought much about the design
                of these classes? Currently, we have LoopIdiomRecognize
                which is a pass, and which has innate logic for forming
                memset intrinsics out of loops. But we now also have a
                non-pass helper class to recognize a specific kind of
                loop. How would you envision this design scaling up to
                the next loop idiom?</div>
              <div><br>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote></div>
    Yes, I did. <br>
    It was a big class trying to recognize all interesting pattern. I
    don't want to make it become messy. <br>
    That is why I implement the popcount in a separate class. <br>
    In the future, I'd like new non-trivial recognization is also
    implemented in separate class as well. <br>
    That said, I don't like pull what we have in the pass to separate
    classes. <br>
    We have to make change one step at a time. <br>
    <br>
    ...<br>
  </div>

</blockquote></div><br></div></div></div>