<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Chandler, Thank you for your feedback, see the following inline
    comments<br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> for the loop like
                this, slow HW support may outperform the SW version:<br>
                <br>
                 for (i = 0; I < 32; I++)<br>
                   cnt += (val >> 1) & 1;</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">I don't understand your response. My comment
              was asking why the constraints in this comment aren't
              reflected in the name of the interface. For example
              "getPopcntSupport" producing a simple "slow" kind of
              support doesn't convey under what conditions it is slow,
              or what the expected use case is. I was wondering about
              maybe 'isSparsePopcntFast' and/or 'isDensePopcntFast'...
              You could in theory take an expected density as input, but
              that seems like overkill.</div>
            <div style=""><br>
            </div>
            <div style="">Anyways, I certainly agree about needing to
              factor the density into the test, just was looking for an
              interface that made it more obvious in addition to the
              comment.</div>
            <div style=""><br>
            </div>
            <div style="">The other thing that might help here is to
              clarify what this routine is really describing. It could
              mean at least any of the following:</div>
            <div style="">1) Is there fast, direct hardware support for
              computing popcnt on an integer? (IE, the popcnt
              instruction on x86)</div>
            <div style="">2) Is there a fast, hardware accelerated form
              of computing popcnt? (IE, lzcnt, tzcnt, potentially even
              bsr or other techniques on x86)</div>
            <div style="">3) Is there a fast lowering of the popcnt
              builtin on this target (IE, there are very fast SSSE3 and
              SSE2 implementations, although the ones I know of here
              target vectors... there are also all kinds of clever
              software implementations that we could emit that would be
              faster than the loop...)</div>
            <div style=""><br>
            </div>
            <div style="">I think the intrinsic is currently modeling #1
              based on the name, but the comment and x86 TODO make me
              wonder if you actually meant something else? (IMO,
              eventually it might make sense to model #3, but we'd need
              to teach LLVM to do all the clever tricks in lowering of
              course... hmmm...)</div>
            <div style=""><br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    If the "val" take value 0, then this loop has only one iteration,
    and hence "sparse" case. <br>
    In this sparse case, the loop may only take few cycles. The
    underlying HW support, which <br>
    could be implemented by a couple of instructions, could take longer
    time to finish. <br>
    Depending on how fast the HW support is, replacing the loop into HW
    support doesn't necessarily win. <br>
    That said, if the HW's support is super fast,  it is safe to blindly
    convert any pattern into the popcount intrinsic. ]<br>
    Otherwise, we must be careful about it. <br>
    <br>
    This is an example of "dense" population count:<br>
    <br>
     for (i = 0; i < 32; i++) <br>
         cnt += val & (1<<i)<br>
    <br>
    It blindly take 32 iterations. In this case, we can convert to
    popount intrinsic regardless the HW support is slow or not. <br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im"><br>
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +  virtual
                              PopcntHwSupport
                              getPopcntHwSupport(unsigned
                              IntTyWidthInBit) const {<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Why accept the integer type width? does
                              it matter other than it being a legal
                              integer width for the target?</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                The cost are different for int-width.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">Are they different for different integer
              width, or different expected density of one bits? I'm not
              familiar with ARM, but for x86, the instruction latencies
              and throughput tables I have access to only list Bulldozer
              as having different throughput for different integer
              widths, and we don't even model that in the implementation
              of this routine.</div>
          </div>
        </div>
      </div>
    </blockquote>
    The HW support is not necessarily conducted by a *single*
    instruction.  Wide-integer likely take more instructions<br>
    to compute. It is architecture dependent. <br>
    <br>
    Even if for x8664,  it has direct HW support only in SSE 4.1. Before
    that we have to use couple of SSE3 instructions to compute <br>
    the popcount. Please also note that the SSE3 instructions sequence
    beat SSE4.1 popcnt instruction in performance. <br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +    return
                              None;<br>
                              +  }<br>
                               };<br>
                              <br>
                               /// VectorTargetTransformInfo - This
                              interface is used by the vectorizers<br>
                              <br>
                              Modified:
                              llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
                              URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff"
                                target="_blank" class="cremed">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff</a><br>
==============================================================================<br>
                              ---
                              llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
                              (original)<br>
                              +++
                              llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
                              Thu Nov 29 13:38:54 2012<br>
                              @@ -17670,6 +17670,17 @@<br>
                                 return -1;<br>
                               }<br>
                              <br>
+ScalarTargetTransformInfo::PopcntHwSupport<br>
                              +X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned

                              TyWidth) const {<br>
                              +  assert(isPowerOf2_32(TyWidth)
                              && "Ty width must be power of 2");</blockquote>
                            <div><br>
                            </div>
                            <div>This constraint seems strange. It
                              certainly doesn't seem like it should be
                              an assert.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                It catch the case when this function is fed with
                nonsense parameter, and it at least make sense for x86</div>
            </blockquote>
            <div><br>
            </div>
            <div style="">I understand, but I disagree with that design.
              Specifically, at the IR level we have integer types that
              are not powers of two width. If this is an important
              constraint, than it should be very clearly documented, but
              it would seem more reasonable in a layer like the TTI
              interface to either A) return 'none' or whatever the
              conservative answer is, or B) accept the IR-level type,
              run it through legalization to see the legalized type it
              ends up with, and use that width.</div>
            <div><br>
            </div>
            <div style="">I would suggest B here as it seems a more
              robust solution and to follow the spirit of TTI --
              modeling the result of CodeGen when lowering the code.</div>
          </div>
        </div>
      </div>
    </blockquote>
    I just want to make sure the interface is very simple. If the width
    is not power-of-2, we can simply ignore it <br>
    for the sake of simplicity. <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div style=""> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div><br>
                            </div>
                            <div>If i have an i24, i cant count the
                              population of it by counting the
                              population of it zext'ed to 32 bits...</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                You have to pass 32 when calling this function.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">I mean't to say 'i can count', see above. I
              think the best solution is to pass in the IR-level type
              directly...</div>
            <div style=""><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">  </blockquote>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +  const
                              X86Subtarget &ST =
                              TLI->getTargetMachine().getSubtarget<X86Subtarget>();<br>
                              +<br>
                              +  // TODO: Currently the
                              __builtin_popcount() implementation using
                              SSE3<br>
                              +  //   instructions is inefficient. Once
                              the problem is fixed, we should<br>
                              +  //   call ST.hasSSE3() instead of
                              ST.hasSSE4().<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>__builtin_popcount isn't the issue
                              here, this is in LLVM. What is the actual
                              issue? Why isn't "Slow' used here?</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Please check my original mail.  It is a real LLVM defect
                in generate optimal instruction sequence using SSE3.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">When I said '__builtin_popcount' I was
              referring to the Clang builtin function. The LLVM defect
              has to do with llvm.ctpop implementation. I just didn't
              want a future reader to be confused between Clang builtins
              and LLVM intrinsics.</div>
            <div> <br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> <br>
                            </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">  namespace {<br>
                              +<br>
                              +  class LoopIdiomRecognize;<br>
                              +<br>
                              +  /// This class defines some utility
                              functions for loop idiom recognization.<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Please don't use a class to organize a
                              collection of static methods. That is the
                              role of a namespace.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Some reviewers hate this idea.  It is not hard to
                imagine, this class will become more and more
                complicate.<br>
                At that point, it will not be a container for static
                functions any more.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">But until that day arrives, I think you should
              use a more appropriate software design pattern. I follow
              the YAGNI design philosophy here.</div>
          </div>
        </div>
      </div>
    </blockquote>
    I don't think it is about design patten. It is just about how to
    organize the related functions. <br>
    I can pull these static functions out of the class if they seems to
    be a real eyesore. <br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im"> <br>
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div><br>
                            </div>
                            <div>And this doesn't need a namespace at
                              all -- this is in a single .cc file.
                              Please just use static functions, that is
                              is the overwhelmingly predominant pattern
                              in LLVM.</div>
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +  class
                              LIRUtil {<br>
                              +  public:<br>
                              +    /// Return true iff the block
                              contains nothing but an uncondition branch<br>
                              +    /// (aka goto instruction).<br>
                              +    static bool isAlmostEmpty(BasicBlock
                              *);<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Please use names for parameters in LLVM</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                <blockquote type="cite">
                  <div
                    style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div><br>
                          </div>
                          <div class="im">
                            <div>Also, I don't think this needs a helper
                              method. First, you *always* have a
                              terminator instruction, the verify assures
                              this. Typically, you would then test that
                              &BB->begin() ==
                              BB->getTerminatorInst().</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </blockquote>
                <div class="im"> <br>
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div><br>
                            </div>
                            <div>Or is the point of this function to
                              test for an empty BB *and* an
                              unconditional branch? If so, why do you
                              need to handle that -- SimplifyCFG should
                              have removed such blocks already?</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                I just to make sure this code work for any input. <br>
                I don't want to assume the input is CFG optimized.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">That isn't how LLVM is architected. A very
              fundamental design principle of the IR optimizers is to
              rely on other optimization passes to simplify and
              canonicalize the IR so that each individual transformation
              can be simpler and not handle all cases. While sometimes
              LLVM does duplicate simplification logic, it is usually to
              handle very specific phase ordering problems, which don't
              seem to be the case here.</div>
          </div>
        </div>
      </div>
    </blockquote>
    In general I agree with you.  But, it dose not hurt to be safe at a
    very low cost.<br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +<br>
                              +    static BranchInst
                              *getBranch(BasicBlock *BB) {<br>
                              +      return
                              dyn_cast<BranchInst>(BB->getTerminator());<br>
                              +    }<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Abstracting this into a method doesn't
                              seem to help readability at all. I would
                              rather just see the dyn_cast.</div>
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +<br>
                              +    /// Return the condition of the
                              branch terminating the given basic block.<br>
                              +    static Value
                              *getBrCondtion(BasicBlock *);<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>What if it doesn't terminate in a
                              branch? I'm not really sure what
                              abstraction you're trying to provide here.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Then return 0, the tiny code says that.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">While the code is obvious, the code isn't
              here. This is a place where having just a static helper
              function w/o the forward declaration and interface comment
              would simplify things.</div>
            <div style=""><br>
            </div>
            <div style="">That said, I still don't really see why this
              is worth hoisting into a separate function rather than
              writing out the code.</div>
          </div>
        </div>
      </div>
    </blockquote>
    It was inlined in the code. However, this code is called in several
    place in the same function. <br>
    I factor it out just to reduce the clutter. <br>
    <br>
    Patten match code usually looks nasty, I have got to make it is
    concise enough to make it easier for reader. <br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div><br>
                            </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +<br>
                              +    /// Derive the precondition block
                              (i.e the block that guards the loop<br>
                              +    /// preheader) from the given
                              preheader.<br>
                              +    static BasicBlock
                              *getPrecondBb(BasicBlock *PreHead);<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>I don't understand this.. What is the
                              invariant it exposes? What does it test
                              first? Why is this not a method on
                              LoopInfo if this is generally useful?</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                I'm afraid I don't understand what you are saying? <br>
                Did I say "invariant"?<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">No, I'm to understand what exactly you mean by
              a precondition block, because looking at the
              implementation of this routine, it does something
              something extremely specific: it finds a single
              predecessor of the preheader which terminates in a
              conditional branch. And It's still not clear why this is
              not one of the methods on LoopInfo.</div>
          </div>
        </div>
      </div>
    </blockquote>
    By "precondition block", the I mean the block containing the
    precondition to enter the loop. <br>
    <br>
    e.g. for (i = 0; i < n; i++) { ... };  has a precondition block
    testing "i < n" (after the loop is inversed).<br>
       however "do { } while (cond);" dose not have such precondition
    block. <br>
    <br>
       If precondition block present, I try to place the instrinsic func
    in the precondition block in a hope to completely git rid of the
    precondition block. <br>
    <br>
       If just place the intrinsic function in the preheader, I end up
    with following code :<br>
       if (precondition) {<br>
            popcount.intrinsic.<br>
       }<br>
           <br>
       If you still don't understand, try to write a snippet with memcpy
    pattern.  You will find the precondition still remains <br>
    after the pattern is recoganized.  This is one defect I'm going to
    fix. <br>
    <br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> +  };<br>
                              +<br>
                              +  /// This class is to recoginize idioms
                              of population-count conducted in<br>
                              +  /// a noncountable loop. Currently it
                              only recognizes this pattern:<br>
                              +  /// \code<br>
                              +  ///   while(x) {cnt++; ...; x &= x
                              - 1; ...}<br>
                              +  /// \endcode<br>
                              +  class NclPopcountRecognize {<br>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>If NCL is an acronym you want to use,
                              capitalize it as an acronym. I don't think
                              it is a terribly useful acronym. I would
                              call this just PopcountLoopRecognizer
                              (note a noun, as this is a class), and
                              document what sets of loops it handles
                              currently. If there is some reason we
                              wouldn't re-use this infrastructure for
                              any future extensions, you should explain
                              that here.</div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Population count has tons of variants. <br>
                The loops could be countable loop or non-countable
                loop.  I don't think it is good idea to recognize all
                variants in one class. <br>
                Since the way we recognize a pattern in non-countable
                loop and countable loop are quite different. <br>
                We have some way to differentiate them in the name.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">But we don't have any of these other variants
              yet. Until we do, I'm advocating for a simpler design as
              there is nothing do differentiate.</div>
          </div>
        </div>
      </div>
    </blockquote>
    Nope, we still have couple of variant.  One stupid and most popular
    one being:<br>
    <br>
     for (i = 0; i < n;i++)<br>
        cnt += (val & (i>>i));<br>
    <br>
    <br>
    <br>
    <blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000"> If "NCL" doesn't
                seem to be a good one, what is you suggession? </div>
            </blockquote>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im"><br>
                  <br>
                  <blockquote type="cite">
                    <div
                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">
                      <div dir="ltr">
                        <div class="gmail_extra">
                          <div class="gmail_quote">
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div>Also, have you thought much about the
                              design of these classes? Currently, we
                              have LoopIdiomRecognize which is a pass,
                              and which has innate logic for forming
                              memset intrinsics out of loops. But we now
                              also have a non-pass helper class to
                              recognize a specific kind of loop. How
                              would you envision this design scaling up
                              to the next loop idiom?</div>
                            <div><br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                Yes, I did. <br>
                It was a big class trying to recognize all interesting
                pattern. I don't want to make it become messy. <br>
                That is why I implement the popcount in a separate
                class. <br>
                In the future, I'd like new non-trivial recognization is
                also implemented in separate class as well. <br>
                That said, I don't like pull what we have in the pass to
                separate classes. <br>
                We have to make change one step at a time.<br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div style="">Yea, I agree we need some way to partition
              stuff up. I like pulling things out, but I think it would
              be useful to get some pattern in place across the loop
              idioms. I'm not yet sure what the best pattern is though,
              which is why I was asking your thoughts on the subject.</div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>