<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Chandler, Thank you for your feedback, see the following inline

    comments<br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> for the loop like

                this, slow HW support may outperform the SW version:<br>

                <br>

                 for (i = 0; I < 32; I++)<br>

                   cnt += (val >> 1) & 1;</div>

            </blockquote>

            <div><br>

            </div>

            <div style="">I don't understand your response. My comment

              was asking why the constraints in this comment aren't

              reflected in the name of the interface. For example

              "getPopcntSupport" producing a simple "slow" kind of

              support doesn't convey under what conditions it is slow,

              or what the expected use case is. I was wondering about

              maybe 'isSparsePopcntFast' and/or 'isDensePopcntFast'...

              You could in theory take an expected density as input, but

              that seems like overkill.</div>

            <div style=""><br>

            </div>

            <div style="">Anyways, I certainly agree about needing to

              factor the density into the test, just was looking for an

              interface that made it more obvious in addition to the

              comment.</div>

            <div style=""><br>

            </div>

            <div style="">The other thing that might help here is to

              clarify what this routine is really describing. It could

              mean at least any of the following:</div>

            <div style="">1) Is there fast, direct hardware support for

              computing popcnt on an integer? (IE, the popcnt

              instruction on x86)</div>

            <div style="">2) Is there a fast, hardware accelerated form

              of computing popcnt? (IE, lzcnt, tzcnt, potentially even

              bsr or other techniques on x86)</div>

            <div style="">3) Is there a fast lowering of the popcnt

              builtin on this target (IE, there are very fast SSSE3 and

              SSE2 implementations, although the ones I know of here

              target vectors... there are also all kinds of clever

              software implementations that we could emit that would be

              faster than the loop...)</div>

            <div style=""><br>

            </div>

            <div style="">I think the intrinsic is currently modeling #1

              based on the name, but the comment and x86 TODO make me

              wonder if you actually meant something else? (IMO,

              eventually it might make sense to model #3, but we'd need

              to teach LLVM to do all the clever tricks in lowering of

              course... hmmm...)</div>

            <div style=""><br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    If the "val" take value 0, then this loop has only one iteration,

    and hence "sparse" case. <br>

    In this sparse case, the loop may only take few cycles. The

    underlying HW support, which <br>

    could be implemented by a couple of instructions, could take longer

    time to finish. <br>

    Depending on how fast the HW support is, replacing the loop into HW

    support doesn't necessarily win. <br>

    That said, if the HW's support is super fast,  it is safe to blindly

    convert any pattern into the popcount intrinsic. ]<br>

    Otherwise, we must be careful about it. <br>

    <br>

    This is an example of "dense" population count:<br>

    <br>

     for (i = 0; i < 32; i++) <br>

         cnt += val & (1<<i)<br>

    <br>

    It blindly take 32 iterations. In this case, we can convert to

    popount intrinsic regardless the HW support is slow or not. <br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im"><br>

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +  virtual

                              PopcntHwSupport

                              getPopcntHwSupport(unsigned

                              IntTyWidthInBit) const {<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Why accept the integer type width? does

                              it matter other than it being a legal

                              integer width for the target?</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                The cost are different for int-width.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">Are they different for different integer

              width, or different expected density of one bits? I'm not

              familiar with ARM, but for x86, the instruction latencies

              and throughput tables I have access to only list Bulldozer

              as having different throughput for different integer

              widths, and we don't even model that in the implementation

              of this routine.</div>

          </div>

        </div>

      </div>

    </blockquote>

    The HW support is not necessarily conducted by a *single*

    instruction.  Wide-integer likely take more instructions<br>

    to compute. It is architecture dependent. <br>

    <br>

    Even if for x8664,  it has direct HW support only in SSE 4.1. Before

    that we have to use couple of SSE3 instructions to compute <br>

    the popcount. Please also note that the SSE3 instructions sequence

    beat SSE4.1 popcnt instruction in performance. <br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +    return

                              None;<br>

                              +  }<br>

                               };<br>

                              <br>

                               /// VectorTargetTransformInfo - This

                              interface is used by the vectorizers<br>

                              <br>

                              Modified:

                              llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>

                              URL: <a moz-do-not-send="true"

href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff"

                                target="_blank" class="cremed">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff</a><br>

==============================================================================<br>

                              ---

                              llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

                              (original)<br>

                              +++

                              llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

                              Thu Nov 29 13:38:54 2012<br>

                              @@ -17670,6 +17670,17 @@<br>

                                 return -1;<br>

                               }<br>

                              <br>

+ScalarTargetTransformInfo::PopcntHwSupport<br>

                              +X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned

                              TyWidth) const {<br>

                              +  assert(isPowerOf2_32(TyWidth)

                              && "Ty width must be power of 2");</blockquote>

                            <div><br>

                            </div>

                            <div>This constraint seems strange. It

                              certainly doesn't seem like it should be

                              an assert.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                It catch the case when this function is fed with

                nonsense parameter, and it at least make sense for x86</div>

            </blockquote>

            <div><br>

            </div>

            <div style="">I understand, but I disagree with that design.

              Specifically, at the IR level we have integer types that

              are not powers of two width. If this is an important

              constraint, than it should be very clearly documented, but

              it would seem more reasonable in a layer like the TTI

              interface to either A) return 'none' or whatever the

              conservative answer is, or B) accept the IR-level type,

              run it through legalization to see the legalized type it

              ends up with, and use that width.</div>

            <div><br>

            </div>

            <div style="">I would suggest B here as it seems a more

              robust solution and to follow the spirit of TTI --

              modeling the result of CodeGen when lowering the code.</div>

          </div>

        </div>

      </div>

    </blockquote>

    I just want to make sure the interface is very simple. If the width

    is not power-of-2, we can simply ignore it <br>

    for the sake of simplicity. <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div style=""> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div><br>

                            </div>

                            <div>If i have an i24, i cant count the

                              population of it by counting the

                              population of it zext'ed to 32 bits...</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                You have to pass 32 when calling this function.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">I mean't to say 'i can count', see above. I

              think the best solution is to pass in the IR-level type

              directly...</div>

            <div style=""><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">  </blockquote>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +  const

                              X86Subtarget &ST =

                              TLI->getTargetMachine().getSubtarget<X86Subtarget>();<br>

                              +<br>

                              +  // TODO: Currently the

                              __builtin_popcount() implementation using

                              SSE3<br>

                              +  //   instructions is inefficient. Once

                              the problem is fixed, we should<br>

                              +  //   call ST.hasSSE3() instead of

                              ST.hasSSE4().<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>__builtin_popcount isn't the issue

                              here, this is in LLVM. What is the actual

                              issue? Why isn't "Slow' used here?</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Please check my original mail.  It is a real LLVM defect

                in generate optimal instruction sequence using SSE3.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">When I said '__builtin_popcount' I was

              referring to the Clang builtin function. The LLVM defect

              has to do with llvm.ctpop implementation. I just didn't

              want a future reader to be confused between Clang builtins

              and LLVM intrinsics.</div>

            <div> <br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> <br>

                            </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex">  namespace {<br>

                              +<br>

                              +  class LoopIdiomRecognize;<br>

                              +<br>

                              +  /// This class defines some utility

                              functions for loop idiom recognization.<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Please don't use a class to organize a

                              collection of static methods. That is the

                              role of a namespace.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Some reviewers hate this idea.  It is not hard to

                imagine, this class will become more and more

                complicate.<br>

                At that point, it will not be a container for static

                functions any more.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">But until that day arrives, I think you should

              use a more appropriate software design pattern. I follow

              the YAGNI design philosophy here.</div>

          </div>

        </div>

      </div>

    </blockquote>

    I don't think it is about design patten. It is just about how to

    organize the related functions. <br>

    I can pull these static functions out of the class if they seems to

    be a real eyesore. <br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im"> <br>

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div><br>

                            </div>

                            <div>And this doesn't need a namespace at

                              all -- this is in a single .cc file.

                              Please just use static functions, that is

                              is the overwhelmingly predominant pattern

                              in LLVM.</div>

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +  class

                              LIRUtil {<br>

                              +  public:<br>

                              +    /// Return true iff the block

                              contains nothing but an uncondition branch<br>

                              +    /// (aka goto instruction).<br>

                              +    static bool isAlmostEmpty(BasicBlock

                              *);<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Please use names for parameters in LLVM</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                <blockquote type="cite">

                  <div

                    style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <div class="im">

                            <div>Also, I don't think this needs a helper

                              method. First, you *always* have a

                              terminator instruction, the verify assures

                              this. Typically, you would then test that

                              &BB->begin() ==

                              BB->getTerminatorInst().</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </blockquote>

                <div class="im"> <br>

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div><br>

                            </div>

                            <div>Or is the point of this function to

                              test for an empty BB *and* an

                              unconditional branch? If so, why do you

                              need to handle that -- SimplifyCFG should

                              have removed such blocks already?</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                I just to make sure this code work for any input. <br>

                I don't want to assume the input is CFG optimized.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">That isn't how LLVM is architected. A very

              fundamental design principle of the IR optimizers is to

              rely on other optimization passes to simplify and

              canonicalize the IR so that each individual transformation

              can be simpler and not handle all cases. While sometimes

              LLVM does duplicate simplification logic, it is usually to

              handle very specific phase ordering problems, which don't

              seem to be the case here.</div>

          </div>

        </div>

      </div>

    </blockquote>

    In general I agree with you.  But, it dose not hurt to be safe at a

    very low cost.<br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +<br>

                              +    static BranchInst

                              *getBranch(BasicBlock *BB) {<br>

                              +      return

                              dyn_cast<BranchInst>(BB->getTerminator());<br>

                              +    }<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>Abstracting this into a method doesn't

                              seem to help readability at all. I would

                              rather just see the dyn_cast.</div>

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +<br>

                              +    /// Return the condition of the

                              branch terminating the given basic block.<br>

                              +    static Value

                              *getBrCondtion(BasicBlock *);<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>What if it doesn't terminate in a

                              branch? I'm not really sure what

                              abstraction you're trying to provide here.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Then return 0, the tiny code says that.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">While the code is obvious, the code isn't

              here. This is a place where having just a static helper

              function w/o the forward declaration and interface comment

              would simplify things.</div>

            <div style=""><br>

            </div>

            <div style="">That said, I still don't really see why this

              is worth hoisting into a separate function rather than

              writing out the code.</div>

          </div>

        </div>

      </div>

    </blockquote>

    It was inlined in the code. However, this code is called in several

    place in the same function. <br>

    I factor it out just to reduce the clutter. <br>

    <br>

    Patten match code usually looks nasty, I have got to make it is

    concise enough to make it easier for reader. <br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div><br>

                            </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +<br>

                              +    /// Derive the precondition block

                              (i.e the block that guards the loop<br>

                              +    /// preheader) from the given

                              preheader.<br>

                              +    static BasicBlock

                              *getPrecondBb(BasicBlock *PreHead);<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>I don't understand this.. What is the

                              invariant it exposes? What does it test

                              first? Why is this not a method on

                              LoopInfo if this is generally useful?</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                I'm afraid I don't understand what you are saying? <br>

                Did I say "invariant"?<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">No, I'm to understand what exactly you mean by

              a precondition block, because looking at the

              implementation of this routine, it does something

              something extremely specific: it finds a single

              predecessor of the preheader which terminates in a

              conditional branch. And It's still not clear why this is

              not one of the methods on LoopInfo.</div>

          </div>

        </div>

      </div>

    </blockquote>

    By "precondition block", the I mean the block containing the

    precondition to enter the loop. <br>

    <br>

    e.g. for (i = 0; i < n; i++) { ... };  has a precondition block

    testing "i < n" (after the loop is inversed).<br>

       however "do { } while (cond);" dose not have such precondition

    block. <br>

    <br>

       If precondition block present, I try to place the instrinsic func

    in the precondition block in a hope to completely git rid of the

    precondition block. <br>

    <br>

       If just place the intrinsic function in the preheader, I end up

    with following code :<br>

       if (precondition) {<br>

            popcount.intrinsic.<br>

       }<br>

           <br>

       If you still don't understand, try to write a snippet with memcpy

    pattern.  You will find the precondition still remains <br>

    after the pattern is recoganized.  This is one defect I'm going to

    fix. <br>

    <br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im">

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0 0 0 .8ex;border-left:1px

                              #ccc solid;padding-left:1ex"> +  };<br>

                              +<br>

                              +  /// This class is to recoginize idioms

                              of population-count conducted in<br>

                              +  /// a noncountable loop. Currently it

                              only recognizes this pattern:<br>

                              +  /// \code<br>

                              +  ///   while(x) {cnt++; ...; x &= x

                              - 1; ...}<br>

                              +  /// \endcode<br>

                              +  class NclPopcountRecognize {<br>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>If NCL is an acronym you want to use,

                              capitalize it as an acronym. I don't think

                              it is a terribly useful acronym. I would

                              call this just PopcountLoopRecognizer

                              (note a noun, as this is a class), and

                              document what sets of loops it handles

                              currently. If there is some reason we

                              wouldn't re-use this infrastructure for

                              any future extensions, you should explain

                              that here.</div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Population count has tons of variants. <br>

                The loops could be countable loop or non-countable

                loop.  I don't think it is good idea to recognize all

                variants in one class. <br>

                Since the way we recognize a pattern in non-countable

                loop and countable loop are quite different. <br>

                We have some way to differentiate them in the name.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">But we don't have any of these other variants

              yet. Until we do, I'm advocating for a simpler design as

              there is nothing do differentiate.</div>

          </div>

        </div>

      </div>

    </blockquote>

    Nope, we still have couple of variant.  One stupid and most popular

    one being:<br>

    <br>

     for (i = 0; i < n;i++)<br>

        cnt += (val & (i>>i));<br>

    <br>

    <br>

    <br>

    <blockquote

cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000"> If "NCL" doesn't

                seem to be a good one, what is you suggession? </div>

            </blockquote>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div class="im"><br>

                  <br>

                  <blockquote type="cite">

                    <div

                      style="font-family:arial,helvetica,sans-serif;font-size:10pt">

                      <div dir="ltr">

                        <div class="gmail_extra">

                          <div class="gmail_quote">

                            <div><br>

                            </div>

                            <div><br>

                            </div>

                            <div>Also, have you thought much about the

                              design of these classes? Currently, we

                              have LoopIdiomRecognize which is a pass,

                              and which has innate logic for forming

                              memset intrinsics out of loops. But we now

                              also have a non-pass helper class to

                              recognize a specific kind of loop. How

                              would you envision this design scaling up

                              to the next loop idiom?</div>

                            <div><br>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Yes, I did. <br>

                It was a big class trying to recognize all interesting

                pattern. I don't want to make it become messy. <br>

                That is why I implement the popcount in a separate

                class. <br>

                In the future, I'd like new non-trivial recognization is

                also implemented in separate class as well. <br>

                That said, I don't like pull what we have in the pass to

                separate classes. <br>

                We have to make change one step at a time.<br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div style="">Yea, I agree we need some way to partition

              stuff up. I like pulling things out, but I think it would

              be useful to get some pattern in place across the loop

              idioms. I'm not yet sure what the best pattern is though,

              which is why I was asking your thoughts on the subject.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>