<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Chandler, Thank you for your feedback, see the following inline
comments<br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> for the loop like
this, slow HW support may outperform the SW version:<br>
<br>
for (i = 0; I < 32; I++)<br>
cnt += (val >> 1) & 1;</div>
</blockquote>
<div><br>
</div>
<div style="">I don't understand your response. My comment
was asking why the constraints in this comment aren't
reflected in the name of the interface. For example
"getPopcntSupport" producing a simple "slow" kind of
support doesn't convey under what conditions it is slow,
or what the expected use case is. I was wondering about
maybe 'isSparsePopcntFast' and/or 'isDensePopcntFast'...
You could in theory take an expected density as input, but
that seems like overkill.</div>
<div style=""><br>
</div>
<div style="">Anyways, I certainly agree about needing to
factor the density into the test, just was looking for an
interface that made it more obvious in addition to the
comment.</div>
<div style=""><br>
</div>
<div style="">The other thing that might help here is to
clarify what this routine is really describing. It could
mean at least any of the following:</div>
<div style="">1) Is there fast, direct hardware support for
computing popcnt on an integer? (IE, the popcnt
instruction on x86)</div>
<div style="">2) Is there a fast, hardware accelerated form
of computing popcnt? (IE, lzcnt, tzcnt, potentially even
bsr or other techniques on x86)</div>
<div style="">3) Is there a fast lowering of the popcnt
builtin on this target (IE, there are very fast SSSE3 and
SSE2 implementations, although the ones I know of here
target vectors... there are also all kinds of clever
software implementations that we could emit that would be
faster than the loop...)</div>
<div style=""><br>
</div>
<div style="">I think the intrinsic is currently modeling #1
based on the name, but the comment and x86 TODO make me
wonder if you actually meant something else? (IMO,
eventually it might make sense to model #3, but we'd need
to teach LLVM to do all the clever tricks in lowering of
course... hmmm...)</div>
<div style=""><br>
</div>
</div>
</div>
</div>
</blockquote>
If the "val" take value 0, then this loop has only one iteration,
and hence "sparse" case. <br>
In this sparse case, the loop may only take few cycles. The
underlying HW support, which <br>
could be implemented by a couple of instructions, could take longer
time to finish. <br>
Depending on how fast the HW support is, replacing the loop into HW
support doesn't necessarily win. <br>
That said, if the HW's support is super fast, it is safe to blindly
convert any pattern into the popcount intrinsic. ]<br>
Otherwise, we must be careful about it. <br>
<br>
This is an example of "dense" population count:<br>
<br>
for (i = 0; i < 32; i++) <br>
cnt += val & (1<<i)<br>
<br>
It blindly take 32 iterations. In this case, we can convert to
popount intrinsic regardless the HW support is slow or not. <br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im"><br>
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> + virtual
PopcntHwSupport
getPopcntHwSupport(unsigned
IntTyWidthInBit) const {<br>
</blockquote>
<div><br>
</div>
<div>Why accept the integer type width? does
it matter other than it being a legal
integer width for the target?</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
The cost are different for int-width.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">Are they different for different integer
width, or different expected density of one bits? I'm not
familiar with ARM, but for x86, the instruction latencies
and throughput tables I have access to only list Bulldozer
as having different throughput for different integer
widths, and we don't even model that in the implementation
of this routine.</div>
</div>
</div>
</div>
</blockquote>
The HW support is not necessarily conducted by a *single*
instruction. Wide-integer likely take more instructions<br>
to compute. It is architecture dependent. <br>
<br>
Even if for x8664, it has direct HW support only in SSE 4.1. Before
that we have to use couple of SSE3 instructions to compute <br>
the popcount. Please also note that the SSE3 instructions sequence
beat SSE4.1 popcnt instruction in performance. <br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> + return
None;<br>
+ }<br>
};<br>
<br>
/// VectorTargetTransformInfo - This
interface is used by the vectorizers<br>
<br>
Modified:
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff"
target="_blank" class="cremed">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=168931&r1=168930&r2=168931&view=diff</a><br>
==============================================================================<br>
---
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
(original)<br>
+++
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
Thu Nov 29 13:38:54 2012<br>
@@ -17670,6 +17670,17 @@<br>
return -1;<br>
}<br>
<br>
+ScalarTargetTransformInfo::PopcntHwSupport<br>
+X86ScalarTargetTransformImpl::getPopcntHwSupport(unsigned
TyWidth) const {<br>
+ assert(isPowerOf2_32(TyWidth)
&& "Ty width must be power of 2");</blockquote>
<div><br>
</div>
<div>This constraint seems strange. It
certainly doesn't seem like it should be
an assert.</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
It catch the case when this function is fed with
nonsense parameter, and it at least make sense for x86</div>
</blockquote>
<div><br>
</div>
<div style="">I understand, but I disagree with that design.
Specifically, at the IR level we have integer types that
are not powers of two width. If this is an important
constraint, than it should be very clearly documented, but
it would seem more reasonable in a layer like the TTI
interface to either A) return 'none' or whatever the
conservative answer is, or B) accept the IR-level type,
run it through legalization to see the legalized type it
ends up with, and use that width.</div>
<div><br>
</div>
<div style="">I would suggest B here as it seems a more
robust solution and to follow the spirit of TTI --
modeling the result of CodeGen when lowering the code.</div>
</div>
</div>
</div>
</blockquote>
I just want to make sure the interface is very simple. If the width
is not power-of-2, we can simply ignore it <br>
for the sake of simplicity. <br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div style=""> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>If i have an i24, i cant count the
population of it by counting the
population of it zext'ed to 32 bits...</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
You have to pass 32 when calling this function.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">I mean't to say 'i can count', see above. I
think the best solution is to pass in the IR-level type
directly...</div>
<div style=""><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> </blockquote>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> + const
X86Subtarget &ST =
TLI->getTargetMachine().getSubtarget<X86Subtarget>();<br>
+<br>
+ // TODO: Currently the
__builtin_popcount() implementation using
SSE3<br>
+ // instructions is inefficient. Once
the problem is fixed, we should<br>
+ // call ST.hasSSE3() instead of
ST.hasSSE4().<br>
</blockquote>
<div><br>
</div>
<div>__builtin_popcount isn't the issue
here, this is in LLVM. What is the actual
issue? Why isn't "Slow' used here?</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Please check my original mail. It is a real LLVM defect
in generate optimal instruction sequence using SSE3.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">When I said '__builtin_popcount' I was
referring to the Clang builtin function. The LLVM defect
has to do with llvm.ctpop implementation. I just didn't
want a future reader to be confused between Clang builtins
and LLVM intrinsics.</div>
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> namespace {<br>
+<br>
+ class LoopIdiomRecognize;<br>
+<br>
+ /// This class defines some utility
functions for loop idiom recognization.<br>
</blockquote>
<div><br>
</div>
<div>Please don't use a class to organize a
collection of static methods. That is the
role of a namespace.</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Some reviewers hate this idea. It is not hard to
imagine, this class will become more and more
complicate.<br>
At that point, it will not be a container for static
functions any more.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">But until that day arrives, I think you should
use a more appropriate software design pattern. I follow
the YAGNI design philosophy here.</div>
</div>
</div>
</div>
</blockquote>
I don't think it is about design patten. It is just about how to
organize the related functions. <br>
I can pull these static functions out of the class if they seems to
be a real eyesore. <br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im"> <br>
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>And this doesn't need a namespace at
all -- this is in a single .cc file.
Please just use static functions, that is
is the overwhelmingly predominant pattern
in LLVM.</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> + class
LIRUtil {<br>
+ public:<br>
+ /// Return true iff the block
contains nothing but an uncondition branch<br>
+ /// (aka goto instruction).<br>
+ static bool isAlmostEmpty(BasicBlock
*);<br>
</blockquote>
<div><br>
</div>
<div>Please use names for parameters in LLVM</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div class="im">
<div>Also, I don't think this needs a helper
method. First, you *always* have a
terminator instruction, the verify assures
this. Typically, you would then test that
&BB->begin() ==
BB->getTerminatorInst().</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class="im"> <br>
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Or is the point of this function to
test for an empty BB *and* an
unconditional branch? If so, why do you
need to handle that -- SimplifyCFG should
have removed such blocks already?</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
I just to make sure this code work for any input. <br>
I don't want to assume the input is CFG optimized.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">That isn't how LLVM is architected. A very
fundamental design principle of the IR optimizers is to
rely on other optimization passes to simplify and
canonicalize the IR so that each individual transformation
can be simpler and not handle all cases. While sometimes
LLVM does duplicate simplification logic, it is usually to
handle very specific phase ordering problems, which don't
seem to be the case here.</div>
</div>
</div>
</div>
</blockquote>
In general I agree with you. But, it dose not hurt to be safe at a
very low cost.<br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> +<br>
+ static BranchInst
*getBranch(BasicBlock *BB) {<br>
+ return
dyn_cast<BranchInst>(BB->getTerminator());<br>
+ }<br>
</blockquote>
<div><br>
</div>
<div>Abstracting this into a method doesn't
seem to help readability at all. I would
rather just see the dyn_cast.</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> +<br>
+ /// Return the condition of the
branch terminating the given basic block.<br>
+ static Value
*getBrCondtion(BasicBlock *);<br>
</blockquote>
<div><br>
</div>
<div>What if it doesn't terminate in a
branch? I'm not really sure what
abstraction you're trying to provide here.</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Then return 0, the tiny code says that.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">While the code is obvious, the code isn't
here. This is a place where having just a static helper
function w/o the forward declaration and interface comment
would simplify things.</div>
<div style=""><br>
</div>
<div style="">That said, I still don't really see why this
is worth hoisting into a separate function rather than
writing out the code.</div>
</div>
</div>
</div>
</blockquote>
It was inlined in the code. However, this code is called in several
place in the same function. <br>
I factor it out just to reduce the clutter. <br>
<br>
Patten match code usually looks nasty, I have got to make it is
concise enough to make it easier for reader. <br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> +<br>
+ /// Derive the precondition block
(i.e the block that guards the loop<br>
+ /// preheader) from the given
preheader.<br>
+ static BasicBlock
*getPrecondBb(BasicBlock *PreHead);<br>
</blockquote>
<div><br>
</div>
<div>I don't understand this.. What is the
invariant it exposes? What does it test
first? Why is this not a method on
LoopInfo if this is generally useful?</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
I'm afraid I don't understand what you are saying? <br>
Did I say "invariant"?<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">No, I'm to understand what exactly you mean by
a precondition block, because looking at the
implementation of this routine, it does something
something extremely specific: it finds a single
predecessor of the preheader which terminates in a
conditional branch. And It's still not clear why this is
not one of the methods on LoopInfo.</div>
</div>
</div>
</div>
</blockquote>
By "precondition block", the I mean the block containing the
precondition to enter the loop. <br>
<br>
e.g. for (i = 0; i < n; i++) { ... }; has a precondition block
testing "i < n" (after the loop is inversed).<br>
however "do { } while (cond);" dose not have such precondition
block. <br>
<br>
If precondition block present, I try to place the instrinsic func
in the precondition block in a hope to completely git rid of the
precondition block. <br>
<br>
If just place the intrinsic function in the preheader, I end up
with following code :<br>
if (precondition) {<br>
popcount.intrinsic.<br>
}<br>
<br>
If you still don't understand, try to write a snippet with memcpy
pattern. You will find the precondition still remains <br>
after the pattern is recoganized. This is one defect I'm going to
fix. <br>
<br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im">
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> + };<br>
+<br>
+ /// This class is to recoginize idioms
of population-count conducted in<br>
+ /// a noncountable loop. Currently it
only recognizes this pattern:<br>
+ /// \code<br>
+ /// while(x) {cnt++; ...; x &= x
- 1; ...}<br>
+ /// \endcode<br>
+ class NclPopcountRecognize {<br>
</blockquote>
<div><br>
</div>
<div>If NCL is an acronym you want to use,
capitalize it as an acronym. I don't think
it is a terribly useful acronym. I would
call this just PopcountLoopRecognizer
(note a noun, as this is a class), and
document what sets of loops it handles
currently. If there is some reason we
wouldn't re-use this infrastructure for
any future extensions, you should explain
that here.</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Population count has tons of variants. <br>
The loops could be countable loop or non-countable
loop. I don't think it is good idea to recognize all
variants in one class. <br>
Since the way we recognize a pattern in non-countable
loop and countable loop are quite different. <br>
We have some way to differentiate them in the name.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">But we don't have any of these other variants
yet. Until we do, I'm advocating for a simpler design as
there is nothing do differentiate.</div>
</div>
</div>
</div>
</blockquote>
Nope, we still have couple of variant. One stupid and most popular
one being:<br>
<br>
for (i = 0; i < n;i++)<br>
cnt += (val & (i>>i));<br>
<br>
<br>
<br>
<blockquote
cite="mid:CAGCO0KioZnKjLjNE9PYfDEMLO4knuzZGQM+MuYSsGS0Q+JW_og@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> If "NCL" doesn't
seem to be a good one, what is you suggession? </div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="im"><br>
<br>
<blockquote type="cite">
<div
style="font-family:arial,helvetica,sans-serif;font-size:10pt">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div><br>
</div>
<div>Also, have you thought much about the
design of these classes? Currently, we
have LoopIdiomRecognize which is a pass,
and which has innate logic for forming
memset intrinsics out of loops. But we now
also have a non-pass helper class to
recognize a specific kind of loop. How
would you envision this design scaling up
to the next loop idiom?</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
Yes, I did. <br>
It was a big class trying to recognize all interesting
pattern. I don't want to make it become messy. <br>
That is why I implement the popcount in a separate
class. <br>
In the future, I'd like new non-trivial recognization is
also implemented in separate class as well. <br>
That said, I don't like pull what we have in the pass to
separate classes. <br>
We have to make change one step at a time.<br>
</div>
</blockquote>
<div><br>
</div>
<div style="">Yea, I agree we need some way to partition
stuff up. I like pulling things out, but I think it would
be useful to get some pattern in place across the loop
idioms. I'm not yet sure what the best pattern is though,
which is why I was asking your thoughts on the subject.</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>