[llvm-commits] [PATCH] Population-count loop idiom recognization
Shuxin Yang
shuxin.llvm at gmail.com
Mon Nov 19 16:36:02 PST 2012
Hi, Nadav:
Thank you so much for the speedy code review. I'm very grateful!
I will fix the problems you pointed out when I return back to office
next week.
Also see the following interleaving text.
Have a nice holiday:-)
Shuxin
On 11/19/2012 12:36 PM, Nadav Rotem wrote:
> + /// CreateCtpop - Create and insert ctpop.
> + CallInst *CreateCtpop(Value *Val);
> +
>
> Why did you need to add this function to the IRBuilder ? I understand that the LoopIdiom pass uses the builder API for creating the memset calls, but I think that memset is much more commonly used. I think that you should create the call to popcnt using a regular call.
You are absolutely right. Actually, myself don't like to create a
member function in IRBuilder
just for that purpose. But I thought the reviewer might ask me "why not
add a member function
to IRBuilder to make thing clean". Now that you are at my side:-), we
should get rid of this function
from IRBuilder.
>
> Regarding the name, it looks like in some places we call it 'CtPop' and in other places we call it 'popcnt'. I think that we need to fix this at some point. It does not need to be a part of this patch.
It is bit difficult to makes it consistent because the intrinsic
function is called "ctpop", however the
rest of existing code in CodeGen calls it "Popcount":-(
>
> class LoopIdiomRecognize : public LoopPass {
> + friend class NclPopcountRecognize;
> Loop *CurLoop;
> + mutable const DataLayout *TD;
> + mutable DominatorTree *DT;
> + mutable ScalarEvolution *SE;
> + mutable TargetLibraryInfo *TLI;
> + mutable const ScalarTargetTransformInfo *STTI;
>
> Why did you have to make these mutable ? Why is "NclPopcountRecognize" a friend ? Is there another way to solve this problem ?
They are declared as mutable because in member functions "getXXXX()
constant" labeled as constant.
however, in these functions these fields could be modified (actually
initialized to meaningful value).
NclPopcountRecognize is friend of LoopIdiomRecognize because it needs to
access those analyizers.
I'm thinking a better way...
>
> +
> + private:
> + const DataLayout *getDataLayout() const {
> + return TD ? TD : TD=getAnalysisIfAvailable<DataLayout>();
> + }
> +
> + DominatorTree *getDominatorTree() const {
> + return DT ? DT : (DT=&getAnalysis<DominatorTree>());
> + }
> +
>
> Why do you do the lazy evaluation and not just initialize everything upon initialization ?
I don't have strong argument. The only argument I could offer is from
efficiency angle:
More often than not, a non-countable loop being recognized doesn't count
population.
So, I'd like the recognizer just take a quick look and give up without
paying the cost of
initializing those fields.
>
> +// return true iff the block contains nothing but goto.
> +bool LIRUtil::isAlmostEmpty(BasicBlock *BB) {
> + if (BranchInst *Br = getBranch(BB)) {
> + return Br->isUnconditional() && BB->size() == 1;
> + }
> + return false;
> +}
>
> The documentation needs to be in the class. Also, I think that you are checking for a completely empty loop. No ?
> Do you also need to check other things here, such as that this is the header of the loop ?
No, it is just used to check if the preheader is almost empty nor not.
If the preheader has some instructions other than the unconditional branch,
it would be bit involved to move instructions, across the preheader,
into the precondition block,
as we need to make sure dependence is preserved.
>
> +/// NclPopcountRecognize - Take a glimpse of the loop to see if we need to go
> +/// ahead recoginizing the idiom.
> +bool NclPopcountRecognize::preliminaryScreen() {
> +
> + if (LIR.STTI->getPopcntHwSupport(32) != ScalarTargetTransformInfo::Fast)
> return false;
>
> Why extra space ? Why are you only checking for 32bits ?
> Why extra space?
Sorry for my stubborn personal taste. It become 2nd nature now:-)
I will get rid of it to make the code conform to the coding std.
>Why check 32bits only?
I should put some comment here.
The reason is that : if a architecture has HW support for popcount, it
likely
support 32-bit. It may or may not support wide integer, later on I will
double check if hw support wider integer. (Seems that I miss this double
check).
Some weird arch may support 8-bit, 16-bit only, I'd like to ignore such
arch to make the test faster.
Thank you again for the code review! I'm very grateful!
More information about the llvm-commits
mailing list