[llvm-commits] [PATCH] Bug 14664: InstCombine missing canonicalization of sub-and into a select

Wed Jan 23 12:52:34 PST 2013

Ahmad, Muhammad T wrote:
> Thanks a lot for the detailed feedback.
>
> It does sound like a lot of work and I might not be able to do it. Maybe you could submit bug(s) to llvm bugzilla and if it's decided that I should do it, then I will surely pick it up.

Fair enough. To solve PR14664, you can add the optimizations for just 
'zext' (to select), and 'sub' and 'and' (with a constant on one side and 
a select between two constants on the other).

> I don't understand a few things though:
> I don't fully understand what you mean by "make sure that codegen produces assembly that isn't any worse than it does now". I mean do I need to change the code in CodeGen to produce assembly "as good" as is currently being produces for this canonical form? What I mean is, once we agree on this as the canonical form, then codegen should be changed if the
> code isn't optimal right?

Right. The new canonical form should use selects when possible, and 
codegen should be fixed if it doesn't do the right thing. Notably, 
people have gotten unhappy with me teaching instcombine to canonicalize 
things when it made the generated code worse, so I would suggest making 
sure you fix codegen before or in the same commit as teaching 
instcombine something new. (I think codegen will already do the right 
things here, but you'd still need to check.)

> Opinion: Also, I honestly don't know why one form (select vs sext) is better or worse than the other. In the end, for this particular pattern, it is going to end up as a select anyway and then one would optimize the codegen for this specific select pattern because the IR doesn't matter as long as it is canonical and the codegen produces optimized assembly for the canonical forms.

My thinking is that any arbitrarily long sequence of instructions 
starting with an i1 can be folded into a single select, so it's not just 
about select vs. sext. The codegen can then be taught to efficiently 
materialize the constants. We aren't really going to emit a cmov for 
this, are we?

Nick

>
> Thanks.
>
> -----Original Message-----
> From: Nick Lewycky [mailto:nicholas at mxc.ca]
> Sent: Saturday, January 19, 2013 6:26 PM
> To: Ahmad, Muhammad T
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Bug 14664: InstCombine missing canonicalization of sub-and into a select
>
> Ahmad, Muhammad T wrote:
>> Buzilla URL: http://llvm.org/bugs/show_bug.cgi?id=14664
>>
>> This patch adds the InstCombine that converts any expression of the form '(and (sub 0, (zext A to B)), C)' -->    'select A, C, 0' where type of A is i1.
>>
>> Please have a look (diff attached).
>
> Thanks. First off, I want to lead off with a review of your patch, but I want you to add this optimization in a fundamentally different way.
> Let's start with the patch:
>
> +  if (BinaryOperator *subI = dyn_cast<BinaryOperator>(Op0))
> +    if (subI->getOpcode() == Instruction::Sub)
> +      if (ConstantInt *CI = dyn_cast<ConstantInt>(subI->getOperand(0)))
> +        if (CastInst *zextI = dyn_cast<CastInst>(subI->getOperand(1))) {
> +          if (CI->isZero()&&
> +              zextI->getSrcTy()->isIntegerTy(1)) {
>
> This is a verbose way of writing:
>
>     Value *V;
>     if (match(Op0, m_Sub(m_Zero(), m_ZExt(m_Value(V)))&&
>         V->getType()->isIntegerTy(1)) {
>
> See llvm/Support/PatternMatch.h.
>
> +            Value* A = zextI->getOperand(0);
> +            Value* C = Op1;
> +            Value* zero = Constant::getNullValue(C->getType());
>
> The rest of the file (and part of your own patch) uses "Value *" style, with the * next to the variable, not the type. Please use that style consistently.
>
> +            return SelectInst::Create(A, C, zero);
>
> InstCombine keeps a convenience IRBuilder around. It's a bit more than that because it does some work trying to preserve the right debug info on instructions that are transformed. I'd suggest writing:
>
>     return Builder->CreateSelect(V, C, zero);
>
> Also, thanks for including tests!
>
>
> So I mentioned at the top that I think this should be done differently.
> This missed optimization is a specific symptom of a general problem.
> Fixing the general problem will be a lot more work, but I think you could make it manageable by doing it in two stages.
>
> The neat thing about i1 is that any instruction that takes an i1 and produces a non-i1 can be replaced with a select: zext, sext, uitofp, sitofp, inttoptr. Audit instcombine to make sure that each of those with an i1 operand produces a select. At the same time, make sure that the actual x86 assembly produced for the construct with select is as efficient as the code produced without. That means that the first step in instcombining your example will be to take:
>     (and (sub 0, (zext i1 A to iN)), B)
> and produce:
>     (and (sub 0, (select i1 A, iN 1, iN 0)), B) which is progress, and the first stage is patch just for this part across the five instructions I mentioned.
>
> The second part is making sure that 'sub 0 (select X, C1, C2)' becomes 'select X, -C1, -C2', not just for sub, but for all sorts of instructions with one operand being a select between two constants:
>     binary operators (other operand is constant): add, fadd, sub, fsub, mul, fmul, udiv, sdiv, fdiv, urem, srem, frem, shl, lshr, ashr, and, or, xor
>     conversion operators: trunc, zext, sext, fptrunc, fpext, fptoui, fptosi, uitofp, sitofp, ptrtoint, inttoptr, bitcast.
> Again, the time consuming part isn't necessarily adding these optimizations to instcombine, but making sure that codegen produces assembly which isn't any worse than it does now.
>
> I realize that's a lot of instruction combinations to look through, but the good news is that the code is nearly the same for each one. If you want to start by sending in a patch that does two transforms:
>     zext i1 X to iN -->  select i1 X, iN 1, iN 0
>     sub C1, (select X, C2, C3) -->  select X, C2-C1, C3-C1 I'd be happy to review that.
>
> Nick
>
>> Note: This is my first patch submission to this list and although I am trying to follow protocol as best as I can, if I am missing something, please let me know.
>
>