[llvm-dev] InstCombine wrongful (?) optimization on BinOp with SameOperands

Mon Oct 26 12:05:15 PDT 2015

----- Mail original -----
De: "Hal Finkel" <hfinkel at anl.gov>
À: "Nicolas Brunie" <nicolas.brunie at kalray.eu>
Cc: llvm-dev at lists.llvm.org
Envoyé: Lundi 26 Octobre 2015 18:40:54
Objet: Re: [llvm-dev] InstCombine wrongful (?) optimization on BinOp with	SameOperands

----- Original Message -----
> From: "Nicolas Brunie via llvm-dev" <llvm-dev at lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Wednesday, September 30, 2015 1:01:52 AM
> Subject: [llvm-dev] InstCombine wrongful (?) optimization on BinOp with	SameOperands
> 
> 
> Hi all,
> I have been looking at the way LLVM optimizes code before forwarding
> it to the backend I develop for my company and while building
> define i32 @test_extract_subreg_func(i32 %x, i32 %y) #0 {
> entry:
> %conv = zext i32 %x to i64
> %conv1 = zext i32 %y to i64
> %mul = mul nuw i64 %conv1, %conv
> %shr = lshr i64 %mul, 32
> %xor = xor i64 %shr, %mul
> %conv2 = trunc i64 %xor to i32
> ret i32 %conv2
> }
> 
> I came upon the following optimization (during instcombine):
> IC: Visiting: %mul = mul nuw i64 %conv, %conv1
> IC: Visiting: %shr = lshr i64 %mul, 32
> IC: Visiting: %conv2 = trunc i64 %shr to i32
> IC: Visiting: %conv3 = trunc i64 %mul to i32
> IC: Visiting: %xor = xor i32 %conv3, %conv2
> IC: ADD: %xor6 = xor i64 %mul, %shr
> IC: Old = %xor = xor i32 %conv3, %conv2
> New = <badref> = trunc i64 %xor6 to i32
> 
> which seems to be performed by SDValue
> DAGCombiner::SimplifyBinOpWithSameOpcodeHands(SDNode *N)

You might have figured this out by now, but no, InstCombine and DAGCombine are two completely different pieces of code. One is driven by the code in lib/Transforms/InstCombine/* and the other in lib/CodeGen/SelectionDAG/DAGCombiner.cpp. InstCombine's job is to move the IR toward our chosen canonical form, which is designed to simplify operations in a way that exposes further optimization opportunities (as well as being generally beneficial). It does not take target costs into account.

Yes indeed, I went on my visit of LLVM sources and discover my mistake. But your explanation helps my understanding, than you.

> 
> In my backend's architecture truncate is free, but zext is not (and
> i64 is not a desirable type for xor or any binary operation in
> general),

Why, then, have you listed i64 as a legal type?

Because for operation such as mul, add, and in fact xor ... the targets does in fact supports i64, it is just more costly than i32 : the target is a VLIW which can do two 32b add or a single 64b one each cycle.
So when possible I would like LLLVM to forward the information it gathers about use of result : i.e. if only the 32 MSB of a i64 result are not used it will be better if only the 32b operations was performed and this optimization was recursively applied to the 64b DAG until a node whose 64b are effectively required.
  It may well be that I did not described my target correctly to LLVM and thus the 64b DAG is not simplified to 32b. I was under the impression that I should declare i32 as a "preffered" type for these operations and i64 as legal because I do not want i64 operations to be legalize/expanded just simplified (but maybe this is the point of the "legal" declaration).

Thank you a lot for digging-up this thread, and for the info

Regards,
Nicolas