[LLVMdev] Possible missed optimization? 2.0

Thu Sep 9 13:16:12 PDT 2010

On Sep 9, 2010, at 12:59 PM, Borja Ferrer wrote:

> Hello, i've noticed a new possible missed optimization while testing more trivial code.
> This time it's not a with a xor but with a multiplication instruction and the example is little bit more involved.
> 
> C code:
> 
> typedef short t;
> t foo(t a, t b)
> {
>     t a4 = a*b;
>     return a4;
> }
> 
> argument "a" is passed in R15:R14, argument "b" in R13:R12, the return value is stored in R15:R14.
> The mul instruction takes in two 8bit regs and returns a 16bit result in R1:R0, this is handled in the selectionDAG same way as x86 (btw mul is marked as commutable).

Note that the isCommutable flag is only really useful for two-address instructions. If the two inputs are not constrained, nothing is really won by swapping them.

[...]

> The difference between both versions is that the second has one instruction less and saves a scratch register. 
> If we start by multiplying the lower parts of both arguments instead of mixing upper and lower parts from a start we can save r8 in the first example and a later move, notice that the second version stores directly the result of a.low*b.low into R15:R14. I'm unsure if this is related to http://llvm.org/bugs/show_bug.cgi?id=8112
> I've attached a txt file with the regcoalescing output incase it's useful like requested in the previous emails.

I haven't looked closely, but on the surface it doesn't sound like a coalescing issue.

It sounds like you want different scheduling, or even different selection DAGs.

Does the -view-*-dags output look correct?