[LLVMdev] Possible missed optimization? 2.0
Jakob Stoklund Olesen
stoklund at 2pi.dk
Thu Sep 9 13:16:12 PDT 2010
On Sep 9, 2010, at 12:59 PM, Borja Ferrer wrote:
> Hello, i've noticed a new possible missed optimization while testing more trivial code.
> This time it's not a with a xor but with a multiplication instruction and the example is little bit more involved.
>
> C code:
>
> typedef short t;
> t foo(t a, t b)
> {
> t a4 = a*b;
> return a4;
> }
>
> argument "a" is passed in R15:R14, argument "b" in R13:R12, the return value is stored in R15:R14.
> The mul instruction takes in two 8bit regs and returns a 16bit result in R1:R0, this is handled in the selectionDAG same way as x86 (btw mul is marked as commutable).
Note that the isCommutable flag is only really useful for two-address instructions. If the two inputs are not constrained, nothing is really won by swapping them.
[...]
> The difference between both versions is that the second has one instruction less and saves a scratch register.
> If we start by multiplying the lower parts of both arguments instead of mixing upper and lower parts from a start we can save r8 in the first example and a later move, notice that the second version stores directly the result of a.low*b.low into R15:R14. I'm unsure if this is related to http://llvm.org/bugs/show_bug.cgi?id=8112
> I've attached a txt file with the regcoalescing output incase it's useful like requested in the previous emails.
I haven't looked closely, but on the surface it doesn't sound like a coalescing issue.
It sounds like you want different scheduling, or even different selection DAGs.
Does the -view-*-dags output look correct?
More information about the llvm-dev
mailing list