[PATCH][X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa).

Tue May 6 09:28:59 PDT 2014

LGTM. 

Thanks Andrea!

On May 6, 2014, at 7:59 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:

> Hi,
> 
> The goal of this patch is to simplify the bitconvert from type
> MVT::f64 to type MVT::v2i32 (and vice versa).
> 
> When legalizing an ISD::BITCAST dag node from MVT::f64 to MVT::v2i32,
> we now produce a cheaper SCALAR_TO_VECTOR (to a vector of type v2f64)
> followed by a 'free' bitcast to v4i32. The elements of the resulting
> v4i32 are then extracted to eventually build the resulting v2i32
> vector. This is cheaper than introducing a store+load sequence to
> convert the operand in input from type f64 to i64.
> 
> During type legalization, the f64 operand of a ISD::BITCAST dag node
> that performs a bitconvert from type MVT::f64 to type MVT::v2i32 is
> initially converted into an i64. Then the resulting i64 is used to
> build a vector of type v2i64.
> The reason why the backend introduces a new v2i64 vector is because
> value type MVT::v2i32 is illegal and it requires promotion to the next
> legal vector type with the same number of elements (in this case, it
> is type MVT::v2i64).
> The conversion from f64 to i64 is done by storing the value on a stack
> location and then loading the value from that same location as a i64.
> 
> This patch is beneficial for example in the following case:
> 
> define double @test(double %A) {
>  %1 = bitcast double %A to <2 x i32>
>  %add = add <2 x i32> %1, <i32 3, i32 5>
>  %2 = bitcast <2 x i32> %add to double
>  ret double %2
> }
> 
> Before we produced:
>   movsd %xmm0, -8(%rsp)
>   movq -8(%rsp), %xmm0
>   pshufd $16, %xmm0, %xmm0
>   paddq .LCPI0_0(%rip), %xmm0
>   pshufd $8, %xmm0, %xmm0
>   movq %xmm0, -16(%rsp)
>   movsd -16(%rsp), %xmm0
>   retq
> 
> With this patch we produce a much cleaner:
>   pshufd $16, %xmm0, %xmm0
>   paddq .LCPI0_0(%rip), %xmm0
>   pshufd $8, %xmm0, %xmm0
> 
> 
> Function @t4 from test 'ret-mmx.ll' is another example of function
> that is strongly simplified by this transformation. Before we produced
> a long sequence of 8 instructions (for @t4). Now  the entire function
> is optimized into a single 'movsd' instruction.
> 
> Back to function @test from the example,
> with this patch we would produce a sequence of pshufd+paddq+pshufd.
> Ideally we should be able to fold that entire sequence into a single paddd.
> 
> Another patch will follow that improves the dagcombiner to spot
> sequences of shuffle+binop+shuffle which can be safely folded into a
> single binop.
> 
> Please let me know if ok to submit.
> 
> Thanks,
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group.
> <patch-lower-bitcast.diff>