[PATCH][X86] Improve the lowering of BITCAST dag nodes from type f64 to type v2i32 (and vice versa).
Nadav Rotem
nrotem at apple.com
Tue May 6 09:28:59 PDT 2014
LGTM.
Thanks Andrea!
On May 6, 2014, at 7:59 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
> Hi,
>
> The goal of this patch is to simplify the bitconvert from type
> MVT::f64 to type MVT::v2i32 (and vice versa).
>
> When legalizing an ISD::BITCAST dag node from MVT::f64 to MVT::v2i32,
> we now produce a cheaper SCALAR_TO_VECTOR (to a vector of type v2f64)
> followed by a 'free' bitcast to v4i32. The elements of the resulting
> v4i32 are then extracted to eventually build the resulting v2i32
> vector. This is cheaper than introducing a store+load sequence to
> convert the operand in input from type f64 to i64.
>
> During type legalization, the f64 operand of a ISD::BITCAST dag node
> that performs a bitconvert from type MVT::f64 to type MVT::v2i32 is
> initially converted into an i64. Then the resulting i64 is used to
> build a vector of type v2i64.
> The reason why the backend introduces a new v2i64 vector is because
> value type MVT::v2i32 is illegal and it requires promotion to the next
> legal vector type with the same number of elements (in this case, it
> is type MVT::v2i64).
> The conversion from f64 to i64 is done by storing the value on a stack
> location and then loading the value from that same location as a i64.
>
> This patch is beneficial for example in the following case:
>
> define double @test(double %A) {
> %1 = bitcast double %A to <2 x i32>
> %add = add <2 x i32> %1, <i32 3, i32 5>
> %2 = bitcast <2 x i32> %add to double
> ret double %2
> }
>
> Before we produced:
> movsd %xmm0, -8(%rsp)
> movq -8(%rsp), %xmm0
> pshufd $16, %xmm0, %xmm0
> paddq .LCPI0_0(%rip), %xmm0
> pshufd $8, %xmm0, %xmm0
> movq %xmm0, -16(%rsp)
> movsd -16(%rsp), %xmm0
> retq
>
> With this patch we produce a much cleaner:
> pshufd $16, %xmm0, %xmm0
> paddq .LCPI0_0(%rip), %xmm0
> pshufd $8, %xmm0, %xmm0
>
>
> Function @t4 from test 'ret-mmx.ll' is another example of function
> that is strongly simplified by this transformation. Before we produced
> a long sequence of 8 instructions (for @t4). Now the entire function
> is optimized into a single 'movsd' instruction.
>
> Back to function @test from the example,
> with this patch we would produce a sequence of pshufd+paddq+pshufd.
> Ideally we should be able to fold that entire sequence into a single paddd.
>
> Another patch will follow that improves the dagcombiner to spot
> sequences of shuffle+binop+shuffle which can be safely folded into a
> single binop.
>
> Please let me know if ok to submit.
>
> Thanks,
> Andrea Di Biagio
> SN Systems - Sony Computer Entertainment Group.
> <patch-lower-bitcast.diff>
More information about the llvm-commits
mailing list