[PATCH] D22840: Split the store of an int value merged from a pair of smaller values into multiple stores.
Wei Mi via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 26 22:36:28 PDT 2016
wmi added inline comments.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12208
@@ +12207,3 @@
+/// are bundled together as an i64 value before being stored into memory.
+/// Quite often it is more efficent to generate separate stores for F and I,
+/// which can remove the bitwise instructions or sink them to colder places.
----------------
chandlerc wrote:
> I would just say "Sometimes" here to avoid people defaulting a new target into splitting the stores without measuring.
Fixed.
================
Comment at: lib/Target/X86/X86ISelLowering.h:769-771
@@ +768,5 @@
+ SDValue Hi) const override {
+ // If the pair to store is a mixture of float and int values, we will
+ // save two bitwise instructions and one float-to-int instruction and
+ // increase one store instruction. It is more likely a win.
+ if (Lo.getOpcode() == ISD::BITCAST || Hi.getOpcode() == ISD::BITCAST) {
----------------
chandlerc wrote:
> It's probably also worth noting that beyond the instruction count difference, there is potentially a more significant benefit because it avoids the float->int domain switch for input value. I have a suspicion that is a significant part of the savings here.
Comment added
================
Comment at: lib/Target/X86/X86ISelLowering.h:778-780
@@ +777,5 @@
+ }
+ // If the pair only contains int values, we will save two bitwise
+ // instructions and increase one store instruction. We leave the case
+ // out for now and wait until we find a case showing it is beneficial.
+ return false;
----------------
chandlerc wrote:
> And here it is probably worth mentioning the other upside of only doing a single memory operation in addition to having minimal instruction count overhead.
I am not sure here. The added memory operation is a store. Store usually will not stall the pipeline because there is store buffer, so I feel the cost of the extra store will be low?
================
Comment at: test/Transforms/InstCombine/split-store.ll:16-31
@@ +15,18 @@
+; CHECK: leaq (%rsp), %rdi
+define void @int32_float_pair(i32 %tmp1, float %tmp2) local_unnamed_addr {
+entry:
+ %ref.tmp = alloca i64, align 8
+ %tmpcast = bitcast i64* %ref.tmp to %"pair1"*
+ %t0 = bitcast i64* %ref.tmp to i8*
+ call void @llvm.lifetime.start(i64 8, i8* %t0)
+ %t1 = bitcast float %tmp2 to i32
+ %retval.sroa.2.0.insert.ext.i = zext i32 %t1 to i64
+ %retval.sroa.2.0.insert.shift.i = shl nuw i64 %retval.sroa.2.0.insert.ext.i, 32
+ %retval.sroa.0.0.insert.ext.i = zext i32 %tmp1 to i64
+ %retval.sroa.0.0.insert.insert.i = or i64 %retval.sroa.2.0.insert.shift.i, %retval.sroa.0.0.insert.ext.i
+ store i64 %retval.sroa.0.0.insert.insert.i, i64* %ref.tmp, align 8
+ call void @goo1(%"pair1"* dereferenceable(8) %tmpcast)
+ call void @llvm.lifetime.end(i64 8, i8* %t0)
+ ret void
+}
+
----------------
chandlerc wrote:
> It would be good to minimize this test case rather than keeping it so close to what happens to come out of Clang for a particular C++ input.
>
> I'd just write direct tests in LLVM IR. Something like:
>
> define void @test1(i32 %i, float %f, i64* %ptr) {
> entry:
> ... = zext ...
> ... = bitcast ...
> ... = zext ...
> ... = shl ...
> ... = or ...
> store ..., i64* %ptr
> ret void
> }
>
>
That makes the test much shorter!
Repository:
rL LLVM
https://reviews.llvm.org/D22840
More information about the llvm-commits
mailing list