[PATCH] D22840: [InstCombine] Split int64 store into separate int32 stores

Tue Jul 26 21:30:00 PDT 2016

chandlerc added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp:1287-1292
@@ +1286,8 @@
+/// Instruction sequence of splitted stores:
+///   %ref.tmp = alloca i64, align 8
+///   %1 = bitcast i64* %ref.tmp to i32*
+///   store i32 %int_tmp, i32* %1, align 4
+///   %2 = getelementptr i32, i32* %1, i64 1
+///   %3 = bitcast i32* %2 to float*
+///   store float %float_tmp, float* %3, align 4
+///
----------------
majnemer wrote:
> Is this more canonical than insertelement into a vector and performing a single store?
> I'm not sure we have a canonicalization for this sort of thing, I'm not sure what the best thing to do is here...
> 
> Chandler, what are your thoughts?
I agree with Matt generally -- the original bug should be handled by *splitting* loads and stores late in the pipeline with target specific knowledge. SDAG makes a lot of sense here as a target-independent in representation but target-specific in optimality transform.

Regarding the higher level question you pose David, I think that an integer is the most canonical thing here. I'll try to explain some of why.

The first question is "is one store more canonical than two stores". IMO, the answer should almost always be "yes". The reason is that splitting loads and stores is so much easier than merging. Later on in the pipeline we may have a very hard time effectively merging memory accesses due to memory model constraints, while earlier we may have more information that proves these things are safe. We have consistently moved in this direction both in LLVM and even in Clang itself.

The second question is "what type is the most canonical type?" I think there are no really good answers to that question. When merging adjacent values (as happens at the ABI level for pairs and structs, and in SROA, and elsewhere) there are a few options: a first-class-aggregate of the adjacent types, a vector of integers where the adjacent types are the same width, or an integer the size of the pair.

We've been moving LLVM away from first-class aggregate loads and stores for a while. In retrospect, I'm no longer convinced this was the correct decision, but reversing it would be a huge undertaking and not very easy to do effectively without crippling optimizations.

We also try to avoid forming novel vector types historically because we were worried about how they would lower. These days, we actually do a pretty fantastic job of this, and so I kind of re-evaluated this. However, there are still ways in which a vector of integer types falls down: non-uniform merges (an i32 next to 4 i8s for example), the fact that it still requires float -> int, and worst of all when the non-uniformity extends to overlapping things like unions.

Because of all of the last set of complications, a wide integer type seems a pretty reasonable way to express "a bag of bits". And expressing the extract in terms of bit math is nice because when the components are integers, we often end up effectively combining across the extract operations to further simplify things.

So I'm pretty happy with large integer types in the middle end for canonicalization. If we have lots of missed middle-end optimizations because of that representation that would be fixed by a different canonical form, we should definitely revisit this, but currently the test cases have largely involved missing lowering logic late in the backend.

Repository:
  rL LLVM

https://reviews.llvm.org/D22840