[PATCH] D18546: Prevent X86IselLowering from merging volatile loads

Wed Mar 30 08:42:34 PDT 2016

spatel added a subscriber: hfinkel.
spatel added a comment.

In http://reviews.llvm.org/D18546#386892, @jyknight wrote:

> With the new code, they do: all 4 loads can be determined to not alias, and thus they are all given the same incoming Chain (EntryToken). This is correct. Then, isConsecutiveLoad says yes.

In http://reviews.llvm.org/D18546#386817, @niravd wrote:

> This is currently not caught by the current tests, but if you apply http://reviews.llvm.org/D18062 and http://reviews.llvm.org/D14834 which clean up Merging of Consecutive Stores (and willl hopefully be commited soon). merge-consecutive-loads-128.ll will produce wrong code. Specifically in "merge_4f32_f32_2345_volatile", there are 4 consecutive loads with the first being volatile preventing merging them but are merged into a single volatile load (this happens in the legalize-types of SelectionDAGISel after dag-combine1).

First, thanks to Nirav and James for working on this.

When I looked at the behavior of volatiles in relation to load merging, it looked to me like a volatile access was always guaranteed to have a different chain than a non-volatile access. Therefore, because SelectionDAG::isConsecutiveLoad() bails out when the chains are not equal, I thought we were safe. James's comment clears up that misconception.

But the test that you referenced raises a question about volatile semantics (at least to me since I don't have much experience here). Here's the test case for reference:

  define <4 x float> @merge_4f32_f32_2345_volatile(float* %ptr) nounwind uwtable noinline ssp {
    %ptr0 = getelementptr inbounds float, float* %ptr, i64 2
    %ptr1 = getelementptr inbounds float, float* %ptr, i64 3
    %ptr2 = getelementptr inbounds float, float* %ptr, i64 4
    %ptr3 = getelementptr inbounds float, float* %ptr, i64 5
    %val0 = load volatile float, float* %ptr0                    <--- DANGER!
    %val1 = load float, float* %ptr1
    %val2 = load float, float* %ptr2
    %val3 = load float, float* %ptr3
    %res0 = insertelement <4 x float> undef, float %val0, i32 0
    %res1 = insertelement <4 x float> %res0, float %val1, i32 1
    %res2 = insertelement <4 x float> %res1, float %val2, i32 2
    %res3 = insertelement <4 x float> %res2, float %val3, i32 3
    ret <4 x float> %res3
  }

And here's the resulting codegen after applying http://reviews.llvm.org/D18062 and http://reviews.llvm.org/D14834:

  vmovss	8(%rdi), %xmm0       <--- this is the load volatile float specified by the IR
  vmovups	8(%rdi), %xmm0       <--- this is a 16-byte merged load that includes the volatile region

We still have the volatile access specified by the IR. I'm guessing this is wrong though because we've accessed the volatile region a 2nd time. The LangRef seems vague to me about this:
http://llvm.org/docs/LangRef.html#volatile-memory-accesses

Assuming the extra load of the volatile region is wrong, I would prefer that the check for volatility goes in isConsecutiveLoad(). It could be triggered by an optional bool param. If we only check volatility in EltsFromConsecutiveLoads(), I'm scared that other targets may have already mutated that code and wouldn't get the fix.

http://reviews.llvm.org/D18546