[PATCH] D24745: TLI: Add option to generate dependent stores in scalarization.

Mon Sep 26 12:56:26 PDT 2016

jvesely added a comment.

In https://reviews.llvm.org/D24745#552401, @hfinkel wrote:

> I don't understand why you need this (even if you end up lowing the individual stores using a RMW sequence). Can you please explain?

the problem is if we use RMW sequence for two different elements of the same word. for example storing bytes at address A and A +1. let's assume that A is 4byte aligned. the generated code will look like this
1: r1 = LOAD A
2: r2 = {r1[8:31], x} //This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
3: STORE A, r2
4: r3 = LOAD A
5: r4 ={r3[0:8],y,r3[15:31]} //This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
6: STORE A, r4

The original code does not have dependency between 3 and 4. so 1 and 4 are loads from the same location and get merged into single load:
1: r1 = LOAD A
2: r2 = {r1[8:31], x} //This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
3: STORE A, r2
5: r4 ={r1[0:8],y,r1[15:31]} //This is a sequence of AND/OR instructions that masks of the old bits and ORs the new ones
6: STORE A, r4
(note that sequences 2,3 and 5,6 are independent so the writes can occur in any order).
which results in data corruption at A.

This patch adds chain dependency between 3 and 4 preventing the load elimination. It should still be possible to eliminate both 3 and 4 (which in turn should enable combining of the bit ops in 2 and 5).

Repository:
  rL LLVM

https://reviews.llvm.org/D24745