[PATCH] D83626: [AMDGPU/MemOpsCluster] Guard new mem ops clustering heuristic logic by a flag

Mon Jul 13 11:55:53 PDT 2020

hsmhsm marked an inline comment as done.
hsmhsm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:562
+    }
+    return NumLoads <= MaxNumLoads;
   }
----------------
rampitec wrote:
> You have problem with extremely wide loads. I am not sure what was in the regression case, but probably something like 8 longs or so. Isn't it better to tweak it instead and just clamp based on the NumBytes as it supposed to be? You are saying you are checking NumBytes, but the return is solely based on NumLoads.
Let's consder your example as you replied to my email:

Stas' Reply:

```
Now say you have 4 incoming loads of <8 x i64>. According to your logic: this is 8*64*4/8 = 256 bytes, MaxNumLoads = 4. You would say it is OK to cluster, because 4 <= 4 == true.
Now say you have 8 loads of <8 x i32>. It is the same 256 bytes, but you inhibit clustering because 8 <= 4 == false.

It is not really based on the number of bytes, it is based on the number of loads.
```

As per my understanding, the crux of the logic is this - When the clustered bytes is even large but it is shared by very less number of load instructions, then consider them for clustering. It is where the real usage of clustered bytes plays a role. 

>From above sence,  your first example is ok for the heuristic, but not the second example.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83626/new/

https://reviews.llvm.org/D83626