[PATCH] D83626: [AMDGPU/MemOpsCluster] Guard new mem ops clustering heuristic logic by a flag

Mahesha S via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 13 12:26:40 PDT 2020


hsmhsm marked an inline comment as done.
hsmhsm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:562
+    }
+    return NumLoads <= MaxNumLoads;
   }
----------------
rampitec wrote:
> hsmhsm wrote:
> > hsmhsm wrote:
> > > rampitec wrote:
> > > > You have problem with extremely wide loads. I am not sure what was in the regression case, but probably something like 8 longs or so. Isn't it better to tweak it instead and just clamp based on the NumBytes as it supposed to be? You are saying you are checking NumBytes, but the return is solely based on NumLoads.
> > > Let's consder your example as you replied to my email:
> > > 
> > > Stas' Reply:
> > > 
> > > 
> > > ```
> > > Now say you have 4 incoming loads of <8 x i64>. According to your logic: this is 8*64*4/8 = 256 bytes, MaxNumLoads = 4. You would say it is OK to cluster, because 4 <= 4 == true.
> > > Now say you have 8 loads of <8 x i32>. It is the same 256 bytes, but you inhibit clustering because 8 <= 4 == false.
> > > 
> > > It is not really based on the number of bytes, it is based on the number of loads.
> > > ```
> > > 
> > > 
> > > 
> > > As per my understanding, the crux of the logic is this - When the clustered bytes is even large but it is shared by very less number of load instructions, then consider them for clustering. It is where the real usage of clustered bytes plays a role. 
> > > 
> > > From above sence,  your first example is ok for the heuristic, but not the second example.
> > And, in any case, I am not sure what you mean by - "just clamp based on the NumBytes as it supposed to be".  I welcome your ideas in bit detail so that I understand it in detail, and it helps me to quickly experiment with it.
> > 
> > Also, reworking the heuristic problably means again run all the must performance benchmarks, and make sure, we are not regressing there, and it means, the patch will take its own time before it is commited to master.
> I mean you can allow extra wide loads with your logic, which is not desired. I suggest to limit it. Did you look at the regression IR?
> 
> Then if you afraid of further regressions and think more testing is needed, just revert the change lead to regression.
My honest openion is this - My hands are now tied to other urgent issues, and I do not have any time left to think about it. So safe bet now is just continue with old logic as it is. Since any way, we are not permanently reverting the change, and we are just disabling it, we can come back to it once I get some breathing time, and re-work it sooner than later.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83626/new/

https://reviews.llvm.org/D83626





More information about the llvm-commits mailing list