[PATCH] D123569: [AMDGPU] Try to avoid inserting duplicate s_inst_prefetch

Thu Apr 14 00:09:36 PDT 2022

rampitec added a comment.

In D123569#3450751 <https://reviews.llvm.org/D123569#3450751>, @critson wrote:

> In D123569#3445987 <https://reviews.llvm.org/D123569#3445987>, @rampitec wrote:
>
>> Thanks! Did we ever run any benchmarking on this? I have written this before actual HW was available.
>
> I was curious so did some quick investigation on GFX10.1 (Navi10).
>
> For graphics at the macro scale, I cannot see any performance impact from entirely disabling generation of s_inst_prefetch instructions on our test suite.
>
> Setting up a micro benchmark, I can see a >20% performance uplift setting an appropriate mode, and >20% performance drop for setting an inappropriate mode via s_inst_prefetch.
> So these instructions definitely matter, but its an open question if we are using them effectively -- at least they don't seem to be hurting performance.
> Additionally the cost of back-to-back s_inst_prefetch is the same as s_nop, so I would not expect to see change in performance for this patch, just saving a few redundant scalar instructions.

Thanks Carl! I would suggest this should really matter if we have a loop in a certain range, not too small so it doesn't fit into I$ entirely, not to large to be evicted anyway. It can be somewhat tricky to measure the impact. Certainly nested loops may expose an impact. Maybe compute shaders have better chances to fall into that range.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123569/new/

https://reviews.llvm.org/D123569