[llvm] [DSE] Split memory intrinsics if they are dead in the middle (PR #75478)

Wed Dec 20 12:24:46 PST 2023

bjope wrote:

Thanks a lot for looking into this!

Not sure how much time I have to look at this before the end of the year. But I've downloaded the current patch to run some early testing.

A few things that I noticed already:
- The already existing helpers for trimming the start (tryToShorten) is involving some logic for atomic operations. I wonder if that is needed here as well (a bit unsure since the helper for trimming the end (tryToShortenEnd) isn't having any extra logic for atomics. So maybe it isn't needed, but perhaps something to look into unless you've figured out that it isn't needed.
- The heuristics for when to apply the transform definitely needs to be adjusted for my target (I don't think that getMaxMemIntrinsicInlineSizeThreshold() will be enough). We can for example easily memset/memcpy 1/2/4/8 bytes cheaply (a single store). While 7 bytes would require three stores. So even if for example the "frontsize" can be reduced to something that isn't a power of two, and smaller than eight, it would be profitable for our target to still memset/memcpy 8 bytes. And similarly for the "rearsize". So depending on the target it might be important to keep the start addresses of the two parts aligned, but sometimes also the number of bytes to write.
- I find the (DeadSize >= Threshold) requirement a bit restrictive (at least for us). One example would be if we have a memset writing X+Y+X bytes, with the Y bytes in the middle being dead. If for example X=8 and Y=1, then doing the split would result in two 8 byte stores, while a memset of 17 bytes would require 3 stores (in this example I assume that the target can do stores with align=1). So even if we normally would inline such a memset,  it would be profitable to split it up and skip writing the middle Y bytes as the front/rear would be cheaper as the sizes are well-aligned.

I understand that it could be hard to find a heuristic and TTI hooks that works well for any target (and specially for out-of-targets like mine). And I don't think that needs to be a goal with the first implementation either. Adding more hooks and conditions could be something that evolves over time, when tuning this for different targets. So some of the above comments are just to highlight some different scenarios that sprung to mind when considering  how to tune this for my target.

https://github.com/llvm/llvm-project/pull/75478