[clang] [clang] Better bitfield access units (PR #65742)

Mon Sep 11 04:27:35 PDT 2023

urnathan wrote:

> The advantage of exposing the wide accesses to the optimizer is that it allows memory optimizations, like CSE or DSE, to reason about the entire bitfield as a single unit, and easily eliminate accesses. Reducing the size of bitfield accesses in the frontend is basically throwing away information.

Hm, I've been thinking of it the opposite way round -- merging bitfields is throwing away information (about where cuts might be).  And it's unclear to me how CSE or DSE could make use of a merged access unit to eliminate accesses -- it would seem to me that a merged access unit accessed at a single point would make it look like the whole unit was live? (Can you point me at an example of the analysis you describe happening?)

That the simple x86 example I showed doesn't show (complete) undoing of the merging suggests it is hard for CSE and DSE to do the analysis you indicate. DSE did work there, to undo the merge, but there's no dead load elimination happening. But, that DSE is merely undoing the gluing that the front end did -- if we didn't glue, then it would always happen.

> The disadvantage is, as you note here, that sometimes codegen doesn't manage to clean up the resulting code well.

My contention is that the current algorithm both (a) fails to merge some mergeable access units and (b) inappropriately merges some access units *especially* on strict alignment machines.

> I guess my primary question here is, instead of making clang try to guess which accesses will be optimal, can we improve the way the LLVM code generator handles the patterns currently generated by clang? I'm not exactly against changing the IR generated by clang, but changing the IR like this seems likely to improve some cases at the expense of others.

In the testcases that were changed I did not encounter one that generated worse code (an ARM testcase showed better code due to, IIRC, not merging more than a register width, as with the x86 case it didn't eliminate unnecessary loads whereas with this those are gone). I would be very interested in seeing cases that degrade though.

https://github.com/llvm/llvm-project/pull/65742