[clang] [clang] Better bitfield access units (PR #65742)

Mon Sep 11 11:46:22 PDT 2023

urnathan wrote:

thanks for your comments,

> When I refer to CSE/DSE, I'm mostly talking about keeping values in registers for longer. They don't know anything about individual fields in bitfields. If we split bitfields too narrowly, we end up with extra memory accesses for accessing multiple adjacent fields. And if you have accesses which overlap (access some, but not all, of the same memory), CSE/DSE get much worse.

To be clear by 'accesses which overlap' you mean accesses to bitfields that share parts of a byte? I agree -- that's an unchanged part of the algorithm, bitfields that share parts of a single byte must be in the same access unit.

Or do you mean an access to a non-bitfield adjacent to a bitfield? How does that square with the memory model you pointed to earlier?  Or is it that if you see both such accesses, it must be safe to merge them, or there's some UB already happening? But anyway, this algorithm doesn't affect that -- in both before and after cases something outside of bitfieldAccumulation must be merging them.

> Having a bitfield unit that can't be loaded in a single memory access probably doesn't have much benefit, if there's a natural boundary we can use to split.

I think we're agreeing here.  The current algorithm generates access units that can require multiple accesses, even on a relaxed alignment machine (I refer back to the x86 example).  This replacement tries much harder to not do that -- it only merges access units if that demonstrably lowers the number of accesses needed for the merged access unit from accessing the original set of units.  The motivating case we had was a sequence of 4 byte loads, to fetch an unaligned 32-bit access unit, 1 byte of that being fiddled with, and then that one byte being written back.

https://github.com/llvm/llvm-project/pull/65742