[clang] [clang] Better bitfield access units (PR #65742)

Wed Mar 13 23:43:48 PDT 2024

================
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
       Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
                                    MemberInfo::Field, nullptr, *Field));
     }
-    return;
+    return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-                                      uint64_t StartBitOffset) {
-    if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-      return false;
-    if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-        !DataLayout.fitsInLegalInteger(OffsetInRecord))
-      return false;
-    // Make sure StartBitOffset is naturally aligned if it is treated as an
-    // IType integer.
-    if (StartBitOffset %
-            Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-        0)
-      return false;
-    return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+      bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = Begin;
+  bool Volatile;
----------------
rjmccall wrote:

Could you add comments to these explaining the invariants / significance?

- `Begin` is the start of the span we're currently building an access unit from.
- `StartOffset` is its offset in chars (probably ought to be called `BeginOffset` just to reinforce that we're talking about the same thing).
- `BestEnd` is the end of the largest acceptable access unit we've found since `Begin`, and it might still be `Begin`.
- `BestEndOffset` is the offset of the end of that access unit in chars.
- `Volatile` is whether the current extension (the fields we've added *since* `BestEnd`) contains a volatile bit-field.
- `BitSize` is the width of the span so far, starting from `Begin`.  I'd probably call this `BitSizeSinceBegin`; it's unambiguous out here, but when you get into the merge logic below, you start computing other sizes, and it gets messy.  Having a clearer name will make the reasoning about what needs to get updated much clearer.

https://github.com/llvm/llvm-project/pull/65742