[libcxx-commits] [clang] [libcxx] [clang] Add builtin to clear padding bytes (prework for P0528R3) (PR #75371)

Sun Oct 26 02:58:19 PDT 2025

huixie90 wrote:

> > Bitfield load and store operations should be done using the same offset/size we normally use to access the bitfield; unconditionally using byte load/store operations will impair optimizations/performance. I guess this might not be possible when unions are involved, but it shouldn't be that hard for the non-union cases.
> > The format of builtin-clear-padding-codegen.cpp seems mostly fine, but consider using update_cc_test_checks.py to automate writing the CHECK lines. Please add a couple tests for empty classes and unions.
> > A few comments in the code outlining how the recursion and the interval representation work would be helpful.
> 
> Thanks very much for your review. and really sorry it took me more than a year to come back to this.
> 
> > unconditionally using byte load/store operations will impair optimizations/performance.
> 
> If you still remember this comment, is it referring to the final "clearing padding step", where I zeroing bytes-by-bytes? If so, apologies for not being familiar with this, what would be the best way of achieving it? So my current approach is
> 
> * Visit recursively to figure out all the bits ranges that data occupied
> * figure out all the holes (padding)
> * generate storing zero bytes-by-bytes for the wholes bytes and bits
> 
> on the last step, for non-bitfield, i was basically doing
> 
> ```c++
>               Address ElementAddr(Element, CGF.Int8Ty, CharUnits::One());
>               CGF.Builder.CreateStore(Zero, ElementAddr);
> ```
> 
> for bitfield, i was basically doing
> 
> ```c++
>           uint8_t mask = ((1 << EndBit) - 1) & ~((1 << StartBit) - 1);
>           auto *MaskValue = ConstantInt::get(CGF.Int8Ty, mask);
>           auto *NewValue = CGF.Builder.CreateAnd(Value, MaskValue);
> ```
> 
> This might not be the most optimised way of doing this. however, I am not familiar with this part of the code what would be the alternative.
> 
> > Bitfield load and store operations should be done using the same offset/size we normally use to access the bitfield;
> 
> Hmm, the puzzle I have is that I am not loading/storing the BitField themselves, but the paddings around them, which may or may not be occupied by other stuff.
> 
> > The format of builtin-clear-padding-codegen.cpp seems mostly fine, but consider using update_cc_test_checks.py to automate writing the CHECK lines. Please add a couple tests for empty classes and unions.
> 
> Absolutely, thanks for pointing out to update_cc_test_checks.py . I was mainly testing using our libc++ test suites and was not sure how to automatically generate these IR codegen tests. will update the test to cover all the cases.

Hi @efriedma-quic , since the last update, the tests were updated with update_cc_test_checks.py. i think the remaining issue is the bit field efficiency. In the current approach, since it is doing a simple BFS and I don't explicit mark if a particular member is inside a union (and indirectly inside a union). And the final clearing pass does not know if a padding bits is from which field. there are also situations where a padding bit is a common padding bit between two BitFields inside the same union, which might be even more complex.   I wonder if you are ok with the current approach where it simply clear bytes by bytes. since this function is mainly for implementing `std::atomic` , I would expect extreme rare use of with Bit Field together.

https://github.com/llvm/llvm-project/pull/75371