[llvm] [MachineLICM] Correctly Apply Register Masks (PR #95746)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 17 04:05:52 PDT 2024
================
@@ -426,38 +426,25 @@ static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
static void applyBitsNotInRegMaskToRegUnitsMask(const TargetRegisterInfo &TRI,
BitVector &RUs,
const uint32_t *Mask) {
- // Iterate over the RegMask raw to avoid constructing a BitVector, which is
- // expensive as it implies dynamically allocating memory.
- //
- // We also work backwards.
+ BitVector ClobberedRUs(TRI.getNumRegUnits(), true);
const unsigned NumRegs = TRI.getNumRegs();
const unsigned MaskWords = (NumRegs + 31) / 32;
for (unsigned K = 0; K < MaskWords; ++K) {
- // We want to set the bits that aren't in RegMask, so flip it.
- uint32_t Word = ~Mask[K];
-
- // Iterate all set bits, starting from the right.
- while (Word) {
- const unsigned SetBitIdx = countr_zero(Word);
-
- // The bits are numbered from the LSB in each word.
- const unsigned PhysReg = (K * 32) + SetBitIdx;
-
- // Clear the bit at SetBitIdx. Doing it this way appears to generate less
- // instructions on x86. This works because negating a number will flip all
- // the bits after SetBitIdx. So (Word & -Word) == (1 << SetBitIdx), but
- // faster.
- Word ^= Word & -Word;
-
+ uint32_t Word = Mask[K];
----------------
jayfoad wrote:
If you're interested in the compile time impact you could try skipping the inner loops if `Word` is 0 here.
https://github.com/llvm/llvm-project/pull/95746
More information about the llvm-commits
mailing list