[llvm] 651e644 - [BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree()
Eduard Zingerman via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 21 14:13:53 PDT 2023
Author: Eduard Zingerman
Date: 2023-08-22T00:04:51+03:00
New Revision: 651e644595b72c22fd22f51358cf083146790ed4
URL: https://github.com/llvm/llvm-project/commit/651e644595b72c22fd22f51358cf083146790ed4
DIFF: https://github.com/llvm/llvm-project/commit/651e644595b72c22fd22f51358cf083146790ed4.diff
LOG: [BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree()
Replace `BPFMIPeepholeTruncElim` by adding an overload for
`TargetLowering::isZExtFree()` aware that zero extension is
free for `ISD::LOAD`.
Short description
=================
The `BPFMIPeepholeTruncElim` handles two patterns:
Pattern #1:
%1 = LDB %0, ... %1 = LDB %0, ...
%2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!)
Pattern #2:
bb.1: bb.1:
%a = LDB %0, ... %a = LDB %0, ...
br %bb3 br %bb3
bb.2: bb.2:
%b = LDB %0, ... -> %b = LDB %0, ...
br %bb3 br %bb3
bb.3: bb.3:
%1 = PHI %a, %b %1 = PHI %a, %b
%2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!)
Plus variations:
- AND_ri_32 instead of AND_ri
- SLL/SLR instead of AND_ri
- LDH, LDW, LDB32, LDH32, LDW32
Both patterns could be handled by built-in transformations at
instruction selection phase if suitable `isZExtFree()` implementation
is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`.
When evaluating on BPF kernel selftests and remove_truncate_*.ll LLVM
test cases this revisions performs slightly better than
BPFMIPeepholeTruncElim, see "Impact" section below for details.
Commit also adds a few test cases to make sure that patterns in
question are handled.
Long description
================
Why this works: Pattern #1
--------------------------
Consider the following example:
define i1 @foo(ptr %p) {
entry:
%a = load i8, ptr %p, align 1
%cond = icmp eq i8 %a, 0
ret i1 %cond
}
Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command:
...
Type-legalized selection DAG: %bb.0 'foo:entry'
SelectionDAG has 13 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
t19: i64 = and t16, Constant:i64<255>
t17: i64 = setcc t19, Constant:i64<0>, seteq:ch
t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
...
Replacing.1 t19: i64 = and t16, Constant:i64<255>
With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64
and 0 other values
...
Optimized type-legalized selection DAG: %bb.0 'foo:entry'
SelectionDAG has 11 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
t17: i64 = setcc t20, Constant:i64<0>, seteq:ch
t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17
t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
...
Note:
- Optimized type-legalized selection DAG:
- `t19 = and t16, 255` had been replaced by `t16` (load).
- Patterns like `(and (load ... i8), 255)` are replaced by `load`
in `DAGCombiner::BackwardsPropagateMask` called from
`DAGCombiner::visitAND`.
- Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by
`(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge,
look for `TLI.shouldFoldConstantShiftPairToMask()` call).
Why this works: Pattern #2
--------------------------
Consider the following example:
define i1 @foo(ptr %p) {
entry:
%a = load i8, ptr %p, align 1
br label %next
next:
%cond = icmp eq i8 %a, 0
ret i1 %cond
}
Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command.
Log for first basic block:
Initial selection DAG: %bb.0 'foo:entry'
SelectionDAG has 9 nodes:
t0: ch,glue = EntryToken
t3: i64 = Constant<0>
t2: i64,ch = CopyFromReg t0, Register:i64 %1
t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64
t6: i64 = zero_extend t5
t8: ch = CopyToReg t0, Register:i64 %0, t6
...
Replacing.1 t6: i64 = zero_extend t5
With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
and 0 other values
...
Optimized lowered selection DAG: %bb.0 'foo:entry'
SelectionDAG has 7 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %1
t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64
t8: ch = CopyToReg t0, Register:i64 %0, t9
Note:
- Initial selection DAG:
- `%a = load ...` is lowered as `t6 = (zero_extend (load ...))`
w/o special `isZExtFree()` overload added by this commit
it is instead lowered as `t6 = (any_extend (load ...))`.
- The decision to generate `zero_extend` or `any_extend` is
done in `RegsForValue::getCopyToRegs` called from
`SelectionDAGBuilder::CopyValueToVirtualRegister`:
- if `isZExtFree()` for load returns true `zero_extend` is used;
- `any_extend` is used otherwise.
- Optimized lowered selection DAG:
- `t6 = (any_extend (load ...))` is replaced by
`t9 = load ..., zext from i8`
This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from
`DAGCombiner::visitZERO_EXTEND`.
Log for second basic block:
Initial selection DAG: %bb.1 'foo:next'
SelectionDAG has 13 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t4: i64 = AssertZext t2, ValueType:ch:i8
t5: i8 = truncate t4
t8: i1 = setcc t5, Constant:i8<0>, seteq:ch
t9: i64 = any_extend t8
t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9
t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
...
Replacing.2 t18: i64 = and t4, Constant:i64<255>
With: t4: i64 = AssertZext t2, ValueType:ch:i8
...
Type-legalized selection DAG: %bb.1 'foo:next'
SelectionDAG has 13 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t4: i64 = AssertZext t2, ValueType:ch:i8
t18: i64 = and t4, Constant:i64<255>
t16: i64 = setcc t18, Constant:i64<0>, seteq:ch
t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
...
Optimized type-legalized selection DAG: %bb.1 'foo:next'
SelectionDAG has 11 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t4: i64 = AssertZext t2, ValueType:ch:i8
t16: i64 = setcc t4, Constant:i64<0>, seteq:ch
t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16
t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1
...
Note:
- Initial selection DAG:
- `t0` is an input value for this basic block, it corresponds load
instruction (`t9`) from the first basic block.
- It is accessed within basic block via
`t4` (AssertZext (CopyFromReg t0, ...)).
- The `AssertZext` is generated by RegsForValue::getCopyFromRegs
called from SelectionDAGBuilder::getCopyFromRegs, it is generated
only when `LiveOutInfo` with known number of leading zeros is
present for `t0`.
- Known register bits in `LiveOutInfo` are computed by
`SelectionDAG::computeKnownBits` called from
`SelectionDAGISel::ComputeLiveOutVRegInfo`.
- `computeKnownBits()` generates leading zeros information for
`(load ..., zext from ...)` but *does not* generate leading zeros
information for `(load ..., anyext from ...)`.
This is why `isZExtFree()` added in this commit is important.
- Type-legalized selection DAG:
- `t5 = truncate t4` is replaced by `t18 = and t4, 255`
- Optimized type-legalized selection DAG:
- `t18 = and t4, 255` is replaced by `t4`, this is done by
`DAGCombiner::SimplifyDemandedBits` called from
`DAGCombiner::visitAND`, which simplifies patterns like
`(and (assertzext ...))`
Impact
------
This change covers all remove_truncate_*.ll test cases:
- for -mcpu=v4 there are no changes in the generated code;
- for -mcpu=v2 code generated for remove_truncate_7 and
remove_truncate_8 improved slightly, for other tests it is
unchanged.
For remove_truncate_7:
Before this revision After this revision
-------------------- -------------------
r1 <<= 0x20 r1 <<= 0x20
r1 >>= 0x20 r1 >>= 0x20
if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2>
r1 = *(u32 *)(r2 + 0x0) r0 = *(u32 *)(r2 + 0x0)
goto +0x1 <LBB0_3> goto +0x1 <LBB0_3>
<LBB0_2>: <LBB0_2>:
r1 = *(u32 *)(r2 + 0x4) r0 = *(u32 *)(r2 + 0x4)
<LBB0_3>: <LBB0_3>:
r0 = r1 exit
exit
For remove_truncate_8:
Before this revision After this revision
-------------------- -------------------
r2 = *(u32 *)(r1 + 0x0) r2 = *(u32 *)(r1 + 0x0)
r3 = r2 r3 = r2
r3 <<= 0x20 r3 <<= 0x20
r4 = r3 r3 s>>= 0x20
r4 s>>= 0x20
if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3>
r4 = *(u32 *)(r1 + 0x4) r3 = *(u32 *)(r1 + 0x4)
r3 >>= 0x20
if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3>
r2 += 0x2 r2 += 0x2
*(u32 *)(r1 + 0x0) = r2 *(u32 *)(r1 + 0x0) = r2
<LBB0_3>: <LBB0_3>:
r0 = 0x3 r0 = 0x3
exit exit
For kernel BPF selftests statistics is as follows: (-mcpu=v4):
- For -mcpu=v4: 9 out of 655 object files have differences,
in all cases total number of instructions marginally decreased
(-27 instructions).
- For -mcpu=v2: 9 out of 655 object files have differences:
- For 19 object files number of instruction decreased
(-129 instruction in total): some redundant `rX &= 0xffff`
and register to register assignments removed;
- For 2 object files number of instructions increased +2
instructions in each file.
Both -mcpu=v2 instruction increases could be reduced to the same
example:
define void @foo(ptr %p) {
entry:
%a = load i32, ptr %p, align 4
%b = sext i32 %a to i64
%c = icmp ult i64 1, %b
br i1 %c, label %next, label %end
next:
call void inttoptr (i64 62 to ptr)(i32 %a)
br label %end
end:
ret void
}
Note that this example uses value loaded to `%a` both as a sign
extended (`%b`) and as zero extended (`%a` passed as parameter).
Here is the difference in final assembly code:
Before this revision After this revision
-------------------- -------------------
r1 = *(u32 *)(r1 + 0) r1 = *(u32 *)(r1 + 0)
r1 <<= 32 r1 <<= 32
r1 s>>= 32 r1 s>>= 32
if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2>
r1 <<= 32
r1 >>= 32
call 62 call 62
<LBB0_2>: <LBB0_2>:
exit exit
Before this commit `%a` is passed to call as a sign extended value,
after this commit `%a` is passed to call as a zero extended value,
both are correct as 32-bit sub-register is the same.
The difference comes from `DAGCombiner` operation on the initial DAG:
Initial selection DAG before this commit:
t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
t6: i64 = any_extend t5 <--------------------- (1)
t8: ch = CopyToReg t0, Register:i64 %0, t6
t9: i64 = sign_extend t5
t12: i1 = setcc Constant:i64<1>, t9, setult:ch
Initial selection DAG after this commit:
t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
t6: i64 = zero_extend t5 <--------------------- (2)
t8: ch = CopyToReg t0, Register:i64 %0, t6
t9: i64 = sign_extend t5
t12: i1 = setcc Constant:i64<1>, t9, setult:ch
The node `t9` is processed before node `t6` and `load` instruction is
combined to load with sign extension:
Replacing.1 t9: i64 = sign_extend t5
With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64
and 0 other values
Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64
With: t31: i32 = truncate t30
and 1 other values
This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from
`DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which
is `any_extend` in (1) and `zero_extend` in (2).
`tryToFoldExtOfLoad()` rewrites such uses of `t5` differently:
- `any_extend` is simply removed
- `zero_extend` is replaced by `and t30, 0xffffffff`, which is later
converted to a pair of shifts. This pair of shifts survives till the
end of translation.
Differential Revision: https://reviews.llvm.org/D157870
Added:
llvm/test/CodeGen/BPF/remove_truncate_9.ll
Modified:
llvm/lib/Target/BPF/BPF.h
llvm/lib/Target/BPF/BPFISelLowering.cpp
llvm/lib/Target/BPF/BPFISelLowering.h
llvm/lib/Target/BPF/BPFMIPeephole.cpp
llvm/lib/Target/BPF/BPFTargetMachine.cpp
Removed:
################################################################################
diff --git a/llvm/lib/Target/BPF/BPF.h b/llvm/lib/Target/BPF/BPF.h
index 9b7bab785ee974..1f539d3270b712 100644
--- a/llvm/lib/Target/BPF/BPF.h
+++ b/llvm/lib/Target/BPF/BPF.h
@@ -23,14 +23,12 @@ ModulePass *createBPFCheckAndAdjustIR();
FunctionPass *createBPFISelDag(BPFTargetMachine &TM);
FunctionPass *createBPFMISimplifyPatchablePass();
FunctionPass *createBPFMIPeepholePass();
-FunctionPass *createBPFMIPeepholeTruncElimPass();
FunctionPass *createBPFMIPreEmitPeepholePass();
FunctionPass *createBPFMIPreEmitCheckingPass();
void initializeBPFCheckAndAdjustIRPass(PassRegistry&);
void initializeBPFDAGToDAGISelPass(PassRegistry &);
-void initializeBPFMIPeepholePass(PassRegistry&);
-void initializeBPFMIPeepholeTruncElimPass(PassRegistry &);
+void initializeBPFMIPeepholePass(PassRegistry &);
void initializeBPFMIPreEmitCheckingPass(PassRegistry&);
void initializeBPFMIPreEmitPeepholePass(PassRegistry &);
void initializeBPFMISimplifyPatchablePass(PassRegistry &);
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index ffeef14d413b47..f3368b8979d6f5 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -224,6 +224,18 @@ bool BPFTargetLowering::isZExtFree(EVT VT1, EVT VT2) const {
return NumBits1 == 32 && NumBits2 == 64;
}
+bool BPFTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {
+ EVT VT1 = Val.getValueType();
+ if (Val.getOpcode() == ISD::LOAD && VT1.isSimple() && VT2.isSimple()) {
+ MVT MT1 = VT1.getSimpleVT().SimpleTy;
+ MVT MT2 = VT2.getSimpleVT().SimpleTy;
+ if ((MT1 == MVT::i8 || MT1 == MVT::i16 || MT1 == MVT::i32) &&
+ (MT2 == MVT::i32 || MT2 == MVT::i64))
+ return true;
+ }
+ return TargetLoweringBase::isZExtFree(Val, VT2);
+}
+
BPFTargetLowering::ConstraintType
BPFTargetLowering::getConstraintType(StringRef Constraint) const {
if (Constraint.size() == 1) {
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h b/llvm/lib/Target/BPF/BPFISelLowering.h
index 348510355be881..3be1c04bca3d65 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -144,6 +144,7 @@ class BPFTargetLowering : public TargetLowering {
// For 32bit ALU result zext to 64bit is free.
bool isZExtFree(Type *Ty1, Type *Ty2) const override;
bool isZExtFree(EVT VT1, EVT VT2) const override;
+ bool isZExtFree(SDValue Val, EVT VT2) const override;
unsigned EmitSubregExt(MachineInstr &MI, MachineBasicBlock *BB, unsigned Reg,
bool isSigned) const;
diff --git a/llvm/lib/Target/BPF/BPFMIPeephole.cpp b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
index c46e21d8f063e7..f0edf706bd8fd7 100644
--- a/llvm/lib/Target/BPF/BPFMIPeephole.cpp
+++ b/llvm/lib/Target/BPF/BPFMIPeephole.cpp
@@ -606,180 +606,3 @@ FunctionPass* llvm::createBPFMIPreEmitPeepholePass()
{
return new BPFMIPreEmitPeephole();
}
-
-STATISTIC(TruncElemNum, "Number of truncation eliminated");
-
-namespace {
-
-struct BPFMIPeepholeTruncElim : public MachineFunctionPass {
-
- static char ID;
- const BPFInstrInfo *TII;
- MachineFunction *MF;
- MachineRegisterInfo *MRI;
-
- BPFMIPeepholeTruncElim() : MachineFunctionPass(ID) {
- initializeBPFMIPeepholeTruncElimPass(*PassRegistry::getPassRegistry());
- }
-
-private:
- // Initialize class variables.
- void initialize(MachineFunction &MFParm);
-
- bool eliminateTruncSeq();
-
-public:
-
- // Main entry point for this pass.
- bool runOnMachineFunction(MachineFunction &MF) override {
- if (skipFunction(MF.getFunction()))
- return false;
-
- initialize(MF);
-
- return eliminateTruncSeq();
- }
-};
-
-static bool TruncSizeCompatible(int TruncSize, unsigned opcode)
-{
- if (TruncSize == 1)
- return opcode == BPF::LDB || opcode == BPF::LDB32;
-
- if (TruncSize == 2)
- return opcode == BPF::LDH || opcode == BPF::LDH32;
-
- if (TruncSize == 4)
- return opcode == BPF::LDW || opcode == BPF::LDW32;
-
- return false;
-}
-
-// Initialize class variables.
-void BPFMIPeepholeTruncElim::initialize(MachineFunction &MFParm) {
- MF = &MFParm;
- MRI = &MF->getRegInfo();
- TII = MF->getSubtarget<BPFSubtarget>().getInstrInfo();
- LLVM_DEBUG(dbgs() << "*** BPF MachineSSA TRUNC Elim peephole pass ***\n\n");
-}
-
-// Reg truncating is often the result of 8/16/32bit->64bit or
-// 8/16bit->32bit conversion. If the reg value is loaded with
-// masked byte width, the AND operation can be removed since
-// BPF LOAD already has zero extension.
-//
-// This also solved a correctness issue.
-// In BPF socket-related program, e.g., __sk_buff->{data, data_end}
-// are 32-bit registers, but later on, kernel verifier will rewrite
-// it with 64-bit value. Therefore, truncating the value after the
-// load will result in incorrect code.
-bool BPFMIPeepholeTruncElim::eliminateTruncSeq() {
- MachineInstr* ToErase = nullptr;
- bool Eliminated = false;
-
- for (MachineBasicBlock &MBB : *MF) {
- for (MachineInstr &MI : MBB) {
- // The second insn to remove if the eliminate candidate is a pair.
- MachineInstr *MI2 = nullptr;
- Register DstReg, SrcReg;
- MachineInstr *DefMI;
- int TruncSize = -1;
-
- // If the previous instruction was marked for elimination, remove it now.
- if (ToErase) {
- ToErase->eraseFromParent();
- ToErase = nullptr;
- }
-
- // AND A, 0xFFFFFFFF will be turned into SLL/SRL pair due to immediate
- // for BPF ANDI is i32, and this case only happens on ALU64.
- if (MI.getOpcode() == BPF::SRL_ri &&
- MI.getOperand(2).getImm() == 32) {
- SrcReg = MI.getOperand(1).getReg();
- if (!MRI->hasOneNonDBGUse(SrcReg))
- continue;
-
- MI2 = MRI->getVRegDef(SrcReg);
- DstReg = MI.getOperand(0).getReg();
-
- if (!MI2 ||
- MI2->getOpcode() != BPF::SLL_ri ||
- MI2->getOperand(2).getImm() != 32)
- continue;
-
- // Update SrcReg.
- SrcReg = MI2->getOperand(1).getReg();
- DefMI = MRI->getVRegDef(SrcReg);
- if (DefMI)
- TruncSize = 4;
- } else if (MI.getOpcode() == BPF::AND_ri ||
- MI.getOpcode() == BPF::AND_ri_32) {
- SrcReg = MI.getOperand(1).getReg();
- DstReg = MI.getOperand(0).getReg();
- DefMI = MRI->getVRegDef(SrcReg);
-
- if (!DefMI)
- continue;
-
- int64_t imm = MI.getOperand(2).getImm();
- if (imm == 0xff)
- TruncSize = 1;
- else if (imm == 0xffff)
- TruncSize = 2;
- }
-
- if (TruncSize == -1)
- continue;
-
- // The definition is PHI node, check all inputs.
- if (DefMI->isPHI()) {
- bool CheckFail = false;
-
- for (unsigned i = 1, e = DefMI->getNumOperands(); i < e; i += 2) {
- MachineOperand &opnd = DefMI->getOperand(i);
- if (!opnd.isReg()) {
- CheckFail = true;
- break;
- }
-
- MachineInstr *PhiDef = MRI->getVRegDef(opnd.getReg());
- if (!PhiDef || PhiDef->isPHI() ||
- !TruncSizeCompatible(TruncSize, PhiDef->getOpcode())) {
- CheckFail = true;
- break;
- }
- }
-
- if (CheckFail)
- continue;
- } else if (!TruncSizeCompatible(TruncSize, DefMI->getOpcode())) {
- continue;
- }
-
- BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(BPF::MOV_rr), DstReg)
- .addReg(SrcReg);
-
- if (MI2)
- MI2->eraseFromParent();
-
- // Mark it to ToErase, and erase in the next iteration.
- ToErase = &MI;
- TruncElemNum++;
- Eliminated = true;
- }
- }
-
- return Eliminated;
-}
-
-} // end default namespace
-
-INITIALIZE_PASS(BPFMIPeepholeTruncElim, "bpf-mi-trunc-elim",
- "BPF MachineSSA Peephole Optimization For TRUNC Eliminate",
- false, false)
-
-char BPFMIPeepholeTruncElim::ID = 0;
-FunctionPass* llvm::createBPFMIPeepholeTruncElimPass()
-{
- return new BPFMIPeepholeTruncElim();
-}
diff --git a/llvm/lib/Target/BPF/BPFTargetMachine.cpp b/llvm/lib/Target/BPF/BPFTargetMachine.cpp
index c47e8274b2e2f3..3926885c05a392 100644
--- a/llvm/lib/Target/BPF/BPFTargetMachine.cpp
+++ b/llvm/lib/Target/BPF/BPFTargetMachine.cpp
@@ -42,7 +42,6 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeBPFTarget() {
PassRegistry &PR = *PassRegistry::getPassRegistry();
initializeBPFCheckAndAdjustIRPass(PR);
initializeBPFMIPeepholePass(PR);
- initializeBPFMIPeepholeTruncElimPass(PR);
initializeBPFDAGToDAGISelPass(PR);
}
@@ -155,7 +154,6 @@ void BPFPassConfig::addMachineSSAOptimization() {
if (!DisableMIPeephole) {
if (Subtarget->getHasAlu32())
addPass(createBPFMIPeepholePass());
- addPass(createBPFMIPeepholeTruncElimPass());
}
}
diff --git a/llvm/test/CodeGen/BPF/remove_truncate_9.ll b/llvm/test/CodeGen/BPF/remove_truncate_9.ll
new file mode 100644
index 00000000000000..3b9293d38fd01f
--- /dev/null
+++ b/llvm/test/CodeGen/BPF/remove_truncate_9.ll
@@ -0,0 +1,81 @@
+; RUN: llc -mcpu=v2 -march=bpf < %s | FileCheck %s
+; RUN: llc -mcpu=v4 -march=bpf < %s | FileCheck %s
+
+; Zero extension instructions should be eliminated at instruction
+; selection phase for all test cases below.
+
+; In BPF zero extension is implemented as &= or a pair of <<=/>>=
+; instructions, hence simply check that &= and >>= do not exist in
+; generated code (<<= remains because %c is used by both call and
+; lshr in a few test cases).
+
+; CHECK-NOT: &=
+; CHECK-NOT: >>=
+
+define void @shl_lshr_same_bb(ptr %p) {
+entry:
+ %a = load i8, ptr %p, align 1
+ %b = zext i8 %a to i64
+ %c = shl i64 %b, 56
+ %d = lshr i64 %c, 56
+ %e = icmp eq i64 %d, 0
+ ; hasOneUse() is a common requirement for many CombineDAG
+ ; transofmations, make sure that it does not matter in this case.
+ call void @sink1(i8 %a, i64 %b, i64 %c, i64 %d, i1 %e)
+ ret void
+}
+
+define void @shl_lshr_
diff _bb(ptr %p) {
+entry:
+ %a = load i16, ptr %p, align 2
+ %b = zext i16 %a to i64
+ %c = shl i64 %b, 48
+ %d = lshr i64 %c, 48
+ br label %next
+
+; Jump to the new basic block creates a COPY instruction for %d, which
+; might be materialized as noop or as AND_ri (zero extension) at the
+; start of the basic block. The decision depends on TLI.isZExtFree()
+; results, see RegsForValue::getCopyToRegs(). Check below verifies
+; that COPY is materialized as noop.
+next:
+ %e = icmp eq i64 %d, 0
+ call void @sink2(i16 %a, i64 %b, i64 %c, i64 %d, i1 %e)
+ ret void
+}
+
+define void @load_zext_same_bb(ptr %p) {
+entry:
+ %a = load i8, ptr %p, align 1
+ ; zext is implicit in this context
+ %b = icmp eq i8 %a, 0
+ call void @sink3(i8 %a, i1 %b)
+ ret void
+}
+
+define void @load_zext_
diff _bb(ptr %p) {
+entry:
+ %a = load i8, ptr %p, align 1
+ br label %next
+
+next:
+ %b = icmp eq i8 %a, 0
+ call void @sink3(i8 %a, i1 %b)
+ ret void
+}
+
+define void @load_zext_
diff _bb_2(ptr %p) {
+entry:
+ %a = load i32, ptr %p, align 4
+ br label %next
+
+next:
+ %b = icmp eq i32 %a, 0
+ call void @sink4(i32 %a, i1 %b)
+ ret void
+}
+
+declare void @sink1(i8, i64, i64, i64, i1);
+declare void @sink2(i16, i64, i64, i64, i1);
+declare void @sink3(i8, i1);
+declare void @sink4(i32, i1);
More information about the llvm-commits
mailing list