[PATCH] D144829: [BPF] Add a few new insns under cpu=v4
Yonghong Song via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Jul 24 00:26:42 PDT 2023
yonghong-song added inline comments.
================
Comment at: llvm/lib/Target/BPF/BPFInstrInfo.td:379
+ "$dst = (s8)$src",
+ [(set GPR:$dst, (sra (shl GPR:$src, (i64 56)), (i64 56)))]>;
+ def MOVSX_rr_16 : ALU_RR<BPF_ALU64, BPF_MOV, 16,
----------------
eddyz87 wrote:
> I think it is possible to avoid matching expansion pattern `(sra (shl GPR:$src, (i64 56))` here, and instead turn off the expansion when `movsx` is available.
>
> I tried the change below and all BPF codegen tests are passing. Do I miss something?
>
> ---
>
> ```
> diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
> index 9a7357d6ad04..5e84af009591 100644
> --- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
> +++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
> @@ -132,9 +132,11 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
> setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);
>
> setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
> - setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
> - setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
> - setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Expand);
> + if (!STI.hasMovsx()) {
> + setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
> + setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
> + setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Expand);
> + }
>
> // Extended load operations for i1 types must be promoted
> for (MVT VT : MVT::integer_valuetypes()) {
> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
> index a1d532e60db2..29bec72aa92d 100644
> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
> @@ -376,11 +376,11 @@ let Predicates = [BPFHasMovsx] in {
> def MOVSX_rr_8 : ALU_RR<BPF_ALU64, BPF_MOV, 8,
> (outs GPR:$dst), (ins GPR:$src),
> "$dst = (s8)$src",
> - [(set GPR:$dst, (sra (shl GPR:$src, (i64 56)), (i64 56)))]>;
> + [(set GPR:$dst, (sext_inreg GPR:$src, i8))]>;
> def MOVSX_rr_16 : ALU_RR<BPF_ALU64, BPF_MOV, 16,
> (outs GPR:$dst), (ins GPR:$src),
> "$dst = (s16)$src",
> - [(set GPR:$dst, (sra (shl GPR:$src, (i64 48)), (i64 48)))]>;
> + [(set GPR:$dst, (sext_inreg GPR:$src, i16))]>;
> def MOVSX_rr_32 : ALU_RR<BPF_ALU64, BPF_MOV, 32,
> (outs GPR:$dst), (ins GPR32:$src),
> "$dst = (s32)$src",
> @@ -388,11 +388,11 @@ let Predicates = [BPFHasMovsx] in {
> def MOVSX_rr_32_8 : ALU_RR<BPF_ALU, BPF_MOV, 8,
> (outs GPR32:$dst), (ins GPR32:$src),
> "$dst = (s8)$src",
> - [(set GPR32:$dst, (sra (shl GPR32:$src, (i32 24)), (i32 24)))]>;
> + [(set GPR32:$dst, (sext_inreg GPR32:$src, i8))]>;
> def MOVSX_rr_32_16 : ALU_RR<BPF_ALU, BPF_MOV, 16,
> (outs GPR32:$dst), (ins GPR32:$src),
> "$dst = (s16)$src",
> - [(set GPR32:$dst, (sra (shl GPR32:$src, (i32 16)), (i32 16)))]>;
> + [(set GPR32:$dst, (sext_inreg GPR32:$src, i16))]>;
> }
> }
> ```
This indeed can simplify the code. I will incorporate your change into the patch. Thanks!
================
Comment at: llvm/lib/Target/BPF/BPFMIPeephole.cpp:321
+
+ std::map<unsigned, unsigned> ReverseCondOpMap;
----------------
eddyz87 wrote:
> Is this map unused?
No. This is a leftover. Will remove.
================
Comment at: llvm/lib/Target/BPF/BPFMIPeephole.cpp:412
+ int CurrNumInsns = 0;
+ std::map<MachineBasicBlock *, int> SoFarNumInsns;
+ std::map<MachineBasicBlock *, MachineBasicBlock *> FollowThroughBB;
----------------
eddyz87 wrote:
> Nitpick: Fangrui suggested in my llvm-objdump revisions to use `DenseMap` in most cases (as `std::map` allocates for each pair).
Will try to use DenseMap.
================
Comment at: llvm/test/CodeGen/BPF/movsx.ll:30
+}
+; CHECK: w0 = w1 # encoding: [0xbc,0x10,0x00,0x00,0x00,0x00,0x00,0x00]
+
----------------
eddyz87 wrote:
> This does not seem right, as it does not sign extend 8-bit argument to 16-bit value.
This is probably due to ABI. For example,
```
$ cat t1.c
__attribute__((noinline)) short f1(char a) {
return a * a;
}
int f2(int a) {
return f1(a);
}
$ clang --target=bpf -O2 -mcpu=v4 -S t1.c
f1: # @f1
# %bb.0: # %entry
w0 = w1
w0 *= w0
exit
.Lfunc_end0:
.size f1, .Lfunc_end0-f1
# -- End function
.globl f2 # -- Begin function f2
.p2align 3
.type f2, at function
f2: # @f2
# %bb.0: # %entry
w1 = (s8)w1
call f1
w0 = (s16)w0
exit
```
You can see in function f2(), the sign-extension has been done properly. and that is probably the reason in f1(), the compiler didn't generate proper sign extension code.
I will modify the test to generate proper sign extension like the above f2().
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144829/new/
https://reviews.llvm.org/D144829
More information about the cfe-commits
mailing list