[PATCH] D144829: [BPF] Add a few new insns under cpu=v4

Mon Jul 24 00:26:42 PDT 2023

yonghong-song added inline comments.

================
Comment at: llvm/lib/Target/BPF/BPFInstrInfo.td:379
+                      "$dst = (s8)$src",
+                      [(set GPR:$dst, (sra (shl GPR:$src, (i64 56)), (i64 56)))]>;
+  def MOVSX_rr_16 : ALU_RR<BPF_ALU64, BPF_MOV, 16,
----------------
eddyz87 wrote:
> I think it is possible to avoid matching expansion pattern `(sra (shl GPR:$src, (i64 56))` here, and instead turn off the expansion when `movsx` is available.
> 
> I tried the change below and all BPF codegen tests are passing. Do I miss something?
> 
> ---
> 
> ```
> diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
> index 9a7357d6ad04..5e84af009591 100644
> --- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
> +++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
> @@ -132,9 +132,11 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
>    setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);
>  
>    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
> -  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
> -  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
> -  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Expand);
> +  if (!STI.hasMovsx()) {
> +    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand);
> +    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand);
> +    setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Expand);
> +  }
>  
>    // Extended load operations for i1 types must be promoted
>    for (MVT VT : MVT::integer_valuetypes()) {
> diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
> index a1d532e60db2..29bec72aa92d 100644
> --- a/llvm/lib/Target/BPF/BPFInstrInfo.td
> +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
> @@ -376,11 +376,11 @@ let Predicates = [BPFHasMovsx] in {
>    def MOVSX_rr_8 : ALU_RR<BPF_ALU64, BPF_MOV, 8,
>                        (outs GPR:$dst), (ins GPR:$src),
>                        "$dst = (s8)$src",
> -                      [(set GPR:$dst, (sra (shl GPR:$src, (i64 56)), (i64 56)))]>;
> +                      [(set GPR:$dst, (sext_inreg GPR:$src, i8))]>;
>    def MOVSX_rr_16 : ALU_RR<BPF_ALU64, BPF_MOV, 16,
>                        (outs GPR:$dst), (ins GPR:$src),
>                        "$dst = (s16)$src",
> -                      [(set GPR:$dst, (sra (shl GPR:$src, (i64 48)), (i64 48)))]>;
> +                      [(set GPR:$dst, (sext_inreg GPR:$src, i16))]>;
>    def MOVSX_rr_32 : ALU_RR<BPF_ALU64, BPF_MOV, 32,
>                        (outs GPR:$dst), (ins GPR32:$src),
>                        "$dst = (s32)$src",
> @@ -388,11 +388,11 @@ let Predicates = [BPFHasMovsx] in {
>    def MOVSX_rr_32_8 : ALU_RR<BPF_ALU, BPF_MOV, 8,
>                        (outs GPR32:$dst), (ins GPR32:$src),
>                        "$dst = (s8)$src",
> -                      [(set GPR32:$dst, (sra (shl GPR32:$src, (i32 24)), (i32 24)))]>;
> +                      [(set GPR32:$dst, (sext_inreg GPR32:$src, i8))]>;
>    def MOVSX_rr_32_16 : ALU_RR<BPF_ALU, BPF_MOV, 16,
>                        (outs GPR32:$dst), (ins GPR32:$src),
>                        "$dst = (s16)$src",
> -                      [(set GPR32:$dst, (sra (shl GPR32:$src, (i32 16)), (i32 16)))]>;
> +                      [(set GPR32:$dst, (sext_inreg GPR32:$src, i16))]>;
>  }
>  }
>  ```
This indeed can simplify the code. I will incorporate your change into the patch. Thanks!

================
Comment at: llvm/lib/Target/BPF/BPFMIPeephole.cpp:321
+
+  std::map<unsigned, unsigned> ReverseCondOpMap;

----------------
eddyz87 wrote:
> Is this map unused?
No. This is a leftover. Will remove.

================
Comment at: llvm/lib/Target/BPF/BPFMIPeephole.cpp:412
+  int CurrNumInsns = 0;
+  std::map<MachineBasicBlock *, int> SoFarNumInsns;
+  std::map<MachineBasicBlock *, MachineBasicBlock *> FollowThroughBB;
----------------
eddyz87 wrote:
> Nitpick: Fangrui suggested in my llvm-objdump revisions to use `DenseMap` in most cases (as `std::map` allocates for each pair).
Will try to use DenseMap.

================
Comment at: llvm/test/CodeGen/BPF/movsx.ll:30
+}
+; CHECK: w0 = w1                                 # encoding: [0xbc,0x10,0x00,0x00,0x00,0x00,0x00,0x00]
+
----------------
eddyz87 wrote:
> This does not seem right, as it does not sign extend 8-bit argument to 16-bit value.
This is probably due to ABI. For example,
```
$ cat t1.c
__attribute__((noinline)) short f1(char a) {
  return a * a;
}

int f2(int a) {
  return f1(a);
}

$ clang --target=bpf -O2 -mcpu=v4 -S t1.c

f1:                                     # @f1
# %bb.0:                                # %entry
        w0 = w1
        w0 *= w0
        exit
.Lfunc_end0:
        .size   f1, .Lfunc_end0-f1
                                        # -- End function
        .globl  f2                              # -- Begin function f2
        .p2align        3
        .type   f2, at function
f2:                                     # @f2
# %bb.0:                                # %entry
        w1 = (s8)w1
        call f1
        w0 = (s16)w0
        exit
```
You can see in function f2(), the sign-extension has been done properly. and that is probably the reason in f1(), the compiler didn't generate proper sign extension code.

I will modify the test to generate proper sign extension like the above f2().

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144829/new/

https://reviews.llvm.org/D144829