[PATCH] D72184: [BPF] support atomic instructions

Tue Nov 17 09:27:06 PST 2020

ast added inline comments.

================
Comment at: llvm/lib/Target/BPF/BPFInstrInfo.td:684
+  let Inst{47-32} = addr{15-0}; // offset
+  let Inst{11-8} = val;
+  let Inst{7-4} = Opc.Value;
----------------
jackmanb wrote:
> yonghong-song wrote:
> > jackmanb wrote:
> > > jackmanb wrote:
> > > > jackmanb wrote:
> > > > > Sorry I'm a beginner with the LLVM code, could you explain what `val` does? I didn't notice this when I looked through here before.
> > > > > 
> > > > > To try and get a clue I tried just removing this line and then compiling the following code:
> > > > > 
> > > > > ```C
> > > > > // SPDX-License-Identifier: GPL-2.0
> > > > > #include <stdatomic.h>
> > > > > 
> > > > > #include <linux/bpf.h>
> > > > > #include <bpf/bpf_helpers.h>
> > > > > #include <bpf/bpf_tracing.h>
> > > > > 
> > > > > __u64 test_data_64 = 0;
> > > > > __u64 test1_result = 0;
> > > > > 
> > > > > SEC("fentry/bpf_fentry_test1")
> > > > > int BPF_PROG(test1, int a)
> > > > > {
> > > > >         /* atomic_fetch_add(&test_data_64, 1); */
> > > > >         test1_result = __sync_fetch_and_add(&test_data_64, 1);
> > > > >         return 0;
> > > > > }
> > > > > ```
> > > > > 
> > > > > And I was able to load and run the program, with the kernel on my WIP branch: https://github.com/bjackman/linux-bpf/tree/wips/bpf-atomics-v0
> > > > > 
> > > > > The result looks like this:
> > > > > 
> > > > > ```shell
> > > > > $ llvm-objdump -d atomics_test.o
> > > > > 
> > > > > atomics_test.o: file format elf64-bpf
> > > > > 
> > > > > 
> > > > > Disassembly of section fentry/bpf_fentry_test1:
> > > > > 
> > > > > 0000000000000000 <test1>:
> > > > >        0:       b7 01 00 00 01 00 00 00 r1 = 1
> > > > >        1:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
> > > > >        3:       db 12 00 00 01 00 00 00 r1 = atomic_fetch_add((u64 *)(r2 + 0), PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
> > > > > Stack dump:
> > > > > 0.      Program arguments: llvm-objdump -d atomics_test.o 
> > > > > Segmentation fault
> > > > > ```
> > > > > 
> > > > > Aside from the fact that llvm-objdump crashed, the encoding `db 12 00 00 01 00 00 00` seems correct to me. If I add the `let Inst{11-8} = val` back in I get `db 12 00 00 01 01 00 00` which I don't understand.
> > > > Ah wait, I guess this is adding a 3rd operand register? In my example it's unclear because it is R1 which is also the dst operand. I was envisaging we just have semantics like `src = atomic_fetch_add(dst+off, src)` but you are instead proposing `dst = atomic_fetch_add(dst+off, val)`?
> > > Sorry I mean `dst = atomic_fetch_add(src+off, val)`
> > > Sorry I'm a beginner with the LLVM code, could you explain what `val` does? I didn't notice this when I looked through here before.
> > 
> > For the below statement:
> >   test1_result = __sync_fetch_and_add(&test_data_64, 1);
> > 
> > 'val' represents the register which holds the value '1'.
> > 
> > bit 4-7 is also used in compare-and-exchange insn as you need a memory location, in-register old/new values.
> > 
> > > 
> > > To try and get a clue I tried just removing this line and then compiling the following code:
> > > 
> > > ```C
> > > // SPDX-License-Identifier: GPL-2.0
> > > #include <stdatomic.h>
> > > 
> > > #include <linux/bpf.h>
> > > #include <bpf/bpf_helpers.h>
> > > #include <bpf/bpf_tracing.h>
> > > 
> > > __u64 test_data_64 = 0;
> > > __u64 test1_result = 0;
> > > 
> > > SEC("fentry/bpf_fentry_test1")
> > > int BPF_PROG(test1, int a)
> > > {
> > >         /* atomic_fetch_add(&test_data_64, 1); */
> > >         test1_result = __sync_fetch_and_add(&test_data_64, 1);
> > >         return 0;
> > > }
> > > ```
> > > 
> > > And I was able to load and run the program, with the kernel on my WIP branch: https://github.com/bjackman/linux-bpf/tree/wips/bpf-atomics-v0
> > > 
> > > The result looks like this:
> > > 
> > > ```shell
> > > $ llvm-objdump -d atomics_test.o
> > > 
> > > atomics_test.o: file format elf64-bpf
> > > 
> > > 
> > > Disassembly of section fentry/bpf_fentry_test1:
> > > 
> > > 0000000000000000 <test1>:
> > >        0:       b7 01 00 00 01 00 00 00 r1 = 1
> > >        1:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
> > >        3:       db 12 00 00 01 00 00 00 r1 = atomic_fetch_add((u64 *)(r2 + 0), PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
> > > Stack dump:
> > > 0.      Program arguments: llvm-objdump -d atomics_test.o 
> > > Segmentation fault
> > > ```
> > > 
> > 
> > the crash may come from that the 'val' is not encoded properly. I will double check.
> > 
> > > Aside from the fact that llvm-objdump crashed, the encoding `db 12 00 00 01 00 00 00` seems correct to me. If I add the `let Inst{11-8} = val` back in I get `db 12 00 00 01 01 00 00` which I don't understand.
> > 
> > in this particular case, yes, as final asm code looks like
> >    r1 = atomic_fetch_add((u64 *)(r2 + 0), r1)
> > where the value "r1" and the result "r1" shared the same register. A little bit compiler work is need to enforce "val" register and result register always the same.
> > 
> > My current design allows:
> >   r3 = atomic_fetch_add((u64 *)(r2 + 0), r1)
> > and I think this is more flexible as people may later on still use "r1". But let me know whether you prefer
> >   r1 = atomic_fetch_add((u64 *)(r2 + 0), r1)
> > always.
> > 
> > 
> Got it - this design will introduce some impedance mismatch with x86, since XADD has only two operands (`r3 = atomic_fetch_add((u64 *)(r2 + 0), r1)` cannot be implemented in a single instruction). It also hard-codes RAX as the third operand for [cmp]xchg so my current kernel implementation hard-codes R0 - that's also what Alexei supported in the separate email thread.
> 
> Will you be available to discuss this in the BPF office hours this week? There are a few possible ways forward, we just need to decide which tradeoff is best.
+1 to Brendan's point.

(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))
is a 3 op instruction. All bpf insns are 2 op. I think it's incorrect to make such departure from 2 op for this set of insns,
because it moves it further away from x64.
$val reg and $dst reg need to be the same to match 'lock xadd' semantics of x64 insn.
It's not only about reducing complexity of JIT, but to make compiler perform register allocation with actual HW in mind. Otherwise code gen with extra reg will assume that $dst can be different, but in reality it won't and JIT would have to emit extra insns to avoid corrupting $dst. So if there is no constraint on $dst==$val then reg alloc will work against the JIT and against the HW which we lead to redundant code after JIT in almost all cases.

As far as fetch_and_sub() I think it's ok to keep it as a separate BPF insn though there is no x64 xsub insn.
long foo(long *a, long b) {
 return __sync_fetch_and_sub(a, b);
}
compiles into
  mov     rax, rsi
  neg     rax
  lock            xadd    qword ptr [rdi], rax
  ret
The BPF JIT will emit two x64 insn for single BPF atomic_fetch_sub insn.

BPF atomic_fetch_add insn should map to exactly one x64 'lock xadd' insn in all conditions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72184/new/

https://reviews.llvm.org/D72184