[llvm] [RFC][BPF] Do atomic_fetch_*() pattern matching with memory ordering (PR #107343)

Wed Sep 4 19:19:51 PDT 2024

https://github.com/yonghong-song created https://github.com/llvm/llvm-project/pull/107343

For atomic fetch_and_*() operations, do pattern matching with memory ordering seq_cst and monotonic (relaxed). For fetch_and_*() operations with seq_cst ordering, atomic_fetch_*() instructions are generated. For monotonic ordering, locked insns are generated. The main motivation is to resolve the kernel issue [1].

The following are memory ordering which could be supported:
  seq_cst, acq_rel, release, acquire, relaxed
This patch only supports seq_cst and relaxed. We could pattern match other memory ordering as well. Current gcc style __sync_fetch_and_*() operations are all seq_cst.

To use relaxed memory ordering, the _Atomic type is needed. The following is an example:

```
$ cat test.c
\#include <stdatomic.h>
void f1(_Atomic int *i) {
   (void)__c11_atomic_fetch_and(i, 10, memory_order_relaxed);
}
void f2(_Atomic int *i) {
   (void)__c11_atomic_fetch_and(i, 10, memory_order_seq_cst);
}
$ cat run.sh
clang  -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test.c -o test.o && llvm-objdump -d test.o
$ ./run.sh

test.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 <f1>:
       0:       b4 02 00 00 0a 00 00 00 w2 = 0xa
       1:       c3 21 00 00 50 00 00 00 lock *(u32 *)(r1 + 0x0) &= w2
       2:       95 00 00 00 00 00 00 00 exit

0000000000000018 <f2>:
       3:       b4 02 00 00 0a 00 00 00 w2 = 0xa
       4:       c3 21 00 00 51 00 00 00 w2 = atomic_fetch_and((u32 *)(r1 + 0x0), w2)
       5:       95 00 00 00 00 00 00 00 exit
```

Here is another example with global _Atomic variable:

```
$ cat test1.c
\#include <stdatomic.h>

_Atomic int i;

void f1(void) {
   (void)__c11_atomic_fetch_and(&i, 10, memory_order_relaxed);
}
void f2(void) {
   (void)__c11_atomic_fetch_and(&i, 10, memory_order_seq_cst);
}
$ cat run.sh
clang  -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test1.c -o test1.o && llvm-objdump -d test1.o
$ ./run.sh

test1.o:        file format elf64-bpf

Disassembly of section .text:

0000000000000000 <f1>:
       0:       b4 01 00 00 0a 00 00 00 w1 = 0xa
       1:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
       3:       c3 12 00 00 50 00 00 00 lock *(u32 *)(r2 + 0x0) &= w1
       4:       95 00 00 00 00 00 00 00 exit

0000000000000028 <f2>:
       5:       b4 01 00 00 0a 00 00 00 w1 = 0xa
       6:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
       8:       c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
       9:       95 00 00 00 00 00 00 00 exit
```

Note that in the above compilations, '-g' is not used. The reason is due to the following IR related to _Atomic type:
```
$clang  -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -g -S -emit-llvm test1.c
```
The related debug info for test1.c:
```
!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "i", scope: !2, file: !3, line: 3, type: !16, isLocal: false, isDefinition: true)
...
!16 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !17)
!17 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
```

If compiling test.c, the related debug info:
```
...
!19 = distinct !DISubprogram(name: "f1", scope: !1, file: !1, line: 3, type: !20, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !25) !20 = !DISubroutineType(types: !21)
!21 = !{null, !22}
!22 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !23, size: 64) !23 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !24) !24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) ...
!26 = !DILocalVariable(name: "i", arg: 1, scope: !19, file: !1, line: 3, type: !22)
```

All the above suggests _Atomic behaves like a modifier (e.g. const, restrict, volatile).
This seems true based on doc [1].

Without proper handling DW_TAG_atomic_type, BTF generation will be incorrect since
the current implementation assumes no existence of DW_TAG_atomic_type. So we have
two choices here:
  (1). llvm bpf backend processes DW_TAG_atomic_type but ignores it in BTF encoding.
  (2). Add another type, e.g., BTF_KIND_ATOMIC to BTF. BTF_KIND_ATOMIC behaves as a
       modifier like const/volatile/restrict.

For choice (1), the following is a hack which can make '-g' work for test1.c:
```

diff --git a/llvm/lib/Target/BPF/BTFDebug.cpp b/llvm/lib/Target/BPF/BTFDebug.cpp index 4d847abea731..fd61bb811111 100644
--- a/llvm/lib/Target/BPF/BTFDebug.cpp
+++ b/llvm/lib/Target/BPF/BTFDebug.cpp
@@ -1444,8 +1444,14 @@ void BTFDebug::processGlobals(bool ProcessingMapDef) {
       DIGlobal = GVE->getVariable();
       if (SecName.starts_with(".maps"))
         visitMapDefType(DIGlobal->getType(), GVTypeId);
-      else
-        visitTypeEntry(DIGlobal->getType(), GVTypeId, false, false);
+      else {
+        const DIType *Ty = DIGlobal->getType();
+        auto *DTy = dyn_cast<DIDerivedType>(Ty);
+        if (DTy && DTy->getTag() == dwarf::DW_TAG_atomic_type)
+          visitTypeEntry(DTy->getBaseType(), GVTypeId, false, false);
+        else
+          visitTypeEntry(Ty, GVTypeId, false, false);
+      }
       break;
     }
```
You can see that basicaly dwarf::DW_TAG_atomic_type is skipped during BTF generation.
Other changes are needed to avoid other usages of dwarf::DW_TAG_atomic_type.

But I prefer adding BTF_KIND_ATOMIC if we indeed intends to use _Atomic in bpf programs.
This probably is the only way if the _Atomic type is used e.g. at global variable
where corresponding types in skeleton needs also be _Atomic.

Please let me know your opinion.

 [1] https://lore.kernel.org/bpf/7b941f53-2a05-48ec-9032-8f106face3a3@linux.dev/
 [2] https://dwarfstd.org/issues/131112.1.html

>From 0e227d67ff199090ec6584c430f785e416366dbd Mon Sep 17 00:00:00 2001
From: Yonghong Song <yonghong.song at linux.dev>
Date: Tue, 3 Sep 2024 21:26:17 -0700
Subject: [PATCH] [RFC][BPF] Do atomic_fetch_*() pattern matching with memory
 ordering

For atomic fetch_and_*() operations, do pattern matching with memory ordering
seq_cst and monotonic (relaxed). For fetch_and_*() operations with seq_cst
ordering, atomic_fetch_*() instructions are generated. For monotonic ordering,
locked insns are generated. The main motivation is to resolve the kernel
issue [1].

The following are memory ordering which could be supported:
  seq_cst, acq_rel, release, acquire, relaxed
This patch only supports seq_cst and relaxed. We could pattern match
other memory ordering as well. Current gcc style __sync_fetch_and_*()
operations are all seq_cst.

To use relaxed memory ordering, the _Atomic type is needed.
The following is an example:

```
$ cat test.c
\#include <stdatomic.h>
void f1(_Atomic int *i) {
   (void)__c11_atomic_fetch_and(i, 10, memory_order_relaxed);
}
void f2(_Atomic int *i) {
   (void)__c11_atomic_fetch_and(i, 10, memory_order_seq_cst);
}
$ cat run.sh
clang  -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test.c -o test.o && llvm-objdump -d test.o
$ ./run.sh

test.o: file format elf64-bpf

Disassembly of section .text:

0000000000000000 <f1>:
       0:       b4 02 00 00 0a 00 00 00 w2 = 0xa
       1:       c3 21 00 00 50 00 00 00 lock *(u32 *)(r1 + 0x0) &= w2
       2:       95 00 00 00 00 00 00 00 exit

0000000000000018 <f2>:
       3:       b4 02 00 00 0a 00 00 00 w2 = 0xa
       4:       c3 21 00 00 51 00 00 00 w2 = atomic_fetch_and((u32 *)(r1 + 0x0), w2)
       5:       95 00 00 00 00 00 00 00 exit
```

Here is another example with global _Atomic variable:

```
$ cat test1.c
\#include <stdatomic.h>

_Atomic int i;

void f1(void) {
   (void)__c11_atomic_fetch_and(&i, 10, memory_order_relaxed);
}
void f2(void) {
   (void)__c11_atomic_fetch_and(&i, 10, memory_order_seq_cst);
}
$ cat run.sh
clang  -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -c test1.c -o test1.o && llvm-objdump -d test1.o
$ ./run.sh

test1.o:        file format elf64-bpf

Disassembly of section .text:

0000000000000000 <f1>:
       0:       b4 01 00 00 0a 00 00 00 w1 = 0xa
       1:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
       3:       c3 12 00 00 50 00 00 00 lock *(u32 *)(r2 + 0x0) &= w1
       4:       95 00 00 00 00 00 00 00 exit

0000000000000028 <f2>:
       5:       b4 01 00 00 0a 00 00 00 w1 = 0xa
       6:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll
       8:       c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
       9:       95 00 00 00 00 00 00 00 exit
```

Note that in the above compilations, '-g' is not used. The reason is due to the following IR
related to _Atomic type:
```
$clang  -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -O2 --target=bpf -g -S -emit-llvm test1.c
```
The related debug info for test1.c:
```
!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
!1 = distinct !DIGlobalVariable(name: "i", scope: !2, file: !3, line: 3, type: !16, isLocal: false, isDefinition: true)
...
!16 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !17)
!17 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
...

If compiling test.c, the related debug info:
```
...
!19 = distinct !DISubprogram(name: "f1", scope: !1, file: !1, line: 3, type: !20, scopeLine: 3, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !25)
!20 = !DISubroutineType(types: !21)
!21 = !{null, !22}
!22 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !23, size: 64)
!23 = !DIDerivedType(tag: DW_TAG_atomic_type, baseType: !24)
!24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
...
!26 = !DILocalVariable(name: "i", arg: 1, scope: !19, file: !1, line: 3, type: !22)
```

All the above suggests _Atomic behaves like a modifier (e.g. const, restrict, volatile).
This seems true based on doc [1].

Without proper handling DW_TAG_atomic_type, BTF generation will be incorrect since
the current implementation assumes no existence of DW_TAG_atomic_type. So we have
two choices here:
  (1). llvm bpf backend processes DW_TAG_atomic_type but ignores it in BTF encoding.
  (2). Add another type, e.g., BTF_KIND_ATOMIC to BTF. BTF_KIND_ATOMIC behaves as a
       modifier like const/volatile/restrict.

For choice (1), the following is a hack which can make '-g' work for test1.c:
```
diff --git a/llvm/lib/Target/BPF/BTFDebug.cpp b/llvm/lib/Target/BPF/BTFDebug.cpp
index 4d847abea731..fd61bb811111 100644
--- a/llvm/lib/Target/BPF/BTFDebug.cpp
+++ b/llvm/lib/Target/BPF/BTFDebug.cpp
@@ -1444,8 +1444,14 @@ void BTFDebug::processGlobals(bool ProcessingMapDef) {
       DIGlobal = GVE->getVariable();
       if (SecName.starts_with(".maps"))
         visitMapDefType(DIGlobal->getType(), GVTypeId);
-      else
-        visitTypeEntry(DIGlobal->getType(), GVTypeId, false, false);
+      else {
+        const DIType *Ty = DIGlobal->getType();
+        auto *DTy = dyn_cast<DIDerivedType>(Ty);
+        if (DTy && DTy->getTag() == dwarf::DW_TAG_atomic_type)
+          visitTypeEntry(DTy->getBaseType(), GVTypeId, false, false);
+        else
+          visitTypeEntry(Ty, GVTypeId, false, false);
+      }
       break;
     }
```
You can see that basicaly dwarf::DW_TAG_atomic_type is skipped during BTF generation.
Other changes are needed to avoid other usages of dwarf::DW_TAG_atomic_type.

But I prefer adding BTF_KIND_ATOMIC if we indeed intends to use _Atomic in bpf programs.
This probably is the only way if the _Atomic type is used e.g. at global variable
where corresponding types in skeleton needs also be _Atomic.

Please let me know your opinion.

 [1] https://lore.kernel.org/bpf/7b941f53-2a05-48ec-9032-8f106face3a3@linux.dev/
 [2] https://dwarfstd.org/issues/131112.1.html
---
 llvm/lib/Target/BPF/BPFInstrInfo.td | 44 ++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td
index f7e17901c7ed5e..cf256531a8939a 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -864,26 +864,50 @@ class XFALU32<BPFWidthModifer SizeOp, BPFArithOp Opc, string OpcodeStr,
 
 let Constraints = "$dst = $val" in {
   let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
-    def XFADDW32 : XFALU32<BPF_W, BPF_ADD, "u32", "add", atomic_load_add_i32>;
-    def XFANDW32 : XFALU32<BPF_W, BPF_AND, "u32", "and", atomic_load_and_i32>;
-    def XFORW32  : XFALU32<BPF_W, BPF_OR,  "u32", "or",  atomic_load_or_i32>;
-    def XFXORW32 : XFALU32<BPF_W, BPF_XOR, "u32", "xor", atomic_load_xor_i32>;
+    def XFADDW32 : XFALU32<BPF_W, BPF_ADD, "u32", "add", atomic_load_add_i32_seq_cst>;
+    def XFANDW32 : XFALU32<BPF_W, BPF_AND, "u32", "and", atomic_load_and_i32_seq_cst>;
+    def XFORW32  : XFALU32<BPF_W, BPF_OR,  "u32", "or",  atomic_load_or_i32_seq_cst>;
+    def XFXORW32 : XFALU32<BPF_W, BPF_XOR, "u32", "xor", atomic_load_xor_i32_seq_cst>;
   }
 
   let Predicates = [BPFHasALU32] in {
-    def XFADDD : XFALU64<BPF_DW, BPF_ADD, "u64", "add", atomic_load_add_i64>;
+    def XFADDD : XFALU64<BPF_DW, BPF_ADD, "u64", "add", atomic_load_add_i64_seq_cst>;
   }
-  def XFANDD : XFALU64<BPF_DW, BPF_AND, "u64", "and", atomic_load_and_i64>;
-  def XFORD  : XFALU64<BPF_DW, BPF_OR,  "u64", "or",  atomic_load_or_i64>;
-  def XFXORD : XFALU64<BPF_DW, BPF_XOR, "u64", "xor", atomic_load_xor_i64>;
+  def XFANDD : XFALU64<BPF_DW, BPF_AND, "u64", "and", atomic_load_and_i64_seq_cst>;
+  def XFORD  : XFALU64<BPF_DW, BPF_OR,  "u64", "or",  atomic_load_or_i64_seq_cst>;
+  def XFXORD : XFALU64<BPF_DW, BPF_XOR, "u64", "xor", atomic_load_xor_i64_seq_cst>;
+}
+
+let Predicates = [BPFHasALU32] in {
+    def : Pat<(atomic_load_add_i32_monotonic ADDRri:$addr, GPR32:$val),
+              (XADDW32 ADDRri:$addr, GPR32:$val)>;
+    def : Pat<(atomic_load_add_i64_monotonic ADDRri:$addr, GPR:$val),
+              (XADDD ADDRri:$addr, GPR:$val)>;
 }
 
 // atomic_load_sub can be represented as a neg followed
 // by an atomic_load_add.
-def : Pat<(atomic_load_sub_i32 ADDRri:$addr, GPR32:$val),
+def : Pat<(atomic_load_sub_i32_seq_cst ADDRri:$addr, GPR32:$val),
           (XFADDW32 ADDRri:$addr, (NEG_32 GPR32:$val))>;
-def : Pat<(atomic_load_sub_i64 ADDRri:$addr, GPR:$val),
+def : Pat<(atomic_load_sub_i64_seq_cst ADDRri:$addr, GPR:$val),
           (XFADDD ADDRri:$addr, (NEG_64 GPR:$val))>;
+def : Pat<(atomic_load_sub_i32_monotonic ADDRri:$addr, GPR32:$val),
+          (XADDW32 ADDRri:$addr, (NEG_32 GPR32:$val))>;
+def : Pat<(atomic_load_sub_i64_monotonic ADDRri:$addr, GPR:$val),
+          (XADDD ADDRri:$addr, (NEG_64 GPR:$val))>;
+
+def : Pat<(atomic_load_and_i32_monotonic ADDRri:$addr, GPR32:$val),
+          (XANDW32 ADDRri:$addr, GPR32:$val)>;
+def : Pat<(atomic_load_and_i64_monotonic ADDRri:$addr, GPR:$val),
+          (XANDD ADDRri:$addr, GPR:$val)>;
+def : Pat<(atomic_load_or_i32_monotonic ADDRri:$addr, GPR32:$val),
+          (XORW32 ADDRri:$addr, GPR32:$val)>;
+def : Pat<(atomic_load_or_i64_monotonic ADDRri:$addr, GPR:$val),
+          (XORD ADDRri:$addr, GPR:$val)>;
+def : Pat<(atomic_load_xor_i32_monotonic ADDRri:$addr, GPR32:$val),
+          (XXORW32 ADDRri:$addr, GPR32:$val)>;
+def : Pat<(atomic_load_xor_i64_monotonic ADDRri:$addr, GPR:$val),
+          (XXORD ADDRri:$addr, GPR:$val)>;
 
 // Atomic Exchange
 class XCHG<BPFWidthModifer SizeOp, string OpcodeStr, PatFrag OpNode>