[PATCH] D139645: [WebAssembly] Fold adds with global addresses into load offset

Thu Dec 22 16:34:37 PST 2022

luke added inline comments.

================
Comment at: llvm/test/CodeGen/WebAssembly/negative-base-reg.ll:20-22
+; CHECK:      local.get $push[[L1:[0-9]+]]=, 0
+; CHECK-NEXT: i32.const $push[[V:[0-9]+]]=, 1
+; CHECK-NEXT: i32.store args+128($pop[[L1]]), $pop[[V]]
----------------
luke wrote:
> luke wrote:
> > luke wrote:
> > > dschuff wrote:
> > > > luke wrote:
> > > > > @dschuff This got shuffled about, and now the global address gets folded into the store. But is the offset here still unfolded? Or by "offset" here is it also referring to the global address too
> > > > OK, so looking back at https://github.com/llvm/llvm-project/issues/29497 and D24053 this actually does look like the problem that we fixed back then.
> > > > IIRC the problem is that sometimes the address operand of the store (i.e. local 0, or L1 here) can have a negative value (here it's -128), but the store's effective address calculation (i.e. the operand plus the constant offset, L1 + args + 128 with this CL applied) is unsigned, so it will overflow. 
> > > > So we have to ensure that the calculation that recombines the native local value with the compile-time constant (here 128) happens with an add instruction rather than getting folded. Does that make sense?
> > > Thanks, that makes sense. 
> > > And from what I understand, that would then mean we can't optimise for the case in [[ https://github.com/llvm/llvm-project/issues/57771 | this GitHub issue ]], at least not in the case where the argument being passed is signed. 
> > > 
> > > At the risk of going down the rabbit hole here, do you think there's a way to generate a gep that results in an `ISD::ADD` with `nuw`? I.e. is there a way to say that `%n` is unsigned here so that we are still allowed to fold `@global_i32` in?
> > > 
> > > ```
> > > define i32 @load_i32_global_address_with_folded_offset(i32 %n) {
> > >   %s = getelementptr inbounds i32, i32* @global_i32, i32 %n
> > >   %t = load i32, i32* %s
> > >   ret i32 %t
> > > }
> > > ```
> > > 
> > > https://reviews.llvm.org/D15544 suggests that it used to be enough to just specify `inbounds`:
> > > 
> > > ```
> > > ; CHECK-LABEL: load_i32_with_folded_gep_offset:
> > > ; CHECK: i32.load  $push0=, 24($0){{$}}
> > > define i32 @load_i32_with_folded_gep_offset(i32* %p) {
> > >   %s = getelementptr inbounds i32, i32* %p, i32 6
> > >   %t = load i32, i32* %s
> > >   ret i32 %t
> > > }
> > > ```
> > > 
> > > 
> > > But now it looks like the pointer needs to be manually calculated:
> > > ```
> > > ; CHECK-LABEL: load_i32_with_folded_offset:
> > > ; CHECK: i32.load  $push0=, 24($0){{$}}
> > > define i32 @load_i32_with_folded_offset(ptr %p) {
> > >   %q = ptrtoint ptr %p to i32
> > >   %r = add nuw i32 %q, 24
> > >   %s = inttoptr i32 %r to ptr
> > >   %t = load i32, ptr %s
> > >   ret i32 %t
> > > }
> > > ```
> > So I guess my question is given the following C:
> > 
> > ```
> > extern int *data;
> > int f(unsigned idx) {
> >   return data[idx];
> > }
> > ```
> > 
> > Should it not be semantically possible to somehow generate `i32.load data($0)` from it, given that:
> > 
> > a) we know the argument is unsigned
> > b) WebAssembly performs unsigned addition of the offset and address
> > 
> Just also writing this down here so I remember later: Because the spec defines the memarg offset as an unsigned integer, the addition of the offset to the base address won't wrap. 
> 
> And it looks like it may be impossible to do this optimisation because the signed-ness of an integer is lost in LLVM IR.
To illustrate my point: Given the following C:

```
extern char data[1024];
char h(unsigned idx) {
  return data[idx];
}

char i(signed idx) {
  return data[idx];
}
```

The emitted IR (`clang -emit-llvm -S --target=wasm32 foo.c -O3`) is exactly the same for the two functions:

```
; Function Attrs: mustprogress nofree norecurse nosync nounwind readonly willreturn
define hidden signext i8 @h(i32 noundef %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds [1024 x i8], ptr @data, i32 0, i32 %0
  %3 = load i8, ptr %2, align 1, !tbaa !2
  ret i8 %3
}

; Function Attrs: mustprogress nofree norecurse nosync nounwind readonly willreturn
define hidden signext i8 @i(i32 noundef %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds [1024 x i8], ptr @data, i32 0, i32 %0
  %3 = load i8, ptr %2, align 1, !tbaa !2
  ret i8 %3
}
```

If we really wanted to cater for this, could we maybe get clang to generate the `ptrtoint -> add nuw -> inttoptr` sequence instead of a gep when it knows that the type is unsigned?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139645/new/

https://reviews.llvm.org/D139645