[PATCH] D139645: [WebAssembly] Fold adds with global addresses into load offset

Thu Dec 22 10:44:11 PST 2022

dschuff accepted this revision.
dschuff added inline comments.

================
Comment at: llvm/test/CodeGen/WebAssembly/userstack.ll:336
+; global address, so we should fold the global address into the offset, but not
+; the frame offset.
 ; CHECK-LABEL: frame_offset_with_global_address
----------------
aheejin wrote:
> dschuff wrote:
> > dschuff wrote:
> > > luke wrote:
> > > > dschuff wrote:
> > > > > so the result of this is that the frame offset will end up in a const or add expression (consumed by the i32.load8_u), and the global address will be folded?
> > > > > Is this better than what we did before (i.e. the address in a const and the frame offset folded)?
> > > > In this case the frame offset wasn't being folded, although I'm not sure if that is intentional or not. This is the codegen before the patch:
> > > > 
> > > > ```
> > > > frame_offset_with_global_address:
> > > > 	.functype	frame_offset_with_global_address () -> (i32)
> > > > 	i32.const	$push0=, str
> > > > 	global.get	$push5=, __stack_pointer
> > > > 	i32.const	$push6=, 16
> > > > 	i32.sub 	$push9=, $pop5, $pop6
> > > > 	i32.const	$push7=, 12
> > > > 	i32.add 	$push8=, $pop9, $pop7
> > > > 	i32.add 	$push1=, $pop0, $pop8
> > > > 	i32.load8_u	$push2=, 0($pop1)
> > > > 	i32.const	$push3=, 67
> > > > 	i32.and 	$push4=, $pop2, $pop3
> > > > 	return	$pop4
> > > > ```
> > > > 
> > > > With the patch the global address gets folded in which saves a `const` and an `add` instruction:
> > > > 
> > > > ```
> > > > frame_offset_with_global_address:
> > > > 	.functype	frame_offset_with_global_address () -> (i32)
> > > > 	global.get	$push3=, __stack_pointer
> > > > 	i32.const	$push4=, 16
> > > > 	i32.sub 	$push7=, $pop3, $pop4
> > > > 	i32.const	$push5=, 12
> > > > 	i32.add 	$push6=, $pop7, $pop5
> > > > 	i32.load8_u	$push1=, str($pop6)
> > > > 	i32.const	$push0=, 67
> > > > 	i32.and 	$push2=, $pop1, $pop0
> > > > 	return	$pop2
> > > > ```
> > > > 
> > > > 
> > > Ah ok yeah this test was added in https://reviews.llvm.org/D90577 which fixed a bug in the case below the one you are modifying in this CL. So I guess what's happening now is that this IR is being ISel'ed differently and is ending up in the first case in eliminateFrameIndex instead of the second.
> > Maybe we could add to the comment something like "(this allows the global address to be relocated)"
> > 
> > Since this test doesn't cover that second case anymore, I wonder if we have any tests that do. I would assume that clause is still needed, maybe at least for when we aren't using DAG ISel...
> I commented the `if` added in D90577 out and ran this test, and this test still passes. When it was added that wouldn't have been the case, so I'm not sure what changed since then to make this pass. So what I'm saying is, this test is not covering the `if` in D90577 even now. 🤷🏻 Not sure what we should do. Delete the test and come up with a new test that covers the case? But that shouldn't be this CL's responsibility..
> 
> 
> @dschuff And I have another question. I am not very familiar with this prolog-epilog code..
> 1. What is the difference between what [[ https://github.com/llvm/llvm-project/blob/beb3fa2d2efbdc1fbedee2d5c587eae1364652d3/llvm/lib/Target/WebAssembly/WebAssemblyRegisterInfo.cpp#L71-L87 | this upper snippet ]] does and [[ https://github.com/llvm/llvm-project/blob/beb3fa2d2efbdc1fbedee2d5c587eae1364652d3/llvm/lib/Target/WebAssembly/WebAssemblyRegisterInfo.cpp#L89-L113 | the lower snippet ]] does? Both seem to be about folding offsets.
> 
> 2. Why do we need to do custom folding here, and why isn't this taken care of the normal isel patterns? Is this for the case of fast isel?
Oh, that's interesting.
It looks like this test now actually does cover the behavior in this CL, so maybe it's good to keep.

1. IIRC the main difference is that the first one is targeting a load that has the FI operand and the second is targeting an add with the FI operand (so different patterns that come out of ISel. My guess is what happened here is that at the time the test was written it covered this code but the patterns coming from ISel and the front half of the MI passes are different now, so it doesn't anymore. It would be interesting to see if there are any redundancies (or other optimization opportunities) for FI elimination but I agree that none of that needs to block this CL.

2. The reason there's custom logic here is that code comes out of ISel and goes through many of the MI passes (until after regalloc IIRC) with the FrameIndex operands as part of the IR, and this pass doesn't run until fairly late. So most of the passes that could optimize it have run already.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139645/new/

https://reviews.llvm.org/D139645