[PATCH] D119338: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Fri Feb 11 01:41:41 PST 2022

david-arm added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:317
+  %alloca1 = alloca <vscale x 2 x double>, i32 13
+  %alloca2 = alloca <vscale x 4 x i32>, i32 2
+  %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1, i64 9, i64 0
----------------
sdesmalen wrote:
> david-arm wrote:
> > sdesmalen wrote:
> > > alloca2 is unused? (also true for other cases)
> > This is something I asked @kmclaughlin to do because it's the only way to expose some of the code changes in this patch. All the tests ending _valid_imm do this for that reason. If you look at `isAArch64FrameOffsetLegal` we return a StackOffset, which is always zero for all tests in this file except ones like this. Having a non-zero StackOffset helped to ensure we were calculating the remainder/offset correctly using the Scale property set in `getMemOpInfo`. We can remove the test, but I'm worried we're not fully testing the changes that's all.
> > 
> > For example, in `ld3b_f32_valid_imm` you'll notice the addvl just before the ld3b, which happens precisely because StackOffset is non-zero.
> I assumed that was what the `gep` was for. Maybe it's because of how this is written. If you write:
> 
>   %alloca1 = alloca <vscale x 64 x double>, align 4                                              
>   %alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*               
>   %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 28, i64 0 
>   %ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)
> 
> Then that results in:
> 
>   ld4d    { z0.d, z1.d, z2.d, z3.d }, p0/z, [sp, #28, mul vl]
> 
> Whereas
> 
>   %alloca1 = alloca <vscale x 64 x double>, align 4                                              
>   %alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*               
>   %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 32, i64 0 
>   %ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)
> 
> Results in:
> 
>   <x8 = calculations for sp + 28 * sizeof(VL)>
>   ld4d    { z0.d, z1.d, z2.d, z3.d }, p0/z, [x8]
Sure, I'd be happy with that if it works and @kmclaughlin can see it leads to the non-zero StackOffset - if we can avoid the second alloca then all the better!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119338/new/

https://reviews.llvm.org/D119338