[PATCH] D119338: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 11 01:41:41 PST 2022
david-arm added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:317
+ %alloca1 = alloca <vscale x 2 x double>, i32 13
+ %alloca2 = alloca <vscale x 4 x i32>, i32 2
+ %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1, i64 9, i64 0
----------------
sdesmalen wrote:
> david-arm wrote:
> > sdesmalen wrote:
> > > alloca2 is unused? (also true for other cases)
> > This is something I asked @kmclaughlin to do because it's the only way to expose some of the code changes in this patch. All the tests ending _valid_imm do this for that reason. If you look at `isAArch64FrameOffsetLegal` we return a StackOffset, which is always zero for all tests in this file except ones like this. Having a non-zero StackOffset helped to ensure we were calculating the remainder/offset correctly using the Scale property set in `getMemOpInfo`. We can remove the test, but I'm worried we're not fully testing the changes that's all.
> >
> > For example, in `ld3b_f32_valid_imm` you'll notice the addvl just before the ld3b, which happens precisely because StackOffset is non-zero.
> I assumed that was what the `gep` was for. Maybe it's because of how this is written. If you write:
>
> %alloca1 = alloca <vscale x 64 x double>, align 4
> %alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*
> %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 28, i64 0
> %ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)
>
> Then that results in:
>
> ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [sp, #28, mul vl]
>
> Whereas
>
> %alloca1 = alloca <vscale x 64 x double>, align 4
> %alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*
> %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 32, i64 0
> %ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)
>
> Results in:
>
> <x8 = calculations for sp + 28 * sizeof(VL)>
> ld4d { z0.d, z1.d, z2.d, z3.d }, p0/z, [x8]
Sure, I'd be happy with that if it works and @kmclaughlin can see it leads to the non-zero StackOffset - if we can avoid the second alloca then all the better!
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D119338/new/
https://reviews.llvm.org/D119338
More information about the llvm-commits
mailing list