[PATCH] D110933: [RISCV] Add a test showing incorrect RVV stack alignment

Mon Oct 4 10:04:45 PDT 2021

frasercrmck added a comment.

In D110933#3039424 <https://reviews.llvm.org/D110933#3039424>, @frasercrmck wrote:

> As @HsiangKai says, since VLEN is at least 64 we know any combination of vector objects is always aligned to at least 8 (I suspect this may add yet another issue when supporting VLEN=32 @craig.topper, @rogfer01). We could possibly track vector object alignment separately and realign the stack pointer if there's anything >= 8. We might need the base pointer in that scenario?

Ah yes I now see why this isn't good enough. We can calculate the base sp alignment easy enough. But if we needed to calculate offsets for multiple aligned objects, e.g.:

  | <vscale x 1 x i64> align 16 | <- SP + alignTo(VLENB, 32) + alignTo(VLENB, 16)
  |-----------------------------|
  | <vscale x 1 x i64> align 32 | <- SP + alignTo(VLENB, 32)
  |-----------------------------|
  | <vscale x 1 x i64>          | <- SP

If we only know that VLENB >= 8 then we need to dynamically "align" each offset by an unknown amount. This is what you meant by option 2, right @StephenFan?

In D110933#3038463 <https://reviews.llvm.org/D110933#3038463>, @StephenFan wrote:

> 2. Change the RVV frame layout to what D93750 <https://reviews.llvm.org/D93750> did. Instead of calculating the total RVVStackSize and allocate them all at once, D93750 <https://reviews.llvm.org/D93750> will allocate the RVV stack objects one by one. Thus we can realign for every RVV stack object separately when it was allocating.

Unless we do something more blunt like scale VLENB up by a known good multiplier. But that would quickly start to generate huge offsets.

  | <vscale x 1 x i64> align 16 | <- SP + VLENB*6
  |-----------------------------|
  | <vscale x 1 x i64> align 32 | <- SP + VLENB*4
  |-----------------------------|
  | <vscale x 1 x i64>          | <- SP

I suppose we could dynamically capture `alignTo(VLENB, MaxRVVAlign)` in a register and use that in offset computations? Since that's a static amount, if we know that the maximum alignment is at most 8 then nothing needs to change. It would still overalign but less so.

  | <vscale x 1 x i64> align 16 | <- SP + 2*alignTo(VLENB, 32)
  |-----------------------------|
  | <vscale x 1 x i64> align 32 | <- SP + alignTo(VLENB, 32)
  |-----------------------------|
  | <vscale x 1 x i64>          | <- SP

I think we'd have to materialize that amount each time, though. We can't keep it live through the whole function.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110933/new/

https://reviews.llvm.org/D110933