[PATCH] D155618: [RISCV] Reduce alignment of vector constant pool entries

Fri Jul 21 15:32:25 PDT 2023

reames abandoned this revision.
reames added a comment.

Abandoning as the benefit here appears to be marginal at best due to a misunderstanding pointed out by @efriedma's comment.  (My inline response expands a bit on his point.)

In D155618#4513630 <https://reviews.llvm.org/D155618#4513630>, @kito-cheng wrote:

> That remind me this patch: D112531 <https://reviews.llvm.org/D112531>, and back to psABI part, yes, we don't have too much word on fixed length vector, one point I am hesitate here is P-extension (but it never ratified and no clear timeline.) use GPR to hold fixed-length vector, then that will require align to XLEN.

P can be safely ignored at this point; we can worry about it when we add toolchain support.  If ever.

In D155618#4513655 <https://reviews.llvm.org/D155618#4513655>, @craig.topper wrote:

>> Note: I'd originally tried to do something here which was more target independent, but I found that a) reducing alignment caused massive test diffs, and b) exposed what appeared to be a number of missing folds on x86. Thus the target specific hook approach taken here.
>
> SSE arithmetic ops can only fold aligned loads. Is that what you were seeing?

Yep, that looks like what I was seeing.

================
Comment at: llvm/test/CodeGen/RISCV/rvv/fixed-vectors-constant-pool-alignment.ll:5
 ; CHECK: .section	.rodata.cst16,"aM", at progbits,16
-; CHECK: .p2align	4, 0x0
+; CHECK-NOT: .p2align
 ; CHECK: .byte
----------------
efriedma wrote:
> Messing with the alignment of constants in .rodata.cst16 is going to have nearly zero effect; the data is getting emitted in 16-byte chunks, so there's no padding anyway.  (Not sure if this is what you meant to test.)
I went and dug through this carefully.

As you point out, the data fragments do end up being concatenated together into the .rodata section, and the alignment of the entries within the section doesn't end up mattering much.  We group 16b chunks into .rodata.cst16, and 32b chunks into .rodata.cst16.  These are then combined to form the final .rodata section, and thus the internal alignment doesn't create any additional padding within that section.  This change does end up reducing the alignment requirement of the .rodata section as a whole (from 32 to 8 in this example), but that's a pretty marginal gain.  

When I wrote the patch, I was thinking the data fragments were in the .text section between the functions.  This is not true.  

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155618/new/

https://reviews.llvm.org/D155618