[llvm] [NVPTX] Add NVPTXIncreaseAligmentPass to improve vectorization (PR #144958)

Tue Jun 24 15:51:16 PDT 2025

================
@@ -0,0 +1,131 @@
+//===-- NVPTXIncreaseAlignment.cpp - Increase alignment for local arrays --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// A simple pass that looks at local memory arrays that are statically
+// sized and sets an appropriate alignment for them. This enables vectorization
+// of loads/stores to these arrays if not explicitly specified by the client.
+//
+// TODO: Ideally we should do a bin-packing of local arrays to maximize
+// alignments while minimizing holes.
+//
+//===----------------------------------------------------------------------===//
+
+#include "NVPTX.h"
+#include "llvm/IR/DataLayout.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/MathExtras.h"
+
+using namespace llvm;
+
+static cl::opt<bool>
+    MaxLocalArrayAlignment("nvptx-use-max-local-array-alignment",
+                           cl::init(false), cl::Hidden,
+                           cl::desc("Use maximum alignment for local memory"));
+
+static constexpr Align MaxPTXArrayAlignment = Align::Constant<16>();
+
+/// Get the maximum useful alignment for an array. This is more likely to
+/// produce holes in the local memory.
+///
+/// Choose an alignment large enough that the entire array could be loaded with
+/// a single vector load (if possible). Cap the alignment at
+/// MaxPTXArrayAlignment.
+static Align getAggressiveArrayAlignment(const unsigned ArraySize) {
+  return std::min(MaxPTXArrayAlignment, Align(PowerOf2Ceil(ArraySize)));
+}
+
+/// Get the alignment of arrays that reduces the chances of leaving holes when
+/// arrays are allocated within a contiguous memory buffer (like shared memory
+/// and stack). Holes are still possible before and after the array allocation.
+///
+/// Choose the largest alignment such that the array size is a multiple of the
+/// alignment. If all elements of the buffer are allocated in order of
+/// alignment (higher to lower) no holes will be left.
+static Align getConservativeArrayAlignment(const unsigned ArraySize) {
+  return commonAlignment(MaxPTXArrayAlignment, ArraySize);
+}
+
+/// Find a better alignment for local arrays
+static bool updateAllocaAlignment(const DataLayout &DL, AllocaInst *Alloca) {
+  // Looking for statically sized local arrays
+  if (!Alloca->isStaticAlloca())
+    return false;
+
+  // For now, we only support array allocas
----------------
Artem-B wrote:

Aggregates can contain arrays, so your reasoning about dynamic indexing applies there, too.

Also, dynamic indexing (and SROA failures) indirectly applies to aggregates as well. E.g `x = condition ? f3.x : f3.y`.
Another source of problems is when code takes an address of the local variable and passes it around. Being able to access such a pointer with high enough alignment is useful, IMO. Think of capturing lambdas. They end up in all sorts of weird use cases and often capture things without the user being aware of the specific details.

Would applying the alignment to aggregates add a lot more complexity? Even if you are correct that most of the cases where it may end up being useful would be for arrays, I think there's benefit applying it uniformly to all aggregates. If anything, it avoids having to explain why we've singled out only arrays for the benefit.

https://github.com/llvm/llvm-project/pull/144958