[PATCH] D145163: Add support for vectorization of interleaved memory accesses for scalable VF

Fri Mar 10 07:35:37 PST 2023

reames added inline comments.

================
Comment at: llvm/include/llvm/IR/IRBuilder.h:771

+  /// Create a masked interleaved load using a masked load and deinterliving
+  /// intrinsics.
----------------
This is the wrong interface.  The IRBuilder interface should provide a way to create the interleave and deinterleave instrinsic calls.  That interface should generate shuffles for fixed vectors.  Then the calling logic in the vectorizer should worry about emitting the load/store.  (That's the existing structure in fact.)

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2987

 Value *InnerLoopVectorizer::createBitOrPointerCast(Value *V, VectorType *DstVTy,
                                                    const DataLayout &DL) {
----------------
The changes to this function are NFC for fixed length vectors, and a generally useful scalable cleanup.  Please separate and land this change without the need for further review.

This applies *only* to the changes in this function so as to shrink the diff for future review.

================
Comment at: llvm/test/Transforms/LoopVectorize/sve-interleaved-accesses.ll:1
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -mtriple=aarch64-none-linux-gnu -S -passes=loop-vectorize,instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -enable-sve-interleaved-mem-accesses=true -mattr=+sve -scalable-vectorization=on -runtime-memory-check-threshold=24 < %s | FileCheck %s
----------------
This should be in the AArch64 sub-tree, and probably precommited.  Depending on your confidence in the AArch64 code, you may want to separate that into it's own review.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145163/new/

https://reviews.llvm.org/D145163