[PATCH] D119121: [test-suite] Add unit tests for vectorizer memory runtime checks.

Wed Feb 16 15:07:28 PST 2022

Meinersbur added a comment.

[suggestion] I am still not convinced `__attribute__((noinline))` is useful and give the wrong impression that its semantics solves the problem. What should not be inlined is the `VectorFn` lambda. Is it possible to attach a noinline attribute to it? Or maybe only call it indirectly through a `__attribute__((optnone))` function. As a suggestion, I will leave it up to you whether you think it is worth it.

In D119121#3312322 <https://reviews.llvm.org/D119121#3312322>, @fhahn wrote:

>> [not a change request] If we vary interleaving as well, we would not be restricted to the target architecture's vector width.
>
> I was wondering whether we should rely on the vectorizer choosing vectorization factors and interleave counts automatically or if we should force them instead. My reasoning for letting the compiler chose is that we get different combinations for different targets, possibly increasing coverage overall. We could chose the VF automatically and cover a range of user-provided interleave counts ?

[not a change request] My reasoning was to force range of VFs (or interleave count since it does not require a specific instruction set), so the test covers all possible VFs without requiring access to platforms for each of them to test. The native choice by LoopVectorize can still be tested in addition to that, in case the overlap test makes a difference.

In D119121#3325503 <https://reviews.llvm.org/D119121#3325503>, @dmgreen wrote:

> I tried copying this into a godbolt link, and is suggest you may need to `#include <functional>` under some systems. There are also some warnings that might be worth cleaning up: https://godbolt.org/z/sMTjfKbGo

gcc indeed does not compile it: https://godbolt.org/z/3acnP796E It would be nice to keep the test-suite buildable with gcc as a comparison/reference.

================
Comment at: SingleSource/UnitTests/Vectorizer/runtime-checks.cpp:124
+
+  for (int i = -200; i <= 200; i++)
+    CheckWithOffsetSecond(i);
----------------

================
Comment at: SingleSource/UnitTests/Vectorizer/runtime-checks.cpp:60-62
+    std::unique_ptr<Ty[]> Input1(new Ty[NumArrayElements]);
+    std::unique_ptr<Ty[]> Reference(new Ty[NumArrayElements]);
+    std::unique_ptr<Ty[]> ToCheck(new Ty[NumArrayElements]);
----------------
fhahn wrote:
> Meinersbur wrote:
> > It might be more predictable if reusing the same allocation. Otherwise each allocation may have a different alignment and effectively some offsets (relative to virtual address space, eg. page boundaries) are skipped. In case the vectorizer adds a prologue to ensure vector memory accesses are aligned.
> > 
> Sounds good, I moved it out to `checkOverlappingMemoryOneRuntimeCheck`. Was that what you had in mind?
yes, thank you.

================
Comment at: SingleSource/UnitTests/Vectorizer/runtime-checks.cpp:71
+    // Run scalar function to generate reference output.
+    auto *ReferenceStart = &Reference[0] + NumArrayElements / 2;
+    ScalarFn(ReferenceStart + Offset, ReferenceStart, N);
----------------
fhahn wrote:
> Meinersbur wrote:
> > [style] No almost-always-auto
> Fixed, thanks!
Can make the `&Reference[NumArrayElements / 2]` change too? Its the exact same as `&Reference[0] + NumArrayElements / 2`. `&Reference[0]` is only a more convoluted way to say  `Reference.get()`.

Repository:
  rT test-suite

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119121/new/

https://reviews.llvm.org/D119121