[PATCH] D149842: Scalarizer: limit scalarization for small element types

Thu Jun 1 02:41:07 PDT 2023

nhaehnle added a comment.

Thank you for taking a look. I'm going to address the low-level issues you pointed out immediately.

================
Comment at: llvm/lib/Transforms/Scalar/Scalarizer.cpp:66

+static cl::opt<unsigned> ClScalarizeMinBits(
+    "scalarize-min-bits", cl::init(0), cl::Hidden,
----------------
bjope wrote:
> I think I misunderstood this at first.
> 
> My interpretation was that if setting this to 16 it would not scalarize vectors with element sizes up to 16 bits.
> So it wouldn't scalarize `<16 x i8>` or `<4 x i16>` while it would scalarize `<2 x i24>` and `<2 x i32>`.
> 
> But this size is not mapping to the element sizes, right?
> We could not get some kind of vector split/re-partition from `<16 x i8>` to `<2 x i8>`.  So it is not really scalarizing as the value still will be a vector.
> 
> Not sure exactly how to rephrase it to make that clearer (considering that I misunderstood this to be an element size).
> 
> Maybe I got fooled by the slogan for this patch "limit scalarization for small element types". I actually thought that I would see something that prevented scalarization from happening when the //element size// was smaller than a threshold. But what the patch actually seem to do is to prevent scalarization to happen //for large vector factors// (it just splits it up into using smaller vectors instead of scalarizing).
> 
> So everywhere in this pass when it says "scalarize" I guess one should read it as "split" (or "resize" or something similar). For example code comments saying "Perform actual scalarization" could be followed by code that emit vector operations.
Yeah, this is a fair point and naming is difficult. This is related to the fact that this pass is really meant for GPUs, where we use vector types in a way that's a bit different from CPUs.

On CPUs, the intention of vector types is that they ultimately get mapped to dedicated vector registers.

On GPUs (at least all modern GPUs that I'm aware of), there are no CPU-style vector registers. Instead, the intention of vector types is that they get mapped to contiguous sequences of "scalar" registers (itself a somewhat problematic term because of SIMT, but let's go with that for now).

What this change aims to do with min-bits=32 is essentially scalarization in that sense: vector types are broken up until they are either scalar types or they are vectors that fit in a single "scalar" register.

Does that make sense?

================
Comment at: llvm/lib/Transforms/Scalar/Scalarizer.cpp:204
+  // Return the alignment of fragment I.
+  Align getFragmentAlign(unsigned I) {
+    return commonAlignment(VecAlign, I * SplitSize);
----------------
bjope wrote:
> nit: You've changed `I` to `Frag` in other places where we no longer refer to a vector index. I think it doesn't hurt to do it here as well for consistency.
Sure, makes sense.

================
Comment at: llvm/lib/Transforms/Scalar/Scalarizer.cpp:390

 // Return component I, creating a new Value for it if necessary.
+Value *Scatterer::operator[](unsigned Frag) {
----------------
bjope wrote:
> So this is now "component Frag"?
> 
> Although generally it's a little bit messy with terminology. Is a "component" also a "fragment"?
Yeah, I'm changing this comment.

================
Comment at: llvm/lib/Transforms/Scalar/Scalarizer.cpp:424
   } else {
     // Search through a chain of InsertElementInsts looking for element I.
     // Record other elements in the cache.  The new V is still suitable
----------------
bjope wrote:
> I think this should say "looking for element Frag".
> 
> 
Yes.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149842/new/

https://reviews.llvm.org/D149842