[PATCH] D149842: Scalarizer: limit scalarization for small element types

Thu Jun 1 03:04:32 PDT 2023

bjope added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/Scalarizer.cpp:66

+static cl::opt<unsigned> ClScalarizeMinBits(
+    "scalarize-min-bits", cl::init(0), cl::Hidden,
----------------
nhaehnle wrote:
> bjope wrote:
> > I think I misunderstood this at first.
> > 
> > My interpretation was that if setting this to 16 it would not scalarize vectors with element sizes up to 16 bits.
> > So it wouldn't scalarize `<16 x i8>` or `<4 x i16>` while it would scalarize `<2 x i24>` and `<2 x i32>`.
> > 
> > But this size is not mapping to the element sizes, right?
> > We could not get some kind of vector split/re-partition from `<16 x i8>` to `<2 x i8>`.  So it is not really scalarizing as the value still will be a vector.
> > 
> > Not sure exactly how to rephrase it to make that clearer (considering that I misunderstood this to be an element size).
> > 
> > Maybe I got fooled by the slogan for this patch "limit scalarization for small element types". I actually thought that I would see something that prevented scalarization from happening when the //element size// was smaller than a threshold. But what the patch actually seem to do is to prevent scalarization to happen //for large vector factors// (it just splits it up into using smaller vectors instead of scalarizing).
> > 
> > So everywhere in this pass when it says "scalarize" I guess one should read it as "split" (or "resize" or something similar). For example code comments saying "Perform actual scalarization" could be followed by code that emit vector operations.
> Yeah, this is a fair point and naming is difficult. This is related to the fact that this pass is really meant for GPUs, where we use vector types in a way that's a bit different from CPUs.
> 
> On CPUs, the intention of vector types is that they ultimately get mapped to dedicated vector registers.
> 
> On GPUs (at least all modern GPUs that I'm aware of), there are no CPU-style vector registers. Instead, the intention of vector types is that they get mapped to contiguous sequences of "scalar" registers (itself a somewhat problematic term because of SIMT, but let's go with that for now).
> 
> What this change aims to do with min-bits=32 is essentially scalarization in that sense: vector types are broken up until they are either scalar types or they are vectors that fit in a single "scalar" register.
> 
> Does that make sense?
Ok, I see. And replacing all "scalarize" by "reduce vector factor" would be a rather large change. Maybe not worth it as long as it is obvious that the pass is splitting vectors by reducing the vector factor. And sometimes it stops "before reaching vector factor 1" (which kind of would be the same as having fully scalarized the vector).

We use the Scalarizer downstream. And we run it in beginning of llc to scalarize most operations while for example leaving wide loads/stores. Otherwise we would need to for example deal with legalizing lots of vector operations at ISel instead (although I think it would also impact passes run before ISel in the backend so it's a bit). So our goal with the scalarizer is just to get rid of (most) vector operations in beginning of the backend.
We could perhaps make benefit from this new functionality in the future, for example leaving `<2 x iN>` around for certain operations when that would match with the target instruction set.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149842/new/

https://reviews.llvm.org/D149842