[PATCH] D42724: [X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required

Tue Aug 22 12:15:13 PDT 2023

craig.topper added inline comments.

================
Comment at: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp:275
+  // Extract required-vector-width attribute.
+  unsigned RequiredVectorWidth = UINT32_MAX;
+  if (F.hasFnAttribute("required-vector-width")) {
----------------
danilaml wrote:
> craig.topper wrote:
> > danilaml wrote:
> > > craig.topper wrote:
> > > > danilaml wrote:
> > > > > @craig.topper Sorry for commenting on such an old review, but I was investigating some codegen differences for very similar IR and come across this code (the attribute later changed to the `min-legal-vector-width` but otherwise it's the same on main). Is RequiredVectorWidth intended to be initialized to `UINT32_MAX`? What is the rationale? It forces maximum vector width if the function is missing the attribute for some reason, ignoring the `prefer-*` attributes. To me it seems that the conservative approach would be to set it to `0` and increase according to the attribute, since zero length vectors are always legal/"required".
> > > > It is intentionally set to UINT32_MAX if the attribute is missing. If the IR contains any 512 bit inline assembly, function arguments, returns, or X86 specific vector, the backend will crash or violate the ABI. The presence of the attribute indicates that those cases have been checked and nothing requires 512 bit vectors.
> > > Not sure I understand. Without the attribute the backend will ignore prefer-vector-width and generate avx512 asm regardless. How is setting default to 0 worse?
> > If there are 512-bit x86 intrinsics in the IR it will crash the compiler. I assume compiler crashes are worse than suboptimal code.
> > 
> > Prefer vector width is still checked in many other places to prevent aggressive use of 512 bit vectors. For example, `X86TTIImpl::getRegisterBitWidth` will still tell the vectorizer that the register width is 256 bits.
> > 
> > Are you finding the attribute missing in code compiled with clang or another frontend?
> Doesn't seem to crash (although I haven't tried inline assembly): https://llvm.godbolt.org/z/h6jEYrnaa
> 
> In my case it's another frontend (JIT compiler). I've noticed that compiler would generate suboptimal code using avx512, even though the target cpu has prefer256 tuning and found that the issue is missing attribute (I also noticed that target knows that expanding a certain intrinsic using av512 is more costly than using avx2, but still uses the highest ISA available, but that's another issue entirely). Now I'm wondering what to set it too.
> Also, stuff like SLPVectorizer doesn't really care about `getRegisterBitWidth` since it usually just checks wether some operation/type is legal or not and about the cost returned by the target hooks.
It crashes if prefer-vector-width<=256 is also specified or a CPU that implies prefer vector width <=256 is used https://llvm.godbolt.org/z/G5458499K

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D42724/new/

https://reviews.llvm.org/D42724