[PATCH] D122594: [ArgPromotion] Update min-legal-vector-width when do promotion

Mon Apr 4 10:14:50 PDT 2022

efriedma added a comment.

In D122594#3424141 <https://reviews.llvm.org/D122594#3424141>, @pengfei wrote:

> In D122594#3424091 <https://reviews.llvm.org/D122594#3424091>, @efriedma wrote:
>
>>> So my point is there are only ABI and ABI undefined behaviors, i.e, non ABI implementations. "unstable ABI" is not ABI.
>>
>> I'm translating this to "The formal ABI has a maximum vector width, and vectors in arguments are restricted to that width. The compiler should report a fatal error if it sees an vector wider than the maximum width."  Am I understanding correctly?  (I can't come up with any other interpretation of your use of "undefined" here.)
>
> No. My understanding on "undefined" is it is allowed (for the sake of optimization) but not guaranteed (sacrificing ABI compatibility).
> In this example https://godbolt.org/z/P59Gjjohv, I take the optimization as ABI undefined behavior, because it creates 512bit vector under AVX2 target. But it mostly doesn't matter because both caller and callee are in the same module, so it is allowed.
> How about they are in different modules, or one of them is built into library? I believe we don't have real cases for it, because it's mostly error out by front end and middle end won't do such optimizations. But this is the reason why I don't consider them as ABI.
>
> Anyway, I think our controversy on "unstable ABI" and "ABI undefined" won't affect what's the patch trying to fix, if we can make agreement on `abi512_minlegal256` is a bug where arguments should be passed by ZMM register.

If a vector wider than the ABI width is not rejected, the behavior needs to be defined.  Whatever the stability story, at least for any given caller and callee, they have to agree.  (Undefined behavior doesn't make sense here; it should be either well-defined, or an error.)

Consider the following:

  target triple = "x86_64"

  define i32 @caller(<32 x i16>* %a) #0 {
    call void @callee1(<32 x i16>* %a)
    call void @callee2(<32 x i16> zeroinitializer)
    ret i32 0
  }

  define internal void @callee1(<32 x i16>*) #0 {
    %2 = load <32 x i16>, <32 x i16>* %0, align 4
    ret void
  }

  define void @callee2(<32 x i16>) #0 {
    ret void
  }

  attributes #0 = { noinline "target-cpu"="skylake-avx512" "min-legal-vector-width"="256" }

Without this patch, we run -argpromotion, everything is fine.  With this patch, we mark up caller and callee1, but not callee2, so we've broken the call to callee2.

Given that, I think there are only two possibilities here:

1. This patch is wrong: we can't increase min-vector-width like this patch is doing.
2. The IR is ill-formed in some way, and the verifier should emit an error.

Note that this is complicated by the fact that we support IR where different functions have different target attributes, so a function that doesn't support avx-512 can call one that does, or vice versa.  (This is usually used with runtime CPU detection.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122594/new/

https://reviews.llvm.org/D122594