[PATCH] D103874: [IR] Rename the shufflevector's undef mask to poison

Juneyoung Lee via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 8 19:50:43 PDT 2021


aqjune added a comment.

In D103874#2806519 <https://reviews.llvm.org/D103874#2806519>, @efriedma wrote:

> I noted the cases where it looks like the undef->poison change might actually impact code using compiler intrinsic functions that have external specifications.  The relevant specifications say the elements in question are "undefined", without really specifying what that means.
>
> Currently, for the Intel intrinsics, we treat "undefined" as something more conservative than LLVM undef; see https://github.com/llvm/llvm-project/blob/d2012d965d60c3258b3a69d024491698f8aec386/clang/lib/CodeGen/CGBuiltin.cpp#L12483 .  Maybe we should make the cast intrinsics more conservative to match.  And maybe we should do the same for OpenCL.  Would need to do some backend work to make sure we don't regress the generated code, though.

Makes sense, the PR (https://llvm.org/PR32176) that is left at the comment says it should be something like `freeze poison` as well. (BTW, this means the current shufflevector lowering is already incorrect as well..)

Then, _mm256_castsi128_si256 should be lowered into something like this:

  %fr = freeze <2 x i64> poison
  shufflevector <2 x i64> %x, <2 x i64> %fr, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

BTW, the Intel intrinsic guide for _mm256_castsi128_si256 ( https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_castsi128_si256&expand=628 ) says:

  This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.

We should teach the backend to understand the shufflevector+freeze form and lower it into an efficient assembly.

But, in practice, the frozen element won't be used in most of the cases; the middle-end's demanded elements analysis will trigger instcombine to almost always remove the freeze.
What do you think? If people agree, I'll create a separate patch that lowers this to the new freeze+shufflevector format (since it is already incorrect).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103874/new/

https://reviews.llvm.org/D103874



More information about the llvm-commits mailing list