[PATCH] D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 18 12:45:53 PDT 2018


RKSimon added a comment.

Is there any llvm side fast isel tests for these?



================
Comment at: cfe/trunk/lib/Headers/avx512fintrin.h:9855
+  __v8di __t6 = (__v8di)_mm512_##op(__t4, __t5); \
+  return __t6[0];
 
----------------
craig.topper wrote:
> RKSimon wrote:
> > Would it be dumb to allow VLX capable CPUs to use 128/256 variants of the VPMAXUQ etc ? Or is it better to focus on improving SimplifyDemandedElts to handle this (and many other reduction cases that all tend to keep to the original vector width)?
> I'm not sure how to do that from clang. Should we be using a reduction intrinsic and do custom lowering in the backend?
I believe we're still missing the clang reduction intrinsics.

We also have the problem with the 'fast math only' accumulator argument on the 'experimental' reductions that we can't actually change because arm64 already relies on them in production code.......

But we could maybe propose clang integer reduction intrinsics?

TBH I don't think any of that needs to be done for this patch, its just an indication that x86 reductions aren't great yet.


Repository:
  rL LLVM

https://reviews.llvm.org/D47401





More information about the llvm-commits mailing list