[PATCH] D58282: [x86] scalarize extract element 0 of FP math

Fri Feb 15 06:45:47 PST 2019

spatel created this revision.
spatel added reviewers: RKSimon, craig.topper, andreadb.
Herald added subscribers: jdoerfert, hiraditya, mcrosier.
Herald added a project: LLVM.

This is another step towards ensuring that we produce the optimal code for reductions, but there are other potential benefits as seen in the tests diffs:

1. Memory loads may get scalarized resulting in more efficient code.
2. Memory stores may get scalarized resulting in more efficient code.
3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch.
4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch.

The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions.
The tests for "minimum" and "maximum" IR in extractelement-fp.ll are commented out because those currently crash independently of this patch. I'm not sure what that problem is yet.

https://reviews.llvm.org/D58282

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/avx1-logical-load-folding.ll
  llvm/test/CodeGen/X86/avx512-hadd-hsub.ll
  llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
  llvm/test/CodeGen/X86/exedeps-movq.ll
  llvm/test/CodeGen/X86/extractelement-fp.ll
  llvm/test/CodeGen/X86/ftrunc.ll
  llvm/test/CodeGen/X86/haddsub.ll
  llvm/test/CodeGen/X86/scalar-int-to-fp.ll
  llvm/test/CodeGen/X86/vec_extract.ll
  llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll
  llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D58282.187012.patch
Type: text/x-patch
Size: 80919 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190215/e386430a/attachment.bin>