[PATCH] D89697: * [x86] Implement smarter instruction lowering for FP_TO_UINT from vXf32 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 19 07:13:02 PDT 2020


RKSimon added a reviewer: andrew.w.kaylor.
RKSimon added a comment.

For the costs - ideally you need to run the code snippet through llvm-mca (you can do this in godbolt) for various cpus of that level (e.g. avx1 -> sandybridge/btver2/bdver2, avx2 -> znver2/haswell etc.) and use the worst case throughput cost of those runs. For older targets (sse2...) we're more limited on testable cpu targets, I tend to just use slm's costs as they tend to match weak pre-avx cpus).



================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:21176
+    if (VT == MVT::v4i32 && SrcVT == MVT::v4f32 || 
+        VT == MVT::v8i32 && SrcVT == MVT::v8f32)
+    {
----------------
Avoid operator order warnings:

```
if ((VT == MVT::v4i32 && SrcVT == MVT::v4f32) || 
    (VT == MVT::v8i32 && SrcVT == MVT::v8f32))
```


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:21201
+          DAG.getNode(ISD::AND, dl, VT, Big, IsOverflown));
+    }
+
----------------
clang format this block


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89697/new/

https://reviews.llvm.org/D89697



More information about the llvm-commits mailing list