[llvm] [X86][BF16] Improve vectorization of BF16 (PR #88486)

Wed May 1 19:05:53 PDT 2024

================
@@ -56517,17 +56501,40 @@ static SDValue combineFP16_TO_FP(SDNode *N, SelectionDAG &DAG,
 
 static SDValue combineFP_EXTEND(SDNode *N, SelectionDAG &DAG,
                                 const X86Subtarget &Subtarget) {
+  EVT VT = N->getValueType(0);
+  bool IsStrict = N->isStrictFPOpcode();
+  SDValue Src = N->getOperand(IsStrict ? 1 : 0);
+  EVT SrcVT = Src.getValueType();
+
+  SDLoc dl(N);
+  if (SrcVT.getScalarType() == MVT::bf16) {
+    if (!IsStrict && Src.getOpcode() == ISD::FP_ROUND &&
+        Src.getOperand(0).getValueType() == VT)
----------------
phoebewang wrote:

There are some discussions in https://github.com/llvm/llvm-project/commit/3cf8535dbf0bf5fafa99ea1f300e2384a7254fba

`bfloat` is different from `half` in two ways:

- `bfloat` has fewer fraction bits, so precision should not be a concern like other types (even `half`) by design;
- `half` is an IEEE type, while `bfloat` is not. We don't necessarily follow it;

https://github.com/llvm/llvm-project/pull/88486