[clang] [X86] Replace BF16 to F32 conversions with generic conversions (PR #169781)

Thu Nov 27 04:39:00 PST 2025

================
@@ -156,46 +156,43 @@ __bf16 test_mm_cvtness_sbh(float A) {
 
 __m128 test_mm_cvtpbh_ps(__m128bh A) {
   // CHECK-LABEL: test_mm_cvtpbh_ps
-  // CHECK: sext <4 x i16> %{{.*}} to <4 x i32>
-  // CHECK: call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> %{{.*}}, i32 %{{.*}})
+  // CHECK: fpext <8 x bfloat> %{{.*}} to <8 x float>
----------------
phoebewang wrote:

Oh, we can optimize it back to <4 x bfloat>. The difference is still in O0, but I think we don't care the performance for it.

https://github.com/llvm/llvm-project/pull/169781