[llvm] [X86][SSE] Don't emit SSE2 load instructions in SSE1-only mode (PR #134547)

Sun Apr 6 09:48:12 PDT 2025

llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: Stefan Schmidt (thrimbor)

<details>
<summary>Changes</summary>

This fixes a regression I traced back to https://github.com/llvm/llvm-project/commit/8b43c1be23119c1024bed0a8ce392bc73727e2e2 / https://github.com/llvm/llvm-project/pull/79000

The regression caused an SSE2 instruction, `movsd`, to be emitted as a replacement for an SSE instruction, `movaps`  despite the target potentially not supporting this instruction, such as when building with clang using `-march=pentium3`.

The test was produced by reducing down an actual occurrence of the issue in production code. I'm not super familiar with tests for optimization passes, so it may be possible to improve this further and I'll happily do so if advised.

The problematic optimization is part of the LLVM 19 and 20 releases, is it possible to have this fix backported and if yes, what's the process for that?

---
Full diff: https://github.com/llvm/llvm-project/pull/134547.diff


2 Files Affected:

- (modified) llvm/lib/Target/X86/X86FixupVectorConstants.cpp (+5-3) 
- (modified) llvm/test/CodeGen/X86/sse1.ll (+13) 


``````````diff

diff --git a/llvm/lib/Target/X86/X86FixupVectorConstants.cpp b/llvm/lib/Target/X86/X86FixupVectorConstants.cpp
index 40024baf93fdb..324167b53f5b6 100644
--- a/llvm/lib/Target/X86/X86FixupVectorConstants.cpp
+++ b/llvm/lib/Target/X86/X86FixupVectorConstants.cpp
@@ -333,6 +333,7 @@ bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
                                                      MachineInstr &MI) {
   unsigned Opc = MI.getOpcode();
   MachineConstantPool *CP = MI.getParent()->getParent()->getConstantPool();
+  bool HasSSE2 = ST->hasSSE2();
   bool HasSSE41 = ST->hasSSE41();
   bool HasAVX2 = ST->hasAVX2();
   bool HasDQI = ST->hasDQI();
@@ -396,9 +397,10 @@ bool X86FixupVectorConstantsPass::processInstruction(MachineFunction &MF,
   case X86::MOVUPDrm:
   case X86::MOVUPSrm:
     // TODO: SSE3 MOVDDUP Handling
-    return FixupConstant({{X86::MOVSSrm, 1, 32, rebuildZeroUpperCst},
-                          {X86::MOVSDrm, 1, 64, rebuildZeroUpperCst}},
-                         128, 1);
+    return FixupConstant(
+        {{X86::MOVSSrm, 1, 32, rebuildZeroUpperCst},
+         {HasSSE2 ? X86::MOVSDrm : 0, 1, 64, rebuildZeroUpperCst}},
+        128, 1);
   case X86::VMOVAPDrm:
   case X86::VMOVAPSrm:
   case X86::VMOVUPDrm:
diff --git a/llvm/test/CodeGen/X86/sse1.ll b/llvm/test/CodeGen/X86/sse1.ll
index 8ac86d11d89e6..b5758c3356c82 100644
--- a/llvm/test/CodeGen/X86/sse1.ll
+++ b/llvm/test/CodeGen/X86/sse1.ll
@@ -251,5 +251,18 @@ define <2 x float> @PR31672() #0 {
 
 declare <2 x float> @llvm.sqrt.v2f32(<2 x float>) #1
 
+define void @movaps_test(ptr nocapture noundef writeonly %v) {
+; X86-LABEL: movaps_test:
+; X86:       # %bb.0:
+; X86-NEXT:    movl 4(%esp), %eax
+; X86-NEXT:    movaps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
+
+; X64-LABEL: movaps_test:
+; X64:       # %bb.0:
+; X64-NEXT:    movaps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+  store <2 x float> <float 2.560000e+02, float 5.120000e+02>, ptr %v, align 4
+  ret void
+}
+
 attributes #0 = { nounwind "unsafe-fp-math"="true" }
 

``````````

</details>


https://github.com/llvm/llvm-project/pull/134547