[llvm] [X86] Add rewrite pattern for SSE41/AVX1 roundss/sd + blendps/pd (PR #172056)

Phoebe Wang via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 23 05:38:26 PST 2025


phoebewang wrote:

> The point of this pr should be to make sure that, when we expand something like `roundss`, into `extract -> round -> insert` as per #171227, we don't always get back the original asm instruction. For example:
> 
> ```
> define <4 x float> @floor_ss(<4 x float> %x, <4 x float> %y) nounwind {
> ; SSE41-LABEL: floor_ss:
> ; SSE41:       ## %bb.0:
> ; SSE41-NEXT:    roundss $9, %xmm0, %xmm1
> ; SSE41-NEXT:    movaps %xmm1, %xmm0
> ; SSE41-NEXT:    retq
> ;
> ; AVX-LABEL: floor_ss:
> ; AVX:       ## %bb.0:
> ; AVX-NEXT:    vroundss $9, %xmm0, %xmm1, %xmm0
> ; AVX-NEXT:    retq
> ```
> 
> so while in AVX, roundss is changed back into a single instruction, in SSE41 the transformation gives:
> 
> ```
> roundss $9, %xmm0, %xmm0
> blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
> ↓                                              -- via the implemented pattern
> roundss $9, %xmm0, %xmm1
> movaps %xmm1, %xmm0      ;  this is needed, as result is in the wrong register
> ```
> 
> so it actually didn't achieve anything, we still need two instructions. I think this defeats the point of the pr to some extent?

%xmm0 is required by ABI for return valure, and SSE instructions only have 2 operands. It's not a problem in reality, or you can switch order of %x, %y to eliminate the mov instruction.

https://github.com/llvm/llvm-project/pull/172056


More information about the llvm-commits mailing list