[clang] [llvm] Clang: convert `__m64` intrinsics to unconditionally use SSE2 instead of MMX. (PR #96540)
James Y Knight via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 25 15:33:53 PDT 2024
jyknight wrote:
> I guess the clang calling convention code never uses MMX types for passing/returning values?
Correct, Clang never uses MMX types in its calling convention. This is actually _wrong_ for the 32-bit x86 psABI. You're supposed to pass the first 3 MMX args in mm0-2, and return the first MMX value in mm0. Yet...conflicting with those statements, it also states that all functions MUST be entered in x87 mode, and that you must call emms before returning. _shrug_.
We did attempt to implement the arg/return-passing rules for MMX in llvm/lib/Target/X86/X86CallingConv.td, but it doesn't actually apply to the IR Clang emits, since Clang never uses the `x87mmx` type, except as needed around the MMX LLVM-builtins, and inline-asm.
Anyhow, I propose that we _do not_ attempt to fix Clang's ABI to conform with the 32-bit psABI. We've gotten it wrong for a decade, and at this point, "fixing" it to use MMX registers it would be worse than not doing so.
> Have you looked at the code quality? #41665 mentions potential issues with widening vectors.
I've glanced at it. In optimized code, the codegen looks pretty good. Unoptimized code looks pretty bad _before_ my changes, and looks about the same after. I have not attempted to measure performance of any MMX-intrinsics-using code.
> This doesn't touch inline asm or _mm_empty; I guess you're leaving that for a followup?
Correct. That needs additional work, which I have not done.
I do plan to add to this PR another commit that deletes all the `__builtin_*` MMX functions, which are no longer used, after the header changes here.
However, that will leave all those MMX intrinsics existing still on the LLVM side, and I'm not sure how to go about removing those. Should we just do it, and break bitcode backwards-compatibility for those files? Or, if we do need to preserve bitcode compat, how to best achieve it? Perhaps we convert them into inline-asm in the bitcode upgrader?
https://github.com/llvm/llvm-project/pull/96540
More information about the cfe-commits
mailing list