[PATCH] D54121: [FPEnv] Add constrained FCMP intrinsic

Thu Nov 8 09:08:27 PST 2018

craig.topper added a comment.

The X86 builtin story is weird. There should be 9 builtins. I'm not sure how you found 12. 8 representing the encodings used by the SSE1/SSE2 cmpps/pd/ss/sd listed below. And 9th intrinsic that takes a 5 bit immediate to cover the 32 values that the AVX vcmpps/pd/ss/sd.

  TARGET_BUILTIN(__builtin_ia32_cmpeqps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpltps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpleps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpunordps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpneqps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpnltps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpnleps, "V4fV4fV4f", "ncV:128:", "sse")
  TARGET_BUILTIN(__builtin_ia32_cmpordps, "V4fV4fV4f", "ncV:128:", "sse")

All 9 builtins map to the same IR intrinsic that takes a 5 bit immediate. Or at least they use to. I think some map directly to fcmp these days. We have 8 separate SSE builtins because that's what gcc did way back in the SSE1 days. When AVX expanded to 32 comparison predicates, gcc decided to use one builtin with an immediate instead of adding 24 more. Clang uses the 8 legacy builtins to match gcc and to prevent users from trying to use an AVX encoding when targetting an SSE only CPU. Within the avxintrin.h file I believe we should have wrappers around the builtin for all 32 possible encodiings that just pass the correct immediate to the builtin.

Repository:
  rL LLVM

https://reviews.llvm.org/D54121