[cfe-commits] [PATCH] Clean up and fix X86 features

Jung-uk Kim jkim at FreeBSD.org
Thu Nov 15 16:28:30 PST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-11-15 18:47:04 -0500, Jung-uk Kim wrote:
> On 2012-11-15 16:52:49 -0500, Eli Friedman wrote:
>> On Thu, Nov 15, 2012 at 10:41 AM, Jung-uk Kim <jkim at freebsd.org> 
>> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>> 
>>> [This is actually PR14344 but I was told to submit the patch 
>>> here. Please see the PR14344 for the PR history.]
>>> 
>>> I have an AMD Family 10h processor and I realized that LZCNT
>>> and POPCNT are not enabled by default.  Then, I looked at 
>>> lib/Basic/Targets.cpp and found it needs some love (e.g.,
>>> sync. with LLVM's X86.td). :-)
>>> 
>>> Please see the attached patch.
>>> 
>>> - - AMD Barcelona("amdfam10") and later processors have LZCNT
>>> and POPCNT instructions. - - AMD Piledriver("bdver2") and
>>> later processors have BMI, FMA, and F16C instructions. - -
>>> Intel Ivy Bridge("core-avx-i") and later processors have F16C 
>>> instructions. - - Do not set SIMD sets (i.e., MMX, SSE3, and
>>> AVX) when they are implicitly set via higher instruction sets.
>>> - - Do not enable POPCNT instruction with SSE4* instruction
>>> sets as they are not part of the specifications nor uses SIMD
>>> registers. [1]
>>> 
>>> Jung-uk Kim
>>> 
>>> [1] It has little bit of history behind it.  AMD called them
>>> ABM and they added "-mabm" option to GCC, which enabled both
>>> LZCNT and POPCNT instructions.  OTOH, Intel implemented POPCNT
>>> first (Arrandale), then LZCNT later (upcoming Haswell).
>>> Therefore, GCC had to separate the flag and added "-mpopcnt"
>>> and "-mlzcnt" (but kept "-mabm" for backward compatibility).
> 
>> Tests are important here; please include the relevant changes for
>>  clang/test/Preprocessor/predefined-arch-macros.c .
> 
> Attached (not tried yet).  Please note I had to add missing tests
> for AMD processors.

The attached patch restores a bug-for-bug compatibility with GCC,
i.e., SSE4/SSE4.2/AVX/FMA/FMA4/XOP enables POPCNT.  Although I am not
really in favor of this, it may be important for some people.

Jung-uk Kim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCliK0ACgkQmlay1b9qnVNBHgCgwhA7Lv7AyY2YjStwC/9qjOh7
mxAAnjjaWyQFCwLncqDF0fu3r4POQA2M
=z7ng
-----END PGP SIGNATURE-----
-------------- next part --------------
--- lib/Basic/Targets.cpp
+++ lib/Basic/Targets.cpp
@@ -1854,58 +1854,50 @@
     break;
   case CK_Pentium3:
   case CK_Pentium3M:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "sse", true);
     break;
   case CK_PentiumM:
   case CK_Pentium4:
   case CK_Pentium4M:
   case CK_x86_64:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "sse2", true);
     break;
   case CK_Yonah:
   case CK_Prescott:
   case CK_Nocona:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "sse3", true);
     break;
   case CK_Core2:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "ssse3", true);
     break;
   case CK_Penryn:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "sse4.1", true);
     break;
   case CK_Atom:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "ssse3", true);
     break;
   case CK_Corei7:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "sse4", true);
     break;
   case CK_Corei7AVX:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "avx", true);
     setFeatureEnabled(Features, "aes", true);
     setFeatureEnabled(Features, "pclmul", true);
     break;
   case CK_CoreAVXi:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "avx", true);
     setFeatureEnabled(Features, "aes", true);
     setFeatureEnabled(Features, "pclmul", true);
     setFeatureEnabled(Features, "rdrnd", true);
+    setFeatureEnabled(Features, "f16c", true);
     break;
   case CK_CoreAVX2:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "avx2", true);
     setFeatureEnabled(Features, "aes", true);
     setFeatureEnabled(Features, "pclmul", true);
     setFeatureEnabled(Features, "lzcnt", true);
     setFeatureEnabled(Features, "rdrnd", true);
+    setFeatureEnabled(Features, "f16c", true);
     setFeatureEnabled(Features, "bmi", true);
     setFeatureEnabled(Features, "bmi2", true);
     setFeatureEnabled(Features, "rtm", true);
@@ -1946,23 +1938,33 @@
     setFeatureEnabled(Features, "3dnowa", true);
     break;
   case CK_AMDFAM10:
-    setFeatureEnabled(Features, "sse3", true);
     setFeatureEnabled(Features, "sse4a", true);
     setFeatureEnabled(Features, "3dnowa", true);
+    setFeatureEnabled(Features, "lzcnt", true);
+    setFeatureEnabled(Features, "popcnt", true);
     break;
   case CK_BTVER1:
     setFeatureEnabled(Features, "ssse3", true);
     setFeatureEnabled(Features, "sse4a", true);
+    setFeatureEnabled(Features, "lzcnt", true);
+    setFeatureEnabled(Features, "popcnt", true);
     break;
   case CK_BDVER1:
+    setFeatureEnabled(Features, "xop", true);
+    setFeatureEnabled(Features, "lzcnt", true);
+    setFeatureEnabled(Features, "aes", true);
+    setFeatureEnabled(Features, "pclmul", true);
+    break;
   case CK_BDVER2:
-    setFeatureEnabled(Features, "avx", true);
     setFeatureEnabled(Features, "xop", true);
+    setFeatureEnabled(Features, "lzcnt", true);
     setFeatureEnabled(Features, "aes", true);
     setFeatureEnabled(Features, "pclmul", true);
+    setFeatureEnabled(Features, "bmi", true);
+    setFeatureEnabled(Features, "fma", true);
+    setFeatureEnabled(Features, "f16c", true);
     break;
   case CK_C3_2:
-    setFeatureEnabled(Features, "mmx", true);
     setFeatureEnabled(Features, "sse", true);
     break;
   }
@@ -2021,12 +2023,12 @@
         Features["ssse3"] = Features["sse41"] = Features["sse42"] =
         Features["popcnt"] = Features["avx"] = Features["fma"] = true;
     else if (Name == "fma4")
-        Features["mmx"] = Features["sse"] = Features["sse2"] = Features["sse3"] =
+      Features["mmx"] = Features["sse"] = Features["sse2"] = Features["sse3"] =
         Features["ssse3"] = Features["sse41"] = Features["sse42"] =
         Features["popcnt"] = Features["avx"] = Features["sse4a"] =
         Features["fma4"] = true;
     else if (Name == "xop")
-        Features["mmx"] = Features["sse"] = Features["sse2"] = Features["sse3"] =
+      Features["mmx"] = Features["sse"] = Features["sse2"] = Features["sse3"] =
         Features["ssse3"] = Features["sse41"] = Features["sse42"] =
         Features["popcnt"] = Features["avx"] = Features["sse4a"] =
         Features["fma4"] = Features["xop"] = true;
--- test/Preprocessor/predefined-arch-macros.c
+++ test/Preprocessor/predefined-arch-macros.c
@@ -467,6 +467,7 @@
 // CHECK_CORE_AVX_I_M32: #define __MMX__ 1
 // CHECK_CORE_AVX_I_M32: #define __PCLMUL__ 1
 // CHECK_CORE_AVX_I_M32: #define __RDRND__ 1
+// CHECK_CORE_AVX_I_M32: #define __F16C__ 1
 // CHECK_CORE_AVX_I_M32: #define __SSE2__ 1
 // CHECK_CORE_AVX_I_M32: #define __SSE3__ 1
 // CHECK_CORE_AVX_I_M32: #define __SSE4_1__ 1
@@ -487,6 +488,7 @@
 // CHECK_CORE_AVX_I_M64: #define __MMX__ 1
 // CHECK_CORE_AVX_I_M64: #define __PCLMUL__ 1
 // CHECK_CORE_AVX_I_M64: #define __RDRND__ 1
+// CHECK_CORE_AVX_I_M64: #define __F16C__ 1
 // CHECK_CORE_AVX_I_M64: #define __SSE2_MATH__ 1
 // CHECK_CORE_AVX_I_M64: #define __SSE2__ 1
 // CHECK_CORE_AVX_I_M64: #define __SSE3__ 1
@@ -516,6 +518,7 @@
 // CHECK_CORE_AVX2_M32: #define __PCLMUL__ 1
 // CHECK_CORE_AVX2_M32: #define __POPCNT__ 1
 // CHECK_CORE_AVX2_M32: #define __RDRND__ 1
+// CHECK_CORE_AVX2_M32: #define __F16C__ 1
 // CHECK_CORE_AVX2_M32: #define __RTM__ 1
 // CHECK_CORE_AVX2_M32: #define __SSE2__ 1
 // CHECK_CORE_AVX2_M32: #define __SSE3__ 1
@@ -542,6 +545,7 @@
 // CHECK_CORE_AVX2_M64: #define __PCLMUL__ 1
 // CHECK_CORE_AVX2_M64: #define __POPCNT__ 1
 // CHECK_CORE_AVX2_M64: #define __RDRND__ 1
+// CHECK_CORE_AVX2_M64: #define __F16C__ 1
 // CHECK_CORE_AVX2_M64: #define __RTM__ 1
 // CHECK_CORE_AVX2_M64: #define __SSE2_MATH__ 1
 // CHECK_CORE_AVX2_M64: #define __SSE2__ 1
@@ -1008,12 +1012,33 @@
 // CHECK_ATHLON_FX_M64: #define __tune_k8__ 1
 // CHECK_ATHLON_FX_M64: #define __x86_64 1
 // CHECK_ATHLON_FX_M64: #define __x86_64__ 1
+// RUN: %clang -march=amdfam10 -m32 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck %s -check-prefix=CHECK_AMDFAM10_M32
+// CHECK_AMDFAM10_M32: #define __3dNOW_A__ 1
+// CHECK_AMDFAM10_M32: #define __3dNOW__ 1
+// CHECK_AMDFAM10_M32: #define __MMX__ 1
+// CHECK_AMDFAM10_M32: #define __LZCNT__ 1
+// CHECK_AMDFAM10_M32: #define __POPCNT__ 1
+// CHECK_AMDFAM10_M32: #define __SSE2_MATH__ 1
+// CHECK_AMDFAM10_M32: #define __SSE2__ 1
+// CHECK_AMDFAM10_M32: #define __SSE3__ 1
+// CHECK_AMDFAM10_M32: #define __SSE4A__ 1
+// CHECK_AMDFAM10_M32: #define __SSE_MATH__ 1
+// CHECK_AMDFAM10_M32: #define __SSE__ 1
+// CHECK_AMDFAM10_M32: #define __i386 1
+// CHECK_AMDFAM10_M32: #define __i386__ 1
+// CHECK_AMDFAM10_M32: #define __amdfam10 1
+// CHECK_AMDFAM10_M32: #define __amdfam10__ 1
+// CHECK_AMDFAM10_M32: #define __tune_amdfam10__ 1
 // RUN: %clang -march=amdfam10 -m64 -E -dM %s -o - 2>&1 \
 // RUN:     -target i386-unknown-linux \
 // RUN:   | FileCheck %s -check-prefix=CHECK_AMDFAM10_M64
 // CHECK_AMDFAM10_M64: #define __3dNOW_A__ 1
 // CHECK_AMDFAM10_M64: #define __3dNOW__ 1
 // CHECK_AMDFAM10_M64: #define __MMX__ 1
+// CHECK_AMDFAM10_M64: #define __LZCNT__ 1
+// CHECK_AMDFAM10_M64: #define __POPCNT__ 1
 // CHECK_AMDFAM10_M64: #define __SSE2_MATH__ 1
 // CHECK_AMDFAM10_M64: #define __SSE2__ 1
 // CHECK_AMDFAM10_M64: #define __SSE3__ 1
@@ -1027,14 +1052,86 @@
 // CHECK_AMDFAM10_M64: #define __tune_amdfam10__ 1
 // CHECK_AMDFAM10_M64: #define __x86_64 1
 // CHECK_AMDFAM10_M64: #define __x86_64__ 1
+// RUN: %clang -march=btver1 -m32 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck %s -check-prefix=CHECK_BTVER1_M32
+// CHECK_BTVER1_M32-NOT: #define __3dNOW_A__ 1
+// CHECK_BTVER1_M32-NOT: #define __3dNOW__ 1
+// CHECK_BTVER1_M32: #define __MMX__ 1
+// CHECK_BTVER1_M32: #define __LZCNT__ 1
+// CHECK_BTVER1_M32: #define __POPCNT__ 1
+// CHECK_BTVER1_M32: #define __SSE2_MATH__ 1
+// CHECK_BTVER1_M32: #define __SSE2__ 1
+// CHECK_BTVER1_M32: #define __SSE3__ 1
+// CHECK_BTVER1_M32: #define __SSE4A__ 1
+// CHECK_BTVER1_M32: #define __SSE_MATH__ 1
+// CHECK_BTVER1_M32: #define __SSE__ 1
+// CHECK_BTVER1_M32: #define __SSSE3__ 1
+// CHECK_BTVER1_M32: #define __i386 1
+// CHECK_BTVER1_M32: #define __i386__ 1
+// CHECK_BTVER1_M32: #define __btver1 1
+// CHECK_BTVER1_M32: #define __btver1__ 1
+// CHECK_BTVER1_M32: #define __tune_btver1__ 1
+// RUN: %clang -march=btver1 -m64 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck %s -check-prefix=CHECK_BTVER1_M64
+// CHECK_BTVER1_M64-NOT: #define __3dNOW_A__ 1
+// CHECK_BTVER1_M64-NOT: #define __3dNOW__ 1
+// CHECK_BTVER1_M64: #define __MMX__ 1
+// CHECK_BTVER1_M64: #define __LZCNT__ 1
+// CHECK_BTVER1_M64: #define __POPCNT__ 1
+// CHECK_BTVER1_M64: #define __SSE2_MATH__ 1
+// CHECK_BTVER1_M64: #define __SSE2__ 1
+// CHECK_BTVER1_M64: #define __SSE3__ 1
+// CHECK_BTVER1_M64: #define __SSE4A__ 1
+// CHECK_BTVER1_M64: #define __SSE_MATH__ 1
+// CHECK_BTVER1_M64: #define __SSE__ 1
+// CHECK_BTVER1_M64: #define __SSSE3__ 1
+// CHECK_BTVER1_M64: #define __amd64 1
+// CHECK_BTVER1_M64: #define __amd64__ 1
+// CHECK_BTVER1_M64: #define __btver1 1
+// CHECK_BTVER1_M64: #define __btver1__ 1
+// CHECK_BTVER1_M64: #define __tune_btver1__ 1
+// CHECK_BTVER1_M64: #define __x86_64 1
+// CHECK_BTVER1_M64: #define __x86_64__ 1
+// RUN: %clang -march=bdver1 -m32 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck %s -check-prefix=CHECK_BDVER1_M32
+// CHECK_BDVER1_M32: #define __AES__ 1
+// CHECK_BDVER1_M32: #define __AVX__ 1
+// CHECK_BDVER1_M32-NOT: #define __3dNOW_A__ 1
+// CHECK_BDVER1_M32-NOT: #define __3dNOW__ 1
+// CHECK_BDVER1_M32: #define __FMA4__ 1
+// CHECK_BDVER1_M32: #define __MMX__ 1
+// CHECK_BDVER1_M32: #define __LZCNT__ 1
+// CHECK_BDVER1_M32: #define __POPCNT__ 1
+// CHECK_BDVER1_M32: #define __PCLMUL__ 1
+// CHECK_BDVER1_M32: #define __SSE2_MATH__ 1
+// CHECK_BDVER1_M32: #define __SSE2__ 1
+// CHECK_BDVER1_M32: #define __SSE3__ 1
+// CHECK_BDVER1_M32: #define __SSE4A__ 1
+// CHECK_BDVER1_M32: #define __SSE4_1__ 1
+// CHECK_BDVER1_M32: #define __SSE4_2__ 1
+// CHECK_BDVER1_M32: #define __SSE_MATH__ 1
+// CHECK_BDVER1_M32: #define __SSE__ 1
+// CHECK_BDVER1_M32: #define __SSSE3__ 1
+// CHECK_BDVER1_M32: #define __XOP__ 1
+// CHECK_BDVER1_M32: #define __i386 1
+// CHECK_BDVER1_M32: #define __i386__ 1
+// CHECK_BDVER1_M32: #define __bdver1 1
+// CHECK_BDVER1_M32: #define __bdver1__ 1
+// CHECK_BDVER1_M32: #define __tune_bdver1__ 1
 // RUN: %clang -march=bdver1 -m64 -E -dM %s -o - 2>&1 \
 // RUN:     -target i386-unknown-linux \
 // RUN:   | FileCheck %s -check-prefix=CHECK_BDVER1_M64
+// CHECK_BDVER1_M64: #define __AES__ 1
 // CHECK_BDVER1_M64: #define __AVX__ 1
 // CHECK_BDVER1_M64-NOT: #define __3dNOW_A__ 1
 // CHECK_BDVER1_M64-NOT: #define __3dNOW__ 1
 // CHECK_BDVER1_M64: #define __FMA4__ 1
 // CHECK_BDVER1_M64: #define __MMX__ 1
+// CHECK_BDVER1_M64: #define __LZCNT__ 1
+// CHECK_BDVER1_M64: #define __POPCNT__ 1
 // CHECK_BDVER1_M64: #define __PCLMUL__ 1
 // CHECK_BDVER1_M64: #define __SSE2_MATH__ 1
 // CHECK_BDVER1_M64: #define __SSE2__ 1
@@ -1053,5 +1150,67 @@
 // CHECK_BDVER1_M64: #define __tune_bdver1__ 1
 // CHECK_BDVER1_M64: #define __x86_64 1
 // CHECK_BDVER1_M64: #define __x86_64__ 1
+// RUN: %clang -march=bdver2 -m32 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck %s -check-prefix=CHECK_BDVER2_M32
+// CHECK_BDVER2_M32: #define __AES__ 1
+// CHECK_BDVER2_M32: #define __AVX__ 1
+// CHECK_BDVER2_M32-NOT: #define __3dNOW_A__ 1
+// CHECK_BDVER2_M32-NOT: #define __3dNOW__ 1
+// CHECK_BDVER2_M32: #define __F16C__ 1
+// CHECK_BDVER2_M32: #define __BMI__ 1
+// CHECK_BDVER2_M32: #define __FMA__ 1
+// CHECK_BDVER2_M32: #define __FMA4__ 1
+// CHECK_BDVER2_M32: #define __MMX__ 1
+// CHECK_BDVER2_M32: #define __LZCNT__ 1
+// CHECK_BDVER2_M32: #define __POPCNT__ 1
+// CHECK_BDVER2_M32: #define __PCLMUL__ 1
+// CHECK_BDVER2_M32: #define __SSE2_MATH__ 1
+// CHECK_BDVER2_M32: #define __SSE2__ 1
+// CHECK_BDVER2_M32: #define __SSE3__ 1
+// CHECK_BDVER2_M32: #define __SSE4A__ 1
+// CHECK_BDVER2_M32: #define __SSE4_1__ 1
+// CHECK_BDVER2_M32: #define __SSE4_2__ 1
+// CHECK_BDVER2_M32: #define __SSE_MATH__ 1
+// CHECK_BDVER2_M32: #define __SSE__ 1
+// CHECK_BDVER2_M32: #define __SSSE3__ 1
+// CHECK_BDVER2_M32: #define __XOP__ 1
+// CHECK_BDVER2_M32: #define __i386 1
+// CHECK_BDVER2_M32: #define __i386__ 1
+// CHECK_BDVER2_M32: #define __bdver2 1
+// CHECK_BDVER2_M32: #define __bdver2__ 1
+// CHECK_BDVER2_M32: #define __tune_bdver2__ 1
+// RUN: %clang -march=bdver2 -m64 -E -dM %s -o - 2>&1 \
+// RUN:     -target i386-unknown-linux \
+// RUN:   | FileCheck %s -check-prefix=CHECK_BDVER2_M64
+// CHECK_BDVER2_M64: #define __AES__ 1
+// CHECK_BDVER2_M64: #define __AVX__ 1
+// CHECK_BDVER2_M64-NOT: #define __3dNOW_A__ 1
+// CHECK_BDVER2_M64-NOT: #define __3dNOW__ 1
+// CHECK_BDVER2_M64: #define __F16C__ 1
+// CHECK_BDVER2_M64: #define __BMI__ 1
+// CHECK_BDVER2_M64: #define __FMA__ 1
+// CHECK_BDVER2_M64: #define __FMA4__ 1
+// CHECK_BDVER2_M64: #define __MMX__ 1
+// CHECK_BDVER2_M64: #define __LZCNT__ 1
+// CHECK_BDVER2_M64: #define __POPCNT__ 1
+// CHECK_BDVER2_M64: #define __PCLMUL__ 1
+// CHECK_BDVER2_M64: #define __SSE2_MATH__ 1
+// CHECK_BDVER2_M64: #define __SSE2__ 1
+// CHECK_BDVER2_M64: #define __SSE3__ 1
+// CHECK_BDVER2_M64: #define __SSE4A__ 1
+// CHECK_BDVER2_M64: #define __SSE4_1__ 1
+// CHECK_BDVER2_M64: #define __SSE4_2__ 1
+// CHECK_BDVER2_M64: #define __SSE_MATH__ 1
+// CHECK_BDVER2_M64: #define __SSE__ 1
+// CHECK_BDVER2_M64: #define __SSSE3__ 1
+// CHECK_BDVER2_M64: #define __XOP__ 1
+// CHECK_BDVER2_M64: #define __amd64 1
+// CHECK_BDVER2_M64: #define __amd64__ 1
+// CHECK_BDVER2_M64: #define __bdver2 1
+// CHECK_BDVER2_M64: #define __bdver2__ 1
+// CHECK_BDVER2_M64: #define __tune_bdver2__ 1
+// CHECK_BDVER2_M64: #define __x86_64 1
+// CHECK_BDVER2_M64: #define __x86_64__ 1
 //
 // End X86/GCC/Linux tests ------------------


More information about the cfe-commits mailing list