[cfe-dev] march=native on Broadwell yields non-Broadwell ISA?

Steven Noonan steven at uplinklabs.net
Thu Mar 19 07:53:10 PDT 2015


Seems that -march=native doesn't want to detect I'm on a Broadwell host.

$ clang -v
clang version 3.6.0 (tags/RELEASE_360/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix

$ cat rsqrt.c
#include <math.h>

float rsqrtf(float f)
{
       return 1.0f / sqrtf(f);
}


march=native, note non-VEX encoded SSE instructions:

$ clang -O3 -march=native -S -o - rsqrt.c | showasm
.LCPI0_0:
rsqrtf:                                 # @rsqrtf
       pushq   %rax
.Ltmp0:
       movaps  %xmm0, %xmm1
       xorps   %xmm0, %xmm0
       sqrtss  %xmm1, %xmm0
       ucomiss %xmm0, %xmm0
       jnp     .LBB0_2
       movaps  %xmm1, %xmm0
       callq   sqrtf
.LBB0_2:                                # %.split
       movss   .LCPI0_0(%rip), %xmm1
       divss   %xmm0, %xmm1
       movaps  %xmm1, %xmm0
       popq    %rax
       retq
.Ltmp1:


march=broadwell, note use of VEX encoded instructions:

$ clang -O3 -march=broadwell -S -o - rsqrt.c | showasm
.LCPI0_0:
rsqrtf:                                 # @rsqrtf
       pushq   %rax
.Ltmp0:
       vmovaps %xmm0, %xmm1
       vxorps  %xmm0, %xmm0, %xmm0
       vsqrtss %xmm1, %xmm0, %xmm0
       vucomiss        %xmm0, %xmm0
       jnp     .LBB0_2
       vmovaps %xmm1, %xmm0
       callq   sqrtf
.LBB0_2:                                # %.split
       vmovss  .LCPI0_0(%rip), %xmm1
       vdivss  %xmm0, %xmm1, %xmm0
       popq    %rax
       retq
.Ltmp1:


Host CPU details:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
stepping        : 4
microcode       : 0x1a
cpu MHz         : 2600.000
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 20
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm c
onstant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 fma cx16 xtpr pdcm p
cid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm
tpr_shadow vnmi flexpriority ep
t vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
rdseed adx smap xsaveopt
bugs            :
bogomips        : 5187.73
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
[...]

$ ./cpuid -c 0 -d
CPU 0:
CPUID 00000000:00 = 00000014 756e6547 6c65746e 49656e69 | ....GenuntelineI
CPUID 00000001:00 = 000306d4 00100800 7ffafbff bfebfbff | ................
CPUID 00000002:00 = 76036301 00f0b5ff 00000000 00c30000 | .c.v............
CPUID 00000003:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000004:00 = 1c004121 01c0003f 0000003f 00000000 | !A..?...?.......
CPUID 00000004:01 = 1c004122 01c0003f 0000003f 00000000 | "A..?...?.......
CPUID 00000004:02 = 1c004143 01c0003f 000001ff 00000000 | CA..?...........
CPUID 00000004:03 = 1c03c163 03c0003f 00000fff 00000006 | c...?...........
CPUID 00000004:04 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000005:00 = 00000040 00000040 00000003 11142120 | @... at ....... !..
CPUID 00000006:00 = 00000077 00000002 00000009 00000000 | w...............
CPUID 00000007:00 = 00000000 021c2fbb 00000000 00000000 | ...../..........
CPUID 00000008:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000009:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 0000000a:00 = 07300403 00000000 00000000 00000603 | ..0.............
CPUID 0000000b:00 = 00000001 00000002 00000100 00000000 | ................
CPUID 0000000b:01 = 00000004 00000004 00000201 00000000 | ................
CPUID 0000000b:02 = 00000000 00000000 00000002 00000000 | ................
CPUID 0000000c:00 = 00000000 00000001 00000001 00000000 | ................
CPUID 0000000d:00 = 00000007 00000340 00000340 00000000 | .... at ...@.......
CPUID 0000000d:01 = 00000001 00000000 00000000 00000000 | ................
CPUID 0000000d:02 = 00000100 00000240 00000000 00000000 | .... at ...........
CPUID 0000000e:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 0000000f:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000010:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000011:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000012:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000013:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 00000014:00 = 00000000 00000001 00000001 00000000 | ................
CPUID 80000000:00 = 80000008 00000000 00000000 00000000 | ................
CPUID 80000001:00 = 00000000 00000000 00000121 2c100800 | ........!......,
CPUID 80000002:00 = 65746e49 2952286c 726f4320 4d542865 | Intel(R) Core(TM
CPUID 80000003:00 = 37692029 3036352d 43205530 40205550 | ) i7-5600U CPU @
CPUID 80000004:00 = 362e3220 7a484730 00000000 00000000 |  2.60GHz........
CPUID 80000005:00 = 00000000 00000000 00000000 00000000 | ................
CPUID 80000006:00 = 00000000 00000000 01006040 00000000 | ........@`......
CPUID 80000007:00 = 00000000 00000000 00000000 00000100 | ................
CPUID 80000008:00 = 00003027 00000000 00000000 00000000 | '0..............

Ideas?

- Steven



More information about the cfe-dev mailing list