[PATCH] D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions
Andrew V. Tischenko via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 8 04:41:40 PDT 2017
avt77 added inline comments.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:450
+ "VBROADCASTF128", "VBROADCASTSSrr", "VINSERTF128rr",
+ "VMOVAP(D|S)rm", "VMOVDDUPYrr", "VMOVS(H|L)DUPYrr", "VMOVUP(D|S)Yrm",
+ "VORP(S|D)Yrr", "VPERMILP(D|S)Yri", "VSHUFP(D|S)Yrri", "VUNPCK(H|L)P(D|S)rr",
----------------
RKSimon wrote:
> "VMOVAP(D|S)rm" etc. are memory loads - they should be in the Ld version
>From my point of view rm-version store some register value into memory while mr-version loads the value from memory into the register. Am I right?
================
Comment at: test/CodeGen/X86/recip-fastmath.ll:344
; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
+; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [1:1.00]
; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
----------------
RKSimon wrote:
> Latency should be 5cy
Why? In fact we should have tp 0.5 for XMM (see below). I'll fix it.
VMOVAPD xmm1 xmm2 AVX 1 FPA|FPM 1 0,5
VMOVAPD ymm1 ymm2 AVX 2 FPA|FPM 1 1
VMOVAPS xmm1 xmm2 AVX 1 FPA|FPM 1 0,5
VMOVAPS ymm1 ymm2 AVX 2 FPA|FPM 1 1
https://reviews.llvm.org/D33099
More information about the llvm-commits
mailing list