[PATCH] D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions

Andrew V. Tischenko via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 8 04:41:40 PDT 2017


avt77 added inline comments.


================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:450
+                  "VBROADCASTF128", "VBROADCASTSSrr", "VINSERTF128rr",
+                  "VMOVAP(D|S)rm", "VMOVDDUPYrr", "VMOVS(H|L)DUPYrr", "VMOVUP(D|S)Yrm",
+                  "VORP(S|D)Yrr", "VPERMILP(D|S)Yri", "VSHUFP(D|S)Yrri", "VUNPCK(H|L)P(D|S)rr",
----------------
RKSimon wrote:
> "VMOVAP(D|S)rm" etc. are memory loads - they should be in the Ld version
>From my point of view rm-version store some register value into memory while mr-version loads the value from memory into the register. Am I right?


================
Comment at: test/CodeGen/X86/recip-fastmath.ll:344
 ; BTVER2-NEXT:    vrcpps %xmm0, %xmm1 # sched: [2:1.00]
+; BTVER2-NEXT:    vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [1:1.00]
 ; BTVER2-NEXT:    vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
----------------
RKSimon wrote:
> Latency should be 5cy
Why? In fact we should have tp 0.5 for XMM (see below). I'll fix it.

VMOVAPD	xmm1	xmm2			AVX	1	FPA|FPM	1	0,5
VMOVAPD	ymm1	ymm2			AVX	2	FPA|FPM	1	1
VMOVAPS	xmm1	xmm2			AVX	1	FPA|FPM	1	0,5
VMOVAPS	ymm1	ymm2			AVX	2	FPA|FPM	1	1


https://reviews.llvm.org/D33099





More information about the llvm-commits mailing list