<div dir="ltr">Fusing of the fadd and fmul is not allowed by default.<br><a href="http://llvm.org/docs/LangRef.html#floating-point-environment">http://llvm.org/docs/LangRef.html#floating-point-environment</a><br><br>'contract' on the fadd (and an fma-capable target) are the minimum requirements; 'reassoc' will also work, but that may enable other (possibly unintended) transforms.<br><a href="https://godbolt.org/z/-k6G2h">https://godbolt.org/z/-k6G2h</a><br><br>define float @fma(float %x, float %y, float %z) {<br> %m = fmul float %x, %y<br> %a = fadd contract float %m, %z<br> ret float %a<br>}</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 2, 2019 at 7:29 AM Roman Lebedev via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It appears you need 'reassoc' on fmul/fadd:<br>
<a href="https://godbolt.org/z/nuTzx2" rel="noreferrer" target="_blank">https://godbolt.org/z/nuTzx2</a><br>
<br>
On Mon, Sep 2, 2019 at 2:20 PM Uday Kumar Reddy B via llvm-dev<br>
<<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>
><br>
> Hello,<br>
><br>
> On the appended reasonably simple test case that has an fmul/fadd<br>
> sequence on <8 x float> vector types, I don't see the x86-64 code<br>
> generator (with cpu set to haswell or later types) turning it into an<br>
> AVX2 FMA instructions. Here's the snippet in the output it generates:<br>
><br>
> $ llc -O3 -mcpu=skylake<br>
><br>
> ---------------------<br>
> .LBB0_2: # =>This Inner Loop Header: Depth=1<br>
> vbroadcastss (%rsi,%rdx,4), %ymm0<br>
> vmulps (%rdi,%rcx), %ymm0, %ymm0<br>
> vaddps (%rax,%rcx), %ymm0, %ymm0<br>
> vmovups %ymm0, (%rax,%rcx)<br>
> incq %rdx<br>
> addq $32, %rcx<br>
> cmpq $15, %rdx<br>
> jle .LBB0_2<br>
> -----------------------<br>
><br>
> $ llc --version<br>
> LLVM (<a href="http://llvm.org/" rel="noreferrer" target="_blank">http://llvm.org/</a>):<br>
> LLVM version 8.0.0<br>
> Optimized build.<br>
> Default target: x86_64-unknown-linux-gnu<br>
> Host CPU: skylake<br>
> (llvm commit 198009ae8db11d7c0b0517f17358870dc486fcfb from Aug 31)<br>
><br>
> Using opt -O3 followed by llc leads to the same vmulps / vaddps<br>
> sequence. (adding -mattr=fma doesn't help, although this I assume<br>
> isn't needed given the cpu type.) The result is the same even with<br>
> -mcpu=haswell.<br>
><br>
> This is a common pattern involved in a reduction with two things on<br>
> the RHS. The three things in play here are (%rax,%rcx), (%rdi,%rcx),<br>
> and %ymm0. If another register is used to hold a loaded value, the<br>
> vfmadd instruction could be used in multiple ways. I suspect I'm<br>
> missing something, which I why I'm not already posting this on<br>
> llvm-bugs. Is this expected behavior?<br>
><br>
> -------------------------------------------------------------------------------------------<br>
> ; ModuleID = 'LLVMDialectModule'<br>
> source_filename = "LLVMDialectModule"<br>
><br>
> declare i8* @malloc(i64)<br>
><br>
> declare void @free(i8*)<br>
><br>
> define <8 x float>* @fma(<8 x float>* %0, float* %1, <8 x float>* %2) {<br>
> br label %4<br>
><br>
> 4: ; preds = %7, %3<br>
> %5 = phi i64 [ %19, %7 ], [ 0, %3 ]<br>
> %6 = icmp slt i64 %5, 16<br>
> br i1 %6, label %7, label %20<br>
><br>
> 7: ; preds = %4<br>
> %8 = getelementptr <8 x float>, <8 x float>* %0, i64 %5<br>
> %9 = load <8 x float>, <8 x float>* %8, align 16<br>
> %10 = getelementptr float, float* %1, i64 %5<br>
> %11 = load float, float* %10, align 16<br>
> %12 = getelementptr <8 x float>, <8 x float>* %2, i64 %5<br>
> %13 = load <8 x float>, <8 x float>* %12, align 16<br>
> %14 = insertelement <8 x float> undef, float %11, i32 0<br>
> %15 = shufflevector <8 x float> %14, <8 x float> undef, <8 x i32><br>
> zeroinitializer<br>
> %16 = fmul <8 x float> %15, %9<br>
> %17 = fadd <8 x float> %16, %13<br>
> %18 = getelementptr <8 x float>, <8 x float>* %2, i64 %5<br>
> store <8 x float> %17, <8 x float>* %18, align 16<br>
> %19 = add i64 %5, 1<br>
> br label %4<br>
><br>
> 20: ; preds = %4<br>
> ret <8 x float>* %2<br>
> }<br>
<br>
Roman<br>
<br>
> -------------------------------------------------------------------------------------------------------<br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>