Folding nodes in instruction selection - please review
Jakob Stoklund Olesen
stoklund at 2pi.dk
Wed Jul 10 11:51:06 PDT 2013
On Jul 8, 2013, at 4:47 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> Hi,
>
> I analyzed the folding DAG nodes algorithm in LLVM, compared to code generated by Intel compiler and got into conclusion that the LLVM code is not always optimal.
> This is an example:
>
> %b1 = fadd <8 x float> %a1, <float AAA, float AAA, float AAA, float AAA, float AAA, float AAA, float AAA, float AAA >
> %b2 = fadd <8 x float> %a2, <float AAA, float AAA, float AAA, float AAA, float AAA, float AAA, float AAA, float AAA >
> %c = fmul <8 x float> %b1, %b2
>
> The result (1) bellow is not better than (2), because loading constant is not a problem, but spilling %ymm2 that may be required in (1) is not cheap.
>
> (1)
> vmovaps .LCPI1_0(%rip), %ymm2
> vaddps %ymm2, %ymm1, %ymm1
> vaddps %ymm2, %ymm0, %ymm0
> vmulps %ymm1, %ymm0, %ymm0
>
> (2)
> vaddps .LCPI1_0(%rip), %ymm1, %ymm1
> vaddps .LCPI1_0(%rip), %ymm0, %ymm0
> vmulps %ymm1, %ymm0, %ymm0
Hi Elena,
(2) has more micro-ops in the load/store unit, so I don’t believe that it is always better. The instruction selector can’t make this decision without knowing the register pressure.
The register allocator should turn (1) into (2) when it runs out of registers, it shouldn’t spill %ymm2. Make sure that the MI::canFoldAsLoad() property is working correctly. See InlineSpiller.cpp:
// Before rematerializing into a register for a single instruction, try to
// fold a load into the instruction. That avoids allocating a new register.
if (RM.OrigMI->canFoldAsLoad() &&
foldMemoryOperand(Ops, RM.OrigMI)) {
Edit->markRematerialized(RM.ParentVNI);
++NumFoldedLoads;
return true;
}
Thanks,
/jakob
More information about the llvm-commits
mailing list