[llvm-commits] [RFC/PATCH] introduce 'UseSSEx' predicates

Sun Aug 26 03:21:28 PDT 2012

On Fri, 2012-08-24 at 19:23 -0700, Eli Friedman wrote:
> On Fri, Aug 24, 2012 at 6:55 PM, Michael Liao <michael.liao at intel.com> wrote:
> > On Fri, 2012-08-24 at 18:14 -0700, Eli Friedman wrote:
> >> On Wed, Aug 22, 2012 at 10:27 AM, Michael Liao <michael.liao at intel.com> wrote:
> >> > Hi
> >> >
> >> > The attached patch adds several new predicates, namely UseSSE1, UseSSE2,
> >> > UseSSE3, UseSSSE3, UseSSE41, and UseSSE42.
> >> >
> >> > As the penalty of inter-mixing SSE and AVX instructions, we need prevent
> >> > SSE legacy insn from being generated except explicitly specified through
> >> > some intrinsics. For patterns both supported by both SSE and AVX, so
> >> > far, we force AVX insn will be tried first relying on AddedComplexity or
> >> > td location. It's error-prone and introduces bugs accidentally.
> >> >
> >> > 'UseSSEx' is disabled when AVX is turned on. For insns both supported by
> >> > AVX and SSE insns, we need this predicate to force VEX encoding only.
> >> >
> >> > For insns not inherited by AVX, we still use the previous predicates,
> >> > i.e. 'HasSSEx'. So far, these insns fall into the following categories:
> >> >   * SSE insns with MMX operands
> >> >   * SSE insns with GPR/mem operands only (xFENCE, PREFETCH, CLFLUSH,
> >> >     CRC, and etc.)
> >> >   * SSE4A insns.
> >> >   * MMX insns.
> >> >   * x87 insns added by SSE.
> >> >
> >> > With this patch, several inter-mixing cases are found and fixed from
> >> > regression tests.
> >>
> >> --- a/test/ExecutionEngine/MCJIT/test-common-symbols.ll
> >> +++ b/test/ExecutionEngine/MCJIT/test-common-symbols.ll
> >> @@ -1,4 +1,4 @@
> >> -; RUN: %lli -use-mcjit -O0 -disable-lazy-compilation=false %s
> >> +; RUN: %lli -use-mcjit -disable-lazy-compilation=false %s
> >>
> >>  ; The intention of this test is to verify that symbols mapped to COMMON in ELF
> >>  ; work as expected.
> >>
> >> What is the point of this change?
> >
> > So far, MCJIT cannot support the creation of '.rodata' section. This's
> > work around. I will file a bug for MCJIT.
> 
> Oh, I see...
> 
> How hard would it be to fix fast-isel so it can select vcvtsi2sdq?

fast-isel only looks at patterns from Instructions but we have no way to
specify a source operand as implicit-defined. It seems to me it need
non-trivial effort to support it.

However, here is kind of hacky way to enable that by adding unary
version of vcvtsi2sdq, e.g.

let isCodeGenOnly = 1 in {
  def Unary_VCVTSI2SD64rr : SI<0x2A, MRMSrcMem, (outs FR64:$dst), (ins
GR64:$src), "vcvtsi2sd{q]\t{$src, $dst|$dst, $src}",
                               [(set FR64:$dst, (sint_to_fp GR64:
$src))]>;
}

It's OK to miss the middle operand as it's encodeded in VEX.VVVV which
is 1111 (e.g. 0) by default. So, the physical insn will always have %
xmm0 as the middle operand, i.e.

    vcvtsi2sdq %rN, %xmm0, %xmmN

Semantically, it doesn't nothing wrong (for scalar conversion.). But,
performance side, the dependency to %xmm0 may be an issue depending on
where the latest %xmm0 is updated.

A possible fix is to lower it into 3 operands post-RA and select the
farest physical register being updated.

Without such post-fix, this pattern should only be selected when
fast-isel is enabled and a new predicate for td to check TM options.

Yours
- Michael

> 
> -Eli