[all-commits] [llvm/llvm-project] 21a4fa: [ARM] Move double vector insert patterns using vin...

Mon Feb 22 01:30:22 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 21a4faab60c34b8a8c4d09a5ffac50ded8163208
      https://github.com/llvm/llvm-project/commit/21a4faab60c34b8a8c4d09a5ffac50ded8163208
  Author: David Green <david.green at arm.com>
  Date:   2021-02-22 (Mon, 22 Feb 2021)

  Changed paths:
    M llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
    M llvm/lib/Target/ARM/ARMInstrMVE.td
    M llvm/lib/Target/ARM/ARMInstrVFP.td
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
    M llvm/test/CodeGen/Thumb2/mve-div-expand.ll
    M llvm/test/CodeGen/Thumb2/mve-float16regloops.ll
    M llvm/test/CodeGen/Thumb2/mve-gather-ind16-scaled.ll
    M llvm/test/CodeGen/Thumb2/mve-gather-ptrs.ll
    M llvm/test/CodeGen/Thumb2/mve-masked-ldst.ll
    M llvm/test/CodeGen/Thumb2/mve-minmax.ll
    M llvm/test/CodeGen/Thumb2/mve-shuffle.ll
    M llvm/test/CodeGen/Thumb2/mve-shufflemov.ll
    M llvm/test/CodeGen/Thumb2/mve-simple-arith.ll
    M llvm/test/CodeGen/Thumb2/mve-vcvt.ll
    M llvm/test/CodeGen/Thumb2/mve-vld2.ll
    M llvm/test/CodeGen/Thumb2/mve-vld3.ll
    M llvm/test/CodeGen/Thumb2/mve-vld4.ll
    M llvm/test/CodeGen/Thumb2/mve-vldst4.ll
    M llvm/test/CodeGen/Thumb2/mve-vst2.ll
    M llvm/test/CodeGen/Thumb2/mve-vst3.ll
    M llvm/test/CodeGen/Thumb2/mve-vst4.ll

  Log Message:
  -----------
  [ARM] Move double vector insert patterns using vins to DAG combine

This removes the existing patterns for inserting two lanes into an
f16/i16 vector register using VINS, instead using a DAG combine to
pattern match the same code sequences. The tablegen patterns were
already on the large side (foreach LANE = [0, 2, 4, 6]) and were not
handling all the cases they could. Moving that to a DAG combine, whilst
not less code, allows us to better control and expand the selection of
VINSs. Additionally this allows us to remove the AddedComplexity on
VCVTT.

The extra trick that this has learned in the process is to move two
adjacent lanes using a single f32 vmov, allowing some extra
inefficiencies to be removed.

Differenial Revision: https://reviews.llvm.org/D96876