[all-commits] [llvm/llvm-project] 338314: [ARM] Lower v16i8 -> i64 VMLA reductions.

Wed Jul 14 10:11:53 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 338314f9c26d4594d49fdd3a7656d71c77255c54
      https://github.com/llvm/llvm-project/commit/338314f9c26d4594d49fdd3a7656d71c77255c54
  Author: David Green <david.green at arm.com>
  Date:   2021-07-14 (Wed, 14 Jul 2021)

  Changed paths:
    M llvm/lib/Target/ARM/ARMISelLowering.cpp
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-mla.ll

  Log Message:
  -----------
  [ARM] Lower v16i8 -> i64 VMLA reductions.

MVE does not have a VMLALV instruction that can perform v16i8 -> i64
reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That
means that the pattern to create them will be spilt up by type
legalization, creating a lot of instructions.

This extends the patterns for matching i64 reductions a little to handle
the v16i8->i64 case. We need to turn them into a pair of v8i16->i64
VMLALVs that each perform half of the reduction and are summed together
(so the later is a VMLALVA). The order of the lanes does not matter for
the reduction so we generate a MVEEXT for the extension, that will
either be folded into a extending load or can be optimized to a
VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be
improved in a later patch.

Differential Revision: https://reviews.llvm.org/D105680