[all-commits] [llvm/llvm-project] 8f0ba6: [X86] Add X64 test coverage to smul-with-overflow.ll

Fri Jul 22 09:36:14 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 8f0ba6c40527f705adebf43d1e1eb14713b912dd
      https://github.com/llvm/llvm-project/commit/8f0ba6c40527f705adebf43d1e1eb14713b912dd
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2022-07-22 (Fri, 22 Jul 2022)

  Changed paths:
    M llvm/test/CodeGen/X86/smul-with-overflow.ll

  Log Message:
  -----------
  [X86] Add X64 test coverage to smul-with-overflow.ll

  Commit: 939cf9b1bea4b5daee1d1b63860b0e958703656f
      https://github.com/llvm/llvm-project/commit/939cf9b1bea4b5daee1d1b63860b0e958703656f
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2022-07-22 (Fri, 22 Jul 2022)

  Changed paths:
    M llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
    M llvm/lib/Target/AArch64/AArch64ISelLowering.h
    M llvm/test/CodeGen/AArch64/parity.ll

  Log Message:
  -----------
  [AArch64] Use neon instructions for i64/i128 ISD::PARITY calculation

As noticed on D129765 and reported on Issue #56531 - aarch64 targets can use the neon ctpop + add-reduce instructions to speed up scalar ctpop instructions, but we fail to do this for parity calculations.

I'm not sure where the cutoff should be for specific CPUs, but i64 (+ i128 special case) shows a definite reduction in instruction count. i32 is about the same (but scalar <-> neon transfers are probably more costly?), and sub-i32 promotion looks to be a definite regression compared to parity expansion optimized for those widths.

Differential Revision: https://reviews.llvm.org/D130246

Compare: https://github.com/llvm/llvm-project/compare/7b81a81d5f9c...939cf9b1bea4