[all-commits] [llvm/llvm-project] 7457f5: [X86] Fold VPERMV3(X, M, Y) -> VPERMV(CONCAT(X, Y), WI...

Mon Jan 13 06:14:18 PST 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7457f51f6cf61b960e3e6e45e63378debd5c1d5c
      https://github.com/llvm/llvm-project/commit/7457f51f6cf61b960e3e6e45e63378debd5c1d5c
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2025-01-13 (Mon, 13 Jan 2025)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
    M llvm/test/CodeGen/X86/pr97968.ll
    M llvm/test/CodeGen/X86/shuffle-strided-with-offset-512.ll
    M llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-4.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-2.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-7.ll
    M llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll

  Log Message:
  -----------
  [X86] Fold VPERMV3(X,M,Y) -> VPERMV(CONCAT(X,Y),WIDEN(M)) iff the CONCAT is free (#122485)

This extends the existing fold which concatenates X and Y if they are sequential subvectors extracted from the same source.

By using combineConcatVectorOps we can recognise other patterns where X and Y can be concatenated for free (e.g. sequential loads, concatenating repeated instructions etc.), which allows the VPERMV3 fold to be a lot more aggressive.

This required combineConcatVectorOps to be extended to fold the additional case of "concat(extract_subvector(x,lo), extract_subvector(x,hi)) -> extract_subvector(x)", similar to the original VPERMV3 fold where "x" was larger than the concat result type.

This also exposes more cases where we have repeated vector/subvector loads if they have multiple uses - e.g. where we're loading a ymm and the lo/hi xmm pairs independently - in the past we've always considered this to be relatively benign, but I'm not certain if we should now do more to keep these from splitting?

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications