[PATCH] D56506: [X86][SSE] Allow SplitOpsAndApply to split to lowest common vector size

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 9 13:01:14 PST 2019


craig.topper added a comment.

PMULDQ/PMULUDQ is interacting poorly with the fact that we convert zext/sext to zext_vector_inreg/sext_vector_inreg before type legalization. So we split the PMULDQ/PMULUDQ when we create them. Then SimplfiyDemandedbits can't optimize the zext/sext to aext because the splitting messed up the use count. Then the zext/sext becomes a split zext_invec/sext_invec, but SimplifyDemandedBit won't turn those into aext_invec. So its really a gross ordering problem that probably goes away with -x86-experimental-vector-widening-legalization since we won't eagerly create zext_invec/sext_invec ops.

For this AVG case, I've considered trying to see if we could emit a v48i8 pavg and let type legalization custom widen it to v64i8 using undef and then split it. I think that requires us to use a (v64i8 (insert_subvector undef, (v48i8 X))) to widen the inputs in custom legalization. Then generic legalization would need support for legalizing the v64i8 insert_subvector with v48i8 input. Once its widened and split we should have one v64i8 pavg, or two v32i8 pavg, or four v16i8 pavg depending on the target.

Another idea is that we could teach custom type legalization to split the v48i8 avg as it widens into v16i8 undef, v16i8 avg, v32i8 avg for avx2.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56506/new/

https://reviews.llvm.org/D56506





More information about the llvm-commits mailing list