[PATCH] D158195: [RISCV] Combine (vrot{l,r} vxi16, 8) -> vrev8

Fri Aug 18 10:47:24 PDT 2023

craig.topper added a comment.

In D158195#4598747 <https://reviews.llvm.org/D158195#4598747>, @luke wrote:

> In D158195#4596052 <https://reviews.llvm.org/D158195#4596052>, @craig.topper wrote:
>
>> If the rotate came in as a fshl/fshr intrinsic or as shl+shr+or would we already get vrev8 for fixed vectors? Is only the shuffle case that is being optimized?
>
> Yeah we already get vrev8 for these, DAGCombiner canonicalises them before they would be legalised to vl nodes:
>
>   define <4 x i16> @rot_via_fshr(<4 x i16> %a) {
>     %res = call <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %a, <4 x i16> <i16 8, i16 8, i16 8, i16 8>)
>     ret <4 x i16> %res
>   }
>   
>   declare <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c)
>   
>   define <4 x i16> @rot_via_shift(<4 x i16> %a, <4 x i16> %amt) {
>     %1 = shl <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
>     %2 = lshr <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
>     %3 = or <4 x i16> %1, %2
>     ret <4 x i16> %3
>   }
>
>
>
>   === rot_via_fshr
>   Initial selection DAG: %bb.0 'rot_via_fshr:'
>   SelectionDAG has 13 nodes:
>     t0: ch,glue = EntryToken
>             t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>           t4: v4i16 = extract_subvector t2, Constant:i64<0>
>           t6: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
>         t7: v4i16 = rotr t4, t6
>       t9: nxv2i16 = insert_subvector undef:nxv2i16, t7, Constant:i64<0>
>     t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
>     t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1
>   
>   
>   Optimized lowered selection DAG: %bb.0 'rot_via_fshr:'
>   SelectionDAG has 11 nodes:
>     t0: ch,glue = EntryToken
>             t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>           t4: v4i16 = extract_subvector t2, Constant:i64<0>
>         t13: v4i16 = bswap t4
>       t9: nxv2i16 = insert_subvector undef:nxv2i16, t13, Constant:i64<0>
>     t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
>     t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1
>
>
>
>   === rot_via_shift
>   Initial selection DAG: %bb.0 'rot_via_shift:'
>   SelectionDAG has 18 nodes:
>     t0: ch,glue = EntryToken
>       t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>     t4: v4i16 = extract_subvector t2, Constant:i64<0>
>       t6: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %1
>     t7: v4i16 = extract_subvector t6, Constant:i64<0>
>     t9: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
>           t10: v4i16 = shl t4, t9
>           t11: v4i16 = srl t4, t9
>         t12: v4i16 = or t10, t11
>       t14: nxv2i16 = insert_subvector undef:nxv2i16, t12, Constant:i64<0>
>     t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
>     t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1
>   
>   
>   Optimized lowered selection DAG: %bb.0 'rot_via_shift:'
>   SelectionDAG has 11 nodes:
>     t0: ch,glue = EntryToken
>             t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>           t4: v4i16 = extract_subvector t2, Constant:i64<0>
>         t19: v4i16 = bswap t4
>       t14: nxv2i16 = insert_subvector undef:nxv2i16, t19, Constant:i64<0>
>     t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
>     t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1

How ugly would it be to do it as a special case during the shuffle lowering instead?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158195/new/

https://reviews.llvm.org/D158195