[PATCH] D158195: [RISCV] Combine (vrot{l,r} vxi16, 8) -> vrev8

Fri Aug 18 12:50:10 PDT 2023

reames added a comment.

In D158195#4599594 <https://reviews.llvm.org/D158195#4599594>, @craig.topper wrote:

> In D158195#4598747 <https://reviews.llvm.org/D158195#4598747>, @luke wrote:
>
>> In D158195#4596052 <https://reviews.llvm.org/D158195#4596052>, @craig.topper wrote:
>>
>>> If the rotate came in as a fshl/fshr intrinsic or as shl+shr+or would we already get vrev8 for fixed vectors? Is only the shuffle case that is being optimized?
>>
>> Yeah we already get vrev8 for these, DAGCombiner canonicalises them before they would be legalised to vl nodes:
>>
>>   define <4 x i16> @rot_via_fshr(<4 x i16> %a) {
>>     %res = call <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %a, <4 x i16> <i16 8, i16 8, i16 8, i16 8>)
>>     ret <4 x i16> %res
>>   }
>>   
>>   declare <4 x i16> @llvm.fshr.v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c)
>>   
>>   define <4 x i16> @rot_via_shift(<4 x i16> %a, <4 x i16> %amt) {
>>     %1 = shl <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
>>     %2 = lshr <4 x i16> %a, <i16 8, i16 8, i16 8, i16 8>
>>     %3 = or <4 x i16> %1, %2
>>     ret <4 x i16> %3
>>   }
>>
>>
>>
>>   === rot_via_fshr
>>   Initial selection DAG: %bb.0 'rot_via_fshr:'
>>   SelectionDAG has 13 nodes:
>>     t0: ch,glue = EntryToken
>>             t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>>           t4: v4i16 = extract_subvector t2, Constant:i64<0>
>>           t6: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
>>         t7: v4i16 = rotr t4, t6
>>       t9: nxv2i16 = insert_subvector undef:nxv2i16, t7, Constant:i64<0>
>>     t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
>>     t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1
>>   
>>   
>>   Optimized lowered selection DAG: %bb.0 'rot_via_fshr:'
>>   SelectionDAG has 11 nodes:
>>     t0: ch,glue = EntryToken
>>             t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>>           t4: v4i16 = extract_subvector t2, Constant:i64<0>
>>         t13: v4i16 = bswap t4
>>       t9: nxv2i16 = insert_subvector undef:nxv2i16, t13, Constant:i64<0>
>>     t11: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t9
>>     t12: ch = RISCVISD::RET_GLUE t11, Register:nxv2i16 $v8, t11:1
>>
>>
>>
>>   === rot_via_shift
>>   Initial selection DAG: %bb.0 'rot_via_shift:'
>>   SelectionDAG has 18 nodes:
>>     t0: ch,glue = EntryToken
>>       t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>>     t4: v4i16 = extract_subvector t2, Constant:i64<0>
>>       t6: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %1
>>     t7: v4i16 = extract_subvector t6, Constant:i64<0>
>>     t9: v4i16 = BUILD_VECTOR Constant:i16<8>, Constant:i16<8>, Constant:i16<8>, Constant:i16<8>
>>           t10: v4i16 = shl t4, t9
>>           t11: v4i16 = srl t4, t9
>>         t12: v4i16 = or t10, t11
>>       t14: nxv2i16 = insert_subvector undef:nxv2i16, t12, Constant:i64<0>
>>     t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
>>     t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1
>>   
>>   
>>   Optimized lowered selection DAG: %bb.0 'rot_via_shift:'
>>   SelectionDAG has 11 nodes:
>>     t0: ch,glue = EntryToken
>>             t2: nxv2i16,ch = CopyFromReg t0, Register:nxv2i16 %0
>>           t4: v4i16 = extract_subvector t2, Constant:i64<0>
>>         t19: v4i16 = bswap t4
>>       t14: nxv2i16 = insert_subvector undef:nxv2i16, t19, Constant:i64<0>
>>     t16: ch,glue = CopyToReg t0, Register:nxv2i16 $v8, t14
>>     t17: ch = RISCVISD::RET_GLUE t16, Register:nxv2i16 $v8, t16:1
>
> How ugly would it be to do it as a special case during the shuffle lowering instead?

Another possibility would be a RISCV shuffle to bswap combine before lowering, but having this be a special case in the lowering doesn't seem bad to me.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158195/new/

https://reviews.llvm.org/D158195