[llvm] [X86] Use RORX over SHR imm (PR #77964)

Sun Jan 14 21:09:13 PST 2024

KanRobert wrote:

> I don't know that there is a benefit to eliminating a MOV in most cases. Obviously depending on REX and whatever but I think it's generally worse?

The total cycle of MOV+SHR is same as RORX https://godbolt.org/z/z98vevYMq
The `MOV+SHR` has longer size only when both of two registers are R8-R15. But `MOV` itself does have cost, it's hard to say which is better w/o testing.

> Changing and 8bit or 16bit shift to a 32bit RORX could have a false dependency so I'd definitely remove those. Thanks for pointing that out. Otherwise it should be fine?

32bit/64bit can introduce false dependency too.  Considering the value of source register is `00 00 00 ff 44 33 22 11`, if we shift right by 8, then the high 32 bits are zeros, but if we use rotate, then value would be `11 00 00 00 ff 44 33 22`, no user of the high 32 bits but it's not zero.

> I have two questions, first is where this should better be done. I also think there may be other situations where a normally less optimal instruction could be used when it reduces flag spilling.

I haven't had a clear answer for this. It might be in `X86DAGToDAGISel::tryShiftAmountMod` or in peephole optimization.

https://github.com/llvm/llvm-project/pull/77964