[llvm] [X86] Use RORX over SHR imm (PR #77964)

Sun Jan 14 19:46:53 PST 2024

Bryce-MW wrote:

I appreciate the comment! I actually was thinking about some of your points but I wasn't sure how much of an issue they were so I am glad that you brought them up.

I think this is definitely a benefit for the flag spilling case. Probably worthwhile even when flags other than those set by SHR are read.

But the code size is a concern. Especially when replacing SHR by 1 which is 2 bytes vs 6 for RORX. Even the MOV + full SHR is 5 bytes and I don't know that there is a benefit to eliminating a MOV in most cases. Obviously depending on REX and whatever but I think it's generally worse?

Changing and 8bit or 16bit shift to a 32bit RORX could have a false dependency so I'd definitely remove those. Thanks for pointing that out. Otherwise it should be fine?

I have two questions, first is where this should better be done. I haven't looked at codegen much before so I'm not so sure on where things happen. I also think there may be other situations where a normally less optimal instruction could be used when it reduces flag spilling. I had another, more complex case I was going to address in a separate PR.

The other is if similar concerns apply to the SHLX/SHRX transformations. I haven't checked the code size on those but part of the reason I did this in the way I did was looking at those.

I am also hoping to do some kind of more general testing to see what the impact is outside of the LLVM tests. I'll definitely check on my codebase at work which is performance-sensitive. I also recall seeing someone with an automated performance testing system somewhere.

Thanks!

https://github.com/llvm/llvm-project/pull/77964