https://github.com/labrinea approved this pull request. The instruction sequences look better. Also BIF/BIT/BSL have higher throughput (4) than TBL (2), yet same execution latency (2). https://github.com/llvm/llvm-project/pull/121474