RKSimon wrote: I'd really like to see the codegen diff for a benchmark that shows a real perf differences - my gut feeling is something else is being combined as a sideeffect and its nothing to do with the GFNI instruction by itself. https://github.com/llvm/llvm-project/pull/91721