https://github.com/kuhar requested changes to this pull request. Could we exhaustively test this on int8 against some reference implementation? Maybe something in numpy, etc., just to be confident that this implementation is correct. https://github.com/llvm/llvm-project/pull/84720