https://github.com/philnik777 commented: I'll try to figure out how to let the compiler vactorize this and post some example code here. I think that should be significantly faster on at least some platforms. https://github.com/llvm/llvm-project/pull/128832