krzysz00 wrote: High-level question: why is this in GPU? Shouldn't the translation from bf16 to i16 either be a pass over on Arith or part of the SPIR-V lowering? See also, when going to LLVM, we replace all the 8-bit float types with i8 https://github.com/llvm/llvm-project/pull/138087