vzakhari wrote: Just FYI, it seems the `SUM(DIM)` case in `digits_2` might be optimized further by proving that the LHS and RHS accesses never overlap. Flang creates an unnecessary temp currently. https://github.com/llvm/llvm-project/pull/118556