khaki3 wrote: Calling runtime routines might be optimal on CPUs. Perhaps, we can only apply some of the passes in OptimizedBufferization to OpenACC or OpenMP code. Any comments are welcome. https://github.com/llvm/llvm-project/pull/103503