michalpaszkowski wrote: For unknown reason (to me), the previously (and still currently) passing OpenCL CTS test relationals/shuffle_array_cast has significantly shorter run time after this PR (~50% difference). https://github.com/llvm/llvm-project/pull/96514