[llvm] [AMDGPU] Fold uniform readfirstlane + cndmask (PR #70188)

Thu Oct 26 00:45:32 PDT 2023

Pierre-vh wrote:

I made a couple of changes:

- Now it folds into a S_CSELECT directly, I think it leads to better codegen.
- Only allow S_CSELECT -1, 0 because we want all lanes active for the V_CNDMASK. We don't know what EXEC looks like so if any lane is disabled and it's the only active lane, this falls apart.

> > > Should I abandon this, or is this still a worthwhile addition? There's some positive code changes but not many tests are affected so not sure it's worth it. OTOH the code's already done so it doesn't hurt to just land this too. No strong opinion.
> > 
> > 
> > Is it improving in cross block cases? Add a comment that it should be removable after globalisel?
> 
> My preference is still not to add this complexity to SIFoldOperands. Are there still "positive code changes" if you rebase this on #69703?

The affected tests don't overlap. I think with this SIFoldOperand transform, we can deal with the i32 ext cases too (and the other patch is for i64)

https://github.com/llvm/llvm-project/pull/70188