[llvm] AMDGPU: Improve cost handling of canonicalize (PR #101479)

Thu Aug 1 07:47:20 PDT 2024

================
@@ -7,10 +7,10 @@
 ; Simple 3-pair chain with loads and stores
 define amdgpu_kernel void @test1_as_3_3_3_v2f16(ptr addrspace(3) %a, ptr addrspace(3) %b, ptr addrspace(3) %c) {
 ; GCN-LABEL: @test1_as_3_3_3_v2f16(
-; GCN-NEXT:    [[TMP2:%.*]] = load <2 x half>, ptr addrspace(3) [[A:%.*]], align 2
-; GCN-NEXT:    [[TMP4:%.*]] = load <2 x half>, ptr addrspace(3) [[B:%.*]], align 2
-; GCN-NEXT:    [[TMP5:%.*]] = fmul <2 x half> [[TMP2]], [[TMP4]]
-; GCN-NEXT:    store <2 x half> [[TMP5]], ptr addrspace(3) [[C:%.*]], align 2
+; GCN-NEXT:    [[TMP1:%.*]] = load <2 x half>, ptr addrspace(3) [[A:%.*]], align 2
----------------
cdevadas wrote:

Nit; The only diff in this test is the TMP variable index changes. May be pre-commit them?
There are some more tests with similar behavior in this same file except for the last test `canonicalize_v2f16`.  

https://github.com/llvm/llvm-project/pull/101479