<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/153109>153109</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [NVPTX] Performance regression in IR that uses `<1 x float>`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
            Artem-B
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Artem-B
      </td>
    </tr>
</table>

<pre>
    Introduction of v2f32 support appears to cause performance regression in IR that uses `<1 x float>` vectors. 

LLVM then tends to use `shufflevector` to build `<2 x float>` and our lowering for that ends up doing it the hard way, which does regress performance in some of our benchmarks.

Minimized reproducer: https://godbolt.org/z/8efcrna8b  

One kernel constructs `<2 x float>` using `insertelement` and all of it is removed during lowering, and the case that uses `<1 x float>` and `shufflevector` ends up doing a lot more unnecessary work.

```
 %i4 = shufflevector <1 x float> %i1, <1 x float> %i2, <2 x i32> <i32 0, i32 1>

->
        cvt.u64.u32     %rd3, %r1;
 cvt.u64.u32     %rd4, %r2;
        shl.b64         %rd5, %rd4, 32;
 or.b64  %rd6, %rd3, %rd5;
```

Vs:
```
  %i4 = insertelement <2 x float> undef, float %i1, i64 0
  %a = insertelement <2 x float> %i4, float %i2, i64 1

->
 ...[nothing. LLVM removes vector creation and uses the original inputs %i1/%i2 ]
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJykVN1u4zYTfRr6ZhCBIiXZvtCFk3wGFtj9uiiKoLeUOLLYUKTAH7vZpy9IydnYSNECFQyYIs-cM3M0HOG9OhnEltSPhLGDCzg9pBWpnzcihtG6dt3cdFa-tV9McFbGPihrwA5wZgNn4OM8WxdAzDMK5yFY6EX0CDO6wbpJmB7B4cmh9ylQGfjyK4RRBIgePZCGEv5Uwp8waCsC4f8jDYUz9sE6XwChB0IPX7--fIMwooGARmaRJEEa6sc4DBoXfIoMFrqotFyJ2R2xMBJsdKDtBZ0yJxisW7LJxHEGadO2CkkPRuEkXMQbYU9wGVU_grTor_Xc1KgMeDthciYJdGj6cRLu1RdLDd-UUZP6gRIcztlIdIQfYAxh9oQfCDsSdjxZ2VkdCutOhB1_EHbc4dA7I3YdrGb8YhBe0RnU0Fvjg4t98H9TbvSpGNJQZTy6gBonNOFqhNA6pasCqFTTZM8oQcbsy9WgVHnCJjd64fEfP10Cf_Zhbv0VoG2AyTqEaAz26L1wb3Cx7nX1K7EvP3oAwmpVAeHPcMMLd_oZV6aUPztg60EySXGWt_mT4gxoOkmLMhWR5R-WFaxPfw5FbKoicpbfCaud5JmQ1a4k_DGBP0VVVxRbUevjR110TXV9XcD1FbyE8WuMdQs2HzXvoPcMZL0AP7pG6OEld9admR_cvGkLuGsgiEbikBTyzk9zVVPBO5P4F0RZ8JaIXYnKW7-LoiD1o7FhVOZUQL75S2v6dSpA71DkIZRaLXdiak7r1EkZoUGZOaYLsWR7zGJA6uePNmxky-We78UG23JbV_t6y3izGdsd29fNfluzspFlz2jZdJKyuqLVwDouyo1qGWU13ZWMUrot64JihQMXu0HIrpPbgVQUJ6F0ofV5Svd4o7yP2JY1L-l-o0WH2q9D1-AF8uk6dl2bgh66ePKkolr54H_SBBV0ntb_f_n-2--kfobv_2HEbqLT7d3sUWGMXdHbibBjkl3_HmZn_8A-EHbMyXrCjms155b9FQAA__9zhOS2">