[llvm] [LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (PR #159388)
Drew Kersnar via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 24 07:31:36 PDT 2025
================
@@ -45,29 +45,32 @@ define half @fh(ptr %p) {
; ENABLED-LABEL: fh(
; ENABLED: {
; ENABLED-NEXT: .reg .b16 %rs<10>;
-; ENABLED-NEXT: .reg .b32 %r<13>;
+; ENABLED-NEXT: .reg .b32 %r<17>;
; ENABLED-NEXT: .reg .b64 %rd<2>;
; ENABLED-EMPTY:
; ENABLED-NEXT: // %bb.0:
; ENABLED-NEXT: ld.param.b64 %rd1, [fh_param_0];
-; ENABLED-NEXT: ld.v4.b16 {%rs1, %rs2, %rs3, %rs4}, [%rd1];
-; ENABLED-NEXT: ld.b16 %rs5, [%rd1+8];
-; ENABLED-NEXT: cvt.f32.f16 %r1, %rs2;
-; ENABLED-NEXT: cvt.f32.f16 %r2, %rs1;
-; ENABLED-NEXT: add.rn.f32 %r3, %r2, %r1;
-; ENABLED-NEXT: cvt.rn.f16.f32 %rs6, %r3;
-; ENABLED-NEXT: cvt.f32.f16 %r4, %rs4;
-; ENABLED-NEXT: cvt.f32.f16 %r5, %rs3;
-; ENABLED-NEXT: add.rn.f32 %r6, %r5, %r4;
-; ENABLED-NEXT: cvt.rn.f16.f32 %rs7, %r6;
-; ENABLED-NEXT: cvt.f32.f16 %r7, %rs7;
-; ENABLED-NEXT: cvt.f32.f16 %r8, %rs6;
-; ENABLED-NEXT: add.rn.f32 %r9, %r8, %r7;
-; ENABLED-NEXT: cvt.rn.f16.f32 %rs8, %r9;
-; ENABLED-NEXT: cvt.f32.f16 %r10, %rs8;
-; ENABLED-NEXT: cvt.f32.f16 %r11, %rs5;
-; ENABLED-NEXT: add.rn.f32 %r12, %r10, %r11;
-; ENABLED-NEXT: cvt.rn.f16.f32 %rs9, %r12;
+; ENABLED-NEXT: .pragma "used_bytes_mask 1023";
----------------
dakersnar wrote:
Apologies, I should have been clearer. This pragma comes from the other PR associated with this change, which I've put up for review here: https://github.com/llvm/llvm-project/pull/159387. I've separated the PRs to keep reviewing easier, but the test output in this PR is under the assumption that the other one gets merged first.
The pragma is documented here: https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask.
https://github.com/llvm/llvm-project/pull/159388
More information about the llvm-commits
mailing list