[PATCH] D147713: [RISCV] Combine concat_vectors of loads into strided loads

Fri Apr 7 07:40:13 PDT 2023

luke added inline comments.

================
Comment at: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:11475-11477
+    if (!allowsMemoryAccessForAlignment(*DAG.getContext(), DAG.getDataLayout(),
+                                        WideVecVT, *MMO))
+      break;
----------------
luke wrote:
> Is it legal to increase the alignment here?
> E.g. for these loads
> 
> ```
> %0 = load <4 x i8>, ptr %pix1, align 1
> %add.ptr = getelementptr inbounds i8, ptr %pix1, i64 %idx.ext
> %2 = load <4 x i8>, ptr %add.ptr, align 1
> ```
> 
> Can we use an align of 4 * 1:
> 
> ```
> %0 = call <2 x i32> @llvm.riscv.strided.load ptr %pix1, i64 %idx.ext, align 4
> ```
> 
I have a feeling the answer is no, which would mean that we can't combine this in x264 SAD:

```c
  #include <stdint.h>
  #include <stdlib.h>
  typedef uint8_t pixel;

  #define PIXEL_SAD_C( name, lx, ly )		    \
      int name( pixel *pix1, intptr_t i_stride_pix1,  \
		pixel *pix2, intptr_t i_stride_pix2 ) \
  {                                                   \
      int i_sum = 0;                                  \
      for( int y = 0; y < ly; y++ )                   \
      {                                               \
	  for( int x = 0; x < lx; x++ )               \
	  {                                           \
	      i_sum += abs( pix1[x] - pix2[x] );      \
	  }                                           \
	  pix1 += i_stride_pix1;                      \
	  pix2 += i_stride_pix2;                      \
      }                                               \
      return i_sum;                                   \
  }

  PIXEL_SAD_C(x264_pixel_sad_4x4, 4, 4)
```

There's no guarantee here that `pix1`/`pix2`/`i_stride_pix1`/`i_stride_pix2` are word aligned so we can't use vlse32. Unless we know it has fast unaligned access?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147713/new/

https://reviews.llvm.org/D147713