[flang-commits] [llvm] [lldb] [mlir] [openmp] [flang] [mlir][Vector] Add patterns for efficient i4 -> i8 conversion emulation (PR #79494)

Fri Jan 26 07:31:58 PST 2024

MacDue wrote:

>  It gets difficult to get this working for scalable at this level as we would have to introduce SVE or LLVM intrinsics to model the interleave in an scalable way.

There already are LLVM intrinsics for that, so I don't think it'd be hard to extend to support SVE:

I wrote this little test, which seemed to build fine, and generate reasonable looking code:
```mlir
func.func @test_sve_i4_extend(%inMem: memref<?xi4> ) -> vector<[8]xi32> {
  %c0 = arith.constant 0 :index
  %c4 = arith.constant 4 : i8
  %in = vector.load %inMem[%c0] :  memref<?xi4>, vector<[8]xi4>
  %shift = vector.splat %c4 : vector<[4]xi8>
  %0 = vector.bitcast %in : vector<[8]xi4> to vector<[4]xi8>
  %1 = arith.shli %0, %shift : vector<[4]xi8>
  %2 = arith.shrsi %1, %shift : vector<[4]xi8>
  %3 = arith.shrsi %0, %shift : vector<[4]xi8>
  %4 = "llvm.intr.experimental.vector.interleave2"(%2, %3) : (vector<[4]xi8>, vector<[4]xi8>) -> vector<[8]xi8>
  %5 = arith.extsi %4 : vector<[8]xi8> to vector<[8]xi32>
  return %5 : vector<[8]xi32>
}
```
->
```
test_sve_i4_extend: 
	ptrue	p0.s
	ld1sb	{ z0.s }, p0/z, [x1]
	lsl	z1.s, z0.s, #28
	asr	z0.s, z0.s, #4
	asr	z1.s, z1.s, #28
	zip2	z2.s, z1.s, z0.s
	zip1	z0.s, z1.s, z0.s
	movprfx	z1, z2
	sxtb	z1.s, p0/m, z2.s
	sxtb	z0.s, p0/m, z0.s
	ret
```

I think in the vector dialect: `"llvm.intr.experimental.vector.interleave2` could nicely become `vector.scalable.interleave` :slightly_smiling_face: 

https://github.com/llvm/llvm-project/pull/79494