[clang] [llvm] [HLSL][DXIL][SPIRV] Create llvm dot intrinsic and use for HLSL (PR #102872)

Mon Aug 12 10:06:26 PDT 2024

farzonl wrote:

> AArch64 has a udot and sdot instruction (and a usdot instruction). They perform a "partial" reduction though, producing a v4i32 from two v16i8 inputs. We would like to use those from the vectorizer and have recently added a partial-reduction intrinsic, but doing it with a higher level intrinsic might be a little nicer.

We haven't done it yet, but our plan here is to create a default expansion in `TargetLoweringBase.cpp`. And then any backend thant has specalizations can  add those specializations in your case to AArch64ISelLowering.cpp.

> 
> It would seem like a "udot" can be represented already as `vecreduce.add(mul(zext, zext))`, and fdot is simpler still. Is there any particular reason to add a new intrinsic for it if it is already representable as a vecreduce? And it would feel like a shame if it couldn't be used with the actual AArch64 instructions.
> 
There was a  whole discussion on dot in https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294/13 check out `kparzysz` posts. Essentially Yes we could represent dot this way, but then we would not be able to benefit from the ubquity of the hardware specific dot lowerings that are showing up across gpu and convolution
 use cases.
> @SamTebbs33 @NickGuy-Arm FYI.

https://github.com/llvm/llvm-project/pull/102872