[Mlir-commits] [mlir] [mlir][x86vector] AVX512-BF16 Dot op (PR #124800)

Thu Jan 30 14:22:12 PST 2025

rengolin wrote:

> To fill this gap, different downstream project came up with their own way to represent target information and pass that to upstream code in some way. For example, we have some AVX2 specific transformations with an API that allows downstream projects to enable them using target information. I believe designing with that in mind would help transition to whatever we end up adopting without major rework.

That's the idea.

> Well, AVX512 alone has [thousands of intrinsics](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX_512), even when many in that list can be folded together. I think, loading all AVX512 ops vs just the supported subsets could really make a difference.

We're not proposing to complete the extension by any stretch of imagination. This would be wrong in too many levels. :)

> It's also difficult to know how these operations will be used in the long run. We already have some passes running on SME operations and I wouldn't be surprised if we ended up introducing some transformations/canonicalizations of the AVX2 operations that we generate.

Transformations, yes. Canonicalizations, no. We want to have a "special lowering" from a `vector.contract` into a series of FMAs, broadcasts, permutes that implement an efficient algorithm utilizing more registers than standard LLVM vectorization. We have a local prototype that does that for F32 and reaches really high performance. We want to test the waters on BF16.

I do not believe we should have such a low level dialect in the first place (as @Groverkss said, this would be akin to Rocm), which does not make sense, as you also pointed out. We know that, and agree with the sentiment. So, we expose the problems in the current design and come up with a plan to refactor it.

My preference is to not need the `intr` namespace and match lowering directly based on types. But to refactor the dialect now, with partial information, would likely mean we'll have to do it again in a month from now. So, we continue with the current warts, then form a plan to remove them.

> My comment is mostly giving visibility to some concerns that were brought to my attention about the current state of some of these dialects. It's great to hear that there is a plan to improve this. I'll be happy to help to the extent possible :)

Awesome! The intention was to grab the attention of people who care, so we can have a design upstream across usages, not particular to our own. We have two or three more to go and we'll have all we need for a transform that can convert contractions into efficient micro-kernels at the compiler level. With that example upstream, we can begin to discuss the pros and cons and redesign both `x86vector` and `amx` (and if Arm is game, their dialects, too).

We'll be counting on you to help us!

https://github.com/llvm/llvm-project/pull/124800