[llvm-dev] Complex proposal v3 + roundtable agenda

Fri Nov 13 02:13:27 PST 2020

Hi,

On 11/12/20 7:53 PM, Cameron McInally via llvm-dev wrote:
> On Thu, Nov 12, 2020 at 12:03 PM Florian Hahn via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Hi,
>>
>> There’s growing interest among our users to make better use of dedicated hardware instructions for complex math and I would like to re-start the discussion on the topic. Given that this original thread was started a while ago apologies if I missed anything already discussed earlier on the list or the round-table. The original mail is quoted below.
>>
>> In particular, I’m interested in the AArch64 side of things, like using FCMLA [1] for complex multiplications to start with.
>>
>> To get the discussion going, I’d like to share an alternative pitch. Instead of starting with adding complex types, we could start with adding a set of intrinsics that operate on complex values packed into vectors instead.
>>
>> Starting with intrinsics would allow us to bring up the lowering of those intrinsics to target-specific nodes incrementally without having to make substantial changes across the codebase, as adding new types would require. Initially, we could try and match IR patterns that correspond to complex operations late in the pipeline. We can then work on incrementally moving the point where the intrinsics are introduced earlier in the pipeline, as we adopt more passes to deal with them. This way, we won’t have to teach all passes about complex types at once or risk loosing out all the existing combines on the corresponding floating point operations.
>>
>> I think if we introduce a small set of intrinsics for complex math (like @llvm.complex.multiply) we could use them to improve code-generation in key passes like the vectorizers and deliver large improvements to our users fairly quickly. There might be some scenarios which require a dedicated IR type, but I think we can get a long way with a set of specialized intrinsics at a much lower cost. If we later decide that dedicated IR types are needed, replacing the intrinsics should be easy and we will benefit of having already updated various passes to deal with the intrinsics.
>>
>> We took a similar approach when adding matrix support to LLVM and I think that worked out very well in the end. The implementation upstream generates equivalent or better code than our earlier implementation using dedicated IR matrix types, while being simpler and impacting a much smaller area of the codebase.
>>
>> An independent issue to discuss is how to generate complex math intrinsics.
>> As part of the initial bring-up, I’d propose matching the code Clang generates for operations on std::complex<> & co to introduce the complex math intrinsics. This won’t be perfect and will miss cases, but allows us to deliver initial improvements without requiring extensive updates to existing libraries or frontends. I don’t think either the intrinsic only or the complex type variants are inherently more convenient for frontends to emit.
>>
>> To better illustrate what this approach could look like, I put up a set of rough patches that introduce a @llvm.complex.multiply intrinsic (https://reviews.llvm.org/D91347), replace a set of fadd/fsub/fmul instructions with @llvm.complex.multiply (https://reviews.llvm.org/D91353) and  lower the intrinsic for FCMLA on AArch64 (https://reviews.llvm.org/D91354). Note that those are just rough proof-of-concept patches.
>>
>> Cheers,
>> Florian
> Hi Florian,
>
> The proposed experimental intrinsics are a difficult detour to accept
> for performance reasons. With a complex type, the usual algebraic
> simplifications fall out for free (or close to it). Teaching existing
> optimizations how to handle the new complex intrinsics seems like a
> LOT of unnecessary work.
>
> That said, we recently had this same conversation at Simon Moll's
> native predication sync-up meeting. Simon had some convincing ways to
> workaround predicated intrinsic optimization (e.g. the
> PredicatedInstruction class). Maybe we should explore a more
> generalized solution that would cover complex intrinsics too?
The generalized pattern matching in the VP reference patch is not
VP-specific, eg it is parameter-ized in the abstraction. That means we
can lift InstCombine,InstSimplify once on top of that abstraction and
than instantiate that (it's literally a template parameter) to (0)
regular LLVM instructions, (1) constrained fp intrinsics, (2) complex
intrinsics, (3) VP.. hypothetically even (4) constrained/complex/vp
intrinsics.

I'll send out a separate RFC on how that generalized pattern match works
- it's about time we get working on this since use cases keep piling up..

- Simon
>
> Digressing a bit, have we ever discussed using a branch to develop
> something like complex support? That way we would avoid an
> experimental intrinsic implementation, but also not disturb the
> codebase until the implementation is complete.
>
> -Cameron
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>