[PATCH] D10555: [X86] Replace avx2.pbroadcast intrinsics with native IR.

Tue Aug 18 10:41:11 PDT 2015

Hi all,

> On Aug 17, 2015, at 12:35 PM, Simon Pilgrim via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> RKSimon added a comment.
> 
> In https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10555-23225988&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=Kar_gr4lUTX9iRgFLHyIPCR7VD3VlB3-02h_Q5v9oWk&m=UMYXsw4HORZbnm0C8rRLWxGb7kLk4gKnpOVQB3o_65c&s=xAskr7HRDjZzxFaprjSR49_YDil6Xc7Ad8SeAc5KdNk&e= , @chandlerc wrote:
> 
>> FWIW, I really like this patch. Is there anything we can do to make this work?
> 
> 
> It appears we have a few things that need to be decided before going any further:
> 
> 1 - When is it permitable to replace a (sub)target-specific intrinsic with a non-specific implementation in the headers (e.g. using __builtin_shufflevector for these broadcasts)?

I would say as often as possible. The fewer intrinsics we have the better!

> 
> As long as the expected instruction remains in debug code I'm keen for this to be encouraged - we can add suitable tests, remove those builtin intrinsics to AutoUpgrade.cpp until 4.0 and get much cleaner headers.

To me the debug code problem is a false constraint. Although I am also keen to have the debug experience as close as possible to what the users wrote, they must be aware that intrinsics can and will be replaced by the compiler. Assuming differently means that the users know how intrinsics are represented, which is not a good thing I think.

> 
> 2 - When is it permitable to replace a (sub)target-specific intrinsic in IR/DAG creation, and should that occur in InstCombine or in the target ISel code someplace?
> 
> I'd vote for InstCombine as we already appear to have a critical mass of intrinsics here.

You mean a place where we replace intrinsics by LLVM IR?
If so, I assume this is a proposal to solve the debug vs. IR problem, since those instcombines shouldn’t run at O0. I have to admit I do not like that because it means we still have to support the intrinsics in codegen when the instcombine does not run. Moreover, it sounds artificial to have a pass to expose the IR, whereas the front-end could have done it in the first place.

> 
> 3 - What are we going to do to fix the issue introduced by the header refactor removing the target guards, causing a tricky to decipher 'Cannot select: intrinsic %llvm.x86.vcvtps2ph.128' style backend error for intrinsics that are implemented as macros?

I do not get that part. If those are macros, won’t we just generate something correct, though arguably not efficient.
The “Cannot select:” problem is more general to this header guards refactoring and should be addressed separately. E.g., we have the exact same problem with ARM when neon intrinsics are used but the CPU does not support neon.

My 2c.

Cheers,
-Quentin

> 
> A quick+nasty solution would be to add header guards at least around each of those macros.
> 
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D10555&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=Kar_gr4lUTX9iRgFLHyIPCR7VD3VlB3-02h_Q5v9oWk&m=UMYXsw4HORZbnm0C8rRLWxGb7kLk4gKnpOVQB3o_65c&s=WjDNOXqcuqQGomWATsi-879XugITRofSdefnL_fthzY&e= 
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Dcommits&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=Kar_gr4lUTX9iRgFLHyIPCR7VD3VlB3-02h_Q5v9oWk&m=UMYXsw4HORZbnm0C8rRLWxGb7kLk4gKnpOVQB3o_65c&s=BFu2ZsX0IXH9UejGbdAoOp2GtTOT9J4spPG6hFZZ-e0&e=