[PATCH] D24816: [Target] move reciprocal estimate settings from TargetOptions to TargetLowering

Thu Sep 22 09:45:30 PDT 2016

spatel added a comment.

In https://reviews.llvm.org/D24816#549338, @mehdi_amini wrote:

> > There are independent goals, but I see this is an intermediate step:
>
> > 
>
> > 1. It allows the settings to be different per function.
>
>
> The technical solution to handle this "per instruction" would be totally different though. These attributes would become obsolete as soon as you have the per instruction solution.
>  Now if there is not plan to get something per-instruction  in the near future, why not.

These are all excellent questions. I think I'm going to learn something today. :)

I agree that the 'per instruction' solution would obsolete this. I should have made that clearer in the patch summary, and I will make that clear in the commit message if this patch is committed.

So why don't we just get to the ideal solution right now? It's just not high enough on anyone's priority list (including mine) to get it done immediately. Partly, this is because of an issue that you properly raised and I mentioned in the post-commit thread for r268539:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160104/323154.html

So while I'd like to do 'per instruction', I think it's (much?) more time and work than this. It's worth mentioning that I wouldn't have made this patch a priority if I didn't think that all of reciprocal estimate codegen had been issued an existential threat - thanks, Eric! :)

It's possible that I misinterpreted the messages from r268539, but I took the possibility of a PPC revert as a chance that x86 would be the next domino to fall. After reviewing the diffs here, I still don't know why x86 is immune to the LTO-breakage problem if PPC is not. Neither appears to be directly predicating codegen on a CPU model, just CPU subtarget features.

FWIW, I think the AArch patch absolutely needs to be re-worked so that it's not checking 'isExynosM1()'. This goes back to Eric's last comment and I think is your primary concern: -mrecip was not intended to be a subtarget-based tuning parameter. It's a programmer-based optimization hint that says, "I want to make a speed/accuracy trade-off for these particular FP operations using these parameters to tune the codegen." At least, that's my understanding: it's a refinement of -ffast-math.

> What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math). functionsHaveCompatibleAttributes() or something in the cost model might need updating to improve this?

https://reviews.llvm.org/D24816