[PATCH] D21033: Add runtime support for __cpu_model (__builtin_cpu_supports)

Wed Jun 8 15:16:52 PDT 2016

On Wed, Jun 08, 2016 at 08:49:17PM +0000, Eric Christopher via llvm-commits wrote:
> On Tue, Jun 7, 2016 at 3:47 PM Joerg Sonnenberger <joerg at bec.de> wrote:
> >   if (predict_false(!initialised)) {
> >      set cpuid fields;
> >      initialised = 1;
> >   }
> >   use cpuid fields
> >
> > Runtime impact is minimal, at most a barrier before initialised=1 should
> > be needed on all sane architectures. Running the init code twice is fine
> > as it computes the same data again.
> >
> 
> This won't work for any of the ifunc implementations that use this feature
> for dynamic load time resolution as I was alluding to above.

Yes, it does. It is even more important for ifunc to avoid ctors, as
there are nasty chances for order bugs otherwise. GCC tries to even
accomodate somewhat for this by providing an explicit
__builtin_cpu_init, but I count that as one more bug of the interface.

> > > > (2) For the purpose of `__builtin_cpu_supports`, only the flags are
> > > > needed. That means 90% of the code complexity here is unnecessary.
> > >
> > > This isn't just for __builtin_cpu_supports. That's just what Alina
> > needed here.
> >
> > That kind of proves my point thought. It is a lot of code for a very
> > questionable feature.
> >
>
> I don't follow. Could you elaborate.

CPU names have become practically useless for anything but marketing
recently. Even the family-like names supposed to be available by the GCC
docs are only of limited use when it comes to distingiush performance.
E.g. the difference between desktop, mobile and server versions is
completely ignored.

> > > This is talking about a new interface. While I'm somewhat sympathetic
> > > to your goals, having a compiler-rt that's a drop in replacement for
> > > libgcc unfortunately needs this feature set to be implemented.
> >
> > If you just want ot be compatible with GCC to mark some checkbox, you
> > can just return 0. That's a perfectly valid implementation. I consider
> > this interface bad enough to go as far as saying that it might be the
> > only useful implementation choice...
> >
> >
> No, no it's not a valid implementation. It needs to work to be implemented.

For __builtin_cpu_supports, I am willing to somewhat compromise. That's
still easy enough to map directly to feature flags without a lot of
bloat. __builtin_cpu_is is a different beast...

> > > Also see arguments against implementing ifunc in the first place versus
> > > just a (possibly hidden) global that's set at program start time.
> >
> > I'm somewhat reluctant about ifunc as well, but primarily because the
> > tooling support is stupidly inconsistent. It's a lovely example of
> > static linking becoming a third class citzen in GNUland. That said, I
> > never said the *builtins* shouldn't be recognized. Just that it makes a
> > lot more sense to keep all the string handling out of a true low-level
> > library and making it much more future proof at the same time.
> >
> >
> You keep conflating your desire for a better interface for the builtins to
> the compiler-rt support for the builtins that exist. I understand you don't
> like the current interface, but given how long they've been implemented and
> the fairly small overall impact to compiler-rt and maintainability I don't
> see how you don't want to support them. If you really don't like having
> runtime support for them then we can easily provide a way for you to
> compile them out (though I'd suggest you also add patches to clang to nix
> the builtin unless you want to use glibc).

So far I have literally not seen a single user. Four years isn't even
that long. Please look again at the code for supporting
__builtin_cpu_is. Now look at the implementation of the software
floating point. The former is in the same order of code size as the
latter. Now look at how often new CPU generations appear and yes, it
becomes a maintainance burden. Especially because this API is designed
to not allow any error handling. As I said, just returning 0 is a
perfectly valid implementation. Not the most useful one for sure, but
certainly valid.

Joerg