[cfe-dev] time of inline assembler evaluation in template specialization

Fri Jun 24 01:09:17 PDT 2011

Am 23.06.2011 um 22:11 schrieb Douglas Gregor:
> On Jun 23, 2011, at 12:19 PM, Titus von Boxberg wrote:
>> 
>> Am 23.06.2011 um 18:45 schrieb John McCall:
>> 
>>> On Jun 22, 2011, at 11:52 PM, Titus von Boxberg wrote:
>>>> I tried to use template specialization for selecting
>>>> the right inline assembly code.
>>>> 
>>>> Example:
>>>> -----------------
>>>> template<int CPU>
>>>> struct A {
>>>> };
>>>> 
>>>> template<>
>>>> struct A<8> {
>>>> void f(void) {
>>>> asm("ldr r4,=1\n\t" ::: "r4");
>>>> }
>>>> };
>>>> 
>>>> template<>
>>>> struct A<1> {
>>>> void f(void) {
>>>> asm("movl $1, %%eax\n\t" ::: "eax");
>>>> }
>>>> };
>>>> 
>>>> 
>>>> int main(void)
>>>> {
>>>> A<1>	a;
>>>> a.f();
>>>> }
>>>> ---------------
>>>> clang says:
>>>> 
>>>> error: unknown register name 'r4' in asm
>>>> asm("ldr r4,=1\n\t" ::: "r4");
>>>> 
>>>> g++ compiles the code, both for arm (A<8>) and x86.
>>>> 
>>>> Even if the register name r4 in the clobber list would really
>>>> be invalid, I'd expect that the compiler would only analyze
>>>> the syntactical correctness of the asm statement itself while
>>>> parsing the template.
>>>> Otherwise this application of template specialization cannot work.
>>>> 
>>>> Is this assumption correct?
>>> 
>>> No.  A program is ill-formed, no diagnostic required, if it contains
>>> a template specialization with no valid instantiations.  You cannot
>>> rely on (non-dependent) code in dead template specializations
>>> to not be checked for validity.
>> 
>> Clear, but elegantly bypassing my question ;-)
>> What is - in this context - validity, then?
>> Why is it not the mere syntactical correctness of the asm statement?
>> asm statements are a nonstandard extension, anyway; so clang is free
>> to do it this or the other way.
>> If clang would handle it like gcc the combination of template specialization
>> and assembly language would work; and I'd recommend to follow gcc
>> here since there is no other standard.
>> 
>> Since you cut away my second question:
>> clang already behaves inconsistently (from the user perspective)
>> because it analyzes the register lists but not the statement itself.
>> That this might be (don't know if it can be with clang) explained
>> by the different stages of the compiler / assembler is no excuse.
>> 
>> And this behaviour is also present when the assembly code is dependent on
>> some type parameter.
>> 
>> What do you think?
> 
> 
> Diagnosing ill-formed code within a template definition is a quality-of-implementation issue. Clang's philosophy is to diagnose everything that is allowed and reasonable. It's certainly reasonable to check the constraints, but our architecture makes it a bit hard to check the actual statement. That's a QoI issue that could conceivably be fixed in the future.
> 
> Besides, in the example you give, we're not even talking about a template definition: this is actual code that presumably only works in GCC because one of the inline functions isn't ever emitted. I don't think it makes any sense whatsoever to accept this code.
> 
Sorry to insist, and sorry if I was unclear.
There is no "ill formed", "invalid" or whatsoever bad C++ code in my example.
It's like in any template instantiation / specialization.
It only works because the right (specialized) template code gets selected and emitted.

There is no standard for inline asm, thus doing it like gcc (which is
what http://clang.llvm.org/compatibility.html#inline-asm claims to be clang's intention)
might be a good idea.
gcc analyzes the syntactical correctness of the asm
statement during template code parsing, and analyzes the content of
the asm statement during template instantiation / code generation
(the statement itself is even passed through to gas, not being analyzed by gcc).
With gcc you can write "r4", "blah" or "eax" in the register lists as long as e.g.
the input / output lists are well formed (which can be analyzed at the "C++ code level").
That the register lists take the form of string constants already points at
the different levels of analysis. The register names are no special words
at the "C++ code level" (sorry, have no better phrase)
which is what I feel clang is erroneously trying to make sure.

All I want to say is:
Doing it the gcc way is what I'd suggest for clang. And you already said that it's
"a bit hard" to get clang consistent the other way round: to analyze
the complete content of the asm statement during (template) parsing
before code actually gets emitted.

The sense to accept this code is because one gains the option of
using template specialization instead of having to rely on the preprocessor
to select the right (assembly) code which is perfectly reasonable.
The software I'm working on contains quite a bit of platform specific
(specialized) template code that gets selected based on CPU, OS, and the like.
Being able to use assembly inside template specializations surely isn't
an everyday usage case but definitely can make "sense".
In the case that brought up my question it's a specialized lock that also
works on platforms that lack std::atomic_flag (e.g. MacOS/clang ;-)

Maybe it's helpful looking at clang's code?
When time permits I'd try to figure out what would be needed
to get clang gcc compatible.

Regards
Titus