[cfe-dev] source code rewriting for invalid ast nodes

Manuel Klimek klimek at google.com
Thu Oct 9 06:01:28 PDT 2014

On Thu Oct 09 2014 at 2:57:14 PM Marc Greim <marc.greim at mytum.de> wrote:

> First of all thanks for your patience and help with my problem.
> Your are right i cannot change the code generator (and it may possibly
> even be exchanged with another generator but that is future problem).
> Your solution seems right to me in the simple case of the example.
> But I think it fails for me due to the complexity of the generated code
> for 2 reasons:
>         1. The generated code uses macro defines to declare variables. I
> only change some defines but i don't know exactly which variables are
> affected. (still this may be solvable by changing those defines, record the
> variables, change them back and record the use, or record macro expansion
> operations for those cases)

If you know the locations where you change the defines, you can figure out
which variables are affected. Clang has all the information about macro
expansion stored in the AST / the source-manager.

>         2. There may be the case that x is not used directly but together
> with operations(or worse a function call). The resulting type in this case
> would require that I deduct the type of the operator[] parameter for a
> statement under the condition of a changed type of x.

Yep. I'd first look how many of those there are, often, even in > 100MLOC
code bases, there are only a handful of corner cases.

> I think I will give that solution a try, but it seems complex to implement
> and very limited in its ability to accept variations in the generated code.

It seems to me that it allows you to be much more specific in addressing
what you need.

> My hope is still that there is a "keep the node in the ast" solution for
> clang that would allow me to rewrite such a statement based on the
> parameter type without caring about the exact statement within the brackets.

But then the question is how can you be sure that it works afterwards? For
that it seems like you'd need to know the exact information you would get
when analyzing the unchanged code first (which is why I suggested that).

> Greetings,
> Marc
> On Thu, Oct 9, 2014 at 2:27 PM, Manuel Klimek <klimek at google.com> wrote:
>> I assume you cannot change the code generator?
>> Why can't you:
>> 1. generate the code; parse it with the current version (having the 'int
>> x')
>> 2. find all 'int x's you wan to change to 'X x;'; also find all uses of
>> them (including uses in array[x]); output all this information in some
>> format
>> 3. run over all those cases; now you can change 'int x' to 'X x' and
>> 'array[x]' to 'myfunc(array, x)' at the same time
>> 4. reap benefits; codebase is never in a non-parsing state
>> Cheers,
>> /Manuel
>> On Thu Oct 09 2014 at 2:10:51 PM Marc Greim <marc.greim at mytum.de> wrote:
>>> In this particular example "array[x]" was generated externally while I
>>> changed the declaration "int x;" to "X x;". That is the point where i need
>>> to patch the generated code in order to fix the missing operator[] error
>>> for type X and allow other simulation relevant operations.
>>> On Thu, Oct 9, 2014 at 2:04 PM, Manuel Klimek <klimek at google.com> wrote:
>>>> On Thu Oct 09 2014 at 1:59:13 PM Marc Greim <marc.greim at mytum.de>
>>>> wrote:
>>>>> In general I would also say that doing code transformation should only
>>>>> be done on valid code since one needs to know what actually happens. This
>>>>> particular problem is unfortunately rather specific. Sorry if the given
>>>>> example was not sufficient.
>>>>> Maybe the problem becomes clearer when class X is defined as wrapper
>>>>> for int with special functionality.
>>>>> The code that I try to transform is part of a hardware simulation
>>>>> written/generated in c++. The code is guaranteed to be valid except for the
>>>>> partial substitution of int variables by X variables. The resulting invalid
>>>>> code statements are only invalid in the sense that a parameter type is not
>>>>> right. Due to the complexity of the code it is not feasible/possible to
>>>>> predict where type X and where int is used for such operations. Since some
>>>>> operators (e.g. [] for pointers) cannot be overloaded with standard c++
>>>>> code, errors will come up. For that and other reasons it is necessary to
>>>>> rewrite the code to use custom functions instead of the operator itself.
>>>>> This is where my problem with clang lies. Those nodes are removed from the
>>>>> ast due to missing operator[]/missing type conversion. But those are the
>>>>> nodes i need to preserve in order to run a matcher and transform the code
>>>>> as needed for the simulation. Again as noted before adding operator int()
>>>>> to class X is not a solution since that would create many ambiguity
>>>>> problems.
>>>>> So to boil the problem further down: How can i forces clang to ignore
>>>>> wrong types that are passed to operators/functions and build the ast with
>>>>> such nodes?
>>>>> Again, I suspect that adding built-in operators for those cases is the
>>>>> way to go, but I don't know how to iterate over all types and then create
>>>>> empty dummy functions for that.
>>>>> I hope this describes my problem sufficiently.
>>>> I still don't fully understand what the current situation is.
>>>> So you have code that calls array[x] with a class type x? How did you
>>>> produce that code?
>>>>> On Thu, Oct 9, 2014 at 12:53 PM, Manuel Klimek <klimek at google.com>
>>>>> wrote:
>>>>>> The general advise is to only do code transformations on valid code.
>>>>>> I don't know enough about your problem to understand why that is not
>>>>>> possible.
>>>>>> On Tue Oct 07 2014 at 4:37:52 PM Marc Greim <marc.greim at mytum.de>
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>> I'm using clang to rewrite (generated) source code. Usually that can
>>>>>>> be done by running matchers on the ast in combination with a rewriter
>>>>>>> instance.
>>>>>>> Unfortunately this fails in the case of operator[] where the index
>>>>>>> argument cannot be converted to a valid type (e.g. class X{} x; int *
>>>>>>> array; int i = array[x]; ). The corresponding ast nodes are missing since
>>>>>>> the "array[x]" statement has no valid representation.
>>>>>>> How can I detect and rewrite the code in cases like above example?
>>>>>>> "array[x]" -> "someFunc(array,x)"
>>>>>>> ExternalSemaSource::CorrectTypo dosen't get called in this case
>>>>>>> (maybe because all tokens are valid?) so that attempt failed.
>>>>>>> Using ExternalSemaSource::LookupUnqualified also failed so far,
>>>>>>> because i haven't found a method to get a valid SourceRange for that code
>>>>>>> part. It would also require manual parsing of that code part which seem
>>>>>>> like a "dirty" solution to me.
>>>>>>> The only idea that I have left is to declare builtin operators for
>>>>>>> any type with Sema::AddBuiltinCandidate, but that may result in many
>>>>>>> operator definitions. Also I have no idea how to iterate over all types and
>>>>>>> how to declare these functions. However this may be the best solution,
>>>>>>> because then matchers can be used to find and rewrite those code parts.
>>>>>>> I would appreciate any help to find a solution.
>>>>>>> Regards,
>>>>>>> Marc
>>>>>>> P.S. I am aware that adding operator int() to class X of above
>>>>>>> example would allow those statements but that is not an option, since X
>>>>>>> cannot be represented as int in my case and additional operations need to
>>>>>>> be performed; such an operator may also mess up other parts of the code.
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141009/e8bb3820/attachment.html>

More information about the cfe-dev mailing list