[cfe-dev] source code rewriting for invalid ast nodes

Thu Oct 9 06:54:19 PDT 2014

>
>
>
>>         2. There may be the case that x is not used directly but together
>> with operations(or worse a function call). The resulting type in this case
>> would require that I deduct the type of the operator[] parameter for a
>> statement under the condition of a changed type of x.
>>
>
> Yep. I'd first look how many of those there are, often, even in > 100MLOC
> code bases, there are only a handful of corner cases.
>

You are right, after looking through my generated code it doesn't seem to
be a problem.

My hope is still that there is a "keep the node in the ast" solution for
>> clang that would allow me to rewrite such a statement based on the
>> parameter type without caring about the exact statement within the brackets.
>>
>
> But then the question is how can you be sure that it works afterwards? For
> that it seems like you'd need to know the exact information you would get
> when analyzing the unchanged code first (which is why I suggested that).
>

That is why I thought that i may force clang to keep those nodes by
declaring empty dummy builtin operator overloads for any pointer type,
which I could later replace with the help of the ast. That way I thought I
could be sure that the source code transformations are valid.Basically the
idea was to use internal clang methods to define something along the lines
of:
template <typename T>
T & operator[](T * array, const X & index){...}
which unfortunately is not a valid c++ statement. Then again I lack
knowledge about the inner workings of clang to do that and that idea may be
way more complicated/impractical than I thought.

Again thanks for your help. Your solution should be more practical than
trying to handle every theoretical case in a general manner as i first
intended.

Regards,
Marc Greim

On Thu, Oct 9, 2014 at 3:01 PM, Manuel Klimek <klimek at google.com> wrote:

>
>
> On Thu Oct 09 2014 at 2:57:14 PM Marc Greim <marc.greim at mytum.de> wrote:
>
>> First of all thanks for your patience and help with my problem.
>>
>> Your are right i cannot change the code generator (and it may possibly
>> even be exchanged with another generator but that is future problem).
>>
>> Your solution seems right to me in the simple case of the example.
>> But I think it fails for me due to the complexity of the generated code
>> for 2 reasons:
>>         1. The generated code uses macro defines to declare variables. I
>> only change some defines but i don't know exactly which variables are
>> affected. (still this may be solvable by changing those defines, record the
>> variables, change them back and record the use, or record macro expansion
>> operations for those cases)
>>
>
> If you know the locations where you change the defines, you can figure out
> which variables are affected. Clang has all the information about macro
> expansion stored in the AST / the source-manager.
>
>
>>         2. There may be the case that x is not used directly but together
>> with operations(or worse a function call). The resulting type in this case
>> would require that I deduct the type of the operator[] parameter for a
>> statement under the condition of a changed type of x.
>>
>
> Yep. I'd first look how many of those there are, often, even in > 100MLOC
> code bases, there are only a handful of corner cases.
>
>
>> I think I will give that solution a try, but it seems complex to
>> implement and very limited in its ability to accept variations in the
>> generated code.
>>
>
> It seems to me that it allows you to be much more specific in addressing
> what you need.
>
>
>> My hope is still that there is a "keep the node in the ast" solution for
>> clang that would allow me to rewrite such a statement based on the
>> parameter type without caring about the exact statement within the brackets.
>>
>
> But then the question is how can you be sure that it works afterwards? For
> that it seems like you'd need to know the exact information you would get
> when analyzing the unchanged code first (which is why I suggested that).
>
>
>>
>> Greetings,
>> Marc
>>
>>
>> On Thu, Oct 9, 2014 at 2:27 PM, Manuel Klimek <klimek at google.com> wrote:
>>
>>> I assume you cannot change the code generator?
>>>
>>> Why can't you:
>>> 1. generate the code; parse it with the current version (having the 'int
>>> x')
>>> 2. find all 'int x's you wan to change to 'X x;'; also find all uses of
>>> them (including uses in array[x]); output all this information in some
>>> format
>>> 3. run over all those cases; now you can change 'int x' to 'X x' and
>>> 'array[x]' to 'myfunc(array, x)' at the same time
>>> 4. reap benefits; codebase is never in a non-parsing state
>>>
>>> Cheers,
>>> /Manuel
>>>
>>> On Thu Oct 09 2014 at 2:10:51 PM Marc Greim <marc.greim at mytum.de> wrote:
>>>
>>>> In this particular example "array[x]" was generated externally while I
>>>> changed the declaration "int x;" to "X x;". That is the point where i need
>>>> to patch the generated code in order to fix the missing operator[] error
>>>> for type X and allow other simulation relevant operations.
>>>>
>>>> On Thu, Oct 9, 2014 at 2:04 PM, Manuel Klimek <klimek at google.com>
>>>> wrote:
>>>>
>>>>> On Thu Oct 09 2014 at 1:59:13 PM Marc Greim <marc.greim at mytum.de>
>>>>> wrote:
>>>>>
>>>>>> In general I would also say that doing code transformation should
>>>>>> only be done on valid code since one needs to know what actually happens.
>>>>>> This particular problem is unfortunately rather specific. Sorry if the
>>>>>> given example was not sufficient.
>>>>>>
>>>>>> Maybe the problem becomes clearer when class X is defined as wrapper
>>>>>> for int with special functionality.
>>>>>>
>>>>>> The code that I try to transform is part of a hardware simulation
>>>>>> written/generated in c++. The code is guaranteed to be valid except for the
>>>>>> partial substitution of int variables by X variables. The resulting invalid
>>>>>> code statements are only invalid in the sense that a parameter type is not
>>>>>> right. Due to the complexity of the code it is not feasible/possible to
>>>>>> predict where type X and where int is used for such operations. Since some
>>>>>> operators (e.g. [] for pointers) cannot be overloaded with standard c++
>>>>>> code, errors will come up. For that and other reasons it is necessary to
>>>>>> rewrite the code to use custom functions instead of the operator itself.
>>>>>> This is where my problem with clang lies. Those nodes are removed from the
>>>>>> ast due to missing operator[]/missing type conversion. But those are the
>>>>>> nodes i need to preserve in order to run a matcher and transform the code
>>>>>> as needed for the simulation. Again as noted before adding operator int()
>>>>>> to class X is not a solution since that would create many ambiguity
>>>>>> problems.
>>>>>>
>>>>>>
>>>>>> So to boil the problem further down: How can i forces clang to ignore
>>>>>> wrong types that are passed to operators/functions and build the ast with
>>>>>> such nodes?
>>>>>>
>>>>>> Again, I suspect that adding built-in operators for those cases is
>>>>>> the way to go, but I don't know how to iterate over all types and then
>>>>>> create empty dummy functions for that.
>>>>>>
>>>>>> I hope this describes my problem sufficiently.
>>>>>>
>>>>>
>>>>> I still don't fully understand what the current situation is.
>>>>> So you have code that calls array[x] with a class type x? How did you
>>>>> produce that code?
>>>>>
>>>>>
>>>>>>
>>>>>> On Thu, Oct 9, 2014 at 12:53 PM, Manuel Klimek <klimek at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The general advise is to only do code transformations on valid code.
>>>>>>> I don't know enough about your problem to understand why that is not
>>>>>>> possible.
>>>>>>>
>>>>>>> On Tue Oct 07 2014 at 4:37:52 PM Marc Greim <marc.greim at mytum.de>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm using clang to rewrite (generated) source code. Usually that
>>>>>>>> can be done by running matchers on the ast in combination with a rewriter
>>>>>>>> instance.
>>>>>>>>
>>>>>>>> Unfortunately this fails in the case of operator[] where the index
>>>>>>>> argument cannot be converted to a valid type (e.g. class X{} x; int *
>>>>>>>> array; int i = array[x]; ). The corresponding ast nodes are missing since
>>>>>>>> the "array[x]" statement has no valid representation.
>>>>>>>>
>>>>>>>> How can I detect and rewrite the code in cases like above example?
>>>>>>>> "array[x]" -> "someFunc(array,x)"
>>>>>>>>
>>>>>>>> ExternalSemaSource::CorrectTypo dosen't get called in this case
>>>>>>>> (maybe because all tokens are valid?) so that attempt failed.
>>>>>>>>
>>>>>>>> Using ExternalSemaSource::LookupUnqualified also failed so far,
>>>>>>>> because i haven't found a method to get a valid SourceRange for that code
>>>>>>>> part. It would also require manual parsing of that code part which seem
>>>>>>>> like a "dirty" solution to me.
>>>>>>>>
>>>>>>>> The only idea that I have left is to declare builtin operators for
>>>>>>>> any type with Sema::AddBuiltinCandidate, but that may result in many
>>>>>>>> operator definitions. Also I have no idea how to iterate over all types and
>>>>>>>> how to declare these functions. However this may be the best solution,
>>>>>>>> because then matchers can be used to find and rewrite those code parts.
>>>>>>>>
>>>>>>>> I would appreciate any help to find a solution.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Marc
>>>>>>>>
>>>>>>>>
>>>>>>>> P.S. I am aware that adding operator int() to class X of above
>>>>>>>> example would allow those statements but that is not an option, since X
>>>>>>>> cannot be represented as int in my case and additional operations need to
>>>>>>>> be performed; such an operator may also mess up other parts of the code.
>>>>>>>> _______________________________________________
>>>>>>>> cfe-dev mailing list
>>>>>>>> cfe-dev at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141009/2c7faec8/attachment.html>