[LLVMdev] ISRs for PIC16 [was [llvm]r79631 ...]

Mon Aug 24 21:24:39 PDT 2009

Hi Ali,

Thanks for bringing this up. You're definitely under very tight design  
constraints from the hardware. I can certainly sympathize.

I think two design elements are being conflated here, and it would be  
worthwhile splitting them out. For correctness, you need to make sure  
any routines called from an ISR don't clobber equivalent routines  
called from mainline code. For efficiency, you want to overlay the  
static stack frames of functions as much as possible when you can  
prove those frames do not interfere. I believe you can solve these  
problems orthogonally with a bit of fiddling.

Parallel call trees for the ISR and the mainline have the following  
primary components:
*  Constructing the parallel trees themselves such that there's no  
overlap
*  Keeping things straight in the presence of user-written assembly
*  Indirect calls (function pointers)

To straightforwardly create parallel call trees, one for the mainline  
and one for the ISR, I believe it will be best to create two versions  
of every function that's compiled very early in the process, before  
any lowering of frames to static addresses or anything of that sort is  
done. The ISR-tree functions should be flagged as such and have their  
names mangled to reflect. Any call instructions in the ISR-tree  
function should be modified to call the equivalent ISR-tree function  
of the callee. Thus, with no further analysis, we have disjoint call  
trees for mainline code and ISR code. We also have a bunch of extra  
functions laying around, but they are unreachable, so existing  
optimizations should get rid of them for us.

When it comes time to allocate static memory space for the call  
frames, we simply have to do it twice. Once for the mainline  
functions, and once for the ISR functions. We can assert that nothing  
can overlap between the trees since we know that the trees are  
disjoint except via the ISR itself. Thus, we don't overlay anything  
from one tree with anything from the other tree. Varargs and should  
come along for the ride without any special handling since the  
functions are distinct well before those are lowered to reference  
static buffers.

User assembly throws a big wrench into the works, but it's not  
irreconcilable. I suggest requiring the user to flag assembly  
functions as being either for mainline or for ISR code. Doing so via a  
naming convention would likely tie in the easiest with the function  
cloning done above. With the convention being sufficiently magic  
(read: not legal C identifiers so as to avoid collisions), the  
assembler can issue diagnostics if a call instruction is seen that  
references a function not in the proper tree. That is, the  
disjointness of the trees will be enforceable even in user assembly  
with a bit of cooperation from the assembler.

Function pointers are where things get fun. To do these, we need to  
determine at run time whether we need to call the ISR or the mainline  
version of a function, and since the same pointer can be dereferenced  
from both trees, and we're not guaranteed the pointer assignment will  
take place in the same tree, we can't rely on static analysis to  
figure out which of the two versions to reference. We need to defer  
'til run time, and do so in a way that's not too horribly inefficient.  
We would likewise prefer not to have function pointers become  
unmanageably large, since they already essentially need to carry  
around a pointer to the function itself and a pointer to the argument  
frame for that function. Doubling that is not reasonable. If we use  
descriptor handles, however, we can reduce the load of what we're  
passing around at runtime to a single value. We build a list of  
functions whose addresses are taken, and create a descriptor table for  
them. Eventually, the descriptor handle will be relocated to the  
actual address of the proper half of the descriptor (since the  
invocation point knows at compile time which tree it's in).

Conceptually, something like this. Depending on what level of program  
memory access you have (sorry, hazy memory about what the enhanced  
PIC16 was or wasn't going to do about that), a trickier actual  
implementation may be better.
struct fptr_descriptor {
   void (*main_tree_fn)();
   void *main_tree_args;
   void (*isr_tree_fn)();
   void *isr_tree_args;
};

Each translation unit builds up a table of these, with function  
pointers being a relocation indicating the appropriate slot in the  
table. For global unreachable function removal to work, we just need  
to consider all functions in this table as being called (both  
versions). Good analysis should enable even better results, but the  
simple solution should still be pretty good.

All of this will work without any consideration of optimizing call  
frame allocation. That can be done completely separately without  
interfering (sorry for the pun) with any of these bits, including the  
function pointers, since we've kept the complete separation of ISR and  
mainline trees.

To summarize, make two copies of each function very early and keep the  
ISR and mainline trees separate all the way through. Unreachable  
function removal will get rid of any copies that aren't needed.  
Function pointers become a descriptor table that reference both  
copies, and the runtime invokes the proper bit based on which tree the  
call point is in.

Best regards,

   Jim

On Aug 24, 2009, at 12:12 PM, Alireza.Moshtaghi at microchip.com wrote:

>
> Up front I apologize for the lengthy email. Since our patch was not
> accepted, Chris asked us to follow up with this issue on llvm-dev. For
> the sake of completeness, let me give a bit of background and the
> problems that we are facing. I tried to capture as much as possible  
> here
> so Please do give us feedback.
> Don't take your stack for granted.... PIC16 has very poor pointer
> handling, hence stack access are very limited and inefficient; so we  
> do
> static analysis of function frame size and statically allocate the
> function frame in memory (including locals, arguments and spills);  
> then,
> to save memory, we overlay the frame of functions that are not on the
> same call branch.
> Consequently, functions are non reentrant. We can sacrifice recursion
> (because of other limitations in pic16 architecture) but still
> reentrancy is needed because ISR thread can call same function that is
> also called from main thread; examples are stdlib functions and
> intrinsic for math (div/mul/float/etc)
> Here is what we did - but was rejected and we would like to know which
> part (if any) we can keep and which part we must change:
> 1- Root ISR function is defined by user. Since the ISR is located at
> certain address in program memory, we used the section attribute of
> function to encode that the function needs to be at a section with  
> fixed
> address (our in-house linker takes care of the details). We did not  
> add
> new interrupt attribute because it seemed redundant because we can
> encode whatever is needed in the section attribute in the front-end.
> 2- Added two passes at the end of llvm-ld. First pass does call graph
> analysis to distinguish ISR thread from main thread and add few more
> pieces of info to section attribute of the function to indicate  
> whether
> it is called from main thread or ISR thread; if called from both, the
> function has to be cloned. To clone, code and data must be cloned. Not
> having stack, and accessing local variables through their symbolic
> reference, we need to also clone the local variables and modify the
> cloned function to use different naming convention for local variables
> (there are other ways to do this also but in practice, that requires  
> two
> different frameIndexes for one function one for locals and the other  
> for
> args and spills). So after this pass, the main thread and ISR thread  
> are
> completely disjoint; this takes care of stdlib functions as well  
> because
> they are defined in .bc or .a library files (there is a caveat on
> function pointers that I'm going to explain later). The second pass  
> does
> simple function coloring to identify which function frames can be
> overlaid. Color info is also added to section attribute.
> 3- Code generation is mostly similar for main vs ISR threads except  
> for
> function interface of root ISR; but the most important problem is the
> intrinsic functions for non-native operations such as mul/div and
> floating point. So while generating code for these calls, we use
> different naming conventions depending on whether we are generating  
> code
> for a function on ISR-thread or main-thread (this information is in  
> the
> section attribute of the function). Two versions of intrinsics exist  
> in
> the pic16 specific library and the linker will use according to naming
> convention.
> 4- There is a bit of cleanup left to be done in AsmPrinter and we are
> good to go.
> 5- Handling of function pointers is not implemented yet. But since we
> don't have dynamic linking, through alias analysis we should be able  
> to
> find targets of each pointer and when taking address of these  
> functions,
> depending on whether it is in isr or main thread, we can resolve the
> problem efficiently. This can be done in a separate pass; the only  
> thing
> that we have to disallow is calling same function pointer from both  
> the
> ISR and main threads.
> 6- Handling of varargs requires that a separate frame be created for
> arguments for each call site with varargs. This is going to be handled
> while codegen and by the overlay algorithm to save memory.
>
> If you have come this far, thank you for being interested in my brain
> dysfunction. And I am really interested to know your input.
>
> -Ali
>
>
>>>> We should discuss this on llvmdev, I think it came up before but
> there
>>>> was no conclusive plan that was proposed.
>>>>
>>> The approach that we thought for PIC16 can be described in a
>>> single line as below.
>>>
>>> "Keep the functions called from ISR and main separate by cloning the
>>> shared ones".
>>>
>>>
>>>> In short, I think that this sort of thing should be modeled as an
>>>> attribute on function definitions, not declarations.  I don't
>>>> understand why you need to clone entire call graphs, but this is
> best
>>>> discussed on llvmdev.
>>>>
>>> It isn't cloning the entire call graph. It's cloning only the shared
>>> ones as mentioned above.
>>>
>>> The information that a function is ISR is encoded in the section
> string
>>> for the lack of a better attribute.
>>>
>>> Plus we also wanted to encode some information to inform codegen
> about
>>> whether we are codegenerating an interrupt line function. Again the
>>> section string is used to encode the same. Probably annotations
> might be
>>> a better place but that does not work currently (and also, accessing
>>> them slower). This information is required so that the codegen can
> emit
>>> calls to appropriate intrinsics.
>>>
>>> So, IMO, the approach is simple, but probably the implementation
> wasn't
>>> clean.
>>>
>>> Ideas welcome.
>>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev