[llvm-dev] Altering the return address , for a function with multiple return paths

Joan Lluch via llvm-dev llvm-dev at lists.llvm.org
Sun Jul 21 08:34:51 PDT 2019


Hi Tsur

> In high-level languages, returning optionally different types of returned value will is usually handled with
> a tagged union and a switch statement in the caller.

This is not really returning different types, as you are returning a value which is of the union type. The function is still declared as returning an union, not a dynamic type. Union member variables are reinterpretations of the same memory location (or register content). They tend to be very efficient when you want to look at the bare bottom of the bit representation, but they are generally discouraged in object oriented languages unless you really need to go down to the actual memory representation for some reason.

> My intention is to skip this by giving the callee two different addresses to return to depending on what it did with the input.

As I said, I think this would require a redefinition of the language in order to be able to specify these two return addresses. I can’t really imagine that, but It occurs to me that you should be able to achieve a similar goal, which is ultimately avoiding switch statements and tagged objects, by a proper use of class inheritance. Think of a base class and several sub-classes, each subclass deals with a particular type. From the point of view of object memory usage it’s almost the same as an union, because you will only use one object instance at any given time and object member variables start at the beginning of the object memory, so they are taking the same memory as if they were an union.

> for high-level jitted languages, this can simplify the "type inference" pass. 

This still requires the compiler to know the type of an object at runtime, which is a problem that class object instances solve. In the case of unions, the inferred type will be always an union, the compiler is unable to determine at runtime the member type you want to use by looking at the tag that you may have provided. It just doesn’t work like that, if I understand what you attempt to do.

> Another question on the topic. If I manage the stack myself somehow and replace ret with inline assembly jmp , will
> the processor be able to prefetch instructions beyond the jmp? 


I am not fully qualified to respond to this question as I’m not that versed on processor working internals. I think that processors are able to prefetch instructions that will be executed after a non-conditional jump, but I am unsure about that. In any case, if you replace ‘ret’ instructions by ‘jmp' you must still generate the proper epilog code to restore any modified registers and make sure that the stack pointer or frame pointer point to the caller stack frame.

I think that it would be useful if you give some context about why you actually need this feature. It still looks to me as something that could be defined for a (possible) new language, not something that the LLVM compiler is able to take advantage of for existing languages like C or C++

Joan



> On 21 Jul 2019, at 15:23, Tsur Herman <tsur.herman at gmail.com> wrote:
> 
> In high-level languages, returning optionally different types of returned value will is usually handled with
> a tagged union and a switch statement in the caller.
> 
> My intention is to skip this by giving the callee two different addresses to return to depending on what it did with the input.
> 
> for high-level jitted languages, this can simplify the "type inference" pass. 
> 
> Another question on the topic. If I manage the stack myself somehow and replace ret with inline assembly jmp , will
> the processor be able to prefetch instructions beyond the jmp? 
> 
> On Sun, Jul 21, 2019 at 3:14 PM Joan Lluch <joan.lluch at icloud.com <mailto:joan.lluch at icloud.com>> wrote:
> Hy Jay,
> 
> This trick can certainly be used by someone coding in assembly language directly, but I do not think this is possible for a compiler to do so. High level language functions are supposed to have a single entry point and a single return address to the instruction just next to the call. Virtually all high level languages and their compilers are designed according to these semantics and processors are optimized for that too. Inside the callee, the compiler may optimise the actual placement of the return code or it may repeat code to avoid branching, the compiler may also perform tail call optimisations that modify the standard return procedure, but the proper epilog code will effectively be executed in all cases with identical return value and execution transfer to the same return address.
> 
> In order for a compiler to implement what you suggest, I think that some explicit semantics would have to be incorporated to the high level languages being compiled. Currently, in order to declare a function to return a Float64 or an Int8 depending on external conditions, the user must either use function overloads, or function templates, or closures (on languages supporting them). In all these cases, the user must either explicitly declare a function for every type, or the compiler may generate a separate function for every type use case. So in reality the case where a single function may return multiple types does not happen. My point is that since in high level languages there’s no way to specify multiple return types for the same function, there’s no real use case where the compiler may want to do so. Unless I misunderstood your question.
> 
> Joan
> 
> 
> > On 21 Jul 2019, at 11:06, Tsur Herman via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> > 
> > Playing around with calling conventions naked functions and epilogue/prologue...
> > Is it possible/expressible/feasible to alter the return address the function will return to?
> > 
> > For example, when a function may return an Int8 or a Float64, depending on some external state
> > (user, or random variable), instead of checking the returned type in the calling function, is it possible
> > to pass 2 potential return addresses one suitable for Int8 and one suitable for Float64 and let the function return to the right place?
> > 
> > if it is possible, what are the implications? do these inhibit the optimization opportunities somehow?
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190721/14c43365/attachment.html>


More information about the llvm-dev mailing list