[PATCH] Add an option to allow JumpInstrTables to set unnamed_addr and jumptable on all address-taken functions

Sat Jun 28 00:59:26 PDT 2014

Tom Roeder wrote:
> On Fri, Jun 27, 2014 at 10:43 AM, Nick Lewycky<nicholas at mxc.ca>  wrote:
>> Tom Roeder wrote:
>>>
>>> On Thu, Jun 26, 2014 at 11:06 PM, Nick Lewycky<nicholas at mxc.ca>   wrote:
>>>>
>>>> Tom Roeder wrote:
>>>>>
>>>>>
>>>>> On Tue, Jun 24, 2014 at 12:00 PM, Tom Roeder<tmroeder at google.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 23, 2014 at 12:18 PM, Nick Lewycky<nicholas at mxc.ca>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Tom Roeder wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't see why this should have anything to do with a front end
>>>>>>>> component, though it might make sense eventually to have a high-level
>>>>>>>> flag like that in clang that sets a lower-level flag. Currently, I
>>>>>>>> just pass flags directly to LTO through clang with
>>>>>>>>
>>>>>>>> -Wl,--plugin-opt=-jump-table-type=arity
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mostly I want to have a single place to decide whether we're going to
>>>>>>> do
>
>
>>>>>>> the
>>>>>>> jump table transform or not. The reason it goes in the frontend is
>>>>>>> because
>>>>>>> it actually changes language semantics, the frontend may want to emit
>>>>>>> different IR, or may want to warn the user that "&func1 == my_funcptr"
>>>>>>> isn't
>>>>>>> going to work when CFI is on. If you did have it in the frontend, I
>>>>>>> don't
>>>>>>> see why you would also need a plugin option.
>>>>>>
>>>>>>
>>>>>>
>>>>>> That makes sense; I'll look into it.
>>>>>
>>>>>
>>>>>
>>>>> It looks fairly straightfoward to add the option in clang; however,
>>>>> after thinking about it some more, I think that even with a frontend
>>>>> option, there still needs to be a way to trigger CFI from the back end
>>>>> directly, since there needs to be some way to write CFI regression
>>>>> tests in LLVM directly without using clang.
>>>>
>>>>
>>>>
>>>> Sure. For LLVM you can write a .ll file and observe the .s llc emits, and
>>>> you can write a .ll file and run any pass over it and observe the
>>>> resulting
>>>> .ll. I think you just need to write .ll files with unnamed_addr jumptable
>>>> on
>>>> the functions then observe the .s fi
>
>
> les, I don't think you have a use for
>>>> opt tests? For clang you can write a test that you get the .ll you expect
>>>> for a given .c file.
>>>>
>>>> That's the common way of testing llvm and clang. You can write arbitrary
>>>> shell to do things like llvm-link + opt + llc in a test if you want, but
>>>> I
>>>> don't think you should ever need that?
>>>>
>>>> I think all your use cases covered by this? If not, which?
>>>
>>>
>>> That's true for jumptable, but it won't be true for CFI. One use case
>>> is LTO: the only way clang gets to communicate with LLVM at LTO time
>>> (other than the IR itself) is through flags passed to the linker that
>>> get passed on to LLVMgold.so. If there's no backend flag to indicate
>>> CFI, then there's no way to make it happen at LTO time, since it's not
>>> represented in IR.
>>
>>
>> I agree. We have an unsolved problem in general for passing flags from
>> builds down to LTO. If you want an LTO flag that turns on CFI codegen I
>> won't object to that. Similarly you may want to add an llc flag for testing
>> CFI codegen too.
>
> OK. Would you object to JumpInstrTables looking for the CFI flag and
> using that as an indication to mark all address-taken functions
> jumptable + unnamed_addr?

The problem is that unnamed_addr has an actual semantic change. Imagine 
if you were asking for a pass to mark every function readonly or 
noreturn or something. You could have a correct program coming in, 
correct optimizer behaviour all the way through until you perform this 
transform, then the rest of codegen with that. Suppose the resulting 
program has a bug, have you learned that CFI is buggy? That the input 
program is buggy? That the rest of llvm is buggy? None of the above, the 
conclusion you arrive at is "somebody marked unnamed_addr on functions 
without proving this transform was safe". And if you really want that 
I'm not going to veto it, I just don't see the utility.

The code generator is expected to produce a program with behaviour 
equivalent to the IR it's given. When it can't -- for example 'musttail' 
that can't be emitted as tail -- the code generator emits a fatal error. 
Same thing for global variables with initializers we can't emit a 
relocation for.

Nick