[LLVMdev] RFC: Machine Level IR text-based serialization format

Wed Apr 29 19:44:21 PDT 2015

> On 2015 Apr 29, at 19:13, Hayden Livingston <halivingston at gmail.com> wrote:
> 
> What is missing in the current textual format that doesn't allow going
> all the way to machine code?

Nothing.

What's missing is the ability to serialize the machine level itself.
Since many passes have to run to get from .ll to .s, it's currently
hard (impossible?) to test individual machine level passes robustly.
Having a way to serialize machine IR will let us test each pass in
isolation.

> Is the reason for this project because the current .LL format can't
> always be put to bitcode?

Nope, .ll and .bc can represent the same things.

> 
> On Wed, Apr 29, 2015 at 3:24 PM, Alex L <arphaman at gmail.com> wrote:
>> 
>> 
>> 2015-04-29 11:40 GMT-07:00 Duncan P. N. Exon Smith <dexonsmith at apple.com>:
>> 
>>> 
>>>> On 2015-Apr-29, at 06:40, Krzysztof Parzyszek <kparzysz at codeaurora.org>
>>>> wrote:
>>>> 
>>>> On 4/28/2015 7:13 PM, Alex L wrote:
>>>>> 
>>>>> 
>>>>> 2015-04-28 16:26 GMT-07:00 Matthias Braun <matze at braunis.de
>>>>> <mailto:matze at braunis.de>>:
>>>>> 
>>>>>   For that use case it is worth keeping the following things in mind:
>>>>>   - Please try to keep the output of the various dump functions, esp.
>>>>>   MachineInstr::dump(), MachineOperand::dump(),
>>>>>   MachineBasicBlock::dump() as close as possible to the format you use
>>>>>   for serializing.
>>>>> [...]
>>>>> 
>>>>> Ideally the new syntax would replace the existing print/dump syntax.
>>>>> The
>>>>> new syntax will lead to certain missing information when
>>>>> this information can be inferred (e.g. the TiedTo and IsEarlyClobber
>>>>> attributes for register operands that I mentioned earlier in this
>>>>> thread),
>>>>> so maybe we could have some sort of verbose dumping option where
>>>>> absolutely everything is dumped.
>>>> 
>>>> 
>>>> I think that the new syntax is less readable than the current format of
>>>> the "dump" functions, and in the long term it would be better to have
>>>> something more human-friendly.  However, using YAML has the advantage that
>>>> it's easier to parse it than the direct output of "dump" and so it will take
>>>> less time to implement a YAML-based solution.  My concern is that you may
>>>> run out of time to complete this and the file format is not the most
>>>> important thing in this project.  Getting it to work, if only as a proof of
>>>> concept, would be very helpful to everyone.  Coming up with a fancier
>>>> grammar and implementing a parser for it could be done later on top of the
>>>> initial implementation.
>>>> 
>>>> -Krzysztof
>>> 
>>> Until I got to this email, I was opposed to using YAML here -- I'd
>>> prefer a custom grammar and parser -- but I find Krzysztof's point
>>> here pretty convincing.
>>> 
>>> Starting with a (hybrid) YAML representation seems like a reasonable
>>> way to bootstrap a machine IR.  Once it's in place and working, we
>>> can come back and strip away the YAML parts until it's human-
>>> friendly.  (And since YAML is machine-friendly, upgrade scripts for
>>> testcases should be straightforward.)
>> 
>> 
>> I think that this would be a good approach.
>> I will work on the proposed YAML hybrid format for now and will begin
>> sending out the patches soon. Once it's working, people can evaluate it
>> for themselves and see if it suits them or if we need to change it to a
>> custom format.
>> 
>>> 
>>> 
>>> BTW, we probably need some sort of LangRef document for this.  Maybe
>>> docs/MIRLangRef.rst?
>> 
>> 
>> That's fine with me.
>> 
>> Alex
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>