[PATCH] D27642: DebugInfo: Added support for Checksum debug info feature (LLVM IR part)

Adrian Prantl via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 13 09:02:15 PST 2016


aprantl added inline comments.


================
Comment at: docs/LangRef.rst:4018
+                     data: "000102030405060708090a0b0c0d0e0f")
+
 .. _DIBasicType:
----------------
aaboud wrote:
> mehdi_amini wrote:
> > aaboud wrote:
> > > mehdi_amini wrote:
> > > > aaboud wrote:
> > > > > aprantl wrote:
> > > > > > aaboud wrote:
> > > > > > > mehdi_amini wrote:
> > > > > > > > rnk wrote:
> > > > > > > > > aaboud wrote:
> > > > > > > > > > mehdi_amini wrote:
> > > > > > > > > > > What are the valid list of "type"? What type is the "type" field? Why the prefix `ChecksumType_` and not just `MD5`?
> > > > > > > > > > > Why not a simpler approach where the type is an MDString? (That already how you parse it in DIChecksum::getChecksumType anyway)
> > > > > > > > > > Right now the valid list of types are:
> > > > > > > > > > ChecksumType_MD5
> > > > > > > > > > ChecksumType_SHA1
> > > > > > > > > > 
> > > > > > > > > > I will add this to the document.
> > > > > > > > > > 
> > > > > > > > > > Answering your other questions:
> > > > > > > > > > 1. The current solution does not handle the type as string, but as an enumeration with a known token (see LLexer.cpp)
> > > > > > > > > > 2. Omitting the prefix ChecksumType_ will not allow us to recognize the token.
> > > > > > > > > > 3. This is similar to what is done with DIFlag...
> > > > > > > > > > 4. Allowing the user to write the type as string "MD5" is easier to implement, however, checking invalid types will not be elegant as this solution.
> > > > > > > > > > 
> > > > > > > > > > Bottom line, I am ready to go with either solutions.
> > > > > > > > > > Let's hear what others think.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > I think we could fold DIChecksum into DIFile. It would look like:
> > > > > > > > >   !DIFile(filename: ..., checksumkind: ChecksumType_MD5, checksum: !"deadbeef")
> > > > > > > > > 
> > > > > > > > > This adds a dead integer to DIFile when checksums aren't being used, but it's probably more efficient and less work than having a separate node.
> > > > > > > > I rather have a much simpler implementation, the validation can be done in the verifier easily and it would handle similarly in-memory validation, bitcode and textual IR (the first two are the most important by the way, textual IR is not what we should optimize for).  
> > > > > > > > 
> > > > > > > > With the change that @rnk suggested that would make it quite a smaller patch I believe.
> > > > > > > Fine with me.
> > > > > > I'm in favor of rolling it into DIFile. As Paul noted, DWARF 5 also allows for an MD5 checksum on each file entry in the line table section so I'm expecting this to become the normal case eventually.
> > > > > > 
> > > > > > What's the problem with recognizing MD5 as a context-sensitive token in the parser?
> > > > > It is not only MD5, right? should the LLexer recognize all possible types of checksum, i.e. MD5, SHA1, etc.?
> > > > > Or like it is implemented in this patch, it only needs to recognize the checksum type prefix (whatever it will be).
> > > > > 
> > > > > So, how do you prefer to implement this?
> > > > > 1. Prefix - (if yes, what should be the prefix?)
> > > > > 2. Recognize all types
> > > > > 3. Have a wild string
> > > > Alternative representation:
> > > > 
> > > > `!DIFile(filename: ..., checksum: !"MD5:deadbeef")`
> > > > 
> > > > 
> > > Once again, I do not mind how we implement it, I just need to hear an agreement from most of the participants in this review.
> > > however, I think that with this approach it might complicate the code in CodeViewDebug, as it needs to parse the checksum string to get the type.
> > > 
> > > I would like to hear Reid opinion on this suggestion before I go and implement it.
> > Is there already a patch up that shows the `CodeViewDebug`, so that I can see how it gets more complicated there?
> > I'd expect it to be a `StringSwitch` on the prefix in the string instead of a switch on the enum value and that's it.
> Sorry, I do not have this patch, and I am not planing to.
> I hope that Reid can implement this in CodeViewDebug once I commit this patch.
> This is why I would like to hear his agreement to your proposal before I go this direction.
I was skeptical at first about using a string to represent the checksum, but I came around to agree that this format is likely the most straight-forward way to add this to the IR assembler language.

`!DIFile(filename: ..., checksum: !"MD5:deadbeef") // with checksum being an optional nullable field.`

We probably still want a DIChecksum wrapper around the string that converts the hex string into an llvm::MD5, etc, and/or stores it as a byte string for a more efficient representation.


================
Comment at: include/llvm/IR/DebugInfoMetadata.h:491
+                             bool ShouldCreate = true) {
+    // Do not create checksum with "None" checksum type.
+    if (Type == None)
----------------
aaboud wrote:
> mehdi_amini wrote:
> > What is the reason to have the "None"?
> We need to pass the checksum data and type to the DICompileUnit together with the filename and directory. However, not always we have a checksum data to send, and that is why we need an indicator that there is no checksum available.
> 
> If you try to create a checksum with None type Checksum::get() will return a nullptr.
> This is the best way I could find to deal with the issue.
> Please, let me know if you have better way to do that.
Why not allow for a nullptr to be passed in that interface?


https://reviews.llvm.org/D27642





More information about the llvm-commits mailing list