[PATCH] D92902: [llvm-elfabi] Add flag to keep timestamp when output is the same

Haowei Wu via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 14 18:42:40 PST 2020


haowei marked 3 inline comments as done.
haowei added inline comments.


================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:661-663
+static Error restoreStatOnFile(StringRef Filename,
+                               const sys::fs::file_status &Stat,
+                               bool PreserveDates) {
----------------
jhenderson wrote:
> Rather than copying the llvm-objcopy function into this file, maybe it would make more sense to pull the function into the Support library (this might require some small interface changes, such as the `-` check being pulled out of the function.
I agree copying this function is not optimal choice. I don't know what is the best place to put it when I upload the change. But now I prefer to avoid using it for simplicity reasons.


================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:675
+
+  if (PreserveDates)
+    if (auto EC = sys::fs::setLastAccessAndModificationTime(
----------------
jhenderson wrote:
> I might not be following this fully, but does this function have any useful effect if `PreserveDates` is false? Aside from error checking, it looks like it just opens and then closes the file in that case. If that's the case, maybe the `PreserveDates` check can be pulled out too.
It sets the permission bits. It is a use case for llvm-objcpy since it wants to retain the original file's stat (both permission and timestamp if flag allows).  In elf-abi's case, it is not necessary to set the execution bits. Anyway, I prefer to avoid using it in the latest diff.


================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:712
+    Stat.permissions(static_cast<sys::fs::perms>(0777));
+  std::unique_ptr<uint8_t[]> InputData;
+  size_t InputSize = 0;
----------------
jhenderson wrote:
> I agree with @grimar that using a vector type would make more sense here, probably. I'd just change this line to `std::vector<uint8_t>`, then resize the vector at the point when you know the size. It also means you don't need a separate `InputSize` variable. You could even use `std::insert` or similar to append the data to the vector without needing to explicitly resize the vector.
I changed to vector in latest diff and set the size in constructor.


================
Comment at: llvm/lib/InterfaceStub/ELFObjHandler.cpp:674
+
+  if (PreserveDates) {
+    ErrorOr<std::unique_ptr<MemoryBuffer>> BufOrError =
----------------
jhenderson wrote:
> haowei wrote:
> > jhenderson wrote:
> > > I'm not convinced that this is a good approach - it requires copying the entire buffer contents, if I read it correctly, in the case where --preserve-dates isn't used (which presumably will be the norm). llvm-objcopy simply restores the stat of the file after writing it (see `restoreStatOnFile`).
> > > 
> > > Is the intent that there is a behaviour difference between objcopy and elfabi here? In the former, the timestamp is always updated, even if the output is unchanged, whereas in the latter in the current patch it isn't. I guess it depends on how your build system works which is the more useful behaviour (I can see benefits to both).
> > > 
> > > I think you can preserve the existing behaviour using the `restoreStatOnFile` approach and avoiding a copy of the file contents by doing something like the following:
> > > # Read existing file into buffer.
> > > # Write new file into file output buffer.
> > > # Compare buffer contents.
> > > # If unchanged, run `restoreStatOnFile` behaviour on output file.
> > > 
> > > Something similar probably applies to the TBE output too.
> > I didn't work on llvm-objcopy before so I am not familiar with its approach, sorry about that.
> > 
> > The restoreStatOnFile did sound like a better approach at first glance. However, after I written down the code, I feel it may not be a better approach here.
> > 
> > In my case:
> > 1. Map existing file into MemoryBuffer(mmap)
> > 2. Copy content of MemoryBuffer into a buffer on heap
> > 3. Write ELF Stub into FileBuffer(mmap)
> > 4. Compare the FileBuffer with buffer on heap.
> > 5. Commit the change
> > 6. If unchanged, set PreserveDatas flag and run `restoreStatOnFile` to restore stat and timestamp.
> > 
> > Since input and output are the same file, I cannot omit step 2 to copy the existing file into a dedicate buffer. I think it has similar cost or maybe higher cost compared to write the ELFStub into buffer and copy the buffer to file later. Since the existing file can be quite big (It can be a legit ELF file, not a stub) while ELFStub is quite small. In other words, this approach did not make the cost lower. Code wise, I think it is a bit more complicated.
> > 
> > I am not intended to make elfabi to behave differently than objcopy. I just want to make it easier to implement and avoid unnecessary memory copies. If I understood correctly, objcopy will always write the output file and restore file stats. If `PreserveDates` is set, it will also restore "last access and  modification time", regardless of the content of the output. It does not seem to compare the content of existing file, which makes it easier to use `restoreStatOnFile` than my case.
> > 
> > In TBE case, I tried to use `restoreStatOnFile` but it looks like I cannot. I have to manually close `raw_fd_ostream Out` before I call `restoreStatOnFile`. And I will get a assertion failure in testing: `llvm/lib/Support/raw_ostream.cpp:795: void llvm::raw_fd_ostream::close(): Assertion 'ShouldClose' failed.`
> > 
> > Also if I have to use `restoreStatOnFile`, it looks like I have to copy this function from objcopy unless I found a better library to put it, which I didn't find one at first glance. It does not look like a good choice to create a new library and put this single function there as well.
> > 
> > Please let me know if you have better suggestions. 
> I'm not sure I really follow everything you've explained here. Are you saying that `MemoryBuffer` is backed by the file, i.e. it needs the file open to be usable? If that's the case, then fair enough. I think I previously thought it was a copy of the file, copied into memory, already (and therefore there would be no need to copy it again).
MemoryBuffer use mmap when possible (https://llvm.org/doxygen/MemoryBuffer_8cpp_source.html#l00322). So since elfabi need to overwrite the file, it need to copy the file to the memory instead of just mmap to the memory, as mmap is backed by the file. If a memory copy is unavoidable, I prefer to use my previous approach for simplicity.

The build system we intended to use this tool is GN/Ninja. In the current stage, the use case is that for each shared library in our build, a '.tbe' (the text file) file will be generated by elfabi. For a ELF file that will be linked against the shared library, we set the '.tbe' as one of its deps. Therefore, for incremental builds, if the shared library is changed by a code change but does not affect linking, the '.tbe' file's content will stay the same. With this patch, when '.tbe' file is unchanged, the timestamp will not be updated, the ELF file that depends on the shared library will not be relinked by the build system.

We still prefer to use the approach that avoid updating the file when elfabi know the output is the same, than always updating the file but restore the file stats. The implementation is simpler and it has less chance to cause data race in case some other program read the file when it is updated in the middle. Both '.tbe' files and ELF stub file generated from elfabi tool are quite small as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92902/new/

https://reviews.llvm.org/D92902



More information about the llvm-commits mailing list