[PATCH] D37099: Added optional validation of svn sources to Dockerfiles.

Ilya Biryukov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 31 01:59:43 PDT 2017


ilya-biryukov added inline comments.


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:1
+"""Contains helper functions to compute checksums for LLVM checkouts.
+"""
----------------
klimek wrote:
> Why 2 files? (generally, I dislike files named "util" :)
> 
Totally agree, `util` is a bad choice. 
One file(`util`) is a source code of a library, the other is a source code of an executable.
The idea is to split the command argument parsing logic from an actual library code.
How about `_lib` instead of `_util`? Or do you think merging the files is better?


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:27-28
+                         is_ignored,
+                         content_hasher,
+                         hash_algo=hashlib.sha256):
+  """ Computes checksums for all files and symlinks under path.
----------------
klimek wrote:
> Why's the algo not part of the content_hasher?
> Can't we just use a hasher that has the exact interface we need?
- `content_hasher` is responsible for reading files and replacing the contents of the file, while `hash_algo` is responsible for providing hash functions.
- we don't call `content_hasher` for broken symlinks and use `hash_algo` on the symlink target instead. It does not make sense to do any replacements in there, as it's not a file.
- it's nice that this function controls the lifetime of `hash_algo()` objects. Otherwise, there's a chance the client will create hasher only once and use that for all subsequent calls.

It also seems nice to have the checksumming algorithm name as part of the function interface.


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:48
+
+  def proc_file(fullpath):
+    if os.path.islink(fullpath):
----------------
klimek wrote:
> Call it process_file? (proc_file sounds like a file in /proc :P)
All short names are already taken by Linux :-(
Renamed to `process_file`.


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:122-123
+  hasher = hash_algo()
+  # Feed the number of files to the hasher.
+  hasher.update(str(len(file_checksums)))
+  for kind, file_path, checksum in file_checksums:
----------------
klimek wrote:
> Given that we feed in paths, this seems redundant?
Totally agree, removed it.


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:131-133
+    elif kind == FileKind.VALID_SYMLINK:
+      hasher.update("@symlink")
+    else:
----------------
klimek wrote:
> Why not just feed the content (file content or symlink target)?
To distinguish between a broken symlink and a file with content equivalent to the broken symlink's target.


https://reviews.llvm.org/D37099





More information about the llvm-commits mailing list