[PATCH] D37099: Added optional validation of svn sources to Dockerfiles.

Manuel Klimek via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 31 02:15:46 PDT 2017


klimek added inline comments.


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:27-28
+                         is_ignored,
+                         content_hasher,
+                         hash_algo=hashlib.sha256):
+  """ Computes checksums for all files and symlinks under path.
----------------
ilya-biryukov wrote:
> klimek wrote:
> > Why's the algo not part of the content_hasher?
> > Can't we just use a hasher that has the exact interface we need?
> - `content_hasher` is responsible for reading files and replacing the contents of the file, while `hash_algo` is responsible for providing hash functions.
> - we don't call `content_hasher` for broken symlinks and use `hash_algo` on the symlink target instead. It does not make sense to do any replacements in there, as it's not a file.
> - it's nice that this function controls the lifetime of `hash_algo()` objects. Otherwise, there's a chance the client will create hasher only once and use that for all subsequent calls.
> 
> It also seems nice to have the checksumming algorithm name as part of the function interface.
I think my main concern is that we're mixing levels of abstraction; generally, we have a strategy how to do the hashing, and we have an algorithm that does the visitation (giving project structure, whitelist, etc).

I'd probably just do a visit_project function that takes a class with perhaps 2 functions:
visit_file(path, content)
visit_symlink(path, link)

Then we'd have a Checksum class that implements these 2 functions, has a hasher member and does the hashing in the visitation, cutting at the responsibilities.

That way, we also could split this up into multiple files more easily:
project_tree.py (perhaps with a better name :) allows to visit all files in the project.
llvm_checksum.py - can now be the main method and the checksum'ing parts of the algo


================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:131-133
+    elif kind == FileKind.VALID_SYMLINK:
+      hasher.update("@symlink")
+    else:
----------------
ilya-biryukov wrote:
> klimek wrote:
> > Why not just feed the content (file content or symlink target)?
> To distinguish between a broken symlink and a file with content equivalent to the broken symlink's target.
Why do we care?


https://reviews.llvm.org/D37099





More information about the llvm-commits mailing list