[PATCH] D37099: Added optional validation of svn sources to Dockerfiles.

Thu Aug 31 03:45:44 PDT 2017

ilya-biryukov added inline comments.

================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:27-28
+                         is_ignored,
+                         content_hasher,
+                         hash_algo=hashlib.sha256):
+  """ Computes checksums for all files and symlinks under path.
----------------
klimek wrote:
> ilya-biryukov wrote:
> > klimek wrote:
> > > Why's the algo not part of the content_hasher?
> > > Can't we just use a hasher that has the exact interface we need?
> > - `content_hasher` is responsible for reading files and replacing the contents of the file, while `hash_algo` is responsible for providing hash functions.
> > - we don't call `content_hasher` for broken symlinks and use `hash_algo` on the symlink target instead. It does not make sense to do any replacements in there, as it's not a file.
> > - it's nice that this function controls the lifetime of `hash_algo()` objects. Otherwise, there's a chance the client will create hasher only once and use that for all subsequent calls.
> > 
> > It also seems nice to have the checksumming algorithm name as part of the function interface.
> I think my main concern is that we're mixing levels of abstraction; generally, we have a strategy how to do the hashing, and we have an algorithm that does the visitation (giving project structure, whitelist, etc).
> 
> I'd probably just do a visit_project function that takes a class with perhaps 2 functions:
> visit_file(path, content)
> visit_symlink(path, link)
> 
> Then we'd have a Checksum class that implements these 2 functions, has a hasher member and does the hashing in the visitation, cutting at the responsibilities.
> 
> That way, we also could split this up into multiple files more easily:
> project_tree.py (perhaps with a better name :) allows to visit all files in the project.
> llvm_checksum.py - can now be the main method and the checksum'ing parts of the algo
Good point, will do.

================
Comment at: utils/docker/scripts/llvm_checksum/llvm_checksum_utils.py:131-133
+    elif kind == FileKind.VALID_SYMLINK:
+      hasher.update("@symlink")
+    else:
----------------
klimek wrote:
> ilya-biryukov wrote:
> > klimek wrote:
> > > Why not just feed the content (file content or symlink target)?
> > To distinguish between a broken symlink and a file with content equivalent to the broken symlink's target.
> Why do we care?
Frankly, we don't. Broken symlinks and files are different beasts, but for our use-case this is probably irrelevant.
Could remove this logic altogether and make the script simpler.

https://reviews.llvm.org/D37099