[PATCH] D143344: [bazel] Port zstd support

Aaron Siddhartha Mondal via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 28 16:25:46 PDT 2023


aaronmondal added inline comments.


================
Comment at: utils/bazel/WORKSPACE:126
+maybe(
+    http_archive,
+    name = "llvm_zstd",
----------------
GMNGeoffrey wrote:
> aaronmondal wrote:
> > GMNGeoffrey wrote:
> > > Can we use `new_git_repository` here instead?
> > This was because of the hash failure for archives with github right? I'm not sure whether `new_git_repository` is a good idea here since we are using a stable archive (which AFAIK is guaranteed to be hash-stable). At least according to the bazel docs 
> > 
> > > Prefer http_archive to git_repository. The reasons are:
> > 
> >   - Git repository rules depend on system git(1) whereas the HTTP downloader is built into Bazel and has no system dependencies.
> >   - http_archive supports a list of urls as mirrors, and git_repository supports only a single remote.
> >   - http_archive works with the repository cache, but not git_repository. See #5116 for more information.
> > 
> > I'm only really worried about the third point here. You know the best practices better though, so I'm happy to change this if you prefer `new_git_repository`.
> Oh I actually didn't know about that third point. Yeah that sucks. You make good points that this is pretty stable. I guess I'm used to frequently-updating dependencies like LLVM :-) Let's keep it as `http_archive`.
> 
> An aside on hash stability:
> 
> Those archives are only sort-of guaranteed to be hash stable because everyone depends on that. gzip doesn't actually guarantee hash-stability by the spec and neither does git guarantee `git archive` stability. It recently changed the algorithm in a way that changed the hash. That broke the world when GitHub updated its git version: https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes. It seems that GitHub had a guarantee that "release assets" would be hash-stable and many people interpreted that to include the "source archive" link present on the release page (right next to the things manually uploaded), but GitHub didn't. Separately, someone from Bazel had received a private commitment from support that those would remain stable, but obviously that commitment didn't involve anyone who actually worked on the relevant code :-D Anyway it was a fun dumpster fire, but I still think that it would be better if Bazel provided a way that didn't rely on these unstable hashes. Probably using the GitHub API with the ref arg, as suggested in those GitHub docs, would provide a way to at least have a `new_github_repository` rule. Anyway, this is all fine for now :-)
Thanks for linking that article I didn't know about this.

> If you rely on stable archives for security (ensuring you don’t accidentally trigger a tarbomb, for example), we recommend you switch to release assets instead of using source downloads.

Well looks like those living at head will seriously need to look for new options 🤣

> it would be better if Bazel provided a way that didn't rely on these unstable hashes

Yep. This is definitely a big issue. I'm actually surprised that the `git_archive` variants are *less* cacheable than `http_archive`s. My intuition was like "hey if a new commit comes in just treat it like a git pull" or something, but apparently that's not the case. I'll keep track of https://github.com/bazelbuild/bazel/pull/15536 so that we can switch when the git repo rules allow caching.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143344/new/

https://reviews.llvm.org/D143344



More information about the llvm-commits mailing list