[cfe-dev] modules and distributed builds

Thompson, John John_Thompson at playstation.sony.com
Fri Jan 23 15:43:19 PST 2015


I've been asked about this, but it's currently beyond my knowledge.  Can someone shed some light?  Does Clang need fixing for this?

***
> >- Does it allow for processes being terminated whilst creating cache entries?
>
> A subsequent run will clean up orphaned file locks from a previously crashed or aborted process,
> but it's not clear if other modules cache files are dealt with.
>
> I think the build system should probably include deleting the entire cache directory during a
> normal "realclean" or perhaps even a "clean" operation as well.  If a build reports errors in
> intermediate files that's typically the first thing I would try, i.e. do a clean and rebuild.

When we distribute builds we may abort processes at any point during the build.  It is essential
that doing this does not prevent other processes in the build from completing successfully.  In
this scenario failing the build with an error, doing a clean and a rebuild is not an option.

Our requirement us that if two compilation jobs A and B are running simultaneously and will create
a cache entry C, then:
1. A and B will arbitrate access to the cache as necessary to create and read C.
2. Aborting either A or B at any moment (without any opportunity for clean up by the aborted
process) will not prevent the other from completing successfully.

The important case is when A holds the lock and is writing C but is terminated before completing
the write.  It is essential that C won't be left in a state where B (either waiting on the same
lock or running at a later date) will think it is valid, try to use and error out.

This can be avoided if the procedure for writing to the cache goes like this:

1. Take the lock
2. If C does not exist then:
2.1. create C.TEMP
2.2. Rename C.TEMP to C
3. Release lock

This guarantees (in the absence of power outages, disk failures etc) that an invalid C will never
be left for B to fail on.

There are some minor implementation wrinkles in making the the rename as atomic as possible (on
Windows C.TEMP and C must be on the same volume so that a copy is not required), but this ought to
be straight forward to implement if it hasn't been already.

Thanks.

-John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150123/ba1481db/attachment.html>


More information about the cfe-dev mailing list