[llvm-dev] [pre-RFC] Data races in concurrent ThinLTO processes

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Mon Mar 26 18:09:57 PDT 2018


On Mon, Mar 26, 2018 at 6:03 PM, <katya.romanova at sony.com> wrote:

> Hi Steven,
>
> Look at my replies inline (below your comments).
>
> Katya.
>
>
>
> *From:* stevenwu at apple.com <stevenwu at apple.com>
> *Sent:* Thursday, March 22, 2018 4:46 PM
> *To:* Romanova, Katya <katya.romanova at sony.com>
> *Cc:* Teresa Johnson <tejohnson at google.com>; Mehdi AMINI <
> joker.eph at gmail.com>; Rafael Avila de Espindola <
> rafael.espindola at gmail.com>; Peter Collingbourne <peter at pcc.me.uk>; Hans
> Wennborg <hans at chromium.org>; Reid Kleckner <rnk at google.com>;
> bd1976llvm at gmail.com; llvm-dev at lists.llvm.org
> *Subject:* Re: [pre-RFC] Data races in concurrent ThinLTO processes
>
>
>
> Hi Katya
>
>
>
> Thanks for investigating this. Here is my thought inline.
>
>
>
> On Mar 22, 2018, at 1:32 AM, katya.romanova at sony.com wrote:
>
>
>
>
>
> Hello,
>
>
>
> I am sending the following proposal to discuss issues and solutions
> regarding data races in concurrent ThinLTO processes.
>
>
>
> This caught my attention when we encountered a race condition in ThinLTO
> with caching.
>
> While looking into how ThinLTO deals with the problem of cache files
> reads/writes/deletes I spotted a lot of problems: some of them are related
> to data races, others - to file/resource sharing between different
> processes. I wanted to point out these problems and start the discussion
> about potential solutions. I would like to get your feedback.
>
>
>
>
>
> *Problem #1: In certain common situations, ‘rename’ can fail, causing a
> data race later on.*
>
> Look at the following code in the function ‘write’ in
> ThinLTOCodeGenerator.cpp
>
>
>
> *std::error_code EC =*
>
> *  sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD, TempFilename);*
>
> *if (EC) {*
>
> *  errs() << "Error: " << EC.message() << "\n";*
>
> *  report_fatal_error("ThinLTO: Can't get a temporary file");*
>
> *}*
>
> *{*
>
> *  raw_fd_ostream OS(TempFD, /* ShouldClose */ true);*
>
> *  OS << OutputBuffer.getBuffer();*
>
> *}*
>
> *// Rename to final destination (hopefully race condition won't matter
> here)*
>
> *EC = sys::fs::rename(TempFilename, EntryPath);    *
>
>
>
> Compared to a race-prone direct write of a buffer to a file, this code
> avoids a data race by first writing a buffer to a temp file and then
> renaming that temp file to become the final file in the cache. After
> r315079, when ‘rename’ became more POSIX-compliant, this scheme guarantees
> atomicity of writing a file to the cache,
>
> (except on Wine, where (in some cases) non-atomic ::MoveFileExW is used).
>
>
>
> However, ‘rename’ might fail and return an error code in several different
> situations. If this happens, we will fall back to a direct write to the
> file, which is neither atomic, nor avoids race conditions (causing problem
> #3).
>
> An example where 'rename' can fail on both Windows and Unix-like systems,
> causing us to fall back to using non-atomic write, is problem #2.
>
>
>
> *Solution:*
>
> All solutions for problem #3 (see below) will take care of problem #1.
>
>
>
> I don't think there is a race here but it can probably be optimized. I
> don't see LTOCodeGenerator fall back to non-atomic write in the code. If
> the rename failed, it will just fatal_error and exit. Maybe a better
> solution is just leave the temp file and use it as output. The file will be
> recompiled if needed again (turns into a cache miss).
>
>
> I think there is a non-atomic write (and race) here. See below. If rename
> failed, we remove the temp file and write to the cache file directly (OS <<
> OutputBuffer.getBuffer()).
>
>
>
>     // Rename to final destination (hopefully race condition won't matter
> here)
>
>     EC = sys::fs::rename(TempFilename, EntryPath);
>
>     if (EC) {
>
>       sys::fs::remove(TempFilename);
>
>       raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);
>
>       if (EC)
>
>         report_fatal_error(Twine("Failed to open ") + EntryPath +
>
>                            " to save cached entry\n");
>
>       OS << OutputBuffer.getBuffer();
>
>     }
>
>   }
>
>
>
> What we can do the following way:
>
> Option 1: If the rename fails, we could simply use OutputBuffer (which has
> the same content as our temp file), bypassing the cache altogether.
>
> Option 2: We could try reading from the existing cache first, and then, if
> this read fails, we could use OutputBuffer (in this case, we will be closer
> to the existing code). I don’t understand why do we have to read from the
> existing cache first (even if we just created this cache). I suspect this
> was done for a reason. There is a comment below with the explanation, but
> it doesn’t make it more clear to me. If this reason/explanation makes
> sense, then we should do Option 2. This is exactly what I previously
> proposed in Solution #0 for Problem #3.
>
>
>
>        // Commit to the cache (if enabled)
>
>         CacheEntry.write(*OutputBuffer);
>
>
>
>         if (SavedObjectsDirectoryPath.empty()) {
>
>           // We need to generated a memory buffer for the linker.
>
>           if (!CacheEntryPath.empty()) {
>
>             // Cache is enabled, reload from the cache
>
>             // We do this to lower memory pressuree: the buffer is on the
> heap
>
>             // and releasing it frees memory that can be used for the next
> input
>
>             // file. The final binary link will read from the VFS cache
>
>            // (hopefully!) or from disk if the memory pressure wasn't too
> high.                   <- I don’t understand why we are reading from the
> cache file (via tryLoadingBuffer), though we already have the same content
> in OutputBuffer?
>
>             auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
> <-The explanation says this is done to reduce the memory pressure. But it’s
> not clear to me why this helps to reduce memory pressure, since we still
> have to keep
>
>             if (auto EC = ReloadedBufferOrErr.getError())
> {                                                 <- the content of the
> cache file in a buffer, pointed to by ReloadedBufferOrErr).
>
>               // On error, keeping the preexisting buffer and printing a
>
>               // diagnostic is more friendly than just crashing.
>
>               errs() << "error: can't reload cached file '" <<
> CacheEntryPath
>
>                      << "': " << EC.message() << "\n";
>
>             } else {
>
>               OutputBuffer = std::move(*ReloadedBufferOrErr);
>
>             }
>
>           }
>
>
>
>
>
>
>
>
>
> *Problem #2 (Use of POSIX-compliant ‘rename’ function is not always
> suitable for ThinLTO)*
>
> We just encountered this issue in Sony. For ThinLTO, we use the function
> ‘rename’, which after r315079 doesn’t support renaming across the different
> logical volumes on Windows.
>
> Let’s say a user has a %TEMP% directory on volume C:\, but he/she passed
> the caching directory name located on volume D:\, then ‘rename’ fails.
> Since we don’t support renaming across different volumes after r315079, we
> then fall back to writing to the cache file directly (which is not atomic)
> and hit a data race.
>
> We anticipate that this will be a common situation for our users, many of
> whom are Windows “power users” and have multiple disks for different
> purposes.
>
>
>
> I think this problem doesn’t only affect Windows. The same issue might
> happen on Linux/MacOS, if the user's $TEMP/$TMP directory is located in a
> different file system than the user-specified caching directory.
>
>
>
> *Solution #1 (not so good):*
>
> The patch below solves this problem on Windows, however, we will lose the
> advantage that ‘rename’ gained in r315079 (i.e., its atomicity), which is
> not good.
>
>
>
> This patch assumes that the function 'rename' in Windows/Path.inc is
> expected to work when the source and destination are located of different
> logical volumes or different file systems. Note that function ‘rename’ is
> not ThinLTO-specific and might have some other consumers who want this
> behavior (which was the behavior before r315079). However, it seems that
> this assumption has changed later to make ‘rename’ more POSIX-compliant. In
> this case, we should add a comment to the function 'rename' so that its
> users knew that renaming across different volumes is not supported by
> design. Or we could have two different functions, one POSIX-compliant, not
> allowing renames across different volumes, another non-POSIX-compliant.
>
>
>
> Before r315079, the function 'rename', (in the case when the destination
> file isn't opened), boiled down to calling the function 'MoveFileExW' with
> the MOVEFILE_COPY_ALLOWED flag set, allowing renaming files across the
> volumes.
>
> The current implementation of the function ‘rename’ (in
> 'rename_internal'), essentially calls 'SetFileInformationByHandle'. This
> function is fast and atomic, so in a sense, it's an improvement over the
> previously used 'MoveFileExW'. However, 'SetFileInformationByHandle'
> doesn't work when the source and destination are located on the different
> volumes.
>
>
>
> My fix just falls back to calling 'MoveFileExW' when
> 'SetFileInformationByHandle' returns an error 'ERROR_NOT_SAME_DEVICE'.
>
>
>
>
>
> *+    // Wine doesn't support SetFileInformationByHandle in
> rename_internal.*
>
> *+    // Rename_internal doesn't work accross different disks. In both of
> these*
>
> *+    // cases fall back to using MoveFileEx.*
>
> *     if (EC ==*
>
> *-        std::error_code(ERROR_CALL_NOT_IMPLEMENTED,
> std::system_category())) {*
>
> *-      // Wine doesn't support SetFileInformationByHandle in
> rename_internal.*
>
> *-      // Fall back to MoveFileEx.*
>
> *+        std::error_code(ERROR_CALL_NOT_IMPLEMENTED,
> std::system_category()) ||*
>
> *+        EC == std::error_code(ERROR_NOT_SAME_DEVICE,
> std::system_category())) {*
>
> *       SmallVector<wchar_t, MAX_PATH> WideFrom;*
>
> *       if (std::error_code EC2 = realPathFromHandle(FromHandle,
> WideFrom))*
>
> *         return EC2;*
>
> *       if (::MoveFileExW(WideFrom.begin(), WideTo.begin(),*
>
> *-                        MOVEFILE_REPLACE_EXISTING))*
>
> *+                        MOVEFILE_COPY_ALLOWED |
> MOVEFILE_REPLACE_EXISTING))*
>
> *         return std::error_code();*
>
> *       return mapWindowsError(GetLastError());*
>
> *     }*
>
>
>
> Note, that when the source and destination are located on the different
> logical volumes, we will effectively perform a copy, followed by a delete,
> which are not atomic operations. Since copying to a different volume might
> be quite time consuming, we also increase the probability that some other
> process starts to rename to the same destination file.
>
>
>
> So, this fix for ‘rename’ is not good for ThinLTO (we should do something
> else in this case). ‘rename’ is a generic function and has many different
> users, but unless renaming across the volumes is needed by someone else,
> other than ThinLTO, this patch shouldn’t be accepted.
>
>
>
>
>
> *Solution #2 (better)*
>
> Here I only try to fix a ThinLTO issue, not a generic ‘rename’ issue as in
> solution #1:
>
>
>
> For ThinLTO+caching we don’t have to create temp files in %TEMP% or %TMP%
> directories (or $TMP/$TEMP). We can create them in the same location where
> the cached files are stored or in a ‘temp’ subfolder within this folder.
> With this approach:
>
> -  If the patch described in Solution #1 gets accepted (for the sake of
> other consumers), then from ThinLTO’s perspective ‘rename’ will stay atomic.
>
> -  If the patch doesn’t get accepted, then rename won’t fail, since we
> only rename the files within the same directory.
>
>
>
> *Solution #3*
>
> All the solutions for problem #3 (see below) will take care of problem #2.
>
>
>
> I am surprised that we are not doing #2 already. Rename across the volumes
> is not atomic thus it can cause issues.
>
>
> Yes, we are calling createTemporaryFile, which creates a file in the
> location pointed by $TEMP, $TMP, etc.
>
>         sys::fs::createTemporaryFile("Thin", "tmp.o", TempFD,
> TempFilename);
>
> So, it looks like this problem is applicable to Unix-like systems too
> (assuming that your $TEMP and cache directory are in the different file
> systems).
>
>
>
>
>
>
>
> *Problem #3 (data race)*
>
> Look at the following code in ThinLTOCodeGenerator.cpp. This code gets
> executed if the ‘rename’ function failed (e.g., because of the problem #1
> or problem #2 described above).
>
>
>
> *EC = sys::fs::rename(TempFilename, EntryPath);*
>
> *if (EC) {*
>
> *  sys::fs::remove(TempFilename);*
>
> *  raw_fd_ostream OS(EntryPath, EC, sys::fs::F_None);*
>
> *  if (EC)*
>
> *    report_fatal_error(Twine("Failed to open ") + EntryPath +*
>
> *                       " to save cached entry\n");*
>
> *  OS << OutputBuffer.getBuffer();            **// Two ThinLTO processes
> can write to the same file here*
>
> *                                             // causing data race.*
>
>
>
> Here, we have a data race problem. We actually stumbled across it while
> testing one of the games with ThinLTO+caching.
>
>
>
> We need to find a generic solution to solve data race problem, while
> ensuring ‘atomicity’ of writing to cache files. I’m not a ThinLTO expert
> and I might not be aware of some ThinLTO constraints. Let me know which
> problems you see with the two solutions below. Hopefully, it will trigger a
> further discussion.
>
>
>
>
>
> *Solution #0 (it won’t require much modifications):*
>
> (a)   If ‘rename’ fails, do not write into the cache directly (we don’t
> want to have non-atomic writes!).
>
> (b)   If ‘rename’ fails, try to read from the cache.
>
> *auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();*
>
> *(c)*   If reading from the cache fails simply use the file that you just
> compiled directly (bypassing the cache entirely).
>
>
>
> This solution might cause cache misses (causing recompilations), but at
> least it should prevent data races when two ThinLTO processes write to the
> same file (which can produce invalid cache!). Correctness is more important
> than optimization. It’s worth noticing another shortage of this solution:
> unlike generic solution #1, #2 and #3 described below this particular
> solution won’t solve the problem #4.
>
>
>
> BTW, we should do something with WINE support in ‘rename’ (which is
> non-atomic). I didn’t want to list it as a separate problem here, but it is
> a problem. I could file a separate bug, if necessary.
>
>
>
>
>
> *Solution #1 (naïve generic solution):*
>
> (a) If the process wants to write to the cache file, it opens it with
> 'exclusive read/write' access.
>
> (b) If a process wants to write to the cache file that is ‘exclusively’
> opened by some other process, we could assume that the cache file will be
> successfully created by the first process and simply return from ‘write’
> function. Different processes writing to the cache file of the same name,
> are writing the same content, right?
>
> (c) If the process needs to read from the cache file, it might wait a bit
> while the file is open for 'exclusive write'. If within a certain period
> the cache file doesn’t get released by a writer or gets removed by a pruner
> – oh, well, we have a hit miss. After all, using cache is just an
> acceleration. Correctness is more important.
>
> (d) If we are concerned about data integrity of a cache file (e.g., a
> writer started to write to a file and something bad happened in the
> middle), we could do additional checks (I suspect that this might be
> already implemented in ThinLTO). A writer could add a unique hash at the
> end of the cache file or a CRC32 for each section of the file, which could
> be an indication that the write to this file was successful. A reader
> checks that this unique hash or a CRC32 checksum matches its expectations.
>
>
>
> I'm sure that there are good reasons why this "naive" solution wasn't
> implemented in the first place. If this solution is picked by LLVM
> community, I could write a working prototype for it that we could apply to
> ThinLTO.
>
>
>
>
>
> *Solution #2 (better generic solution):*
>
> It looks like cache file sharing between several ThinLTO processes is a
> classic readers-writers problem.
>
> https://en.wikipedia.org/wiki/Readers%E2%80%93writers_problem
>
>
>
> We could use read-write locks for global inter-process file sharing.
>
> https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock
>
>
>
> ThinLTO's writer (creator/deleter) must acquire a write lock before
> writing to a file and release a write lock when it's done writing.
>
> ThinLTO’s user (reader) must acquire a read lock before reading a file and
> release a read lock when it's done reading. On a Unix-like system,
> read-write locks are part of POSIX-threads (pthreads). There is an
> implementation of Slim read-write locks on Windows, but unfortunately for
> in-process only (i.e., the locks can’t be shared among different
> processes), so it won’t work for us.
>
> https://msdn.microsoft.com/en-us/library/windows/desktop/
> aa904937(v=vs.85).aspx
>
>
>
> However, on Windows, we could implement read-write locks ourselves, using
> a combination of a named mutex and a named semaphore (any unique global
> name could be used for creating a named mutex/semaphore, the simplest one
> will be the cache file name itself).
>
>
>
> Here is an example of an implementation:
>
> http://www.zachburlingame.com/2011/06/simple-reader-writer-
> lock-in-c-on-win32/
>
>
>
> If this solution is picked, I could write a working prototype for it that
> we could apply to ThinLTO.
>
>
>
> *Solution #3 (hybrid generic solution)*
>
> Basically, use solution #2 as a base and integrate some optimizations and
> completeness checking features from solution #1 (1.b and 1.d respectively).
>
>
>
> *Problem #4 (cache misses could have been avoided via synchronization).*
>
>
>
> With our current approach we might often race for the cache file resources
> and as a result have cache misses.
>
> Of course, it's not as dangerous as the data race problem #3, but it
> definitely lowers the efficiency of caching.
>
>
>
> (a) The reader checks that a file exists, but then the pruner deletes it
> before the reader had a chance to read it. Cache miss.
>
> (b) The reader starts reading a file (block by block) and at this time the
> pruner removes the file. The read might fail. This is OS-specific, but
> likely a cache miss.
>
> (c) If the writer will rename a temp file into a file that is currently
> being read by a reader, I don't expect anything good out of it (the
> behavior is OS-specific). In the best case scenario, the read will fail, in
> the worst one the reader will read the wrong content. So, it's a cache miss
> or a correctness issue.
>
> (d) This is not a very likely event, but what if two processes are trying
> to rename to the same file at the same time?
>
>
>
> *Solution:*
>
> Solutions #1, #2 and #3 for problem #3 will take care of problem #4.
>
>
>
>
>
> *(Potential) problem #5:*
>
> The implementation of the function ‘rename’ on Windows used by ThinLTO
> heavily depends on the function CreateFileW or, to be more exact, on its
> flag FILE_SHARE_DELETE being supported and working correctly on *all* the
> file systems that can be mounted on Windows. Is the FILE_SHARE_DELETE flag
> supported on all non-native Windows file systems that we care about? Is it
> supported on WINE? On FAT32?
>
>
>
> If the flag FILE_SHARE_DELETE is not supported, the ‘rename’ fails, and we
> have problem #3.
>
>
>
> *Solution:*
>
> All the solutions for problem #3 will take care of problem #5.
>
>
>
>
>
> Please let me know what do you think about the problems/solutions
> discussed here. Hopefully this pre-RFC shapes into a RFC soon.
>
>
>
> I am not really concerned with all the problems, except Problem #2. From
> my understanding, LTO cache is for incremental build only.
>
> Yes.
>
>
>
> For every file in the cache, there should be one single writer, otherwise,
> it is a hash collision and will result in miscompile (or link error most
> likely). The cached file will only be used when you rebuild the binary.
> Lots of data races are in situations that are not supported.
>
> Are you aware of any other data races when we share LTO cache between two
> links, except the one that we identified above? Please let me know. I’d
> rather know all the potential problems/limitation in advance.
>
>
>
> For its current design, you cannot shared the LTO cache between two
> linking. For example, if you are building llc and libLTO from LLVM and they
> shared many same static archives with common LTO object files, you still
> need to use two different LTO caches.
>
> But what if we fix this particular problem? We could either do it the way
> Peter implemented it for the new LTO API or the way we discussed above. I
> assume that after fixing this, will we be able to share the same cache
> between 2 links.
>
>
>
> The question is, how exactly we should fix it for C LTO API? I’m aware of
> only two major C LTO API stakeholders (Apple and Sony), so most likely
> nobody else cares much about legacy C API. Peter generously proposed to
> port the non-racy algorithm from new LTO API to the legacy C API.
>

What I proposed was that I would review such a patch, not that I would
write one. Apologies for the miscommunication.

Peter

I looked into new LTO API today and it looks that the data race is fixed
> there, though the code is more heavy C++ and will be harder to read.
> Alternatively, I could write a simpler patch for legacy C LTO API, doing
> what we discussed above (i.e. creating temp files in cache directory and if
> renaming of the temp into a cache file failed, bypass reading from the
> cache file).
>
>
>
> What do you think?
>
>
>
>
>
> Steven
>
>
>
>
>
>
>
> Thank you!
>
> Katya.
>
>
>



-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180326/d67cfed0/attachment-0001.html>


More information about the llvm-dev mailing list