[PATCH] D19921: [esan] EfficiencySanitizer shadow memory

Wed May 4 20:31:06 PDT 2016

bruening marked 14 inline comments as done.
bruening added a comment.

> The OS-related stuff like fix_mmap_addr should be in a esan_posix.cc file, which can include external headers (like the ones in asan do).

>  That way you don't need to hard-code values or anything, you just include the required headers.

Making it #if LINUX should be sufficient.  This matches what tsan does.

> Can you explain a bit better why there's so much complexity around the mapping? I have some inline comments about it.

Please see the commit message and esan_shadow.h comments: this supports multiple scales, as different tools need different shadow configurations.  I would argue that it is not much more complex than other mappings: it simply varies the offset for the different scales.

================
Comment at: lib/esan/esan.cpp:66
@@ +65,3 @@
+  if (WhichTool == ESAN_CacheFrag)
+    ShadowMapping.initMapping(2); // 4B:1B, so 4 to 1 == >>2.
+
----------------
aizatsky wrote:
> Looks like mapping won't be initialized for other tools.
Other tools need to set their own mapping, but I think your point is that it's too easy to accidentally be uninitialized, so I will add an else case here.

================
Comment at: lib/esan/esan.cpp:72
@@ +71,3 @@
+  uptr AppStart, AppEnd;
+  for (int i = 0; getAppRegion(i, &AppStart, &AppEnd); i++) {
+    uptr ShStart = appToShadow(AppStart);
----------------
filcab wrote:
> Why loop over the regions and mapping them to different shadow regions? Why not do the same as ASan and use the system's virtual memory to your advantage, by simply mapping a huge region which covers everything you want, especially since your shadow is so much smaller?
ASan also has multiple shadow regions.  Any direct-map shadow approach has to use different shadow regions for the different 64-bit app regions.  Our shadow mapping supports both larger and smaller scales than ASan (please see the commit message and the esan_shadow.h comments).

================
Comment at: lib/esan/esan.cpp:74
@@ +73,3 @@
+    uptr ShStart = appToShadow(AppStart);
+    uptr ShEnd = appToShadow(AppEnd - 1) + 1; // Do not pass end itself!
+    VPrintf(1, "Shadow #%d: %zx-%zx (%zuGB)\n", i, ShStart, ShEnd,
----------------
filcab wrote:
> Why?
> Do we want the byte after the last shadow byte and this may not be the same as the shadow of the byte past the last byte of the region? When does that happen?
The mapping expression does not work for the excluded endpoint.

================
Comment at: lib/esan/esan.cpp:75
@@ +74,3 @@
+    uptr ShEnd = appToShadow(AppEnd - 1) + 1; // Do not pass end itself!
+    VPrintf(1, "Shadow #%d: %zx-%zx (%zuGB)\n", i, ShStart, ShEnd,
+            (ShEnd - ShStart) >> 30);
----------------
filcab wrote:
> `[start,end)` is a better way to show half-open ranges (like what ASan does when it shows the mapping).
Address ranges are pretty much always open-ended.  It looks a little cluttered with the brackets there next to the parentheses for the size.

================
Comment at: lib/esan/esan_interceptors.cpp:425
@@ +424,3 @@
+  if (*addr) {
+    if (!isAppMem((uptr)*addr) || !isAppMem((uptr)*addr + sz - 1)) {
+      VPrintf(1, "mmap conflict: %p-%p is not in an app region\n",
----------------
aizatsky wrote:
> filcab wrote:
> > What if `addr+sz` gets to the next region?
> theoretically addr & addr + sz - 1 could be in app memory, but in different regions. Do we worry about such scenario? 
That would be a pretty big mmap.  Tsan does not check for it.  I suppose it's not hard to check.

================
Comment at: lib/esan/esan_shadow.h:47
@@ +46,3 @@
+//
+//   shadow(app) = ((app & 0x00000fff'ffffffff) + offs) >> scale
+//
----------------
aizatsky wrote:
> filcab wrote:
> > Don't abbreviate "offset" (same for next line).
> Mask choice surprises me. Why 3 f's?
To distinguish PIE, lib/stack, and low app addresses, which differ by that 1st nibble of the mask.

================
Comment at: lib/esan/esan_shadow.h:50
@@ +49,3 @@
+// Where offs for 1:1 is 0x00001200'00000000 and for others is tweaked
+// from a pure "<< scale" strategy:
+//   scale == 3: 0x0000900'000000000
----------------
filcab wrote:
> What is that strategy, and why the tweaks?
That is the entirety of it, to shift the offs by the scale, but it does not work for two of the scales and has to be tweaked to avoid failing the double-shadow test.

================
Comment at: lib/esan/esan_shadow.h:51
@@ +50,3 @@
+// from a pure "<< scale" strategy:
+//   scale == 3: 0x0000900'000000000
+//   scale == 2: 0x0000440'000000000
----------------
aizatsky wrote:
> Why can't we use the same shadow offset for all scales? You'll need 2 loads now to calculate shadow address. Why not:
> 
> (app & mask) >> scale + offs? 
It fails the double-shadow test for scales 1 and 2.

There are not two loads at execution time for the fastpath in the compiler instrumentation.

================
Comment at: lib/esan/esan_shadow.h:77
@@ +76,3 @@
+// that need no data load.
+class Mapping {
+public:
----------------
filcab wrote:
> `ShadowMapping`
That's the var name.

================
Comment at: lib/esan/esan_shadow.h:86
@@ +85,3 @@
+  static const uptr App4Start  = 0xffffffffff600000u;
+  static const uptr App4End    = 0xffffffffff601000u;
+  static const uptr ShadowMask = 0x00000fffffffffffu;
----------------
filcab wrote:
> An array of regions would be better than a bunch of constants.
See the comment above.

================
Comment at: lib/esan/esan_shadow.h:90
@@ +89,3 @@
+  uptr ShadowScale;
+  uptr ShadowOffs;
+  void initMapping(uptr Scale) {
----------------
filcab wrote:
> `Scale`
> `Offset`
> (We should just mention it's "shadow"-related on the type and be done with it)
The other constants here have App in them, this distinguishes.

================
Comment at: lib/esan/esan_shadow.h:157
@@ +156,3 @@
+bool isShadowMem(uptr Mem) {
+// We assume this is only really used for debugging and so there's
+// no need to hardcode the mapping results.
----------------
filcab wrote:
> Should we `#ifndef NDEBUG`, to make sure it's only there in debug versions?
Really it's more that it's assumed to not be on a critical path.

http://reviews.llvm.org/D19921