[PATCH] D64537: [WebAssembly] Implement thread-local storage (local-exec model)

Thomas Lively via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 11 15:30:38 PDT 2019


tlively added inline comments.


================
Comment at: clang/include/clang/Basic/BuiltinsWebAssembly.def:33
+// Thread-local storage
+TARGET_BUILTIN(__builtin_wasm_tls_size, "z", "nc", "bulk-memory")
+
----------------
quantum wrote:
> quantum wrote:
> > aheejin wrote:
> > > Why is it `c`(const)? According to [[ https://github.com/llvm/llvm-project/blob/e6695821e592f95bffe1340b28be7bcfcce04284/clang/include/clang/Basic/Builtins.h#L104-L108 | this comment ]], this is true if this function has no side effects and doesn't read memory, i.e., the result should be only dependent on its arguments. Can't wasm globals be memory locations in machines?
> > I was thinking that since the global is immutable, so the value is always a constant.
> According to @tlively, there is no solid definition on whether a global (especially a constant one), counts as memory access. For now, I am going to change this to not constant. We can always change it back later.
I think this requires more conversation.


================
Comment at: lld/test/wasm/data-layout.ll:43
+; CHECK-NEXT:       - Index:           3
+; CHECK-NEXT:         Type:            I32
 ; CHECK-NEXT:         Mutable:         false
----------------
These globals don't have enough information to tell the reader what they even are, and they don't have anything to do with the data layout, so how about skipping these in the test with a comment saying what is being skipped?


================
Comment at: lld/test/wasm/tls.ll:57
+;   memory.init 0, 0
+;   end
+
----------------
quantum wrote:
> aheejin wrote:
> > Hmm, I think there might be a way to actually print disassembly results. There are '*.test' files in test directory, in which we have some examples. For example, [[ https://github.com/llvm/llvm-project/blob/master/lld/test/wasm/export-table.test | this test ]] has a sequence of commands you want to run, and you can put input files in a separate directory. That test uses `obj2yaml`, but can we possibly use `llvm-objdump` or something to disassemble?
> We already know that we can do something like
> 
>     Run: obj2yaml %t.wasm | sed -n '/Body:/{s/^\s*Body:\s*//;s/../0x& /gp}'  | llvm-mc -disassemble -triple=wasm32
> 
> to compare the actual assembly. As for `llvm-objdump`, it seems to be unable to disassemble the WASM properly:
> 
> 
> ```
> .../tools/lld/test/wasm/Output/tls.ll.tmp.wasm:	file format WASM
> 
> 
> Disassembly of section CODE:
> 
> 00000000 CODE:
>         # 4 functions in section.
>        1: 02 00                        	block   	invalid_type
>        3: 0b                           	end
>        4: 10 00                        	call	0
>        6: 20 00                        	local.get	0
>        8: 24 01                        	global.set	1
>        a: 20 00                        	local.get	0
>        c: 41 00                        	i32.const	0
>        e: 41 08                        	i32.const	8
>       10: fc 08 00 00                  	memory.init	0, 0
>       14: 0b                           	end
>       15: 0f                           	return
>       16: 00                           	llvm-objdump: lib/Target/WebAssembly/WebAssemblyGenAsmWriter.inc:2032: void llvm::WebAssemblyInstPrinter::printInstruction(const llvm::MCInst *, llvm::raw_ostream &): Assertion `Bits != 0 && "Cannot print this instruction."' failed.
> 
> ```
It might be worth filing an LLVM bug for this (or possibly fixing in a separate CL).


================
Comment at: lld/wasm/Symbols.cpp:208
+    if (Segment->OutputSeg->Name == ".tdata")
+      return Segment->OutputSegmentOffset + Offset;
     return Segment->OutputSeg->StartVA + Segment->OutputSegmentOffset + Offset;
----------------
It would be great to have an explanatory comment here.


================
Comment at: lld/wasm/Writer.cpp:777
+      break;
+    }
+
----------------
quantum wrote:
> aheejin wrote:
> > Is it guaranteed that there's only one TLS segment?
> Yes, all the TLS input segments will be merged into `.tdata`.
--no-merge-data-segments!


================
Comment at: lld/wasm/Writer.cpp:250
+      TLSSize->Global->Global.InitExpr.Value.Int32 = Seg->Size;
+    }
   }
----------------
What happens when there are multiple TLS sections and `--no-merge-data-segments` is used? I assume their sizes should be added together?


================
Comment at: lld/wasm/Writer.cpp:638
+  if (Name.startswith(".tbss."))
+    return ".tdata";
   return Name;
----------------
Does this mean we can't control whether .tdata or .tbss comes first? Is that important for anything?


================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1095
+WebAssemblyTargetLowering::LowerGlobalTLSAddress(SDValue Op,
+                                                 SelectionDAG &DAG) const {
+  SDLoc DL(Op);
----------------
quantum wrote:
> aheejin wrote:
> > If you do the conversion in `WebAssemblyISelDAGToDAG.cpp`, you don't need to create `WebAssemblyISD` node,  `SDTypeProfile`, and `SDNode` for every single instruction you want to generate. This `WebAssemblyISelLowering.cpp` is a part of legalization so it cannot generate machine instructions directly, whereas `WebAssemblyISelDAGToDAG.cpp` is a part of instruction selection so you have direct access to machine instructions. I think moving routines there can be cleaner?
> It seems most other architectures do it in `*TargetLowering`, so I decided to do the same.
It would be good to try @aheejin's suggestion out and see if it leads to cleaner code. Cargo culting other targets is a good way to get started, but we should still try to make our code as simple as possible.


================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td:124
+def WebAssemblyconsti32 : SDNode<"WebAssemblyISD::CONST_I32",
+                                    SDT_WebAssemblyConstI32>;
 
----------------
aheejin wrote:
> Nit: indentations look weird for both defs
I agree with @aheejin that it would be ideal if we didn't need to add these new WebAssemblyISD items. We've come this far without needing them, so it would be good to keep that going.


================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp:192
       Stripped |= stripAtomics(M);
-      Stripped |= stripThreadLocals(M);
     }
----------------
quantum wrote:
> aheejin wrote:
> > Looks like this feature requires both bulk memory and mutable global. Shouldn't we strip thread locals if either is not enabled?
> Do we want to do that? I think having your thread locals vanish if you didn't pass `-mbulk-memory` is surprising. It might make more sense to strip thread locals if `-pthread` is not passed.
I have a patch up to make `-pthread` imply `-mbulk-memory`, so this shouldn't be a problem from a UI perspective. I agree with @aheejin that we should strip TLS if the necessary features aren't present to preserver feature-correctness.

All the errors for using TLS without bulk memory should also apply to mutable-globals, which I hadn't considered before. I will update my patch so -pthread implies -mmutable-globals as well.


================
Comment at: llvm/test/CodeGen/WebAssembly/target-features-tls.ll:23
 ; ATOMICS-NEXT: .ascii "atomics"
 ; ATOMICS-NEXT: .tbss.foo,"",@
----------------
quantum wrote:
> aheejin wrote:
> > Is thread local feature still dependent on atomics? If not, do we still need to test this with both atomics on and off? This now looks rather dependent on bulk memory and mutable global. Should we test this feature with those options instead? (I can be mistaken, so please let me know if so)
> On second thought, this test is somewhat meaningless now, since the thread local is always created regardless of whether atomics are enabled. What does @tlively think?
I agree with @aheejin that we should change the test to toggle bulk-memory and mutable-global and strip TLS if those are not both present.


================
Comment at: llvm/test/CodeGen/WebAssembly/tls.ll:16
+  ; O2-NEXT: i32.const tls
+  ; O2-NEXT: i32.add
+  ; O2-NEXT: return
----------------
quantum wrote:
> aheejin wrote:
> > If the only difference of O0 and O2 results is the order of `global.get` and `i32.const`, you can use [[ https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive | `CHECK-DAG` ]] directive, like
> > ```
> > ; CHECK-DAG: global.get __tls_base
> > ; CHECK-DAG: i32.const tls
> > ...
> > ```
> > 
> > The same for other functions too. Maybe we don't need separate O0 and O2 testing in this file.
> Will do. I didn't know `CHECK-DAG` existed.
If the point of testing with -O0 and -O2 is to test both with and without FastIsel, it might be better to pass `-fast-isel` to explicitly enable it instead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64537/new/

https://reviews.llvm.org/D64537





More information about the llvm-commits mailing list