[PATCH] D95125: [WebAssembly] Enable loop unrolling

Tue Jan 26 03:39:15 PST 2021

SjoerdMeijer added a comment.

Webassembly and runtimes is not my field, just saying so you can take my remarks with a bit (or a lot) of salt, but I do have a few opinions on loop-unrolling. :-)
I look at this problem slightly differently:

> What concerns me is that some existing users will definitely be harmed by this, if the default for -O2 changes.

I.e., the way I look at this is that we deliver on of its promises:

> WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

If this is the goal, if we want parity with native execution, that means we would have to apply the same tricks as ahead of time compiled code, otherwise we would never achieve this. Loop-unrolling is known to greatly improve performance, and also to be an enabler for other optimisations. Thus, without loop-unrolling I don't think near native execution speeds will be feasible.

I agree that a speedup of around 10% for a size cost of 10% is a good trade-off. But it doesn't mean users will get this. I.e., the perf data is shown for a few selected, compute intensive loops, but may not be representative for existing code (not yet). One possibility is that people see minor code-size increases, but large perf gains, another is that there is no increase in code size or improvement in performance, it all depends. One way to find out and quantify this, if necessary, is to check an existing and representative code-base.

So this sounds like a good proposal to me, but I would turn it in something positive:

> But if we do this I'd recommend we at least update changelogs and docs in preparation, to try to minimize the annoyance for users.

I.e., some good changes and good performance improvements are coming! :-) If people would like to opt-out, then are a few possibilities:

- Use optimisation level -Os for that makes a more sensitive decision about the performance vs. code-size trade-off.
- Use a compiler flag `-fno-unroll-loops`, but I don't know how to set that in this environment. Probably that could be set with some environment of make variable?

If deployment is the issue, then are a few other strategies. A switch can be made, with a possibility to opt-out, see above. 
Or people can buy into out, if they e.g. set a EXPERIMENTAL_OPTIMIZATIONS flag/value/option, thus get a taste of the perf uplifts if they are curious, and setting out the clear expectations that this will become the default at some point.

Just my 2 cents.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95125/new/

https://reviews.llvm.org/D95125