[PATCH] D15393: [X86] Order the local stack symbols to improve code size and locality.

Tue Feb 9 00:17:23 PST 2016

On Wed, Dec 9, 2015 at 2:30 PM, Hans Wennborg via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> hans added a comment.
>
> This looks very interesting.
>
> Maybe update the commit message and/or add to the comments in the file why
> this is beneficial for code size, since it wasn't obvious for me at least.
> Do I understand correctly that the idea is that addressing objects closer
> to %esp generally requires  less code because it can use a tighter
> addressing mode?
>

Sorry I'm a bit late to the party here, but I've done some similar analysis
as what I expect led Zia to writing this patch. I don't have anything
material to add to the patch review itself, but I did think that some
concrete data I found would be useful.

Below is a pie chart of the total text size of one of our titles (the
binary is similar in size to clang) broken down by the size of the
instruction (the pie slice size represents to total number of *bytes* of
all the instructions falling in that bucket; not the number of instructions
that fell into that bucket).

As you can see, while most of the text size is accounted for by 3, 4, and 7
byte instructions, there are some outliers.

The table below drills down into specifically movq instructions and the
addressing modes that are present on them (please pardon the ambiguous
commas, this is taken from an old notebook; basically every other comma is
a comma of the list of strings, and the commas in between separate operands
e.g. the first entry in the "12" row is the addressing mode from "movq $0,
424(%rsp)").

The takeaway is that there are very clear pairings: 4<->7, 5<->8, 9<->12
due to having to bump up from an imm8 to an imm32 (which adds 3 bytes). At
least the 9<->12 ones show that this effect is prevalent for stack
operations, and the overall size of the 12 byte slice seems to corroborate
the .5-1.5% code size improvements that Zia is seeing with this patch. (not
that this patch perfectly annihilates that slice, but that it establishes
some baseline expectations and .5-1.5% is in the ballpark)

One interesting thing to also note: the 11, 9, and 12 rows have such
bloated instruction sizes primarily because of needing to store an
immediate "0" as an imm32. (we have some open PR's about this I think).

-- Sean Silva

>
>
> ================
> Comment at: lib/CodeGen/PrologEpilogInserter.cpp:708
> @@ -707,1 +707,3 @@
>
> +  std::vector<int> ObjectsToAllocate;
> +
> ----------------
> Maybe use SmallVector instead?
>
> ================
> Comment at: lib/CodeGen/PrologEpilogInserter.cpp:710
> @@ -708,2 +709,3 @@
> +
>    // Then assign frame offsets to stack objects that are not used to spill
>    // callee saved registers.
> ----------------
> Since this block of code no longer does the actual assigning, maybe change
> the comment to "prepare to assign offsets ..." or "gather objects that need
> stack offsets ..." or something like tthat.
>
> ================
> Comment at: lib/Target/X86/X86FrameLowering.cpp:2678
> @@ +2677,3 @@
> +  // indexing into it, instead of having to search for it every time.
> +  std::vector<X86FrameSortingObject>
> SortingObjects(MFI->getObjectIndexEnd());
> +
> ----------------
> Maybe SmallVector?
>
> ================
> Comment at: lib/Target/X86/X86FrameLowering.cpp:2722
> @@ +2721,3 @@
> +  // For FP, it should be flipped.
> +  i = 0;
> +  for (auto &Obj : SortingObjects) {
> ----------------
> Declare i here, instead of at the start of the function.
>
> ================
> Comment at: lib/Target/X86/X86FrameLowering.cpp:2727
> @@ +2726,3 @@
> +    else
> +      // All invalid items are sorted at the end, so it's safe to stop.
> +      break;
> ----------------
> I think breaking early would be more in line with LLVM style.
>
>   if (!Obj.IsValid)
>     break;
>   ObjectsToAllocate[i++] = ...
>
> ================
> Comment at: lib/Target/X86/X86FrameLowering.h:205
> @@ +204,3 @@
> +    bool IsValid;                 // true if we care about this Object.
> +    unsigned int ObjectIndex;     // Index of Object into MFI list.
> +    unsigned int ObjectSize;      // Size of Object in bytes.
> ----------------
> Just "unsigned" is more common in LLVM code
>
> ================
> Comment at: lib/Target/X86/X86FrameLowering.h:224
> @@ +223,3 @@
> +  /// at the end of our list.
> +  struct X86FrameSortingAlgorithm {
> +    inline bool operator() (const X86FrameSortingObject& a,
> ----------------
> rnk wrote:
> > Ditto, sink this to the .cpp file to reduce visibility.
> How about calling it X86FrameSortingComparator instead?
>
>
> http://reviews.llvm.org/D15393
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160209/c8b36de0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2016-02-08 at 11.42.25 PM.png
Type: image/png
Size: 118106 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160209/c8b36de0/attachment-0001.png>