[PATCH] [X86][SchedModel] Add missing scheduling model for SSE related instructions

Quentin Colombet qcolombet at apple.com
Fri Feb 14 17:31:44 PST 2014


Hi nadav, craig.topper, chandlerc, grosbach,

Hi,

The attached patch sets a scheduling model for almost all[1] SSE related instructions (i.e., the instruction located in X86InstrSSE.td).

During my experiments on the llvm test suite, spec2k, and spec2k6, I do have seen different scheduling decisions for some of the tests but none had a measurable impact on the runtime (see experiments for the details).

@Chandler, could you run some performance measurement with that change, like you did for the addressing mode matcher? (Of course anyone willing to measure performance here is welcome :D)

Thanks for your review!

Note: I apologize for the size of the patch, but the patched file is about 9K lines, thus there are a lot of places to patch!

** Context **

Currently the scheduling model for X86 architecture is incomplete. In particular, many SSE related instructions miss a scheduling model.
This patch is a step toward completing the model.
Put another way, this patch goes into the direction of fixing:
  // FIXME: SSE4 and AVX are unimplemented. This flag is set to allow
  // the scheduler to assign a default model to unrecognized opcodes.
  let CompleteModel = 0;

** Proposed Approach **

The patch defines new or refines existing generic scheduling classes to match the behavior of the SSE instructions.
It also maps those scheduling classes on the related SSE instructions.
The idea behind this approach is to make the targeting of futur X86 architecture easier, having Haswell vs. Sandy Bridge (and Ivy Bridge) as a guide.

The mapping and the definitions of the new scheduling classes have been chosen based on the intel optimization manual, Agner Fog’s instructions tables, and http://www.realworldtech.com/haswell-cpu/.

Note: For targets that use the old itinerary model, this patch does not change their behavior (if it does this was not intented). Because of that choice, some modifications imply a bit of a gymnastic to preserve the old itinerary while using a new scheduling model.
 
** Experiments **

Tested CPUs: Ivy Bridge (MacbookPro Late 2012, clock speed fixed at 2900MHz), Haswell (iMac Late 2013, clock speed fixed at 3500MHz).
Tested Flags: Os, O3, with respectively -march=core-avx-i and -march=core-avx2.
Tested Benchmarks: LLVM test-suite, spec2k, and spec2k6.
Only the benchmarks having different schedules are reported.

For the detailed numbers, see the attached files (assuming the upload worked!).

“Improvement”:
Ivy Bridge:
- O3:
bullet: 1% (about 0.02s), probably just noise.
- Os:
nurbs: 2% (about 0.04s), probably just noise.

Haswell:
- O3:
JM/ldecod: 2% (about 0.007s), probably just noise.

- Os:
253.perlbmk: 1% (about 0.02s), probably just noise.


“Regressions”:
Ivy Bridge:
- O3:
Povray: 4% (although 4% means 0.1s here). The resulting code is identical but for one function (Read_Png_Image), which has less spill code with the new scheduling model. I have profiled the application and this function does not show up in the hot path. This is likely a side effect since the addresses of the functions changed. Put another way: noise.
Others: noise.

- Os:
None.

Hawsell:
- O3:
tramp3d-v4: 1% (about 0.02s): swapped two vpextrw instructions. Noise.
bullet: 1% (about 0.03s): the new schedule implied different spilling decisions.

- Os:
bullet: 1% (about 0.03s): the new schedule implied different spilling decisions.

{F37741}

{F37742}

{F37744}

{F37745}

[1] Some instructions are IMHO too specialized to make a SchedModel out of it. E.g., the pmaskmov family is one of them, it has different port uses for every combination or so of operands. If we want to create a model for those, we would have to create a "simple" definition for each variant (i.e., no use of muliclass with register classes as arguments, etc.)

Thanks,
-Quentin

http://llvm-reviews.chandlerc.com/D2809

Files:
  lib/Target/X86/X86InstrSSE.td
  lib/Target/X86/X86SchedHaswell.td
  lib/Target/X86/X86SchedSandyBridge.td
  lib/Target/X86/X86Schedule.td
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D2809.1.patch
Type: text/x-patch
Size: 106490 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140214/3e465595/attachment.bin>


More information about the llvm-commits mailing list