[PATCH] D115497: [Inline] Disable deferred inlining

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 14 09:13:16 PST 2022


jonpa added a comment.

In D115497#3241189 <https://reviews.llvm.org/D115497#3241189>, @nikic wrote:

> In D115497#3241114 <https://reviews.llvm.org/D115497#3241114>, @jonpa wrote:
>
>> This seems to cause a huge regression on SystemZ/imagick. If I use "-mllvm -inline-deferral" with trunk, I see a 25% improvement!
>>
>> There is a slight regression on xalanc with deferred inlining, but otherwise this doesn't really seem to hurt much on this architecture.
>>
>> My question then is if we should re-enable it on SystemZ, or if you would recommend some alternative that perhaps could give us performance back on imagick while having a sounder algorithm?
>
> This doesn't really seem like something that should be controlled by target options. It's hard to discuss alternatives without knowing more details about the issue affecting imagick -- there //are// some variations on the "deferred inlining" concept we could implement, but I don't know what would help imagick in particular. Changes to inlining heuristics tend to be very hit and miss.
>
> I believe the main inlining-related peculiarity of the SystemZ target is that it increases all inlining thresholds by a factor of three (https://github.com/llvm/llvm-project/blob/c0671e2c9b5c70fbcda277dcd5321d052ca2a2ee/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h#L39). No other non-GPU target does this, and I imagine that it can make inlining behavior on SystemZ very different from other targets. It might be worthwhile to double check whether that threshold multiplier is really needed.

Yes, that's a good point - we found earlier that this was the best setting looking at the benchmarks. Now that the overall inlining algorithm has changed, we should revisit this, which I did: It seems that '3' is still better than '1' and '2', but what's more is that imagick seems to look good again with '4', and overall on benchmarks '5' is even better... So readjusting this value looks right now as the way to go.

On imagick I see just one function that got inlined with 'inline-deferral', @SetPixelCacheNexusPixels, and I am guessing this is the key difference. This function has the 'hot' and 'internal' attributes. It does not have any loops but several calls (@PLT). I saw also for some reason that if not inlined it was called from 12 different functions, but if inlined the file in total had less instructions... The debug output from the inliner:

trunk (threshold multiplier = 3):

  Inlining calls in: SetPixelCacheNexusPixels
  Inlining calls in: SetPixelCacheNexusPixels
  Updated inlining SCC: (SetPixelCacheNexusPixels)
      Inlining (cost=685, threshold=750), Call:   %call62 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %cache_info, %struct._RectangleInfo* nonnull %region, i32 zeroext 1, %struct._NexusInfo* %27, %struct._ExceptionInfo* %exception) #22
      Inlining (cost=685, threshold=750), Call:   %call76 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %clone_info, %struct._RectangleInfo* nonnull %region, i32 zeroext 1, %struct._NexusInfo* %59, %struct._ExceptionInfo* %exception) #23
      Inlining (cost=685, threshold=750), Call:   %call130 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %cache_info, %struct._RectangleInfo* nonnull %region115, i32 zeroext 1, %struct._NexusInfo* %108, %struct._ExceptionInfo* %exception) #23
      Inlining (cost=685, threshold=750), Call:   %call144 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %clone_info, %struct._RectangleInfo* nonnull %region115, i32 zeroext 1, %struct._NexusInfo* %140, %struct._ExceptionInfo* %exception) #23
      NOT Inlining (cost=900, threshold=750), Call:   %call37 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %1, %struct._RectangleInfo* nonnull %region, i32 zeroext %8, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #20
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %1, %struct._RectangleInfo* nonnull %region.i, i32 zeroext %8, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #20
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %6, %struct._RectangleInfo* nonnull %region.i.i, i32 zeroext %13, %struct._NexusInfo* %4, %struct._ExceptionInfo* %exception) #20
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %5, %struct._RectangleInfo* nonnull %region.i.i, i32 zeroext %12, %struct._NexusInfo* %3, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %11, %struct._RectangleInfo* nonnull %region.i.i.i, i32 zeroext %18, %struct._NexusInfo* %9, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=900, threshold=750), Call:   %call = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %4, %struct._RectangleInfo* nonnull %region, i32 zeroext %cond, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #22
      NOT Inlining (cost=135, threshold=135), Call:   %call37.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %10, %struct._RectangleInfo* nonnull %region.i.i, i32 zeroext %17, %struct._NexusInfo* %8, %struct._ExceptionInfo* %exception) #22
      NOT Inlining (cost=135, threshold=135), Call:   %call37.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %12, %struct._RectangleInfo* nonnull %region.i.i, i32 zeroext %19, %struct._NexusInfo* %10, %struct._ExceptionInfo* %exception) #23
      NOT Inlining (cost=135, threshold=135), Call:   %call37.i.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %15, %struct._RectangleInfo* nonnull %region.i.i.i, i32 zeroext %22, %struct._NexusInfo* %13, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=135, threshold=135), Call:   %call37.i.i.i88 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %83, %struct._RectangleInfo* nonnull %region.i.i.i21, i32 zeroext %90, %struct._NexusInfo* %81, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %5, %struct._RectangleInfo* nonnull %region.i, i32 zeroext %12, %struct._NexusInfo* %3, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %8, %struct._RectangleInfo* nonnull %region.i.i, i32 zeroext %15, %struct._NexusInfo* %6, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %1, %struct._RectangleInfo* nonnull %region.i, i32 zeroext %8, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #21
      NOT Inlining (cost=900, threshold=750), Call:   %call37.i = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %6, %struct._RectangleInfo* nonnull %region.i, i32 zeroext %13, %struct._NexusInfo* %4, %struct._ExceptionInfo* %exception) #21

With -inline-deferral:

  Inlining calls in: SetPixelCacheNexusPixels
  Inlining calls in: SetPixelCacheNexusPixels
  Updated inlining SCC: (SetPixelCacheNexusPixels)
      Inlining (cost=395, threshold=750), Call:   %call62 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %cache_info, %struct._RectangleInfo* nonnull %region, i32 zeroext 1, %struct._NexusInfo* %27, %struct._ExceptionInfo* %exception) #22
      Inlining (cost=395, threshold=750), Call:   %call76 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %clone_info, %struct._RectangleInfo* nonnull %region, i32 zeroext 1, %struct._NexusInfo* %51, %struct._ExceptionInfo* %exception) #22
      Inlining (cost=395, threshold=750), Call:   %call130 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %cache_info, %struct._RectangleInfo* nonnull %region115, i32 zeroext 1, %struct._NexusInfo* %92, %struct._ExceptionInfo* %exception) #22
      Inlining (cost=395, threshold=750), Call:   %call144 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %clone_info, %struct._RectangleInfo* nonnull %region115, i32 zeroext 1, %struct._NexusInfo* %116, %struct._ExceptionInfo* %exception) #22
      Inlining (cost=610, threshold=750), Call:   %call37 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %1, %struct._RectangleInfo* nonnull %region, i32 zeroext %8, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #20
      Inlining (cost=-14390, threshold=750), Call:   %call = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %4, %struct._RectangleInfo* nonnull %region, i32 zeroext %cond, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #22

With threshold multiplier = 4:

  Inlining calls in: SetPixelCacheNexusPixels
  Inlining calls in: SetPixelCacheNexusPixels
  Updated inlining SCC: (SetPixelCacheNexusPixels)
      Inlining (cost=685, threshold=1000), Call:   %call62 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %cache_info, %struct._RectangleInfo* nonnull %region, i32 zeroext 1, %struct._NexusInfo* %41, %struct._ExceptionInfo* %exception) #23
      Inlining (cost=685, threshold=1000), Call:   %call76 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %clone_info, %struct._RectangleInfo* nonnull %region, i32 zeroext 1, %struct._NexusInfo* %73, %struct._ExceptionInfo* %exception) #23
      Inlining (cost=685, threshold=1000), Call:   %call130 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %cache_info, %struct._RectangleInfo* nonnull %region115, i32 zeroext 1, %struct._NexusInfo* %122, %struct._ExceptionInfo* %exception) #23
      Inlining (cost=685, threshold=1000), Call:   %call144 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %clone_info, %struct._RectangleInfo* nonnull %region115, i32 zeroext 1, %struct._NexusInfo* %154, %struct._ExceptionInfo* %exception) #23
      Inlining (cost=900, threshold=1000), Call:   %call37 = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* nonnull %1, %struct._RectangleInfo* nonnull %region, i32 zeroext %8, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #20
      Inlining (cost=-14100, threshold=1000), Call:   %call = call fastcc %struct._PixelPacket* @SetPixelCacheNexusPixels(%struct._CacheInfo* %4, %struct._RectangleInfo* nonnull %region, i32 zeroext %cond, %struct._NexusInfo* %nexus_info, %struct._ExceptionInfo* %exception) #22

I will do some more measurements to see which value for the multiplier looks best. Do you see any reason not to use this as a tuning parameter, or any other preferred way? (I have been assuming that it *is* the target-independent variable to play with...)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115497/new/

https://reviews.llvm.org/D115497



More information about the llvm-commits mailing list