[PATCH] Move memset value into register

Tue Sep 17 23:23:07 PDT 2013

Thank you for sharing thoughts, Sean.

I wonder if any target other than x86 can have similar problem? Should this
MI-level peephole pass be x86 specific?

Thanks,
--Serge

2013/9/18 Sean Silva <silvas at purdue.edu>

> On Wed, Sep 18, 2013 at 12:17 AM, Alex Rosenberg <alexr at leftfield.org>wrote:
>
>> FWIW, Sean ran into this exact issue earlier in the Summer. On x86, these
>> long instructions with 32-bit immediate zeros in them added up to a
>> substantial amount of code. IIRC, he noticed lots of those in global
>> constructors.
>>
>> Sean?
>>
>>
> The issue I saw this Summer was a regression of our internal branch vs.
> trunk (i.e., trunk didn't have the code size issue). The issue appeared to
> be fixed in newer versions of the toolchain repo that had merged up to LLVM
> 3.3.
>
> I agree with Chandler though that this would best be done as an MI-level
> peephole pass. Besides the register pressure thing, if optimizing for size,
> there are some pretty "low-level" code size hazards that are basically due
> to quirks in the instruction encoding, Off the top of my head I can think
> of:
> * once an offset from a base pointer becomes large enough (doesn't fit in
> a signed byte), the immediate is encoded as 32-bit instead. Each time you
> can avoid this you save 3 bytes of code size basically.
> * Some AVX2 instructions interpret these immediates differently (as
> multiples of the logical vector element size), which increases the actual
> range of an imm8 operand, so the simple rule "keep the offset from the base
> pointer so that it fits in an imm8" isn't quite so simple.
> * some registers need larger encodings for the indexing (r8-r15 need
> 1-byte REX (but that may already be required for a 64-bit operation);
> rsp+imm8 needs a SIB, since the "usual" encoding for that is special-cased
> for use for RIP-rel addressing). These usually will save you only 1 byte.
>
> -- Sean Silva
>
>
>
>> Alex
>>
>> On Sep 16, 2013, at 9:20 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>
>> Hi,
>>
>> I am sorry for missing this patch the last time you sent it.  Do you have
>> any performance numbers that show the profitability of this transformation
>> ?  I also feel like SelectionDAG is not the right place to implement this
>> kind of optimization. I think that introducing virtual registers would
>> inhibit other optimizations on the DAG.  I also agree that this
>> optimization can be generalized to any sequence of machine instructions
>> that repeatedly use the same constant.
>>
>> I have a few comments on the patch itself.
>>
>> You can commit the documentation of this method even without the change.
>>
>> +/// \brief Lower the call to 'memset' intrinsic function into a series
>> of store
>> +/// operations.
>> +///
>> +/// \param DAG Selection DAG where lowered code is placed.
>> +/// \param dl Link to corresponding IR location.
>>
>> ...
>>
>> +/// The function tries to replace 'llvm.memset' intrinsic with several
>> store
>> +/// operations and value calculation code. This is usually profitable
>> for small
>> +/// memory size.
>>  static SDValue getMemsetStores(SelectionDAG &DAG, SDLoc dl,
>>                                 SDValue Chain, SDValue Dst,
>>                                 SDValue Src, uint64_t Size,
>>
>>
>> The code below is a little confusing.  Can you please add braces to the
>> else clause, or at least separate the next statement with a line break ?
>>
>>
>> +  } else
>> +    LongestValueInReg = MemSetValue;
>> +  SDValue  RegInitialization = Chain;
>> +
>>
>>
>> Also here. Line break after the first "Value = “ line. I would also add
>> braces around the last “Value = “ line.
>>
>> +          TLI.isTruncateFree(LargestVT, VT)) {
>> +        if (ValueReg != 0)
>> +          Value = DAG.getCopyFromReg(RegInitialization, dl, ValueReg,
>> LargestVT);
>> +        Value = DAG.getNode(ISD::TRUNCATE, dl, VT, Value);
>> +      }
>>        else
>>          Value = getMemsetValue(Src, VT, DAG, dl);
>>      }
>>
>>
>> Thanks,
>> Nadav
>>
>>
>> On Sep 16, 2013, at 2:55 PM, Chandler Carruth <chandlerc at google.com>
>> wrote:
>>
>> I wonder whether this is the best approach.
>>
>> Specifically, this is trading off register pressure for code size. I
>> think it is probably the correct call in many cases, but perhaps not in all
>> cases. Also, it has nothing to do with memset, and everything to do with
>> any sequence of immediate operand movs.
>>
>> I think this would be done somewhat better as an MI peephole that takes
>> into account register pressure and how many mov instructions share the
>> immediate to tradeoff the register pressure and potential instructions to
>> materialize the constant into a register against the shrink of immediate
>> operand movs.
>>
>> But yes, Nadav or Andy would have more insights here...
>>
>>
>> On Mon, Sep 16, 2013 at 10:00 AM, Serge Pavlov <sepavloff at gmail.com>wrote:
>>
>>> Friendly ping.
>>>
>>>
>>> 2013/9/10 Serge Pavlov <sepavloff at gmail.com>
>>>
>>>>   Cosmetical update to the patch
>>>>   If block size is small, the size of largest move chunk may be smaller
>>>> than the
>>>>   natural register width and target may not have resisters of such
>>>> width. Example
>>>>   is 32-bit target that does not have 16-bit registers. Avoid
>>>> allocation such
>>>>   illegal registers.
>>>>
>>>> Hi bkramer, nadav,
>>>>
>>>> http://llvm-reviews.chandlerc.com/D1484
>>>>
>>>> CHANGE SINCE LAST DIFF
>>>>   http://llvm-reviews.chandlerc.com/D1484?vs=3692&id=4161#toc
>>>>
>>>> Files:
>>>>   include/llvm/Target/TargetLowering.h
>>>>   lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>>   lib/Target/X86/X86ISelLowering.cpp
>>>>   lib/Target/X86/X86ISelLowering.h
>>>>   test/CodeGen/X86/memset-sse-stack-realignment.ll
>>>>   test/CodeGen/X86/memset.ll
>>>>   test/CodeGen/X86/tlv-1.ll
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> --Serge
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>

-- 
Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130918/96cb2506/attachment.html>