[PATCH] Move memset value into register

Tue Sep 17 01:43:09 PDT 2013

OK, I see the idea. We should treat memset (and memcpy) as "generic"
assignments, which are lower enough. It indeed is different from say GCC.

Thank you very much.

2013/9/17 Chandler Carruth <chandlerc at google.com>

> On Tue, Sep 17, 2013 at 12:44 AM, Serge Pavlov <sepavloff at gmail.com>wrote:
>
>> Thank you for the feedback!
>>
>> This solution obviously increases register pressure. But it requires only
>> one additional data register. If there are no available registers at all,
>> spilling is unavoidable if a temporary is required in the nearby code.
>> Also, memset is rarely a part of complex expression, it also decreases
>> chances that registers are exhausted.
>>
>
> I think you may be making an assumption which is reasonable in many
> compilers, but not LLVM. We aggressively form memset "calls" trusting the
> backend to lower them to instructions. This will show up inside of loops
> and other register constrained code paths, and spilling would be a
> surprising consequence.
>
>
>>
>> The main idea behind such kind of optimization is that we have a
>> "macro-operation" (memset in this case) which we can lower efficiently
>> using our knowledge what this operation does. Caching immediates is not the
>> only problem with memset. Alignment is another. For instance, on ARM target
>> the operation 'memset(p,v,32)' is lowered to 32 byte moves, although a
>> couple of conditional moves can align the destination and other moves can
>> be made by words. Making "limited" optimizations could increase quality of
>> generated code in such cases without increase of compile time.
>>
>
> Ok, but memset is less "macro" in LLVM than almost any other compiler and
> we shouldn't really be concerned with compile times here.
>
> Decomposing memset into something more clever than byte moves is still
> clearly goodness though, and should absolutely be done in the SDISel (or
> elsewhere).
>
>
>>
>> Of course, generic optimization that puts immediates into registers looks
>> a better solution, although more complex. Could you please point out a
>> place in the code to get idea how an optimization pass can learn about
>> current register pressure?
>>
>
> I'll let others advise you better here than I would. I know we have "trace
> metrics" in the MI layer that model this now, but I couldn't give you well
> informed advice about exactly where to slot such an optimization.
>
>
>>
>> Thanks,
>> --Serge
>>
>>
>> 2013/9/17 Nadav Rotem <nrotem at apple.com>
>>
>>> Hi,
>>>
>>> I am sorry for missing this patch the last time you sent it.  Do you
>>> have any performance numbers that show the profitability of this
>>> transformation ?  I also feel like SelectionDAG is not the right place to
>>> implement this kind of optimization. I think that introducing virtual
>>> registers would inhibit other optimizations on the DAG.  I also agree that
>>> this optimization can be generalized to any sequence of machine
>>> instructions that repeatedly use the same constant.
>>>
>>> I have a few comments on the patch itself.
>>>
>>> You can commit the documentation of this method even without the change.
>>>
>>> +/// \brief Lower the call to 'memset' intrinsic function into a series
>>> of store
>>> +/// operations.
>>> +///
>>> +/// \param DAG Selection DAG where lowered code is placed.
>>> +/// \param dl Link to corresponding IR location.
>>>
>>> ...
>>>
>>> +/// The function tries to replace 'llvm.memset' intrinsic with several
>>> store
>>> +/// operations and value calculation code. This is usually profitable
>>> for small
>>> +/// memory size.
>>>  static SDValue getMemsetStores(SelectionDAG &DAG, SDLoc dl,
>>>                                 SDValue Chain, SDValue Dst,
>>>                                 SDValue Src, uint64_t Size,
>>>
>>>
>>> The code below is a little confusing.  Can you please add braces to the
>>> else clause, or at least separate the next statement with a line break ?
>>>
>>>
>>> +  } else
>>> +    LongestValueInReg = MemSetValue;
>>> +  SDValue  RegInitialization = Chain;
>>> +
>>>
>>>
>>> Also here. Line break after the first "Value = “ line. I would also add
>>> braces around the last “Value = “ line.
>>>
>>> +          TLI.isTruncateFree(LargestVT, VT)) {
>>> +        if (ValueReg != 0)
>>> +          Value = DAG.getCopyFromReg(RegInitialization, dl, ValueReg,
>>> LargestVT);
>>> +        Value = DAG.getNode(ISD::TRUNCATE, dl, VT, Value);
>>> +      }
>>>        else
>>>          Value = getMemsetValue(Src, VT, DAG, dl);
>>>      }
>>>
>>>
>>> Thanks,
>>> Nadav
>>>
>>>
>>> On Sep 16, 2013, at 2:55 PM, Chandler Carruth <chandlerc at google.com>
>>> wrote:
>>>
>>> I wonder whether this is the best approach.
>>>
>>> Specifically, this is trading off register pressure for code size. I
>>> think it is probably the correct call in many cases, but perhaps not in all
>>> cases. Also, it has nothing to do with memset, and everything to do with
>>> any sequence of immediate operand movs.
>>>
>>> I think this would be done somewhat better as an MI peephole that takes
>>> into account register pressure and how many mov instructions share the
>>> immediate to tradeoff the register pressure and potential instructions to
>>> materialize the constant into a register against the shrink of immediate
>>> operand movs.
>>>
>>> But yes, Nadav or Andy would have more insights here...
>>>
>>>
>>> On Mon, Sep 16, 2013 at 10:00 AM, Serge Pavlov <sepavloff at gmail.com>wrote:
>>>
>>>> Friendly ping.
>>>>
>>>>
>>>> 2013/9/10 Serge Pavlov <sepavloff at gmail.com>
>>>>
>>>>>   Cosmetical update to the patch
>>>>>   If block size is small, the size of largest move chunk may be
>>>>> smaller than the
>>>>>   natural register width and target may not have resisters of such
>>>>> width. Example
>>>>>   is 32-bit target that does not have 16-bit registers. Avoid
>>>>> allocation such
>>>>>   illegal registers.
>>>>>
>>>>> Hi bkramer, nadav,
>>>>>
>>>>> http://llvm-reviews.chandlerc.com/D1484
>>>>>
>>>>> CHANGE SINCE LAST DIFF
>>>>>   http://llvm-reviews.chandlerc.com/D1484?vs=3692&id=4161#toc
>>>>>
>>>>> Files:
>>>>>   include/llvm/Target/TargetLowering.h
>>>>>   lib/CodeGen/SelectionDAG/SelectionDAG.cpp
>>>>>   lib/Target/X86/X86ISelLowering.cpp
>>>>>   lib/Target/X86/X86ISelLowering.h
>>>>>   test/CodeGen/X86/memset-sse-stack-realignment.ll
>>>>>   test/CodeGen/X86/memset.ll
>>>>>   test/CodeGen/X86/tlv-1.ll
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> --Serge
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks,
>> --Serge
>>
>
>

-- 
Thanks,
--Serge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130917/9a36c0d0/attachment.html>