[llvm-dev] Loop Opt WG Meeting Minutes for Sep 11, 2019

Thu Jan 9 08:13:19 PST 2020

Hi Evgenii,

The specific issue that we ran into turned out to be related to expansion
of a remainder instruction which caused it to not be considered by RA
rematerialization. However the example you provided falls into the general
category of problem with LICM and live range extension, which is where we
started from. I don't know the details but looks like when determining the
cost of a sink or rematerialization we need to take a more holistic view
than doing it on an instruction by instruction bases. Is that possible?

Adding Hussain to the discussion as well.

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab

From:	Evgenii Stepanov <eugenis at google.com>
To:	Bardia Mahjour <bmahjour at ca.ibm.com>
Cc:	Florian Hahn <florian_hahn at apple.com>, LLVM Dev
            <llvm-dev at lists.llvm.org>, tcorring at amd.com
Date:	2020/01/07 02:15 PM
Subject:	[EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep
            11, 2019

Sorry for reviving this old thread.
Is this the case that you are talking about?
void use(int *);
void f(int *p) {
  for (int i = 0; i < 1000; ++i) {
    use(p);
    use(p + 1);
    use(p + 2);
    use(p + 3);
  }
}

LICM hoists all the (p + N) computations out of the loop, and there is
nothing that could sink them back.
entry:
  %add.ptr = getelementptr inbounds i32, i32* %p, i64 1
  %add.ptr1 = getelementptr inbounds i32, i32* %p, i64 2
  %add.ptr2 = getelementptr inbounds i32, i32* %p, i64 3
...
for.body:
...
  tail call void @_Z3usePi(i32* %p)
  tail call void @_Z3usePi(i32* nonnull %add.ptr)
  tail call void @_Z3usePi(i32* nonnull %add.ptr1)
  tail call void @_Z3usePi(i32* nonnull %add.ptr2)

With more calls to use(), these common expressions will be
pre-computed, spilled and then reloaded inside the loop. Each
individual instruction is not profitable to sink or rematerialize in
the loop, because that would simply reduce the liverange of (p+N) at
the cost of extending the liverange of (p).

I see this problem in ARM MTE stack instrumentation. We use a virtual
frame pointer there which makes all local variable access look like
(p+N) in the above example.

On Fri, Sep 13, 2019 at 8:36 AM Bardia Mahjour via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> Thanks Florian.
>
> Tim you said:
> > Some cases can be undone by rematerialization, but not all, and it can
involve a lot of effort which increases compile time.
>
> Do you have examples of cases where rematerialization is not possible? We
are interested in learning about any previous attempts at trying to address
the issue in RA. Have you tried it?
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
> bmahjour at ca.ibm.com (905) 413-2336
>
>
>
> Florian Hahn ---2019/09/13 11:16:01 AM---Hi, > On Sep 11, 2019, at 17:51,
Bardia Mahjour via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> From: Florian Hahn <florian_hahn at apple.com>
> To: Bardia Mahjour <bmahjour at ca.ibm.com>
> Cc: via llvm-dev <llvm-dev at lists.llvm.org>, tcorring at amd.com
> Date: 2019/09/13 11:16 AM
> Subject: [EXTERNAL] Re: [llvm-dev] Loop Opt WG Meeting Minutes for Sep
11, 2019
> Sent by: florian_hahn at apple.com
>
> ________________________________
>
>
>
> Hi,
>
> On Sep 11, 2019, at 17:51, Bardia Mahjour via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> ---------------------------
> Wed, Sep 11, 2019:
> ---------------------------
>
> - LICM vs Loop Sink Strategy (Whitney)
> - LICM and SCEV expander host code with no regards to increased
> live-ranges. This is a long standing issue where historically
> preference has been to keep LICM more aggressive.
>
>
> This issue also motivated adding metadata to disable LICM
(llvm.loop.licm.disable) recently.
https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D64557&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=LmblL0WqDxceW7q5kWmr42tB6v0WRsjslJuUEzVWvco&s=cMpKwKnosBp_bwQWBssHmOyEfVQyRdwAGzOA56wuo8o&e=

>
> - Two questions from IBM side:
> a. This problem is not specific to the POWER platform, so we are
> wondering if other people are interested?
> - b. Where would be the best place to address this issue?
> - Since it's hard to come up with an accurate register pressure
> estimator in opt, it's probably better to be done fairly late,
> maybe after instruction scheduling.
> - A good place to start would be instruction re-materialization in
> the register allocator.
> - Problem is the logic in the register allocator can deal with a
> single instruction (instead of groups of instructions) at a time.
> - Start by handling one single-instruction at a time and apply the
> same logic to groups of instructions iteratively to see the
> impact on performance and compile-time.
> - live-range editor may have utilities to help with code motion.
> - lazy-code-motion may be a good long term solution, but no one seems
> to be actively working on it.
>
> - Announcements:
> - flang call moved so we are no longer in conflict!
>
> - Philip is working on making loop vectorizer robust in the face of
> multiple exits. There are two subproblems
> 1. vectorizer currently gives up because scev is not giving exit
> counts (due to a bug?). This is relatively easy to fix and
> Philip will have a patch for it soon.
> 2. loop exit cannot be analyzed due to data dependent exit, which
> is currently handled via predication. There is a lot of room
> for improvement, specially for read-only loops.
> Please let him know if you are interested.
>
>
> - Status Updates
> - Data Dependence Graph (
https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65350&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=LmblL0WqDxceW7q5kWmr42tB6v0WRsjslJuUEzVWvco&s=cDxL6tZAw-WIrhQ8WTaliZX2sE8JFaHUrWFeoVfOeyQ&e=
 ) (Bardia)
> - All review comments are addressed. Waiting for approval.
> - Bugzilla bugs update (Vivek)
> - Florian has a patch fixing loop bugs related to max trip count.
>
> ----------------------------
> Tentative Agenda for Sept 25
> ----------------------------
>
> Presentation from Marc Moreno Maza about his work on delinearization.
>
> - Status Updates
> - Follow up on multi-dimensional array indexing RFC (Siddharth)
> - Impact of Loop Rotation on existing passes (Min-Yih)
> - Data Dependence Graph (
https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D65350&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=LmblL0WqDxceW7q5kWmr42tB6v0WRsjslJuUEzVWvco&s=cDxL6tZAw-WIrhQ8WTaliZX2sE8JFaHUrWFeoVfOeyQ&e=
 ) (Bardia)
> - Bugzilla bugs update (Vivek)
> - Others?
>
>
> Bardia Mahjour
> Compiler Optimizations
> IBM Toronto Software Lab
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=LmblL0WqDxceW7q5kWmr42tB6v0WRsjslJuUEzVWvco&s=esaBR0Z8WO01NykCMECsouFpZW1h3SvdmiRPWk0tIsg&e=

>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=aihobyOnVzXW7OPSK1-NiSYQkq7oP3ZSUVc4BemvrVo&m=LmblL0WqDxceW7q5kWmr42tB6v0WRsjslJuUEzVWvco&s=esaBR0Z8WO01NykCMECsouFpZW1h3SvdmiRPWk0tIsg&e=

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200109/afb338ab/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200109/afb338ab/attachment.gif>