[LLVMdev] LLVMdev Digest, Vol 112, Issue 56

Mon Oct 21 14:27:32 PDT 2013

Has anyone worked with or used the LLVM backend or compiler for Haskell ??

David

On Monday, October 21, 2013 5:26 PM, "llvmdev-request at cs.uiuc.edu" <llvmdev-request at cs.uiuc.edu> wrote:

Send LLVMdev mailing list submissions to
    llvmdev at cs.uiuc.edu

To subscribe or unsubscribe via the World Wide Web, visit
    http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
or, via email, send a message with subject or body 'help' to
    llvmdev-request at cs.uiuc.edu

You can reach the person managing the list at
    llvmdev-owner at cs.uiuc.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of LLVMdev digest..."

Today's Topics:

   1. Re: Bug #16941 (Dmitry Babokin)
   2. Re: Bug #16941 (Nadav Rotem)
   3. Re: Feature request for include llvm-mc in llvm.org/builds
      (Reid Kleckner)
   4. Re: Feature request for include llvm-mc in llvm.org/builds
      (Reid Kleckner)
   5. Re: Bug #16941 (Dmitry Babokin)
   6. Re: First attempt at recognizing pointer reduction
      (Arnold Schwaighofer)
   7. [lld] Handle _GLOBAL_OFFSET_TABLE symbol (Simon Atanasyan)
   8. Re: [lld] Handle _GLOBAL_OFFSET_TABLE symbol (Shankar Easwaran)
   9. Re: First attempt at recognizing pointer reduction (Renato Golin)
  10. Re: [lld] Handle _GLOBAL_OFFSET_TABLE symbol (Simon Atanasyan)
  11. Re: First attempt at recognizing pointer reduction
      (Arnold Schwaighofer)
  12. Re: [lld] Handle _GLOBAL_OFFSET_TABLE symbol (Shankar Easwaran)
  13. Re: llvm.org bug trend (Robinson, Paul)

----------------------------------------------------------------------

Message: 1
Date: Mon, 21 Oct 2013 22:12:07 +0400
From: Dmitry Babokin <babokin at gmail.com>
To: Nadav Rotem <nrotem at apple.com>
Cc: Ilia Filippov <ili.filippov at gmail.com>,    LLVM Developers Mailing
    List <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Bug #16941
Message-ID:
    <CACRFwuiGHNo_QdX_Ty+gij4PzHhMyyzpvm-2Lco0gdqNXSw8LQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Nadav,

You are absolutely right, it's ISPC workload. I've checked SSE4 and it's
also severely affected.

We use intrinsics only for conversion <N x i32> <=> i32, i.e. movmsk.ps.
For the rest we use general LLVM instructions. And I actually would really
like to stick this way. We rely on LLVM's ability to produce efficient code
from general LLVM IR. Relying on intrinsics too much would be a crunch and
a path to nowhere for many reasons :)

What is the reason for this transformation, if it doesn't lead to efficient
code?

Dmitry.

On Mon, Oct 21, 2013 at 7:01 PM, Nadav Rotem <nrotem at apple.com> wrote:

> Hi Dmitry.
>
> This looks like an ISPC workload. ISPC works around a limitation in
> selection dag which does not know how to legalize mask types when both 128
> and 256 bit registers are available. ISPC works around this problem by
> expanding the mask to i32s and using intrinsics. Can you please verify that
> this regression only happens on AVX ? Can you change ISPC to use intrinsics
> ?
>
> Thanks
> Nadav
>
> Sent from my iPhone
>
> > On Oct 21, 2013, at 4:04, Dmitry Babokin <babokin at gmail.com> wrote:
> >
> > Nadav,
> >
> > Could you please have a look at bug #16941 and let us know what you
> think about it? It's performance regression after one of your commits.
> >
> > Thanks.
> >
> > Dmitry.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/83d77e1f/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 21 Oct 2013 11:18:46 -0700
From: Nadav Rotem <nrotem at apple.com>
To: Dmitry Babokin <babokin at gmail.com>
Cc: Ilia Filippov <ili.filippov at gmail.com>,    LLVM Developers Mailing
    List <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Bug #16941
Message-ID: <567CDFA6-9E85-4943-869C-1C471B2143D8 at apple.com>
Content-Type: text/plain; charset="us-ascii"

Hi Dmitry, 

ISPC does some instruction selection as part of vectorization (on ASTs!) by placing intrinsics for specific operations.  The SEXT to i32 pattern was implemented because LLVM did not support vector-selects when this code was written.  

Can you submit a small SSE4 test case that demonstrates the problem?    Select is the canonical form of this operations, and SEXT is usually more difficult to lower.  

Thanks,
Nadav

On Oct 21, 2013, at 11:12 AM, Dmitry Babokin <babokin at gmail.com> wrote:

> Nadav,
> 
> You are absolutely right, it's ISPC workload. I've checked SSE4 and it's also severely affected.
> 
> We use intrinsics only for conversion <N x i32> <=> i32, i.e. movmsk.ps. For the rest we use general LLVM instructions. And I actually would really like to stick this way. We rely on LLVM's ability to produce efficient code from general LLVM IR. Relying on intrinsics too much would be a crunch and a path to nowhere for many reasons :)
> 
> What is the reason for this transformation, if it doesn't lead to efficient code?
> 
> Dmitry.
> 
> 
> 
> On Mon, Oct 21, 2013 at 7:01 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Dmitry.
> 
> This looks like an ISPC workload. ISPC works around a limitation in selection dag which does not know how to legalize mask types when both 128 and 256 bit registers are available. ISPC works around this problem by expanding the mask to i32s and using intrinsics. Can you please verify that this regression only happens on AVX ? Can you change ISPC to use intrinsics ?
> 
> Thanks
> Nadav
> 
> Sent from my iPhone
> 
> > On Oct 21, 2013, at 4:04, Dmitry Babokin <babokin at gmail.com> wrote:
> >
> > Nadav,
> >
> > Could you please have a look at bug #16941 and let us know what you think about it? It's performance regression after one of your commits.
> >
> > Thanks.
> >
> > Dmitry.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/4224f7d4/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 21 Oct 2013 11:23:01 -0700
From: Reid Kleckner <rnk at google.com>
To: Yonggang Luo <luoyonggang at gmail.com>
Cc: LLVM Dev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Feature request for include llvm-mc in
    llvm.org/builds
Message-ID:
    <CACs=tyJvw+XgZ9S2n3NvSZBYMqqFaiUPfyQvuMZ9GMWczmCU-w at mail.gmail.com>
Content-Type: text/plain; charset="gb2312"

I can confirm I get the same behavior, and that's a real bug.  If you use
--target=i686-pc-win32, you get COFF, and that should be a good workaround
for now.  There must be a conditional somewhere that isn't handling mingw
correctly.

On Sat, Oct 19, 2013 at 7:58 AM, ???(Yonggang Luo) <luoyonggang at gmail.com>wrote:

> 2013/10/19 Rafael Esp?ndola <rafael.espindola at gmail.com>:
> > On 19 October 2013 06:01, ???(Yonggang Luo) <luoyonggang at gmail.com>
> wrote:
> >> I found that access llvm-mc from clang driver is impossible, and I
> >> want to use llvm-mc to compile assembly files, how to do that?
> >
> > Try "clang -integrated-as -c test.s"
>
> Thank you very much, I use the following command compiled successfully:
> clang  -integrated-as -c -v --target=i686-pc-mingw sqrt.s
>
>
> The output format is file format ELF32-i386, i wanna to know
> is there a way to output COFF format along with target=i686-pc-mingw.
> because I want to compile to following asm file for both linux/gcc and
> windows/visual C++.
>
> .global sqrt
> .type sqrt, at function
> sqrt: fldl 4(%esp)
> fsqrt
> fstsw %ax
> sub $12,%esp
> fld %st(0)
> fstpt (%esp)
> mov (%esp),%ecx
> and $0x7ff,%ecx
> cmp $0x400,%ecx
> jnz 1f
> and $0x200,%eax
> sub $0x100,%eax
> sub %eax,(%esp)
> fstp %st(0)
> fldt (%esp)
> 1: add $12,%esp
> fstpl 4(%esp)
> fldl 4(%esp)
> ret
>
> >
> > Cheers,
> > Rafael
>
>
>
> --
>          ??
> ?
> ???
> Yours
>     sincerely,
> Yonggang Luo
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/c1bc27b9/attachment-0001.html>

------------------------------

Message: 4
Date: Mon, 21 Oct 2013 11:43:57 -0700
From: Reid Kleckner <rnk at google.com>
To: Yonggang Luo <luoyonggang at gmail.com>
Cc: LLVM Dev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Feature request for include llvm-mc in
    llvm.org/builds
Message-ID:
    <CACs=tyKD=LzoyHG90V8HU_W6L_Qs9w3ZMxUCUgjk_qGJ5RY-Ww at mail.gmail.com>
Content-Type: text/plain; charset="gb2312"

Ah, so clang only understands the spelling mingw32, not mingw.  That'll
give you COFF.  :)

On Mon, Oct 21, 2013 at 11:23 AM, Reid Kleckner <rnk at google.com> wrote:

> I can confirm I get the same behavior, and that's a real bug.  If you use
> --target=i686-pc-win32, you get COFF, and that should be a good workaround
> for now.  There must be a conditional somewhere that isn't handling mingw
> correctly.
>
>
> On Sat, Oct 19, 2013 at 7:58 AM, ???(Yonggang Luo) <luoyonggang at gmail.com>wrote:
>
>> 2013/10/19 Rafael Esp?ndola <rafael.espindola at gmail.com>:
>> > On 19 October 2013 06:01, ???(Yonggang Luo) <luoyonggang at gmail.com>
>> wrote:
>> >> I found that access llvm-mc from clang driver is impossible, and I
>> >> want to use llvm-mc to compile assembly files, how to do that?
>> >
>> > Try "clang -integrated-as -c test.s"
>>
>> Thank you very much, I use the following command compiled successfully:
>> clang  -integrated-as -c -v --target=i686-pc-mingw sqrt.s
>>
>>
>> The output format is file format ELF32-i386, i wanna to know
>> is there a way to output COFF format along with target=i686-pc-mingw.
>> because I want to compile to following asm file for both linux/gcc and
>> windows/visual C++.
>>
>> .global sqrt
>> .type sqrt, at function
>> sqrt: fldl 4(%esp)
>> fsqrt
>> fstsw %ax
>> sub $12,%esp
>> fld %st(0)
>> fstpt (%esp)
>> mov (%esp),%ecx
>> and $0x7ff,%ecx
>> cmp $0x400,%ecx
>> jnz 1f
>> and $0x200,%eax
>> sub $0x100,%eax
>> sub %eax,(%esp)
>> fstp %st(0)
>> fldt (%esp)
>> 1: add $12,%esp
>> fstpl 4(%esp)
>> fldl 4(%esp)
>> ret
>>
>> >
>> > Cheers,
>> > Rafael
>>
>>
>>
>> --
>>          ??
>> ?
>> ???
>> Yours
>>     sincerely,
>> Yonggang Luo
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/0c1eb922/attachment-0001.html>

------------------------------

Message: 5
Date: Mon, 21 Oct 2013 23:09:59 +0400
From: Dmitry Babokin <babokin at gmail.com>
To: Nadav Rotem <nrotem at apple.com>
Cc: Ilia Filippov <ili.filippov at gmail.com>,    LLVM Developers Mailing
    List <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] Bug #16941
Message-ID:
    <CACRFwujWT6hGihJh9NSbAjBC87FTaHEd8mZ=muryXJZtxPK3iw at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Nadav,

You are right, ISPC may issue intrinsics as a result of AST selection.
Though I believe that we should stick to LLVM IR whenever is possible.
Intrinsics may appear to be boundaries for optimizations (on both data and
control flow) and are generally not optimizable. LLVM may improve over time
from performance stand point and we would benefit from it (or it may play
against us, like in this case). We can change out IR generation, but not in
favor of intrinsics (in long term, though we may use them as workaround, os
course).

I'm not sure that select is really a canonical form of this operation, as
it really assumes AND in this case. But this is a philosophical question,
so no point to argue :) In any case it should lead to more efficient code.
Which means that a) this transformation should not happen or b) code
generation for this instruction combination should be tuned. This should
benefit LLVM in general IMHO. It also may be the case that this just leads
to the bad code only in our specific environment, but at this point it
doesn't seems to be the case.

I'll try to come up with small SSE4 reproducer.

By the way, I'm curious, is the any reason why you focus on SSE4, not AVX?
Seems that vectorizer should care the most about the latest silicon.

Dmitry.

On Mon, Oct 21, 2013 at 10:18 PM, Nadav Rotem <nrotem at apple.com> wrote:

> Hi Dmitry,
>
> ISPC does some instruction selection as part of vectorization (on ASTs!)
> by placing intrinsics for specific operations.  The SEXT to i32 pattern was
> implemented because LLVM did not support vector-selects when this code was
> written.
>
> Can you submit a small SSE4 test case that demonstrates the problem?
>  Select is the canonical form of this operations, and SEXT is usually more
> difficult to lower.
>
> Thanks,
> Nadav
>
> On Oct 21, 2013, at 11:12 AM, Dmitry Babokin <babokin at gmail.com> wrote:
>
> Nadav,
>
> You are absolutely right, it's ISPC workload. I've checked SSE4 and it's
> also severely affected.
>
> We use intrinsics only for conversion <N x i32> <=> i32, i.e. movmsk.ps.
> For the rest we use general LLVM instructions. And I actually would really
> like to stick this way. We rely on LLVM's ability to produce efficient code
> from general LLVM IR. Relying on intrinsics too much would be a crunch and
> a path to nowhere for many reasons :)
>
> What is the reason for this transformation, if it doesn't lead to
> efficient code?
>
> Dmitry.
>
>
>
> On Mon, Oct 21, 2013 at 7:01 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
>> Hi Dmitry.
>>
>> This looks like an ISPC workload. ISPC works around a limitation in
>> selection dag which does not know how to legalize mask types when both 128
>> and 256 bit registers are available. ISPC works around this problem by
>> expanding the mask to i32s and using intrinsics. Can you please verify that
>> this regression only happens on AVX ? Can you change ISPC to use intrinsics
>> ?
>>
>> Thanks
>> Nadav
>>
>> Sent from my iPhone
>>
>> > On Oct 21, 2013, at 4:04, Dmitry Babokin <babokin at gmail.com> wrote:
>> >
>> > Nadav,
>> >
>> > Could you please have a look at bug #16941 and let us know what you
>> think about it? It's performance regression after one of your commits.
>> >
>> > Thanks.
>> >
>> > Dmitry.
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/5209cc70/attachment-0001.html>

------------------------------

Message: 6
Date: Mon, 21 Oct 2013 14:58:55 -0500
From: Arnold Schwaighofer <aschwaighofer at apple.com>
To: Renato Golin <renato.golin at linaro.org>
Cc: LLVM Dev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] First attempt at recognizing pointer reduction
Message-ID: <C6D42D81-67E5-45D7-9430-14321A29634B at apple.com>
Content-Type: text/plain; charset=windows-1252

On Oct 21, 2013, at 1:00 PM, Renato Golin <renato.golin at linaro.org> wrote:

> Hi Arnold,
> 
> To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. 

To detect memory access patterns you will want to look at the SCEV of a pointer (or at a set of SCEVs/pointers). This way you get a canonical form.

For example these should be the SCEVs of ?int a[2*i] = ; a[2*i+1] =?:

{ptr,   +, 8}_loop
{ptr+4, +, 8}_loop

Each access on its own requires a gather/scather (2 loads/stores when vectorized (VF=2) + inserts/extracts). But when we look at both at once we see that we only need two load/store in total (plus some interleaving operations).

What other patterns (than strided accesses) do you want to vectorize?

> 
> For instance, even if the relationship between each loop would be complicated, I know that in each loop, all three reads are sequential. So, at least, I could use a Load<3> rather than three loads.

Yes. Detecting this is what is described in the paper I referred in a post before (Auto-vectorization of interleaved data for SIMD). And what gcc is doing (AFAICT). You want to take a set of accesses in the loop and recognize that they are consecutive in memory (currently we look at every access on it is own, if an access is not sequential we assume we have to gather/scather). Once you know that you have consecutive accesses spread across different instructions you can generate more efficient code instead of scather/gathers. You would want to do the evaluation of which accesses are consecutive in SCEV I think. 

For your example, you want to recognize strided accesses (this is separate from the induction/reduction mechanism), first:

for (i = 0 .. n, +1)
a[2*i] = ?
a[2*1+1] = ?

You want this part first because without it the loop vectorizer is not going to vectorize because of the cost of gather/scather. But if it can recognize that in cases like this the ?gather/scather? is not as costly and we emit the right code we will start to vectorize such loops.

Once we can do that. We need to support strided pointer inductions to get your example.

for (i = 0..n, +1)
*a++=
*a++=

> I guess this is why Nadav was hinting for the SLP vectorizer, not the loop vec. Since the operation on all three loaded values are exactly the same AND the writes are also sequential, I can reduce that to: load<3> -> op<3> ->
> store<3>. That should work even on machines that don't have de-interleaved access (assuming they're pointers to simple types, etc).

Getting this example in the slp vectorizer is easier but we won?t get the most efficient code (i.e. the one that gcc emits) because we would have <3 x i8> stores/loads. With vectorization of interleaved data you can load/store more elements (from several iterations) with a single load.

> 
> 
> On 21 October 2013 17:29, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> can you post a hand-created vectorized IR of how a reduction would work on your example?
> 
> I'm not there yet, just trying to understand the induction/reduction code first and what comes out of it.
> 
> 
> I think the right approach to get examples like this is that we need to recognize strided pointer inductions (we only support strides by one currently).
> 
> I see. I'll have a look at IK_PtrInduction and see what patterns I can spot.
> 
> Do you envisage a new IK type for strided induction, or just work with the PtrInduction to extend the concept of a non-unit stride (ie. separate Step from ElementSize)?

Either representation should be fine. I think the bigger task is not recognizing the induction but recognizing consecutive strided memory accesses, though. First, I think we want to be able to do:

for (i = 0 ? n, +1)
  a[3*i] =
  a[3*i+1] =
  a[3*i+2] =

And next,

for (i = 0 ? n, +1)
  *a++ =
  *a++ =
  *a++ =

Because to get the latter, you need the former.

Have you compared the performance of the kernel (gcc vectorized) you showed vs a scalar version? I would be curious about the speed-up.

Thanks,
Arnold

------------------------------

Message: 7
Date: Tue, 22 Oct 2013 00:08:01 +0400
From: Simon Atanasyan <simon at atanasyan.com>
To: llvmdev <llvmdev at cs.uiuc.edu>
Subject: [LLVMdev] [lld] Handle _GLOBAL_OFFSET_TABLE symbol
Message-ID:
    <CAGyS+DRSEmE1aqyCHn=zpxFBR7Ynjj5MyPxMp01yJXXQBzgp8g at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

What is a recommended way of handling (define, assign value)
_GLOBAL_OFFSET_TABLE symbol? It looks like Hexagon and X86_64 targets
use different API to achieve the same result.

tia,

Simon

------------------------------

Message: 8
Date: Mon, 21 Oct 2013 15:16:58 -0500
From: Shankar Easwaran <shankare at codeaurora.org>
To: Simon Atanasyan <simon at atanasyan.com>, llvmdev
    <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] [lld] Handle _GLOBAL_OFFSET_TABLE symbol
Message-ID: <52658BBA.4000203 at codeaurora.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Its a DefinedAtom whose value is set by the Target Handlers.

On 10/21/2013 3:08 PM, Simon Atanasyan wrote:
> Hi,
>
> What is a recommended way of handling (define, assign value)
> _GLOBAL_OFFSET_TABLE symbol? It looks like Hexagon and X86_64 targets
> use different API to achieve the same result.
>
> tia,
>
> Simon
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu        http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation

------------------------------

Message: 9
Date: Mon, 21 Oct 2013 21:40:38 +0100
From: Renato Golin <renato.golin at linaro.org>
To: Arnold Schwaighofer <aschwaighofer at apple.com>
Cc: LLVM Dev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] First attempt at recognizing pointer reduction
Message-ID:
    <CAMSE1kc7H9Ev9LKYhEUbkj1NGuNDCq_1k90SvsaYXKx5e3SO4w at mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"

On 21 October 2013 20:58, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:

> For example these should be the SCEVs of ?int a[2*i] = ; a[2*i+1] =?:
>
> {ptr,   +, 8}_loop
> {ptr+4, +, 8}_loop
>
> Each access on its own requires a gather/scather (2 loads/stores when
> vectorized (VF=2) + inserts/extracts). But when we look at both at once we
> see that we only need two load/store in total (plus some interleaving
> operations).
>

Yes, I've been studying SCEV when trying to understand some other patterns
where the vectorizer was unable to detect the exit count (basically this
case, with a nested loop). It does make things easier to spot patterns in
the code.

The patch I attached here was not to help vectorize anything, but to let me
jump over the validation step, so that I could start working with the
patterns themselves during the actual vectorization. The review request was
only to understand if the checks I was making made sense, but it turned out
a lot more valuable than that.

Getting this example in the slp vectorizer is easier but we won?t get the
> most efficient code (i.e. the one that gcc emits) because we would have <3
> x i8> stores/loads. With vectorization of interleaved data you can
> load/store more elements (from several iterations) with a single load.
>

So, this was the other patterns I was looking for, as a stepping stone into
the full vectorizer. But I'm not sure this will help in any way the strided
access.

Either representation should be fine. I think the bigger task is not
> recognizing the induction but recognizing consecutive strided memory
> accesses, though. First, I think we want to be able to do:
>
> for (i = 0 ? n, +1)
>   a[3*i] =
>   a[3*i+1] =
>   a[3*i+2] =
>
> And next,
>
> for (i = 0 ? n, +1)
>   *a++ =
>   *a++ =
>   *a++ =
>
> Because to get the latter, you need the former.
>

Makes total sense. I'll change my approach.

Have you compared the performance of the kernel (gcc vectorized) you showed
> vs a scalar version? I would be curious about the speed-up.
>

4x faster, on both Cortex A9 and A15. :)

Thanks for the tips, I hope I can find more time to work on it this week,
since Linaro Connect is in the coming week and the US dev meeting is on the
next.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/c0d72575/attachment-0001.html>

------------------------------

Message: 10
Date: Tue, 22 Oct 2013 00:47:44 +0400
From: Simon Atanasyan <simon at atanasyan.com>
To: Shankar Easwaran <shankare at codeaurora.org>
Cc: llvmdev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] [lld] Handle _GLOBAL_OFFSET_TABLE symbol
Message-ID:
    <CAGyS+DSCbWZEfuWONXqnuLxvDEsLbdM=i=UL8FKM+wdf9F-LHw at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

The Hexagon target adds new atom using the addAbsoluteAtom() functions
and then assigns a virtual address in the finalizeSymbolValues()
routine. The X86_64 target uses addAtom() function to add an object of
the GLOBAL_OFFSET_TABLEAtom class to do the same thing. What is the
reason of this difference? Is the GLOBAL_OFFSET_TABLEAtom just a
useful wrapper which eliminates the necessity to assign an address to
the atom explicitly in the finalizeSymbolValues() routine?

On Tue, Oct 22, 2013 at 12:16 AM, Shankar Easwaran
<shankare at codeaurora.org> wrote:
> Its a DefinedAtom whose value is set by the Target Handlers.
>
> On 10/21/2013 3:08 PM, Simon Atanasyan wrote:
>> What is a recommended way of handling (define, assign value)
>> _GLOBAL_OFFSET_TABLE symbol? It looks like Hexagon and X86_64 targets
>> use different API to achieve the same result.

-- 
Simon

------------------------------

Message: 11
Date: Mon, 21 Oct 2013 15:51:53 -0500
From: Arnold Schwaighofer <aschwaighofer at apple.com>
To: Renato Golin <renato.golin at linaro.org>
Cc: LLVM Dev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] First attempt at recognizing pointer reduction
Message-ID: <F1F91875-68E1-437A-9629-ABFF3ABB9466 at apple.com>
Content-Type: text/plain; charset=windows-1252

Whenever you find time to work on this will be of great help!

I wanted to work on this at some point. But my near time tasks won?t allow. So I much appreciate any help!

Thanks,
Arnold

On Oct 21, 2013, at 3:40 PM, Renato Golin <renato.golin at linaro.org> wrote:

> On 21 October 2013 20:58, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> For example these should be the SCEVs of ?int a[2*i] = ; a[2*i+1] =?:
> 
> {ptr,   +, 8}_loop
> {ptr+4, +, 8}_loop
> 
> Each access on its own requires a gather/scather (2 loads/stores when vectorized (VF=2) + inserts/extracts). But when we look at both at once we see that we only need two load/store in total (plus some interleaving operations).
> 
> 
> Yes, I've been studying SCEV when trying to understand some other patterns where the vectorizer was unable to detect the exit count (basically this case, with a nested loop). It does make things easier to spot patterns in the code.
> 
> The patch I attached here was not to help vectorize anything, but to let me jump over the validation step, so that I could start working with the patterns themselves during the actual vectorization. The review request was only to understand if the checks I was making made sense, but it turned out a lot more valuable than that.
> 
> 
> Getting this example in the slp vectorizer is easier but we won?t get the most efficient code (i.e. the one that gcc emits) because we would have <3 x i8> stores/loads. With vectorization of interleaved data you can load/store more elements (from several iterations) with a single load.
> 
> So, this was the other patterns I was looking for, as a stepping stone into the full vectorizer. But I'm not sure this will help in any way the strided access.
> 
> 
> 
> Either representation should be fine. I think the bigger task is not recognizing the induction but recognizing consecutive strided memory accesses, though. First, I think we want to be able to do:
> 
> for (i = 0 ? n, +1)
>   a[3*i] =
>   a[3*i+1] =
>   a[3*i+2] =
> 
> And next,
> 
> for (i = 0 ? n, +1)
>   *a++ =
>   *a++ =
>   *a++ =
> 
> Because to get the latter, you need the former.
> 
> Makes total sense. I'll change my approach.
> 
> 
> Have you compared the performance of the kernel (gcc vectorized) you showed vs a scalar version? I would be curious about the speed-up.
> 
> 4x faster, on both Cortex A9 and A15. :)

Nice.

> 
> Thanks for the tips, I hope I can find more time to work on it this week, since Linaro Connect is in the coming week and the US dev meeting is on the next.
> 
> cheers,
> --renato

------------------------------

Message: 12
Date: Mon, 21 Oct 2013 15:58:29 -0500
From: Shankar Easwaran <shankare at codeaurora.org>
To: Simon Atanasyan <simon at atanasyan.com>
Cc: llvmdev <llvmdev at cs.uiuc.edu>
Subject: Re: [LLVMdev] [lld] Handle _GLOBAL_OFFSET_TABLE symbol
Message-ID: <52659575.7090002 at codeaurora.org>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 10/21/2013 3:47 PM, Simon Atanasyan wrote:
> The Hexagon target adds new atom using the addAbsoluteAtom() functions
> and then assigns a virtual address in the finalizeSymbolValues()
> routine. The X86_64 target uses addAtom() function to add an object of
> the GLOBAL_OFFSET_TABLEAtom class to do the same thing. What is the
> reason of this difference?
This should be fixed and both should use the addAbsoluteAtom function.

> Is the GLOBAL_OFFSET_TABLEAtom just a
> useful wrapper which eliminates the necessity to assign an address to
> the atom explicitly in the finalizeSymbolValues() routine?
Its being used as a wrapper currently, as this is an AbsoluteAtom and 
not a DefinedAtom. This should also be fixed.

The problem is this is being treated as a DefinedAtom, which should be 
an Absolute atom in my opinion.

Shankar Easwaran

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by the Linux Foundation

------------------------------

Message: 13
Date: Mon, 21 Oct 2013 21:09:34 +0000
From: "Robinson, Paul" <Paul_Robinson at playstation.sony.com>
To: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>, "cfe-dev at cs.uiuc.edu"
    <cfe-dev at cs.uiuc.edu>
Subject: Re: [LLVMdev] llvm.org bug trend
Message-ID:
    <E3B07FDB86BFF041819DC057DEED8FEA5596B747 at USCULXMSG02.am.sony.com>
Content-Type: text/plain; charset="us-ascii"

This week, for the very first time since I started sampling (14 months ago),
the weekly open-bug count went DOWN.

Traffic on llvm-bugs indicates that Bill Wendling is doing a lot of this.  Huzzah!
--paulr

From: cfe-dev-bounces at cs.uiuc.edu [mailto:cfe-dev-bounces at cs.uiuc.edu] On Behalf Of Robinson, Paul
Sent: Tuesday, July 30, 2013 11:32 AM
To: llvmdev at cs.uiuc.edu; cfe-dev at cs.uiuc.edu
Subject: [cfe-dev] llvm.org bug trend

Over most of the past year, I have been keeping an eye on the overall LLVM.org open-bug count.
Sampling the count (almost) every Monday morning, it is a consistently non-decreasing number.
I thought I'd post something about it to the Dev lists, as the count broke 4000 this past week.
For your entertainment here's a chart that Excel produced from the data. (To make it more
dramatic, I carefully did not use a proper zero point on the X-axis.)

I do not have per-category breakdowns, sorry, just the raw total.

Makes me think more seriously about cruising the bug list for something that looks like
I could actually fix it...
--paulr

[cid:image001.png at 01CE8C56.9A40D7E0]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/e18e8251/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 11909 bytes
Desc: image001.png
URL: <http://lists.cs.uiuc.edu/pipermail/llvmdev/attachments/20131021/e18e8251/attachment.png>

------------------------------

_______________________________________________
LLVMdev mailing list
LLVMdev at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

End of LLVMdev Digest, Vol 112, Issue 56
****************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/4b2e4fcb/attachment.html>