<div dir="ltr">Hmm, I'm not able to get those .ll files to compile if I disable SSE and I end up with SSE instructions(including sqrtpd) if I don't disable it.</div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <span dir="ltr"><<a href="mailto:peter@uformia.com" target="_blank">peter@uformia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <div>Is there something specifically
      required to enable SSE? If it's not detected as available (based
      from the target triple?) then I don't think we enable it
      specifically.<br>
      <br>
      Also it seems that it should handle converting to/from the vector
      types, although I can see it getting confused about needing to do
      that if it thinks SSE isn't available at all.<div><div class="h5"><br>
      <br>
      On 19/07/2013 3:47 PM, Craig Topper wrote:<br>
    </div></div></div><div><div class="h5">
    <blockquote type="cite">
      <div dir="ltr">Hmm, maybe sse isn't being enabled so its falling
        back to emulating sqrt?</div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Thu, Jul 18, 2013 at 10:45 PM, Peter
          Newman <span dir="ltr"><<a href="mailto:peter@uformia.com" target="_blank">peter@uformia.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div>In the disassembly, I'm seeing three cases of<br>
                call        76719BA1<br>
                <br>
                I am assuming this is the sqrt function as this is the
                only function called in the LLVM IR.<br>
                <br>
                The code at 76719BA1 is:<br>
                <br>
                76719BA1  push        ebp  <br>
                76719BA2  mov         ebp,esp <br>
                76719BA4  sub         esp,20h <br>
                76719BA7  and         esp,0FFFFFFF0h <br>
                76719BAA  fld         st(0) <br>
                76719BAC  fst         dword ptr [esp+18h] <br>
                76719BB0  fistp       qword ptr [esp+10h] <br>
                76719BB4  fild        qword ptr [esp+10h] <br>
                76719BB8  mov         edx,dword ptr [esp+18h] <br>
                76719BBC  mov         eax,dword ptr [esp+10h] <br>
                76719BC0  test        eax,eax <br>
                76719BC2  je          76719DCF <br>
                76719BC8  fsubp       st(1),st <br>
                76719BCA  test        edx,edx <br>
                76719BCC  js          7671F9DB <br>
                76719BD2  fstp        dword ptr [esp] <br>
                76719BD5  mov         ecx,dword ptr [esp] <br>
                76719BD8  add         ecx,7FFFFFFFh <br>
                76719BDE  sbb         eax,0 <br>
                76719BE1  mov         edx,dword ptr [esp+14h] <br>
                76719BE5  sbb         edx,0 <br>
                76719BE8  leave            <br>
                76719BE9  ret              <br>
                <br>
                <br>
                As you can see at 76719BD5, it modifies ECX .<br>
                <br>
                I don't know that this is the sqrtpd function (for
                example, I'm not seeing any SSE instructions here?) but
                whatever it is, it's being called from the IR I attached
                earlier, and is modifying ECX under some circumstances.
                <div>
                  <div><br>
                    <br>
                    On 19/07/2013 3:29 PM, Craig Topper wrote:<br>
                  </div>
                </div>
              </div>
              <div>
                <div>
                  <blockquote type="cite">
                    <div dir="ltr">That should map directly to sqrtpd
                      which can't modify ecx.</div>
                    <div class="gmail_extra"><br>
                      <br>
                      <div class="gmail_quote">On Thu, Jul 18, 2013 at
                        10:27 PM, Peter Newman <span dir="ltr"><<a href="mailto:peter@uformia.com" target="_blank">peter@uformia.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                          <div text="#000000" bgcolor="#FFFFFF">
                            <div>Sorry, that should have been
                              llvm.x86.sse2.sqrt.pd
                              <div>
                                <div><br>
                                  <br>
                                  On 19/07/2013 3:25 PM, Craig Topper
                                  wrote:<br>
                                </div>
                              </div>
                            </div>
                            <div>
                              <div>
                                <blockquote type="cite">
                                  <div dir="ltr">What is
                                    "frep.x86.sse2.sqrt.pd". I'm only
                                    familiar with things prefixed with
                                    "llvm.x86".</div>
                                  <div class="gmail_extra"><br>
                                    <br>
                                    <div class="gmail_quote">On Thu, Jul
                                      18, 2013 at 10:12 PM, Peter Newman
                                      <span dir="ltr"><<a href="mailto:peter@uformia.com" target="_blank">peter@uformia.com</a>></span>
                                      wrote:<br>
                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                        <div text="#000000" bgcolor="#FFFFFF">
                                          <div>After stepping through
                                            the produced assembly, I
                                            believe I have a culprit.<br>
                                            <br>
                                            One of the calls to
                                            @frep.x86.sse2.sqrt.pd is
                                            modifying the value of ECX -
                                            while the produced code is
                                            expecting it to still
                                            contain its previous value.<br>
                                            <br>
                                            Peter N
                                            <div>
                                              <div><br>
                                                <br>
                                                On 19/07/2013 2:09 PM,
                                                Peter Newman wrote:<br>
                                              </div>
                                            </div>
                                          </div>
                                          <div>
                                            <div>
                                              <blockquote type="cite">
                                                <div>I've attached the
                                                  module->dump() that
                                                  our code is producing.
                                                  Unfortunately this is
                                                  the smallest test case
                                                  I have available.<br>
                                                  <br>
                                                  This is before any
                                                  optimization passes
                                                  are applied. There are
                                                  two separate modules
                                                  in existence at the
                                                  time, and there are no
                                                  guarantees about the
                                                  order the surrounding
                                                  code calls those
                                                  functions, so there
                                                  may be some
                                                  interaction between
                                                  them? There shouldn't
                                                  be, they don't refer
                                                  to any common memory
                                                  etc. There is no
                                                  multi-threading
                                                  occurring.<br>
                                                  <br>
                                                  The function in
                                                  module-dump.ll (called
                                                  crashfunc in this
                                                  file) is called with<br>
                                                  -       
                                                  func_params   
                                                  0x0018f3b0    double
                                                  [3]<br>
                                                          [0x0]   
                                                  -11.339976634695301   
                                                  double<br>
                                                          [0x1]   
                                                  -9.7504239056205506   
                                                  double<br>
                                                          [0x2]   
                                                  -5.2900856817382804   
                                                  double<br>
                                                  at the time of the
                                                  exception.<br>
                                                  <br>
                                                  This is compiled on a
                                                  "i686-pc-win32"
                                                  triple. All of the
                                                  non-intrinsic
                                                  functions referred to
                                                  in these modules are
                                                  the standard
                                                  equivalents from the
                                                  MSVC library (e.g.
                                                  @asin is the standard
                                                  C lib    double asin(
                                                  double ) ).<br>
                                                  <br>
                                                  Hopefully this is
                                                  reproducible for you.<br>
                                                  <br>
                                                  --<br>
                                                  PeterN<br>
                                                  <br>
                                                  On 18/07/2013 4:37 PM,
                                                  Craig Topper wrote:<br>
                                                </div>
                                                <blockquote type="cite">
                                                  <div dir="ltr">Are you
                                                    able to send any IR
                                                    for others to
                                                    reproduce this
                                                    issue?</div>
                                                  <div class="gmail_extra"><br>
                                                    <br>
                                                    <div class="gmail_quote">On
                                                      Wed, Jul 17, 2013
                                                      at 11:23 PM, Peter
                                                      Newman <span dir="ltr"><<a href="mailto:peter@uformia.com" target="_blank">peter@uformia.com</a>></span>
                                                      wrote:<br>
                                                      <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Unfortunately,


                                                        this doesn't
                                                        appear to be the
                                                        bug I'm hitting.
                                                        I applied the
                                                        fix to my source
                                                        and it didn't
                                                        make a
                                                        difference.<br>
                                                        <br>
                                                        Also further
                                                        testing found me
                                                        getting the same
                                                        behavior with
                                                        other SIMD
                                                        instructions.
                                                        The common
                                                        factor is in
                                                        each case, ECX
                                                        is set to
                                                        0x7fffffff, and
                                                        it's an
                                                        operation using
                                                        xmm ptr
                                                        ecx+offset .<br>
                                                        <br>
                                                        Additionally,
                                                        turning the
                                                        optimization
                                                        level passed to
                                                        createJIT down
                                                        appears to avoid
                                                        it, so I'm now
                                                        leaning towards
                                                        a bug in one of
                                                        the optimization
                                                        passes.<br>
                                                        <br>
                                                        I'm going to dig
                                                        through the
                                                        passes
                                                        controlled by
                                                        that parameter
                                                        and see if I can
                                                        narrow down
                                                        which
                                                        optimization is
                                                        causing it.<br>
                                                        <br>
                                                        Peter N
                                                        <div>
                                                          <div><br>
                                                          <br>
                                                          On 17/07/2013
                                                          1:58 PM,
                                                          Solomon Boulos
                                                          wrote:<br>
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          As someone off
                                                          list just told
                                                          me, perhaps my
                                                          new bug is the
                                                          same issue:<br>
                                                          <br>
                                                             <a href="http://llvm.org/bugs/show_bug.cgi?id=16640" target="_blank">http://llvm.org/bugs/show_bug.cgi?id=16640</a><br>
                                                          <br>
                                                          Do you happen
                                                          to be using
                                                          FastISel?<br>
                                                          <br>
                                                          Solomon<br>
                                                          <br>
                                                          On Jul 16,
                                                          2013, at 6:39
                                                          PM, Peter
                                                          Newman <<a href="mailto:peter@uformia.com" target="_blank">peter@uformia.com</a>>




                                                          wrote:<br>
                                                          <br>
                                                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                                          Hello all,<br>
                                                          <br>
                                                          I'm currently
                                                          in the process
                                                          of debugging a
                                                          crash
                                                          occurring in
                                                          our program.
                                                          In LLVM 3.2
                                                          and 3.3 it
                                                          appears that
                                                          JIT generated
                                                          code is
                                                          attempting to
                                                          perform access
                                                          unaligned
                                                          memory with a
                                                          SSE2
                                                          instruction.
                                                          However this
                                                          only happens
                                                          under certain
                                                          conditions
                                                          that seem (but
                                                          may not be)
                                                          related to the
                                                          stacks state
                                                          on calling the
                                                          function.<br>
                                                          <br>
                                                          Our program
                                                          acts as a
                                                          front-end,
                                                          using the LLVM
                                                          C++ API to
                                                          generate a JIT
                                                          generated
                                                          function. This
                                                          function is
                                                          primarily
                                                          mathematical,
                                                          so we use the
                                                          Vector types
                                                          to take
                                                          advantage of
                                                          SIMD
                                                          instructions
                                                          (as well as a
                                                          few SSE2
                                                          intrinsics).<br>
                                                          <br>
                                                          This worked in
                                                          LLVM 2.8 but
                                                          started
                                                          failing in 3.2
                                                          and has
                                                          continued to
                                                          fail in 3.3.
                                                          It fails with
                                                          no
                                                          optimizations
                                                          applied to the
                                                          LLVM
                                                          Function/Module.
                                                          It crashes
                                                          with what is
                                                          reported as a
                                                          memory access
                                                          error
                                                          (accessing
                                                          0xffffffff),
                                                          however it's
                                                          suggested that
                                                          this is how
                                                          the SSE fault
                                                          raising
                                                          mechanism
                                                          appears.<br>
                                                          <br>
                                                          The generated
                                                          instruction
                                                          varies, but it
                                                          seems to often
                                                          be similar to
                                                          (I don't have
                                                          it in front of
                                                          me, sorry):<br>
                                                          movapd xmm0,
                                                          xmm[ecx+0x???????]<br>
                                                          Where the xmm
                                                          register
                                                          changes, and
                                                          the second
                                                          parameter is a
                                                          memory access.<br>
                                                          ECX is always
                                                          set to
                                                          0x7ffffff -
                                                          however I
                                                          don't know if
                                                          this is part
                                                          of the SSE
                                                          error
                                                          reporting
                                                          process or is
                                                          part of the
                                                          situation
                                                          causing the
                                                          error.<br>
                                                          <br>
                                                          I haven't
                                                          worked out
                                                          exactly what
                                                          code path etc
                                                          is causing
                                                          this crash.
                                                          I'm hoping
                                                          that someone
                                                          can tell me if
                                                          there were any
                                                          changed
                                                          requirements
                                                          for working
                                                          with SIMD in
                                                          LLVM 3.2 (or
                                                          earlier, we
                                                          haven't tried
                                                          3.0 or 3.1). I
                                                          currently
                                                          suspect the
                                                          use of
                                                          GlobalVariable
                                                          (we first
                                                          discovered the
                                                          crash when
                                                          using a
                                                          feature that
                                                          uses them),
                                                          however I have
                                                          attempted
                                                          using
                                                          setAlignment
                                                          on the
                                                          GlobalVariables
                                                          without any
                                                          change.<br>
                                                          <br>
                                                          --<br>
                                                          Peter N<br>
_______________________________________________<br>
                                                          LLVM
                                                          Developers
                                                          mailing list<br>
                                                          <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>
                                                                  <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
                                                          <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
                                                          </blockquote>
                                                          </blockquote>
                                                          <br>
_______________________________________________<br>
                                                          LLVM
                                                          Developers
                                                          mailing list<br>
                                                          <a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>
                                                                  <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
                                                          <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
                                                          </div>
                                                        </div>
                                                      </blockquote>
                                                    </div>
                                                    <br>
                                                    <br clear="all">
                                                    <div><br>
                                                    </div>
                                                    -- <br>
                                                    ~Craig </div>
                                                </blockquote>
                                                <br>
                                              </blockquote>
                                              <br>
                                            </div>
                                          </div>
                                        </div>
                                      </blockquote>
                                    </div>
                                    <br>
                                    <br clear="all">
                                    <div><br>
                                    </div>
                                    -- <br>
                                    ~Craig </div>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                      <br clear="all">
                      <div><br>
                      </div>
                      -- <br>
                      ~Craig </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        ~Craig
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>~Craig
</div>