[Openmp-dev] Bug in clang/openmp: max-reduction fails if data type is "long double"

Tue Mar 15 06:48:04 PDT 2016

Here it comes!

Regards,

Stefan

Am 15.03.2016 um 13:38 schrieb Alexey Bataev:
> Hi, could please send llvm ir for your example?
> you can do it like this: clang -c -S -emit-llvm -o openmpbug.ll openmpbug.cc
> and attach openmpbug.ll file.
>
> Best regards,
> Alexey Bataev
> =============
> Software Engineer
> Intel Compiler Team
>
> 15.03.2016 12:43, Cownie, James H via Openmp-dev пишет:
>> This looks like an inconsistency in our knowledge of LLVM/Clang.
>>
>> Historically LLVM did not support 16 byte floating point, so the OpenMP runtime does not have support for them when compiled with clang (because the necessary routines couldn't be compiled!).
>>
>> If your code really is using 16 byte floating point numbers, then the tests in the runtime that don't compile those routines need to be enabled.
>>
>> It'd be worth checking, though, that the "long double" really is 16 bytes, so maybe you could just check sizeof(real_t).
>>
>> p.s. You don't need to initialize max_val; whatever value you put there will be ignored anyway, since the OpenMP standard says that the per-thread reduction values are initialized with the most negative value of the type.
>>
>> -- Jim
>>
>> James Cownie <james.h.cownie at intel.com>
>> SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
>> Tel: +44 117 9071438
>>
>> -----Original Message-----
>> From: Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] On Behalf Of Stefan Illy via Openmp-dev
>> Sent: Friday, March 11, 2016 7:36 AM
>> To: openmp-dev at lists.llvm.org
>> Subject: [Openmp-dev] Bug in clang/openmp: max-reduction fails if data type is "long double"
>>
>> Hello everybody,
>>
>> When I compile the simple test program attached below I get the
>> following error message:
>>
>> openmpbug.cc:(.text+0x2de): undefined reference to
>> `__sync_val_compare_and_swap_16'
>> openmpbug.cc:(.text+0x3a7): undefined reference to
>> `__sync_val_compare_and_swap_16'
>> clang-3.8: error: linker command failed with exit code 1 (use -v to see
>> invocation)
>>
>> This is caused by the OMP max-reduction statement. It also fails if I
>> switch from "max" to "min".
>> I use version 3.8 of clang on an Ubuntu 14.04 LTS system. It also fails
>> with version 3.9 (trunk, self-compiled).
>> If I switch the floating point data type (real_t) from "long double" to
>> "double" or "float", the code compiles and runs without problems.
>>
>> I hope this helps to make clang+openmp even better!
>>
>>
>> Here comes the simple test program:
>> ---------------------------------------SNIP-------------------------------------------
>> #include <iostream>
>>
>> using namespace std;
>>
>> //typedef float real_t;
>> //typedef double real_t;
>> typedef long double real_t;
>>
>> int
>> main()
>> {
>>      real_t maxval = -1.0e-10;
>> #pragma omp parallel for reduction(max: maxval)
>>      for (int i = 1; i <= 1000; i++) maxval = max(maxval, real_t(i));
>>      cout << maxval << endl;
>>      return 0;
>> }
>> ------------------------------------SNIP------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> Intel Corporation (UK) Limited
>> Registered No. 1134945 (England)
>> Registered Office: Pipers Way, Swindon SN3 1RJ
>> VAT No: 860 2173 47
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev

-------------- next part --------------
; ModuleID = 'openmpbug.cc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"class.std::ios_base::Init" = type { i8 }
%"class.std::basic_ostream" = type { i32 (...)**, %"class.std::basic_ios" }
%"class.std::basic_ios" = type { %"class.std::ios_base", %"class.std::basic_ostream"*, i8, i8, %"class.std::basic_streambuf"*, %"class.std::ctype"*, %"class.std::num_put"*, %"class.std::num_get"* }
%"class.std::ios_base" = type { i32 (...)**, i64, i64, i32, i32, i32, %"struct.std::ios_base::_Callback_list"*, %"struct.std::ios_base::_Words", [8 x %"struct.std::ios_base::_Words"], i32, %"struct.std::ios_base::_Words"*, %"class.std::locale" }
%"struct.std::ios_base::_Callback_list" = type { %"struct.std::ios_base::_Callback_list"*, void (i32, %"class.std::ios_base"*, i32)*, i32, i32 }
%"struct.std::ios_base::_Words" = type { i8*, i64 }
%"class.std::locale" = type { %"class.std::locale::_Impl"* }
%"class.std::locale::_Impl" = type { i32, %"class.std::locale::facet"**, i64, %"class.std::locale::facet"**, i8** }
%"class.std::locale::facet" = type <{ i32 (...)**, i32, [4 x i8] }>
%"class.std::basic_streambuf" = type { i32 (...)**, i8*, i8*, i8*, i8*, i8*, i8*, %"class.std::locale" }
%"class.std::ctype" = type <{ %"class.std::locale::facet.base", [4 x i8], %struct.__locale_struct*, i8, [7 x i8], i32*, i32*, i16*, i8, [256 x i8], [256 x i8], i8, [6 x i8] }>
%"class.std::locale::facet.base" = type <{ i32 (...)**, i32 }>
%struct.__locale_struct = type { [13 x %struct.__locale_data*], i16*, i32*, i32*, [13 x i8*] }
%struct.__locale_data = type opaque
%"class.std::num_put" = type { %"class.std::locale::facet.base", [4 x i8] }
%"class.std::num_get" = type { %"class.std::locale::facet.base", [4 x i8] }

$_ZSt3maxIeERKT_S2_S2_ = comdat any

@_ZStL8__ioinit = internal global %"class.std::ios_base::Init" zeroinitializer, align 1
@__dso_handle = external global i8
@_ZSt4cout = external global %"class.std::basic_ostream", align 8
@llvm.global_ctors = appending global [1 x { i32, void ()*, i8* }] [{ i32, void ()*, i8* } { i32 65535, void ()* @_GLOBAL__sub_I_openmpbug.cc, i8* null }]

; Function Attrs: uwtable
define internal void @__cxx_global_var_init() #0 section ".text.startup" {
  call void @_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"* @_ZStL8__ioinit)
  %1 = call i32 @__cxa_atexit(void (i8*)* bitcast (void (%"class.std::ios_base::Init"*)* @_ZNSt8ios_base4InitD1Ev to void (i8*)*), i8* getelementptr inbounds (%"class.std::ios_base::Init", %"class.std::ios_base::Init"* @_ZStL8__ioinit, i32 0, i32 0), i8* @__dso_handle) #2
  ret void
}

declare void @_ZNSt8ios_base4InitC1Ev(%"class.std::ios_base::Init"*) #1

declare void @_ZNSt8ios_base4InitD1Ev(%"class.std::ios_base::Init"*) #1

; Function Attrs: nounwind
declare i32 @__cxa_atexit(void (i8*)*, i8*, i8*) #2

; Function Attrs: norecurse uwtable
define i32 @main() #3 {
  %1 = alloca i32, align 4
  %maxval = alloca x86_fp80, align 16
  %i = alloca i32, align 4
  %2 = alloca x86_fp80, align 16
  store i32 0, i32* %1, align 4
  store x86_fp80 0xKBFDDDBE6FECEBDEDD800, x86_fp80* %maxval, align 16
  store i32 1, i32* %i, align 4
  br label %3

; <label>:3                                       ; preds = %11, %0
  %4 = load i32, i32* %i, align 4
  %5 = icmp sle i32 %4, 1000
  br i1 %5, label %6, label %14

; <label>:6                                       ; preds = %3
  %7 = load i32, i32* %i, align 4
  %8 = sitofp i32 %7 to x86_fp80
  store x86_fp80 %8, x86_fp80* %2, align 16
  %9 = call dereferenceable(16) x86_fp80* @_ZSt3maxIeERKT_S2_S2_(x86_fp80* dereferenceable(16) %maxval, x86_fp80* dereferenceable(16) %2)
  %10 = load x86_fp80, x86_fp80* %9, align 16
  store x86_fp80 %10, x86_fp80* %maxval, align 16
  br label %11

; <label>:11                                      ; preds = %6
  %12 = load i32, i32* %i, align 4
  %13 = add nsw i32 %12, 1
  store i32 %13, i32* %i, align 4
  br label %3

; <label>:14                                      ; preds = %3
  %15 = load x86_fp80, x86_fp80* %maxval, align 16
  %16 = call dereferenceable(272) %"class.std::basic_ostream"* @_ZNSolsEe(%"class.std::basic_ostream"* @_ZSt4cout, x86_fp80 %15)
  %17 = call dereferenceable(272) %"class.std::basic_ostream"* @_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"* %16, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)* @_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_)
  ret i32 0
}

; Function Attrs: inlinehint nounwind uwtable
define linkonce_odr dereferenceable(16) x86_fp80* @_ZSt3maxIeERKT_S2_S2_(x86_fp80* dereferenceable(16) %__a, x86_fp80* dereferenceable(16) %__b) #4 comdat {
  %1 = alloca x86_fp80*, align 8
  %2 = alloca x86_fp80*, align 8
  %3 = alloca x86_fp80*, align 8
  store x86_fp80* %__a, x86_fp80** %2, align 8
  store x86_fp80* %__b, x86_fp80** %3, align 8
  %4 = load x86_fp80*, x86_fp80** %2, align 8
  %5 = load x86_fp80, x86_fp80* %4, align 16
  %6 = load x86_fp80*, x86_fp80** %3, align 8
  %7 = load x86_fp80, x86_fp80* %6, align 16
  %8 = fcmp olt x86_fp80 %5, %7
  br i1 %8, label %9, label %11

; <label>:9                                       ; preds = %0
  %10 = load x86_fp80*, x86_fp80** %3, align 8
  store x86_fp80* %10, x86_fp80** %1, align 8
  br label %13

; <label>:11                                      ; preds = %0
  %12 = load x86_fp80*, x86_fp80** %2, align 8
  store x86_fp80* %12, x86_fp80** %1, align 8
  br label %13

; <label>:13                                      ; preds = %11, %9
  %14 = load x86_fp80*, x86_fp80** %1, align 8
  ret x86_fp80* %14
}

declare dereferenceable(272) %"class.std::basic_ostream"* @_ZNSolsEe(%"class.std::basic_ostream"*, x86_fp80) #1

declare dereferenceable(272) %"class.std::basic_ostream"* @_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"*, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)*) #1

declare dereferenceable(272) %"class.std::basic_ostream"* @_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_(%"class.std::basic_ostream"* dereferenceable(272)) #1

; Function Attrs: uwtable
define internal void @_GLOBAL__sub_I_openmpbug.cc() #0 section ".text.startup" {
  call void @__cxx_global_var_init()
  ret void
}

attributes #0 = { uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
attributes #3 = { norecurse uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #4 = { inlinehint nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.8.0 (tags/RELEASE_380/final)"}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4453 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20160315/e4a75c17/attachment-0001.bin>