Message ID | 4DA1047C-92D3-485A-9457-61655ED14681@lps.ens.fr |
---|---|
State | New |
Headers | show |
Hi Dominique,
> which means that -fexternal-blas should disable the inlining.
It is not surprising that a higly tuned BLAS library is better than
a simple inlining for large matrices.
I did some tests by adjusting n; it seems the inline version is
faster for n<=22, which is not too bad.
Regarding your other test case: This tests matrix*vector
multiplication, which is not implemented yet :-)
Regards,
Thomas
The patch causes the following regressions: FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (internal compiler error) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (test for excess errors) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib -O2 -lcaf_single -latomic (internal compiler error) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib -O2 -lcaf_single -latomic (test for excess errors) FAIL: gfortran.dg/array_function_3.f90 -O (internal compiler error) FAIL: gfortran.dg/array_function_3.f90 -O (test for excess errors) FAIL: gfortran.dg/bind_c_vars.f90 -g -flto (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O0 (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O1 (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O2 (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -O3 -g (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -Os (internal compiler error) FAIL: gfortran.dg/bound_2.f90 -Os (test for excess errors) FAIL: gfortran.dg/bound_2.f90 -g -flto (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O1 (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O2 (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -O3 -g (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -Os (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -Os (test for excess errors) FAIL: gfortran.dg/bound_7.f90 -g -flto (internal compiler error) FAIL: gfortran.dg/bound_7.f90 -g -flto (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O0 (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O0 (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O1 (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O1 (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O2 (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O2 (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-loops (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -O3 -g (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -Os (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -Os (test for excess errors) FAIL: gfortran.dg/bound_8.f90 -g -flto (internal compiler error) FAIL: gfortran.dg/bound_8.f90 -g -flto (test for excess errors) I think the line if (!upper && as->type == AS_ASSUMED_SHAPE && dim) should be something such as (untested) if (!upper && dim && as && as->type == AS_ASSUMED_SHAPE) Dominique > Le 5 avr. 2015 à 22:37, Thomas Koenig <tkoenig@netcologne.de> a écrit : > > Hi Dominique, > >> which means that -fexternal-blas should disable the inlining. > > It is not surprising that a higly tuned BLAS library is better than > a simple inlining for large matrices. > > I did some tests by adjusting n; it seems the inline version is > faster for n<=22, which is not too bad. > > Regarding your other test case: This tests matrix*vector > multiplication, which is not implemented yet :-) > > Regards, > > Thomas
> Le 6 avr. 2015 à 01:15, Dominique d'Humières <dominiq@lps.ens.fr> a écrit : > > The patch causes the following regressions: > > FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (internal compiler error) > … > FAIL: gfortran.dg/bound_8.f90 -g -flto (test for excess errors) > > I think the line > > if (!upper && as->type == AS_ASSUMED_SHAPE && dim) > > should be something such as (untested) > > if (!upper && dim && as && as->type == AS_ASSUMED_SHAPE) This fixes a first batch of ICEs. A second one if fixed by if (kind && lower->expr_type == EXPR_CONSTANT) While the first change is obvious, I am not sure about the second one. With this changes I am left with the following regressions FAIL: gfortran.dg/function_optimize_1.f90 -O scan-tree-dump-times original "matmul_r4" 1 FAIL: gfortran.dg/function_optimize_7.f90 -O scan-tree-dump-times original "matmul_r4" 1 FAIL: gfortran.dg/function_optimize_2.f90 -O scan-tree-dump-times original "matmul_r4" 1 FAIL: gfortran.dg/matmul_bounds_2.f90 -O1 output pattern test FAIL: gfortran.dg/matmul_bounds_2.f90 -O2 output pattern test FAIL: gfortran.dg/matmul_bounds_2.f90 -O3 -fomit-frame-pointer output pattern test FAIL: gfortran.dg/matmul_bounds_2.f90 -O3 -fomit-frame-pointer -funroll-loops output pattern test FAIL: gfortran.dg/matmul_bounds_2.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions output pattern test FAIL: gfortran.dg/matmul_bounds_2.f90 -O3 -g output pattern test FAIL: gfortran.dg/matmul_bounds_2.f90 -Os output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -O1 output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -O2 output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -O3 -fomit-frame-pointer output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -O3 -fomit-frame-pointer -funroll-loops output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -O3 -g output pattern test FAIL: gfortran.dg/matmul_bounds_3.f90 -Os output pattern test FAIL: gfortran.dg/realloc_on_assign_11.f90 -O3 -fomit-frame-pointer execution test FAIL: gfortran.dg/realloc_on_assign_11.f90 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/realloc_on_assign_11.f90 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gfortran.dg/realloc_on_assign_11.f90 -O3 -g execution test FAIL: gfortran.dg/realloc_on_assign_11.f90 -Os execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -O1 execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -O2 execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -O3 -fomit-frame-pointer execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -O3 -fomit-frame-pointer -funroll-loops execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -O3 -g execution test FAIL: gfortran.dg/realloc_on_assign_7.f03 -Os execution test The gfortran.dg/matmul_bounds_* are failures due to a change in the run time error (is there a good reason for that?) and the gfortran.dg/realloc_on_assign_* failures are due to segfaults at run time. Dominique >> Le 5 avr. 2015 à 22:37, Thomas Koenig <tkoenig@netcologne.de> a écrit : >> >> Hi Dominique, >> >>> which means that -fexternal-blas should disable the inlining. >> >> It is not surprising that a higly tuned BLAS library is better than >> a simple inlining for large matrices. >> >> I did some tests by adjusting n; it seems the inline version is >> faster for n<=22, which is not too bad. >> >> Regarding your other test case: This tests matrix*vector >> multiplication, which is not implemented yet :-) >> >> Regards, >> >> Thomas >
--- induct.f90 2005-10-11 22:53:32.000000000 +0200 +++ induct_vmc.f90 2015-04-05 19:06:30.000000000 +0200 @@ -1644,18 +1644,17 @@ contains coil_tmp_vector(1) = -sin(theta) coil_tmp_vector(2) = cos(theta) coil_tmp_vector(3) = 0.0_longreal - coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) - coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) - coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) + coil_current_vec = matmul(rotate_coil,coil_tmp_vector) ! do j = 1, 9 c_vector(3) = 0.5 * h_coil * z1gauss(j) ! ! rotate coil vector into the global coordinate system and translate it ! - rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx - rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy - rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz + rot_c_vector = matmul(rotate_coil,c_vector) + rot_c_vector(1) = rot_c_vector(1) + dx + rot_c_vector(2) = rot_c_vector(2) + dy + rot_c_vector(3) = rot_c_vector(3) + dz ! do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) @@ -1664,9 +1663,7 @@ contains ! ! rotate quad vector into the global coordinate system ! - rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) - rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) - rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) + rot_q_vector = matmul(rotate_quad,q_vector) ! ! compute and add in quadrature term ! @@ -1756,18 +1753,17 @@ contains coil_tmp_vector(1) = -sin(theta) coil_tmp_vector(2) = cos(theta) coil_tmp_vector(3) = 0.0_longreal - coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) - coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) - coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) + coil_current_vec = matmul(rotate_coil,coil_tmp_vector) ! do j = 1, 9 c_vector(3) = 0.5 * h_coil * z1gauss(j) ! ! rotate coil vector into the global coordinate system and translate it ! - rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx - rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy - rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz + rot_c_vector = matmul(rotate_coil,c_vector) + rot_c_vector(1) = rot_c_vector(1) + dx + rot_c_vector(2) = rot_c_vector(2) + dy + rot_c_vector(3) = rot_c_vector(3) + dz ! do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) @@ -1776,9 +1772,7 @@ contains ! ! rotate quad vector into the global coordinate system ! - rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) - rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) - rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) + rot_q_vector = matmul(rotate_quad,q_vector) ! ! compute and add in quadrature term ! @@ -2061,18 +2055,17 @@ contains ! ! compute current vector for the coil in the global coordinate system ! - coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) - coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) - coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) + coil_current_vec = matmul(rotate_coil,coil_tmp_vector) ! do j = 1, 9 c_vector(3) = 0.5 * h_coil * z1gauss(j) ! ! rotate coil vector into the global coordinate system and translate it ! - rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx - rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy - rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz + rot_c_vector = matmul(rotate_coil,c_vector) + rot_c_vector(1) = rot_c_vector(1) + dx + rot_c_vector(2) = rot_c_vector(2) + dy + rot_c_vector(3) = rot_c_vector(3) + dz ! do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) @@ -2081,9 +2074,7 @@ contains ! ! rotate quad vector into the global coordinate system ! - rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) - rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) - rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) + rot_q_vector = matmul(rotate_quad,q_vector) ! ! compute and add in quadrature term ! @@ -2204,18 +2195,17 @@ contains ! ! compute current vector for the coil in the global coordinate system ! - coil_current_vec(1) = dot_product(rotate_coil(1,:),coil_tmp_vector(:)) - coil_current_vec(2) = dot_product(rotate_coil(2,:),coil_tmp_vector(:)) - coil_current_vec(3) = dot_product(rotate_coil(3,:),coil_tmp_vector(:)) + coil_current_vec = matmul(rotate_coil,coil_tmp_vector) ! do j = 1, 9 c_vector(3) = 0.5 * h_coil * z1gauss(j) ! ! rotate coil vector into the global coordinate system and translate it ! - rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx - rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy - rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz + rot_c_vector = matmul(rotate_coil,c_vector) + rot_c_vector(1) = rot_c_vector(1) + dx + rot_c_vector(2) = rot_c_vector(2) + dy + rot_c_vector(3) = rot_c_vector(3) + dz ! do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) @@ -2224,9 +2214,7 @@ contains ! ! rotate quad vector into the global coordinate system ! - rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) - rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) - rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) + rot_q_vector = matmul(rotate_quad,q_vector) ! ! compute and add in quadrature term !