Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Change the assignment paradigm of the allocatable array in test-drive demos #22

Open
zoziha opened this issue Sep 2, 2022 · 4 comments

Comments

@zoziha
Copy link

zoziha commented Sep 2, 2022

Currently, test-drive uses this paradigm:

testsuites = [ &
    new_testsuite("suite1", collect_suite1), &
    new_testsuite("suite2", collect_suite2) &
    ]

In fpm test -V with default flags will result in the following warning:

test\check.f90:17:54:

   17 |     type(testsuite_type), allocatable :: testsuites(:)
      |                                                      ^
note: 'testsuites' declared here
test\check.f90:28:18:

   28 |                  ]
      |                  ^
Warning: 'testsuites.dim[0].lbound' may be used uninitialized [-Wmaybe-uninitialized]

Should we pursue zero warnings here and use the following allocation style to update the test-drive demos?

allocate (testsuites, source=[ &
          new_testsuite("suite1", collect_suite1), &
          new_testsuite("suite2", collect_suite2) &
          ])
@arjenmarkus
Copy link
Member

arjenmarkus commented Sep 2, 2022 via email

@awvwgk
Copy link
Member

awvwgk commented Sep 2, 2022

This is indeed a false positive warning. We use automatic LHS allocation here. An alternative would be to allocate the array with a dummy size of zero and than assign.

@zoziha
Copy link
Author

zoziha commented Sep 2, 2022

Actually, I think LHS is the most efficient, it only involves fewer operations; but I also feel that allocate(.., source=..) will be optimized by the compiler, I tried to write performance test code below, it seems that we are too worrying about the impact of a small amount of allocation on performance, I believe that the number of unit tests for ordinary users will not reach the order of N=10000000. Even if it is achieved, the efficiency difference of these schemes will be very close:

Bench Code with N=10000000
!> Time functions
module time_m

    implicit none
    private

    public :: tic, toc

    integer, save :: time_save !! save the time

contains

    !> Start timer
    impure subroutine tic()
        call system_clock(time_save)
    end subroutine tic

    !> Stop timer and return the time
    impure subroutine toc(t)
        class(*), optional :: t    !! time in seconds
        integer :: time_now, time_rate

        call system_clock(time_now, time_rate)

        associate (dt => real(time_now - time_save)/time_rate)

            if (present(t)) then
                select type (t)
                type is (real)
                    t = dt
                type is (double precision)
                    t = real(dt, 8)
                type is (integer)
                    t = nint(dt)
                type is (character(*))
                    write (*, "(2a,g0.3,a)") t, ', time elapsed: ', dt, " s"
                class default
                    write (*, '(a)') 'Error: unknown type of t in toc()'
                end select
            else
                write (*, "(a,g0.3,a)") 'Time elapsed: ', dt, " s"
            end if

        end associate

    end subroutine toc

end module time_m

program main

    use time_m, only: tic, toc
    implicit none

    type node_t
        real(8), allocatable :: item(:)
    end type node_t

    integer, parameter :: N = 10000000
    type(node_t) :: x(N), y(N), z(N)
    integer :: i

    call tic()
    do i = 1, N
        x(i)%item = [real(8) :: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    end do
    call toc("1. LHS")

    call tic()
    do i = 1, N
        allocate (z(i)%item, source=[real(8) :: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    end do
    call toc("2. allocate(z, source=..)")

    call tic()
    do i = 1, N
        allocate (y(i)%item(0))
        y(i)%item = [real(8) :: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    end do
    call toc("3. allocate(y(0)), LHS")

end program main
>> fpm run --profile release   # On Windows-MSYS2 gfortran, N=10000000 (on my laptop)
1. LHS, time elapsed: 1.73 s
2. allocate(z, source=..), time elapsed: 1.86 s
3. allocate(y(0)), LHS, time elapsed: 3.30 s

On godbolt.org with N=300000:

# ifort 2021.6.0
1. LHS, time elapsed: .593E-01 s
2. allocate(z, source=..), time elapsed: .636E-01 s
3. allocate(y(0)), LHS, time elapsed: .893E-01 s
# gfortran 12.2
1. LHS, time elapsed: 0.410E-1 s
2. allocate(z, source=..), time elapsed: 0.420E-1 s
3. allocate(y(0)), LHS, time elapsed: 0.570E-1 s

From the bench results, we can see that the performances of 1 and 2 are close, but the efficiency ranking is 1 > 2 > 3.

Using allocate(.., source=..) does make the code look a little unintuitive (complicated), but in fact allocate(.., source=..) is the way we may use most.

A Different Bench Code with N=10000: 1 > 3 > 2
!> Time functions
module time_m

    implicit none
    private

    public :: tic, toc

    integer, save :: time_save !! save the time

contains

    !> Start timer
    impure subroutine tic()
        call system_clock(time_save)
    end subroutine tic

    !> Stop timer and return the time
    impure subroutine toc(t)
        class(*), optional :: t    !! time in seconds
        integer :: time_now, time_rate

        call system_clock(time_now, time_rate)

        associate (dt => real(time_now - time_save)/time_rate)

            if (present(t)) then
                select type (t)
                type is (real)
                    t = dt
                type is (double precision)
                    t = real(dt, 8)
                type is (integer)
                    t = nint(dt)
                type is (character(*))
                    write (*, "(2a,g0.3,a)") t, ', time elapsed: ', dt, " s"
                class default
                    write (*, '(a)') 'Error: unknown type of t in toc()'
                end select
            else
                write (*, "(a,g0.3,a)") 'Time elapsed: ', dt, " s"
            end if

        end associate

    end subroutine toc

end module time_m

program main

    use time_m, only: tic, toc
    implicit none

    type node_t
        real(8), allocatable :: item(:)
    end type node_t

    integer, parameter :: N = 20000
    type(node_t) :: x(N), y(N), z(N)
    integer :: i
    real(8), allocatable :: items(:)

    allocate (items, source=[real(8) :: (i, i=1, N)])

    call tic()
    do i = 1, N
        x(i)%item = items
    end do
    call toc("1. LHS")

    call tic()
    do i = 1, N
        allocate (z(i)%item, source=items)
    end do
    call toc("2. allocate(z, source=..)")

    call tic()
    do i = 1, N
        allocate (y(i)%item(0))
        y(i)%item = items
    end do
    call toc("3. allocate(y(0)), LHS")

end program main
>> fpm run --profile release   # On Windows-MSYS2 gfortran, N=10000 (on my laptop)
1. LHS, time elapsed: 0.672 s
2. allocate(z, source=..), time elapsed: 0.828 s
3. allocate(y(0)), LHS, time elapsed: 0.688 s

@awvwgk
Copy link
Member

awvwgk commented Sep 2, 2022

The collect interface is only invoked once, the actual performance critical part is running the body of the tests which are referenced via a procedure pointer. I wouldn't worry too much about the performance difference between those solutions.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants