Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: significantly improve formatted input performance by using optimized libc functions on ARM64 #642

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pawosm-arm
Copy link
Collaborator

@pawosm-arm pawosm-arm commented Dec 19, 2018

Our experiments proved that use of memset and memcpy instead of explicit while loops gives dramatic speedup (at least on AArch64 hosts) on formatted input.

…ptimized libc functions on ARM64

Signed-off-by: Paul Osmialowski <[email protected]>
@sscalpone
Copy link
Member

Do you have test cases that show the performance improvements?
Why do you think this change should be architecture dependent?

@pawosm-arm
Copy link
Collaborator Author

pawosm-arm commented Dec 21, 2018

The test case I was using is following:

! *********************************************************
        program main

        implicit none
!       ---------------------------------------------------
        character(len=500) :: cart
        real(kind=8) :: t1,t2

!       ---------------------------------------------------
        open(unit=9,status='old',file='my_file.txt')
        open(unit=10,file='my_new_file.txt')
        call cpu_time(t1)
        do
!               read each line
                read(9,fmt='(A)') cart
!       ************************************************************
!       ************************************************************
!                       convert  process
!       ************************************************************
!       ************************************************************

                if(cart(1:4)=='/end') then
                        write(10,*) 'this is the end!'
                        exit
                else
                        write(10,*) cart
                endif
        enddo
        call cpu_time(t2)
        close(unit=9)
        close(unit=10)

        print*,' write and read :',t2-t1




!       ---------------------------------------------------

        end program main
! *********************************************************

my_file.txt.gz

Compiled with gfortran it gives much better timing results that when compiled with flang. My patch improves the timing of flang compiled program dramatically.

@pawosm-arm
Copy link
Collaborator Author

Regarding architecture dependency, string.h functions in glibc were carefully optimized for AArch64 and this can be observed in the results of the above test case. I can't guarantee the same for other architectures also I can't guarantee than on all of the architectures replacement of local loop with a function call should never cause performance drop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants