AArch64: significantly improve formatted input performance by using optimized libc functions on ARM64 #642

pawosm-arm · 2018-12-19T22:11:52Z

Our experiments proved that use of memset and memcpy instead of explicit while loops gives dramatic speedup (at least on AArch64 hosts) on formatted input.

…ptimized libc functions on ARM64 Signed-off-by: Paul Osmialowski <[email protected]>

sscalpone · 2018-12-21T01:41:55Z

Do you have test cases that show the performance improvements?
Why do you think this change should be architecture dependent?

pawosm-arm · 2018-12-21T15:35:44Z

The test case I was using is following:

! *********************************************************
        program main

        implicit none
!       ---------------------------------------------------
        character(len=500) :: cart
        real(kind=8) :: t1,t2

!       ---------------------------------------------------
        open(unit=9,status='old',file='my_file.txt')
        open(unit=10,file='my_new_file.txt')
        call cpu_time(t1)
        do
!               read each line
                read(9,fmt='(A)') cart
!       ************************************************************
!       ************************************************************
!                       convert  process
!       ************************************************************
!       ************************************************************

                if(cart(1:4)=='/end') then
                        write(10,*) 'this is the end!'
                        exit
                else
                        write(10,*) cart
                endif
        enddo
        call cpu_time(t2)
        close(unit=9)
        close(unit=10)

        print*,' write and read :',t2-t1




!       ---------------------------------------------------

        end program main
! *********************************************************

my_file.txt.gz

Compiled with gfortran it gives much better timing results that when compiled with flang. My patch improves the timing of flang compiled program dramatically.

pawosm-arm · 2018-12-21T15:40:14Z

Regarding architecture dependency, string.h functions in glibc were carefully optimized for AArch64 and this can be observed in the results of the above test case. I can't guarantee the same for other architectures also I can't guarantee than on all of the architectures replacement of local loop with a function call should never cause performance drop.

AArch64: significantly improve formatted input performance by using o…

b14c65f

…ptimized libc functions on ARM64 Signed-off-by: Paul Osmialowski <[email protected]>

pawosm-arm force-pushed the formatted_input branch from ae1cee3 to b14c65f Compare December 20, 2018 10:49

pawosm-arm mentioned this pull request Jan 3, 2019

Introduce buffered I/O and replace getc with buffered read #647

Closed

pawosm-arm requested review from bryanpkc, shivaramaarao, xoviat and alokkrsharma July 12, 2022 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64: significantly improve formatted input performance by using optimized libc functions on ARM64 #642

AArch64: significantly improve formatted input performance by using optimized libc functions on ARM64 #642

pawosm-arm commented Dec 19, 2018 •

edited

Loading

sscalpone commented Dec 21, 2018

pawosm-arm commented Dec 21, 2018 •

edited

Loading

pawosm-arm commented Dec 21, 2018

AArch64: significantly improve formatted input performance by using optimized libc functions on ARM64 #642

Are you sure you want to change the base?

AArch64: significantly improve formatted input performance by using optimized libc functions on ARM64 #642

Conversation

pawosm-arm commented Dec 19, 2018 • edited Loading

sscalpone commented Dec 21, 2018

pawosm-arm commented Dec 21, 2018 • edited Loading

pawosm-arm commented Dec 21, 2018

pawosm-arm commented Dec 19, 2018 •

edited

Loading

pawosm-arm commented Dec 21, 2018 •

edited

Loading