Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoist malloc/free out of loops. #1500

Open
Leporacanthicus opened this issue Feb 24, 2022 · 2 comments
Open

Hoist malloc/free out of loops. #1500

Leporacanthicus opened this issue Feb 24, 2022 · 2 comments
Labels
codegen conversion of FIR/MLIR to LLVM IR FIR FIR language related Optimizer optimization pass(es) related

Comments

@Leporacanthicus
Copy link
Collaborator

Avoiding malloc/free altogether would be even better, but not always doable.

The following is an example of code that calls malloc & free [as an implementation of fir.allocmem and fir.freemem] to implement the WHERE functionality.

This is another performance enhancement found when investigating performance of the SNAP application.

Here's an example:

program m
  implicit none
  REAL(8), ALLOCATABLE, DIMENSION(:,:,:,:) :: qim

  integer(4) :: nang, noct,nx, ny, ng
  integer(4) :: ll

  real(8) :: total

  real :: time_start
  real :: time_end

  nx = 40
  ny = 40
  noct = 6
  ng = 18


  print *, "Total size = ", nx * ny * noct * ng * 8
  ALLOCATE(qim(nx,ny,noct,ng))

  qim = 1.0

  total = 0.0

  call CPU_TIME(time_start)
  do ll = 1, 20000
     where (qim >= 1.0) qim = 1.0
  end do
  total = sum(qim)
  call CPU_TIME(time_end)
  
  print "(A, F6.3)", " Time taken (s): ", time_end - time_start
  print *, total * 8, " This should match total size"

end program m
@banach-space banach-space added codegen conversion of FIR/MLIR to LLVM IR FIR FIR language related Optimizer optimization pass(es) related labels Feb 24, 2022
@Leporacanthicus
Copy link
Collaborator Author

Hand-modified MLIR, which, when compiled with tco + clang -O1 runs about 10% faster - that's ONLY by mvoing the malloc/free out of the loop.

module attributes {fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", llvm.target_triple = "x86_64-unknown-linux-gnu"}  {
  func @_QQmain() {
    %0 = fir.alloca i32 {bindc_name = "istat", uniq_name = "_QFEistat"}
    %1 = fir.alloca i32 {bindc_name = "j", uniq_name = "_QFEj"}
    %2 = fir.alloca i32 {bindc_name = "k", uniq_name = "_QFEk"}
    %3 = fir.alloca i32 {bindc_name = "ll", uniq_name = "_QFEll"}
    %4 = fir.alloca i32 {bindc_name = "nang", uniq_name = "_QFEnang"}
    %5 = fir.alloca i32 {bindc_name = "ng", uniq_name = "_QFEng"}
    %6 = fir.alloca i32 {bindc_name = "noct", uniq_name = "_QFEnoct"}
    %7 = fir.alloca i32 {bindc_name = "nx", uniq_name = "_QFEnx"}
    %8 = fir.alloca i32 {bindc_name = "ny", uniq_name = "_QFEny"}
    %9 = fir.address_of(@_QFEqim) : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
    %10 = fir.alloca f32 {bindc_name = "time_end", uniq_name = "_QFEtime_end"}
    %11 = fir.alloca f32 {bindc_name = "time_start", uniq_name = "_QFEtime_start"}
    %12 = fir.alloca f64 {bindc_name = "total", uniq_name = "_QFEtotal"}
    %c40_i32 = arith.constant 40 : i32
    fir.store %c40_i32 to %7 : !fir.ref<i32>
    %c40_i32_0 = arith.constant 40 : i32
    fir.store %c40_i32_0 to %8 : !fir.ref<i32>
    %c6_i32 = arith.constant 6 : i32
    fir.store %c6_i32 to %6 : !fir.ref<i32>
    %c18_i32 = arith.constant 18 : i32
    fir.store %c18_i32 to %5 : !fir.ref<i32>
    %c-1_i32 = arith.constant -1 : i32
    %13 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
    %14 = fir.convert %13 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
    %c22_i32 = arith.constant 22 : i32
    %15 = fir.call @_FortranAioBeginExternalListOutput(%c-1_i32, %14, %c22_i32) : (i32, !fir.ref<i8>, i32) -> !fir.ref<i8>
    %16 = fir.address_of(@_QQcl.546F74616C2073697A65203D20) : !fir.ref<!fir.char<1,13>>
    %c13 = arith.constant 13 : index
    %17 = fir.convert %16 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
    %18 = fir.convert %c13 : (index) -> i64
    %19 = fir.call @_FortranAioOutputAscii(%15, %17, %18) : (!fir.ref<i8>, !fir.ref<i8>, i64) -> i1
    %c8_i32 = arith.constant 8 : i32
    %20 = fir.load %7 : !fir.ref<i32>
    %21 = fir.load %8 : !fir.ref<i32>
    %22 = arith.muli %20, %21 : i32
    %23 = fir.load %6 : !fir.ref<i32>
    %24 = arith.muli %22, %23 : i32
    %25 = fir.load %5 : !fir.ref<i32>
    %26 = arith.muli %24, %25 : i32
    %27 = arith.muli %c8_i32, %26 : i32
    %28 = fir.call @_FortranAioOutputInteger32(%15, %27) : (!fir.ref<i8>, i32) -> i1
    %29 = fir.call @_FortranAioEndIoStatement(%15) : (!fir.ref<i8>) -> i32
    %true = arith.constant true
    %30 = fir.absent !fir.box<none>
    %31 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
    %c23_i32 = arith.constant 23 : i32
    %c1 = arith.constant 1 : index
    %32 = fir.load %7 : !fir.ref<i32>
    %c0_i32 = arith.constant 0 : i32
    %33 = fir.convert %9 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>) -> !fir.ref<!fir.box<none>>
    %34 = fir.convert %c1 : (index) -> i64
    %35 = fir.convert %32 : (i32) -> i64
    %36 = fir.call @_FortranAAllocatableSetBounds(%33, %c0_i32, %34, %35) : (!fir.ref<!fir.box<none>>, i32, i64, i64) -> none
    %c1_1 = arith.constant 1 : index
    %37 = fir.load %8 : !fir.ref<i32>
    %c1_i32 = arith.constant 1 : i32
    %38 = fir.convert %9 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>) -> !fir.ref<!fir.box<none>>
    %39 = fir.convert %c1_1 : (index) -> i64
    %40 = fir.convert %37 : (i32) -> i64
    %41 = fir.call @_FortranAAllocatableSetBounds(%38, %c1_i32, %39, %40) : (!fir.ref<!fir.box<none>>, i32, i64, i64) -> none
    %c1_2 = arith.constant 1 : index
    %42 = fir.load %6 : !fir.ref<i32>
    %c2_i32 = arith.constant 2 : i32
    %43 = fir.convert %9 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>) -> !fir.ref<!fir.box<none>>
    %44 = fir.convert %c1_2 : (index) -> i64
    %45 = fir.convert %42 : (i32) -> i64
    %46 = fir.call @_FortranAAllocatableSetBounds(%43, %c2_i32, %44, %45) : (!fir.ref<!fir.box<none>>, i32, i64, i64) -> none
    %c1_3 = arith.constant 1 : index
    %47 = fir.load %5 : !fir.ref<i32>
    %c3_i32 = arith.constant 3 : i32
    %48 = fir.convert %9 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>) -> !fir.ref<!fir.box<none>>
    %49 = fir.convert %c1_3 : (index) -> i64
    %50 = fir.convert %47 : (i32) -> i64
    %51 = fir.call @_FortranAAllocatableSetBounds(%48, %c3_i32, %49, %50) : (!fir.ref<!fir.box<none>>, i32, i64, i64) -> none
    %52 = fir.convert %9 : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>) -> !fir.ref<!fir.box<none>>
    %53 = fir.convert %31 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
    %54 = fir.call @_FortranAAllocatableAllocate(%52, %true, %30, %53, %c23_i32) : (!fir.ref<!fir.box<none>>, i1, !fir.box<none>, !fir.ref<i8>, i32) -> i32
    fir.store %54 to %0 : !fir.ref<i32>
    %cst = arith.constant 1.000000e+00 : f32
    %55 = fir.convert %cst : (f32) -> f64
    %56 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
    %57 = fir.box_addr %56 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>) -> !fir.heap<!fir.array<?x?x?x?xf64>>
    %58 = fir.convert %57 : (!fir.heap<!fir.array<?x?x?x?xf64>>) -> i64
    %c0_i64 = arith.constant 0 : i64
    %59 = arith.cmpi ne, %58, %c0_i64 : i64
    %60:2 = fir.if %59 -> (i1, !fir.heap<!fir.array<?x?x?x?xf64>>) {
      %false = arith.constant false
      %c0_18 = arith.constant 0 : index
      %122:3 = fir.box_dims %56, %c0_18 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c1_19 = arith.constant 1 : index
      %123:3 = fir.box_dims %56, %c1_19 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c2_20 = arith.constant 2 : index
      %124:3 = fir.box_dims %56, %c2_20 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c3_21 = arith.constant 3 : index
      %125:3 = fir.box_dims %56, %c3_21 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %126 = fir.if %false -> (!fir.heap<!fir.array<?x?x?x?xf64>>) {
        %127 = fir.allocmem !fir.array<?x?x?x?xf64>, %122#1, %123#1, %124#1, %125#1 {uniq_name = ".auto.alloc"}
        fir.result %127 : !fir.heap<!fir.array<?x?x?x?xf64>>
      } else {
        fir.result %57 : !fir.heap<!fir.array<?x?x?x?xf64>>
      }
      fir.result %false, %126 : i1, !fir.heap<!fir.array<?x?x?x?xf64>>
    } else {
      %true_18 = arith.constant true
      %122 = fir.address_of(@_QQcl.16373402fa3425c6e46827601abb88fa) : !fir.ref<!fir.char<1,76>>
      %c25_i32 = arith.constant 25 : i32
      %123 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
      %124 = fir.convert %122 : (!fir.ref<!fir.char<1,76>>) -> !fir.ref<i8>
      %125 = fir.convert %123 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
      %126 = fir.call @_FortranAReportFatalUserError(%124, %125, %c25_i32) : (!fir.ref<i8>, !fir.ref<i8>, i32) -> none
      fir.result %true_18, %57 : i1, !fir.heap<!fir.array<?x?x?x?xf64>>
    }
    %c0 = arith.constant 0 : index
    %61:3 = fir.box_dims %56, %c0 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %c1_4 = arith.constant 1 : index
    %62:3 = fir.box_dims %56, %c1_4 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %c2 = arith.constant 2 : index
    %63:3 = fir.box_dims %56, %c2 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %c3 = arith.constant 3 : index
    %64:3 = fir.box_dims %56, %c3 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %65 = fir.shape %61#1, %62#1, %63#1, %64#1 : (index, index, index, index) -> !fir.shape<4>
    %66 = fir.array_load %60#1(%65) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shape<4>) -> !fir.array<?x?x?x?xf64>
    %c1_5 = arith.constant 1 : index
    %c0_6 = arith.constant 0 : index
    %67 = arith.subi %61#1, %c1_5 : index
    %68 = arith.subi %62#1, %c1_5 : index
    %69 = arith.subi %63#1, %c1_5 : index
    %70 = arith.subi %64#1, %c1_5 : index
    %71 = fir.do_loop %arg0 = %c0_6 to %70 step %c1_5 unordered iter_args(%arg1 = %66) -> (!fir.array<?x?x?x?xf64>) {
      %122 = fir.do_loop %arg2 = %c0_6 to %69 step %c1_5 unordered iter_args(%arg3 = %arg1) -> (!fir.array<?x?x?x?xf64>) {
        %123 = fir.do_loop %arg4 = %c0_6 to %68 step %c1_5 unordered iter_args(%arg5 = %arg3) -> (!fir.array<?x?x?x?xf64>) {
          %124 = fir.do_loop %arg6 = %c0_6 to %67 step %c1_5 unordered iter_args(%arg7 = %arg5) -> (!fir.array<?x?x?x?xf64>) {
            %125 = fir.array_update %arg7, %55, %arg6, %arg4, %arg2, %arg0 : (!fir.array<?x?x?x?xf64>, f64, index, index, index, index) -> !fir.array<?x?x?x?xf64>
            fir.result %125 : !fir.array<?x?x?x?xf64>
          }
          fir.result %124 : !fir.array<?x?x?x?xf64>
        }
        fir.result %123 : !fir.array<?x?x?x?xf64>
      }
      fir.result %122 : !fir.array<?x?x?x?xf64>
    }
    fir.array_merge_store %66, %71 to %60#1 : !fir.array<?x?x?x?xf64>, !fir.array<?x?x?x?xf64>, !fir.heap<!fir.array<?x?x?x?xf64>>
    fir.if %60#0 {
      %122 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
      %c0_18 = arith.constant 0 : index
      %123:3 = fir.box_dims %122, %c0_18 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c1_19 = arith.constant 1 : index
      %124:3 = fir.box_dims %122, %c1_19 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c2_20 = arith.constant 2 : index
      %125:3 = fir.box_dims %122, %c2_20 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c3_21 = arith.constant 3 : index
      %126:3 = fir.box_dims %122, %c3_21 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      fir.if %59 {
        fir.freemem %57 : !fir.heap<!fir.array<?x?x?x?xf64>>
      }
      %127 = fir.shape_shift %123#0, %61#1, %124#0, %62#1, %125#0, %63#1, %126#0, %64#1 : (index, index, index, index, index, index, index, index) -> !fir.shapeshift<4>
      %128 = fir.embox %60#1(%127) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shapeshift<4>) -> !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
      fir.store %128 to %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
    }
    %cst_7 = arith.constant 0.000000e+00 : f32
    %72 = fir.convert %cst_7 : (f32) -> f64
    fir.store %72 to %12 : !fir.ref<f64>
    %73 = fir.call @_FortranACpuTime() : () -> f64
    %74 = fir.convert %73 : (f64) -> f32
    fir.store %74 to %11 : !fir.ref<f32>
    %c1_i32_8 = arith.constant 1 : i32
    %75 = fir.convert %c1_i32_8 : (i32) -> index
    %c20000_i32 = arith.constant 20000 : i32
    %76 = fir.convert %c20000_i32 : (i32) -> index
    %c1_9 = arith.constant 1 : index

      %123 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
      %c0_18 = arith.constant 0 : index
      %124:3 = fir.box_dims %123, %c0_18 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c1_19 = arith.constant 1 : index
      %125:3 = fir.box_dims %123, %c1_19 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c2_20 = arith.constant 2 : index
      %126:3 = fir.box_dims %123, %c2_20 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c3_21 = arith.constant 3 : index
      %127:3 = fir.box_dims %123, %c3_21 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %131 = fir.allocmem !fir.array<?x?x?x?x!fir.logical<4>>, %124#1, %125#1, %126#1, %127#1 {uniq_name = ".array.expr"}

    %77 = fir.do_loop %arg0 = %75 to %76 step %c1_9 -> index {
      %122 = fir.convert %arg0 : (index) -> i32
      fir.store %122 to %3 : !fir.ref<i32>
      %128 = fir.box_addr %123 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>) -> !fir.heap<!fir.array<?x?x?x?xf64>>
      %129 = fir.shape_shift %124#0, %124#1, %125#0, %125#1, %126#0, %126#1, %127#0, %127#1 : (index, index, index, index, index, index, index, index) -> !fir.shapeshift<4>
      %130 = fir.array_load %128(%129) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shapeshift<4>) -> !fir.array<?x?x?x?xf64>
      %cst_22 = arith.constant 1.000000e+00 : f64
      %132 = fir.shape %124#1, %125#1, %126#1, %127#1 : (index, index, index, index) -> !fir.shape<4>
      %133 = fir.array_load %131(%132) : (!fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>, !fir.shape<4>) -> !fir.array<?x?x?x?x!fir.logical<4>>
      %c1_23 = arith.constant 1 : index
      %c0_24 = arith.constant 0 : index
      %134 = arith.subi %124#1, %c1_23 : index
      %135 = arith.subi %125#1, %c1_23 : index
      %136 = arith.subi %126#1, %c1_23 : index
      %137 = arith.subi %127#1, %c1_23 : index
      %138 = fir.do_loop %arg1 = %c0_24 to %137 step %c1_23 unordered iter_args(%arg2 = %133) -> (!fir.array<?x?x?x?x!fir.logical<4>>) {
        %158 = fir.do_loop %arg3 = %c0_24 to %136 step %c1_23 unordered iter_args(%arg4 = %arg2) -> (!fir.array<?x?x?x?x!fir.logical<4>>) {
          %159 = fir.do_loop %arg5 = %c0_24 to %135 step %c1_23 unordered iter_args(%arg6 = %arg4) -> (!fir.array<?x?x?x?x!fir.logical<4>>) {
            %160 = fir.do_loop %arg7 = %c0_24 to %134 step %c1_23 unordered iter_args(%arg8 = %arg6) -> (!fir.array<?x?x?x?x!fir.logical<4>>) {
              %161 = fir.array_fetch %130, %arg7, %arg5, %arg3, %arg1 : (!fir.array<?x?x?x?xf64>, index, index, index, index) -> f64
              %162 = arith.cmpf oge, %161, %cst_22 : f64
              %163 = fir.convert %162 : (i1) -> !fir.logical<4>
              %164 = fir.array_update %arg8, %163, %arg7, %arg5, %arg3, %arg1 : (!fir.array<?x?x?x?x!fir.logical<4>>, !fir.logical<4>, index, index, index, index) -> !fir.array<?x?x?x?x!fir.logical<4>>
              fir.result %164 : !fir.array<?x?x?x?x!fir.logical<4>>
            }
            fir.result %160 : !fir.array<?x?x?x?x!fir.logical<4>>
          }
          fir.result %159 : !fir.array<?x?x?x?x!fir.logical<4>>
        }
        fir.result %158 : !fir.array<?x?x?x?x!fir.logical<4>>
      }
      fir.array_merge_store %133, %138 to %131 : !fir.array<?x?x?x?x!fir.logical<4>>, !fir.array<?x?x?x?x!fir.logical<4>>, !fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>
      %139 = fir.shape %124#1, %125#1, %126#1, %127#1 : (index, index, index, index) -> !fir.shape<4>
      %cst_25 = arith.constant 1.000000e+00 : f32
      %140 = fir.convert %cst_25 : (f32) -> f64
      %141 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
      %142 = fir.box_addr %141 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>) -> !fir.heap<!fir.array<?x?x?x?xf64>>
      %143 = fir.convert %142 : (!fir.heap<!fir.array<?x?x?x?xf64>>) -> i64
      %c0_i64_26 = arith.constant 0 : i64
      %144 = arith.cmpi ne, %143, %c0_i64_26 : i64
      %145:2 = fir.if %144 -> (i1, !fir.heap<!fir.array<?x?x?x?xf64>>) {
        %false = arith.constant false
        %c0_33 = arith.constant 0 : index
        %158:3 = fir.box_dims %141, %c0_33 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %c1_34 = arith.constant 1 : index
        %159:3 = fir.box_dims %141, %c1_34 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %c2_35 = arith.constant 2 : index
        %160:3 = fir.box_dims %141, %c2_35 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %c3_36 = arith.constant 3 : index
        %161:3 = fir.box_dims %141, %c3_36 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %162 = fir.if %false -> (!fir.heap<!fir.array<?x?x?x?xf64>>) {
          %163 = fir.allocmem !fir.array<?x?x?x?xf64>, %158#1, %159#1, %160#1, %161#1 {uniq_name = ".auto.alloc"}
          fir.result %163 : !fir.heap<!fir.array<?x?x?x?xf64>>
        } else {
          fir.result %142 : !fir.heap<!fir.array<?x?x?x?xf64>>
        }
        fir.result %false, %162 : i1, !fir.heap<!fir.array<?x?x?x?xf64>>
      } else {
        %true_33 = arith.constant true
        %158 = fir.address_of(@_QQcl.16373402fa3425c6e46827601abb88fa) : !fir.ref<!fir.char<1,76>>
        %c31_i32 = arith.constant 31 : i32
        %159 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
        %160 = fir.convert %158 : (!fir.ref<!fir.char<1,76>>) -> !fir.ref<i8>
        %161 = fir.convert %159 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
        %162 = fir.call @_FortranAReportFatalUserError(%160, %161, %c31_i32) : (!fir.ref<i8>, !fir.ref<i8>, i32) -> none
        fir.result %true_33, %142 : i1, !fir.heap<!fir.array<?x?x?x?xf64>>
      }
      %c0_27 = arith.constant 0 : index
      %146:3 = fir.box_dims %141, %c0_27 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c1_28 = arith.constant 1 : index
      %147:3 = fir.box_dims %141, %c1_28 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c2_29 = arith.constant 2 : index
      %148:3 = fir.box_dims %141, %c2_29 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %c3_30 = arith.constant 3 : index
      %149:3 = fir.box_dims %141, %c3_30 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
      %150 = fir.shape %146#1, %147#1, %148#1, %149#1 : (index, index, index, index) -> !fir.shape<4>
      %151 = fir.array_load %145#1(%150) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shape<4>) -> !fir.array<?x?x?x?xf64>
      %c1_31 = arith.constant 1 : index
      %c0_32 = arith.constant 0 : index
      %152 = arith.subi %146#1, %c1_31 : index
      %153 = arith.subi %147#1, %c1_31 : index
      %154 = arith.subi %148#1, %c1_31 : index
      %155 = arith.subi %149#1, %c1_31 : index
      %156 = fir.do_loop %arg1 = %c0_32 to %155 step %c1_31 unordered iter_args(%arg2 = %151) -> (!fir.array<?x?x?x?xf64>) {
        %158 = fir.do_loop %arg3 = %c0_32 to %154 step %c1_31 unordered iter_args(%arg4 = %arg2) -> (!fir.array<?x?x?x?xf64>) {
          %159 = fir.do_loop %arg5 = %c0_32 to %153 step %c1_31 unordered iter_args(%arg6 = %arg4) -> (!fir.array<?x?x?x?xf64>) {
            %160 = fir.do_loop %arg7 = %c0_32 to %152 step %c1_31 unordered iter_args(%arg8 = %arg6) -> (!fir.array<?x?x?x?xf64>) {
              %c1_33 = arith.constant 1 : index
              %161 = arith.addi %arg7, %c1_33 : index
              %162 = arith.addi %arg5, %c1_33 : index
              %163 = arith.addi %arg3, %c1_33 : index
              %164 = arith.addi %arg1, %c1_33 : index
              %165 = fir.array_coor %131(%139) %161, %162, %163, %164 : (!fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>, !fir.shape<4>, index, index, index, index) -> !fir.ref<!fir.logical<4>>
              %166 = fir.load %165 : !fir.ref<!fir.logical<4>>
              %167 = fir.convert %166 : (!fir.logical<4>) -> i1
              %168 = fir.if %167 -> (!fir.array<?x?x?x?xf64>) {
                %169 = fir.array_update %arg8, %140, %arg7, %arg5, %arg3, %arg1 : (!fir.array<?x?x?x?xf64>, f64, index, index, index, index) -> !fir.array<?x?x?x?xf64>
                fir.result %169 : !fir.array<?x?x?x?xf64>
              } else {
                fir.result %arg8 : !fir.array<?x?x?x?xf64>
              }
              fir.result %168 : !fir.array<?x?x?x?xf64>
            }
            fir.result %160 : !fir.array<?x?x?x?xf64>
          }
          fir.result %159 : !fir.array<?x?x?x?xf64>
        }
        fir.result %158 : !fir.array<?x?x?x?xf64>
      }
      fir.array_merge_store %151, %156 to %145#1 : !fir.array<?x?x?x?xf64>, !fir.array<?x?x?x?xf64>, !fir.heap<!fir.array<?x?x?x?xf64>>
      fir.if %145#0 {
        %158 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
        %c0_33 = arith.constant 0 : index
        %159:3 = fir.box_dims %158, %c0_33 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %c1_34 = arith.constant 1 : index
        %160:3 = fir.box_dims %158, %c1_34 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %c2_35 = arith.constant 2 : index
        %161:3 = fir.box_dims %158, %c2_35 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        %c3_36 = arith.constant 3 : index
        %162:3 = fir.box_dims %158, %c3_36 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
        fir.if %144 {
          fir.freemem %142 : !fir.heap<!fir.array<?x?x?x?xf64>>
        }
        %163 = fir.shape_shift %159#0, %146#1, %160#0, %147#1, %161#0, %148#1, %162#0, %149#1 : (index, index, index, index, index, index, index, index) -> !fir.shapeshift<4>
        %164 = fir.embox %145#1(%163) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shapeshift<4>) -> !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
        fir.store %164 to %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
      }
      %157 = arith.addi %arg0, %c1_9 : index
      fir.result %157 : index
    }

    fir.freemem %131 : !fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>

    %78 = fir.convert %77 : (index) -> i32
    fir.store %78 to %3 : !fir.ref<i32>
    %79 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
    %c0_10 = arith.constant 0 : index
    %80:3 = fir.box_dims %79, %c0_10 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %c1_11 = arith.constant 1 : index
    %81:3 = fir.box_dims %79, %c1_11 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %c2_12 = arith.constant 2 : index
    %82:3 = fir.box_dims %79, %c2_12 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %c3_13 = arith.constant 3 : index
    %83:3 = fir.box_dims %79, %c3_13 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
    %84 = fir.box_addr %79 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>) -> !fir.heap<!fir.array<?x?x?x?xf64>>
    %85 = fir.shape_shift %80#0, %80#1, %81#0, %81#1, %82#0, %82#1, %83#0, %83#1 : (index, index, index, index, index, index, index, index) -> !fir.shapeshift<4>
    %86 = fir.embox %84(%85) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shapeshift<4>) -> !fir.box<!fir.array<?x?x?x?xf64>>
    %87 = fir.absent !fir.box<i1>
    %c0_14 = arith.constant 0 : index
    %88 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
    %c33_i32 = arith.constant 33 : i32
    %89 = fir.convert %86 : (!fir.box<!fir.array<?x?x?x?xf64>>) -> !fir.box<none>
    %90 = fir.convert %88 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
    %91 = fir.convert %c0_14 : (index) -> i32
    %92 = fir.convert %87 : (!fir.box<i1>) -> !fir.box<none>
    %93 = fir.call @_FortranASumReal8(%89, %90, %c33_i32, %91, %92) : (!fir.box<none>, !fir.ref<i8>, i32, i32, !fir.box<none>) -> f64
    fir.store %93 to %12 : !fir.ref<f64>
    %94 = fir.call @_FortranACpuTime() : () -> f64
    %95 = fir.convert %94 : (f64) -> f32
    fir.store %95 to %10 : !fir.ref<f32>
    %96 = fir.address_of(@_QQcl.28412C2046362E3329) : !fir.ref<!fir.char<1,9>>
    %c9 = arith.constant 9 : index
    %97 = fir.convert %96 : (!fir.ref<!fir.char<1,9>>) -> !fir.ref<i8>
    %98 = fir.convert %c9 : (index) -> i64
    %c-1_i32_15 = arith.constant -1 : i32
    %99 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
    %100 = fir.convert %99 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
    %c36_i32 = arith.constant 36 : i32
    %101 = fir.call @_FortranAioBeginExternalFormattedOutput(%97, %98, %c-1_i32_15, %100, %c36_i32) : (!fir.ref<i8>, i64, i32, !fir.ref<i8>, i32) -> !fir.ref<i8>
    %102 = fir.address_of(@_QQcl.2054696D652074616B656E202873293A20) : !fir.ref<!fir.char<1,17>>
    %c17 = arith.constant 17 : index
    %103 = fir.convert %102 : (!fir.ref<!fir.char<1,17>>) -> !fir.ref<i8>
    %104 = fir.convert %c17 : (index) -> i64
    %105 = fir.call @_FortranAioOutputAscii(%101, %103, %104) : (!fir.ref<i8>, !fir.ref<i8>, i64) -> i1
    %106 = fir.load %10 : !fir.ref<f32>
    %107 = fir.load %11 : !fir.ref<f32>
    %108 = arith.subf %106, %107 : f32
    %109 = fir.call @_FortranAioOutputReal32(%101, %108) : (!fir.ref<i8>, f32) -> i1
    %110 = fir.call @_FortranAioEndIoStatement(%101) : (!fir.ref<i8>) -> i32
    %c-1_i32_16 = arith.constant -1 : i32
    %111 = fir.address_of(@_QQcl.2E2F6D616C6C6F632E66393000) : !fir.ref<!fir.char<1,13>>
    %112 = fir.convert %111 : (!fir.ref<!fir.char<1,13>>) -> !fir.ref<i8>
    %c38_i32 = arith.constant 38 : i32
    %113 = fir.call @_FortranAioBeginExternalListOutput(%c-1_i32_16, %112, %c38_i32) : (i32, !fir.ref<i8>, i32) -> !fir.ref<i8>
    %114 = fir.load %12 : !fir.ref<f64>
    %cst_17 = arith.constant 8.000000e+00 : f64
    %115 = arith.mulf %114, %cst_17 : f64
    %116 = fir.call @_FortranAioOutputReal64(%113, %115) : (!fir.ref<i8>, f64) -> i1
    %117 = fir.address_of(@_QQcl.20546869732073686F756C64206D6174636820746F74616C2073697A65) : !fir.ref<!fir.char<1,29>>
    %c29 = arith.constant 29 : index
    %118 = fir.convert %117 : (!fir.ref<!fir.char<1,29>>) -> !fir.ref<i8>
    %119 = fir.convert %c29 : (index) -> i64
    %120 = fir.call @_FortranAioOutputAscii(%113, %118, %119) : (!fir.ref<i8>, !fir.ref<i8>, i64) -> i1
    %121 = fir.call @_FortranAioEndIoStatement(%113) : (!fir.ref<i8>) -> i32
    return
  }
  fir.global internal @_QFEqim : !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>> {
    %0 = fir.zero_bits !fir.heap<!fir.array<?x?x?x?xf64>>
    %c0 = arith.constant 0 : index
    %1 = fir.shape %c0, %c0, %c0, %c0 : (index, index, index, index) -> !fir.shape<4>
    %2 = fir.embox %0(%1) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shape<4>) -> !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
    fir.has_value %2 : !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
  }
  fir.global internal @_QFEqin : !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>> {
    %0 = fir.zero_bits !fir.heap<!fir.array<?x?x?x?xf64>>
    %c0 = arith.constant 0 : index
    %1 = fir.shape %c0, %c0, %c0, %c0 : (index, index, index, index) -> !fir.shape<4>
    %2 = fir.embox %0(%1) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shape<4>) -> !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
    fir.has_value %2 : !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
  }
  func private @_FortranAioBeginExternalListOutput(i32, !fir.ref<i8>, i32) -> !fir.ref<i8> attributes {fir.io, fir.runtime}
  fir.global linkonce @_QQcl.2E2F6D616C6C6F632E66393000 constant : !fir.char<1,13> {
    %0 = fir.string_lit "./malloc.f90\00"(13) : !fir.char<1,13>
    fir.has_value %0 : !fir.char<1,13>
  }
  func private @_FortranAioOutputAscii(!fir.ref<i8>, !fir.ref<i8>, i64) -> i1 attributes {fir.io, fir.runtime}
  fir.global linkonce @_QQcl.546F74616C2073697A65203D20 constant : !fir.char<1,13> {
    %0 = fir.string_lit "Total size = "(13) : !fir.char<1,13>
    fir.has_value %0 : !fir.char<1,13>
  }
  func private @_FortranAioOutputInteger32(!fir.ref<i8>, i32) -> i1 attributes {fir.io, fir.runtime}
  func private @_FortranAioEndIoStatement(!fir.ref<i8>) -> i32 attributes {fir.io, fir.runtime}
  func private @_FortranAAllocatableSetBounds(!fir.ref<!fir.box<none>>, i32, i64, i64) -> none attributes {fir.runtime}
  func private @_FortranAAllocatableAllocate(!fir.ref<!fir.box<none>>, i1, !fir.box<none>, !fir.ref<i8>, i32) -> i32 attributes {fir.runtime}
  func private @_FortranAReportFatalUserError(!fir.ref<i8>, !fir.ref<i8>, i32) -> none attributes {fir.runtime}
  fir.global linkonce @_QQcl.16373402fa3425c6e46827601abb88fa constant : !fir.char<1,76> {
    %0 = fir.string_lit "array left hand side must be allocated when the right hand side is a scalar\00"(76) : !fir.char<1,76>
    fir.has_value %0 : !fir.char<1,76>
  }
  func private @_FortranACpuTime() -> f64 attributes {fir.runtime}
  func private @_FortranASumReal8(!fir.box<none>, !fir.ref<i8>, i32, i32, !fir.box<none>) -> f64 attributes {fir.runtime}
  func private @_FortranAioBeginExternalFormattedOutput(!fir.ref<i8>, i64, i32, !fir.ref<i8>, i32) -> !fir.ref<i8> attributes {fir.io, fir.runtime}
  fir.global linkonce @_QQcl.28412C2046362E3329 constant : !fir.char<1,9> {
    %0 = fir.string_lit "(A, F6.3)"(9) : !fir.char<1,9>
    fir.has_value %0 : !fir.char<1,9>
  }
  fir.global linkonce @_QQcl.2054696D652074616B656E202873293A20 constant : !fir.char<1,17> {
    %0 = fir.string_lit " Time taken (s): "(17) : !fir.char<1,17>
    fir.has_value %0 : !fir.char<1,17>
  }
  func private @_FortranAioOutputReal32(!fir.ref<i8>, f32) -> i1 attributes {fir.io, fir.runtime}
  func private @_FortranAioOutputReal64(!fir.ref<i8>, f64) -> i1 attributes {fir.io, fir.runtime}
  fir.global linkonce @_QQcl.20546869732073686F756C64206D6174636820746F74616C2073697A65 constant : !fir.char<1,29> {
    %0 = fir.string_lit " This should match total size"(29) : !fir.char<1,29>
    fir.has_value %0 : !fir.char<1,29>
  }
}

Diff for the changes:

--- malloc.mlir	2022-02-24 11:31:21.781794342 +0000
+++ malloc-opt.mlir	2022-02-24 11:24:04.069330281 +0000
@@ -170,9 +170,7 @@
     %c20000_i32 = arith.constant 20000 : i32
     %76 = fir.convert %c20000_i32 : (i32) -> index
     %c1_9 = arith.constant 1 : index
-    %77 = fir.do_loop %arg0 = %75 to %76 step %c1_9 -> index {
-      %122 = fir.convert %arg0 : (index) -> i32
-      fir.store %122 to %3 : !fir.ref<i32>
+
       %123 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
       %c0_18 = arith.constant 0 : index
       %124:3 = fir.box_dims %123, %c0_18 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
@@ -182,11 +180,15 @@
       %126:3 = fir.box_dims %123, %c2_20 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
       %c3_21 = arith.constant 3 : index
       %127:3 = fir.box_dims %123, %c3_21 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index) -> (index, index, index)
+      %131 = fir.allocmem !fir.array<?x?x?x?x!fir.logical<4>>, %124#1, %125#1, %126#1, %127#1 {uniq_name = ".array.expr"}
+
+    %77 = fir.do_loop %arg0 = %75 to %76 step %c1_9 -> index {
+      %122 = fir.convert %arg0 : (index) -> i32
+      fir.store %122 to %3 : !fir.ref<i32>
       %128 = fir.box_addr %123 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>) -> !fir.heap<!fir.array<?x?x?x?xf64>>
       %129 = fir.shape_shift %124#0, %124#1, %125#0, %125#1, %126#0, %126#1, %127#0, %127#1 : (index, index, index, index, index, index, index, index) -> !fir.shapeshift<4>
       %130 = fir.array_load %128(%129) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shapeshift<4>) -> !fir.array<?x?x?x?xf64>
       %cst_22 = arith.constant 1.000000e+00 : f64
-      %131 = fir.allocmem !fir.array<?x?x?x?x!fir.logical<4>>, %124#1, %125#1, %126#1, %127#1 {uniq_name = ".array.expr"}
       %132 = fir.shape %124#1, %125#1, %126#1, %127#1 : (index, index, index, index) -> !fir.shape<4>
       %133 = fir.array_load %131(%132) : (!fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>, !fir.shape<4>) -> !fir.array<?x?x?x?x!fir.logical<4>>
       %c1_23 = arith.constant 1 : index
@@ -307,10 +309,12 @@
         %164 = fir.embox %145#1(%163) : (!fir.heap<!fir.array<?x?x?x?xf64>>, !fir.shapeshift<4>) -> !fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>
         fir.store %164 to %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
       }
-      fir.freemem %131 : !fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>
       %157 = arith.addi %arg0, %c1_9 : index
       fir.result %157 : index
     }
+
+    fir.freemem %131 : !fir.heap<!fir.array<?x?x?x?x!fir.logical<4>>>
+
     %78 = fir.convert %77 : (index) -> i32
     fir.store %78 to %3 : !fir.ref<i32>
     %79 = fir.load %9 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>

@jeffhammond
Copy link

I understand why hoisting memory management is good, but why does where (qim >= 1.0) qim = 1.0 need that at all? Is this statement not syntactic sugar for a loop over a logical expression? Why is the compiler trying to generate a temporary at all?

Why wouldn't the compiler try to transform

where (qim >= 1.0) qim = 1.0

to something like this

forall ( i1 = 1 : size(qim, 1), i2 = 1 : size(qim, 2), &
         i3 = 1 : size(qim, 3), i4 = 1 : size(qim, 4) )
    if (qim(i1,i2,i3,4) >= 1.0)) then
        qim(i1,i2,i3,4) = 1.0
    end if
end forall

(or something similar)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen conversion of FIR/MLIR to LLVM IR FIR FIR language related Optimizer optimization pass(es) related
Projects
None yet
Development

No branches or pull requests

3 participants