Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark regressions of 1.3 vs 1.2 #32985

Closed
KristofferC opened this issue Aug 20, 2019 · 4 comments
Closed

Benchmark regressions of 1.3 vs 1.2 #32985

KristofferC opened this issue Aug 20, 2019 · 4 comments
Assignees
Labels
regression Regression in behavior compared to a previous version
Milestone

Comments

@KristofferC
Copy link
Member

Opening this issue to not pollute #32973. The benchmark run in that PR showed some regressions vs 1.2 (https://github.com/JuliaCI/BaseBenchmarkReports/blob/24d2f8dc38422bf7c21691c2fdfa243d894ec38f/c181990_vs_9392955/report.md).

I'll go through the results and post the ones I believe are worth looking more into

@KristofferC KristofferC added the regression Regression in behavior compared to a previous version label Aug 20, 2019
@KristofferC KristofferC added this to the 1.3 milestone Aug 20, 2019
@KristofferC
Copy link
Member Author

KristofferC commented Aug 20, 2019

Some stuff to look through:

Array indexing

["array", "index", "(\"sumvector_view\", \"Array{Float32,2}\")"]	1.72 (50%) ❌	1.00 (1%)

Looks like a fluke


Broadcasting memory allocations

["broadcast", "dotop", "(\"Float64\", (1000, 1000), 2)"]	1.01 (5%)	Inf (1%) 

Fixe by #33007


Id Dict stuff

["collection", "deletion", "(\"IdDict\", \"Int\", \"filter\")"]	1.61 (25%) ❌	1.08 (1%) ❌
["collection", "initialization", "(\"IdDict\", \"Int\", \"iterator\")"]	1.90 (25%) ❌	1.19 (1%) ❌
["collection", "initialization", "(\"IdDict\", \"Int\", \"loop\")"]	1.84 (25%) ❌	1.19 (1%) ❌
["collection", "initialization", "(\"IdDict\", \"Int\", \"loop\", \"sizehint!\")"]	2.04 (25%) ❌	1.24 (1%) ❌
["collection", "queries & updates", "(\"IdDict\", \"Int\", \"push!\", \"new\")"]	2.38 (25%) ❌	1.50 (1%) 

etc

Maybe fixed by: #33009


IO

Some of these are expected to slow down due to new locks in IO code

["io", "serialization", "(\"deserialize\", \"Vector{String}\")"]	1.23 (5%) ❌	1.00 (1%)
["io", "skipchars"]	4.97 (5%) ❌	1.10 (1%) ❌
["micro", "printfd"]	2.61 (5%) ❌	1.20 (1%) ❌

Arithmetic

["linalg", "arithmetic", "(\"+\", \"Bidiagonal\", \"Bidiagonal\", 1024)"]	3.13 (45%) ❌	1.01 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Bidiagonal\", \"Bidiagonal\", 256)"]	4.18 (45%) ❌	1.04 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Diagonal\", \"Diagonal\", 1024)"]	2.83 (45%) ❌	1.01 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Vector\", \"Vector\", 1024)"]	2.90 (45%) ❌	1.01 (1%) ❌
["linalg", "arithmetic", "(\"+\", \"Vector\", \"Vector\", 256)"]	4.98 (45%) ❌	1.04 (1%) ❌
["linalg", "arithmetic", "(\"-\", \"Vector\", \"Vector\", 256)"]	5.49 (45%) ❌	1.04 (1%) ❌

etc.

Analyzed in #32985 (comment).

Fixed in #33079.


Random

Probably due to thread local rngs

["random", "collections", "(\"rand\", \"RandomDevice\", \"small Set\")"]	1.70 (25%) ❌	1.00 (1%)
["random", "collections", "(\"rand\", \"RandomDevice\", \"small String\")"]	1.50 (25%) ❌	1.00 (1%)
["random", "collections", "(\"rand\", \"RandomDevice\", \"small Vector\")"]	1.46 (25%) ❌	1.00 (1%)

Problem

["problem", "monte carlo", "euro_option_devec"]	1.50 (5%) ❌	1.00 (1%)

This is because randn() got slower.


Scalar with big numbers

This is #31759

["scalar", "arithmetic", "(\"add\", \"Complex{Float32}\", \"BigFloat\")"]	

Sparse constructors ❓

Constructor regression might be #31724

["sparse", "constructors", "(\"IJV\", 100)"]	3.58 (5%) ❌	1.02 (1%) ❌
["sparse", "index", "(\"spmat\", \"row\", \"array\", 1000)"]	1.49 (30%) ❌	1.00 (1%)
["sparse", "index", "(\"spmat\", \"row\", \"logical\", 1000)"]	1.57 (30%) ❌	1.00 (1%)

etc

Others

["linalg", "small exp #29116"]	3.21 (5%) ❌	1.10 (1%) ❌

#32985 (comment)

["micro", "mandel"]	2.04 (5%) ❌	1.00 (1%)

This is likely due to the new hypot implementation.

@KristofferC
Copy link
Member Author

KristofferC commented Aug 21, 2019

The arithmetic regression can be reproduced with e.g.

# 1.2
julia> n = 1024; a = rand(n); b = rand(n); @btime a+b;
  2.083 μs (5 allocations: 8.22 KiB)

# 1.1
julia> n = 1024; a = rand(n); b = rand(n); @btime a+b;
  711.142 ns (1 allocation: 8.13 KiB)

Comparing profile traces vs 1.2 (the 1.3 one showed below):

julia> Profile.print()
1155 ./task.jl:333; (::REPL.var"##26#27"{REPL.REPLBackend})()
 1155 ...ia/stdlib/v1.4/REPL/src/REPL.jl:118; macro expansion
  1155 ...ia/stdlib/v1.4/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   1155 ./boot.jl:330; eval(::Module, ::Any)
    1155 ./REPL[17]:3; measure(::Array{Float64,1}, ::Array{Float64,1},...
     1155 ./arraymath.jl:47; +
      107 ./broadcast.jl:802; broadcast_preserving_zero_d(::Function, ::Arr...
       24 ./broadcast.jl:1232; broadcasted(::Function, ::Array{Float64,1}, :...
        23 ./broadcast.jl:1234; broadcasted
      996 ./broadcast.jl:803; broadcast_preserving_zero_d(::Function, ::Arr...
       984 ./broadcast.jl:814; materialize(::Base.Broadcast.Broadcasted{Base...
        983 ./broadcast.jl:834; copy
         843 ./broadcast.jl:858; copyto!
          843 ./broadcast.jl:903; copyto!
           843 ./simdloop.jl:77; macro expansion
            324 ./array.jl:780; macro expansion
            519 ./broadcast.jl:904; macro expansion
             519 ./broadcast.jl:558; getindex
              244 ./broadcast.jl:597; _broadcast_getindex
               244 ./broadcast.jl:621; _getindex
                244 ./broadcast.jl:591; _broadcast_getindex
                 244 ./array.jl:742; getindex
              275 ./float.jl:401; _broadcast_getindex
         140 ./broadcast.jl:196; similar
          140 ./abstractarray.jl:671; similar
           140 ./abstractarray.jl:672; similar
            140 ./boot.jl:421; Array
             140 ./boot.jl:413; Array
              140 ./boot.jl:404; Array
       1   ./simdloop.jl:0; materialize(::Base.Broadcast.Broadcasted{Base...
       6   ./simdloop.jl:75; materialize(::Base.Broadcast.Broadcasted{Base...
      51  ./broadcast.jl:804; broadcast_preserving_zero_d(::Function, ::Arr...
       3 ./tuple.jl:19; length(::Tuple)

It seems this might be related to the whole broadcast_preserving_zero_d stuff added in #32122 (but I haven't actually verified that). cc @mbauman

@KristofferC
Copy link
Member Author

KristofferC commented Aug 21, 2019

I believe the

["linalg", "small exp #29116"]	3.21 (5%) ❌	1.10 (1%) ❌

is also due to the array arithmetic regression:

       554 ./arraymath.jl:39; -
        367 ./broadcast.jl:802; broadcast_preserving_zero_d(::Function, ::Ar...
         68 ./broadcast.jl:1232; broadcasted(::Function, ::Array{Float64,2},...
          58 ./broadcast.jl:1234; broadcasted
        60  ./broadcast.jl:803; broadcast_preserving_zero_d(::Function, ::Ar...
         46 ./broadcast.jl:814; materialize(::Base.Broadcast.Broadcasted{Bas...
          40 ./broadcast.jl:834; copy
           17 ./broadcast.jl:858; copyto!
            8 ./broadcast.jl:902; copyto!
             8 ./broadcast.jl:885; preprocess
              6 ./promotion.jl:399; preprocess_args
            9 ./broadcast.jl:903; copyto!
           23 ./broadcast.jl:196; similar
            23 ./abstractarray.jl:671; similar
             23 ./abstractarray.jl:672; similar
              23 ./boot.jl:421; Array
               23 ./boot.jl:414; Array
                23 ./boot.jl:406; Array

comes up on 1.3 while on 1.2 that section is just

       74  ./arraymath.jl:39; -
        64 ./broadcast.jl:752; broadcast(::typeof(-), ::Array{Float64,2}, ::...
         45 ./boot.jl:406; materialize

@KristofferC
Copy link
Member Author

KristofferC commented Aug 26, 2019

For e.g. the skipchars regression, would it make sense to have an API where someone that is going to do a bunch of tight calls to read can pull out the lock toggle outside its loop?

KristofferC referenced this issue Aug 26, 2019
* reconstruct PR #31118

* reconstruct PR 31118 2

* Check arguments of SparseMatrixCSC #31024 #31435

* fix SuiteSparse test

* added NEWS, fixed tests

* loosen restrictions - resize to useful length

* cleaned up NEWS, revert minor change

* add non-checking and checking constructor - improve check performance
mbauman added a commit that referenced this issue Aug 26, 2019
Followup to https://github.com/JuliaLang/julia/pull/31724/files#r317686891; instead of widening the index type dynamically based upon the index vector length, just throw an error in the case where the index type cannot hold all the indices in a CSC format. This previously was an OOB access (and likely segfault) in 1.2, so throwing an error here is not a breaking change -- and throwing an error restores type stability, addressing the performance regression flagged in #32985.
KristofferC pushed a commit that referenced this issue Aug 27, 2019
Followup to https://github.com/JuliaLang/julia/pull/31724/files#r317686891; instead of widening the index type dynamically based upon the index vector length, just throw an error in the case where the index type cannot hold all the indices in a CSC format. This previously was an OOB access (and likely segfault) in 1.2, so throwing an error here is not a breaking change -- and throwing an error restores type stability, addressing the performance regression flagged in #32985.
KristofferC pushed a commit that referenced this issue Aug 27, 2019
Followup to https://github.com/JuliaLang/julia/pull/31724/files#r317686891; instead of widening the index type dynamically based upon the index vector length, just throw an error in the case where the index type cannot hold all the indices in a CSC format. This previously was an OOB access (and likely segfault) in 1.2, so throwing an error here is not a breaking change -- and throwing an error restores type stability, addressing the performance regression flagged in #32985.

(cherry picked from commit 9725fb4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

2 participants