Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate to calcuration modularity #34

Open
Inazuma110 opened this issue Mar 12, 2020 · 2 comments
Open

Accelerate to calcuration modularity #34

Inazuma110 opened this issue Mar 12, 2020 · 2 comments

Comments

@Inazuma110
Copy link
Contributor

SimpleHypergraphs.jl version v1.6.0

(sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)

This line takes time. I succeeded acceleration. I confirm this program works correctly and fast some hypergraphs and partitions.

    mr = 0
    v = 0
    for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
    dict = Dict([])
    for i in 1:nhe(h)
      dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
    end

    va = Dict([i => 0.0 for i in 1:length(partition)])
    for (i, cluster) in enumerate(partition)
      for node in cluster va[i] += length(gethyperedges(h, node)) end
      va[i] /= v
    end

    for (key, val) in dict
      tmp = 0
      for (key2, val2) in va
        tmp += (val2 ^ key)
      end
      tmp *= val
      mr += tmp
    end

    # (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
    (sum(eP) - mr) / nhe(h)

However, I can't description why acceleration... (Originally, I was improving other line.)
So you can ignore and close this issue. Please refer this program if you like.
And sorry my poor English. Thank you.

@pszufe
Copy link
Owner

pszufe commented Mar 12, 2020

Hello thank you for the feedback.

I tried to run your code. In my tests (see below) your code takes twice as long to compute.
Could you perhaps update the test below with an example illustrating a case where your code is actually faster?
Additionally, in case of degenerate hypergraphs, yields wrong results (but this should be fairly easy to repair if the performance is good).

using SimpleHypergraphs
using Random
using StatsBase
using BenchmarkTools
Random.seed!(0)
m = sample([nothing, true], Weights([0.8,0.2]),(30,30))
h = Hypergraph(m)

part = SimpleHypergraphs.randompartition(h,5)
ha = SimpleHypergraphs.HypergraphAggs(h)

function modularity2(h::Hypergraph, partition::Vector{Set{Int}},
        ha::SimpleHypergraphs.HypergraphAggs=SimpleHypergraphs.HypergraphAggs(h))        
    @boundscheck sum(length.(partition)) == nhv(h)
    @boundscheck union(partition...) == Set(1:nhv(h))
    volP_volV = [sum(ha.deg_vs[i] for i in p)/ha.volV for p in partition]
    eP = [count(i-> ha.hes[i]>0 && (keys(h.he2v[i])  p), 1:nhe(h)) for p in partition]
    mr = 0
    v = 0
    for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
    dict = Dict([])
    for i in 1:nhe(h)
      dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
    end

    va = Dict([i => 0.0 for i in 1:length(partition)])
    for (i, cluster) in enumerate(partition)
      for node in cluster va[i] += length(gethyperedges(h, node)) end
      va[i] /= v
    end

    for (key, val) in dict
      tmp = 0
      for (key2, val2) in va
        tmp += (val2 ^ key)
      end
      tmp *= val
      mr += tmp
    end

    # (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
    (sum(eP) - mr) / nhe(h)
end

And now the tests:

julia> modu1 = @btime modularity($h,$part,$ha);
  12.700 μs (211 allocations: 8.27 KiB)

julia> modu2 = @btime modularity2($h,$part,$ha);
  21.500 μs (377 allocations: 11.29 KiB)

julia> modu1 ≈ modu2
true

@Inazuma110
Copy link
Contributor Author

First, generate hypergraph. It has clearly some clusters.

function create_hypergraph(npcs, hepcs, he_rate=0.5, noise=10)
  node_num = 0
  he_num = 0
  clusters = length(npcs)
  for npc in npcs
    node_num += npc
  end
  for hepc in hepcs
    he_num += hepc
  end
  h = Hypergraph{Int64}(node_num, he_num)
  step = 1
  fin_node = 1
  for (npc, hepc) in zip(npcs, hepcs)
    for i in 1:hepc
      # he_size = rand(1:npc)
      he = randsubseq(fin_node:fin_node+npc-1, he_rate)
      he = [(i in he ? 1 : nothing) for i in 1:nhv(h)]
      for (j, v) in enumerate(he)
        h[j, step] = v
      end
      step += 1
    end
    fin_node += npc
  end

 

  for i in 1:noise
    he = randsubseq(1:node_num, 0.5)
    he = Dict([i => 1 for i in he])
    add_hyperedge!(h, vertices=he)
  end

  h
end
h = create_hypergraph([100 for i in 1:5], [20 for i in 1:5], 0.5, 4)
rp = randompartition(h, 30)
  • npcs is node per clusters. This is array.
  • hepcs is hyperedge per clusters. This is array.
  • Arbitrary node is included hyperedges with he_rate probability.
  • Generated hypergraph has noise hyperedges. There are not fully included Arbitrary cluster.

Generated like this incidence matrix.
image

julia> mod1 = @btime modularity(h, rp)
  589.906 μs (3526 allocations: 194.89 KiB)
-1.5186397093946605e-53

julia> mod2 = @btime modularity2(h, rp)
  222.981 μs (5718 allocations: 152.38 KiB)
-1.518639709394661e-53

And accelerate when using findcommunities.

  • Using conventional modularity.
julia> cfm = CFModularityCNMLike(1000)
CFModularityCNMLike(1000)

julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
  memory estimate:  335.32 MiB
  allocs estimate:  6643143
  --------------
  minimum time:     1.214 s (0.00% GC)
  median time:      1.781 s (3.26% GC)
  mean time:        1.911 s (3.10% GC)
  maximum time:     2.739 s (4.38% GC)
  --------------
  samples:          3
  evals/sample:     1
  • Using my modularity
julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
  memory estimate:  265.13 MiB
  allocs estimate:  10089341
  --------------
  minimum time:     585.823 ms (0.00% GC)
  median time:      761.730 ms (7.58% GC)
  mean time:        976.228 ms (5.85% GC)
  maximum time:     1.728 s (6.59% GC)
  --------------
  samples:          6
  evals/sample:     1

However, I understand that not everything is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants