Accelerate to calcuration modularity #34

Inazuma110 · 2020-03-12T18:18:20Z

SimpleHypergraphs.jl version v1.6.0

Line 62 in 1b03ab3

(sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)

This line takes time. I succeeded acceleration. I confirm this program works correctly and fast some hypergraphs and partitions.

    mr = 0
    v = 0
    for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
    dict = Dict([])
    for i in 1:nhe(h)
      dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
    end

    va = Dict([i => 0.0 for i in 1:length(partition)])
    for (i, cluster) in enumerate(partition)
      for node in cluster va[i] += length(gethyperedges(h, node)) end
      va[i] /= v
    end

    for (key, val) in dict
      tmp = 0
      for (key2, val2) in va
        tmp += (val2 ^ key)
      end
      tmp *= val
      mr += tmp
    end

    # (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
    (sum(eP) - mr) / nhe(h)

However, I can't description why acceleration... (Originally, I was improving other line.)
So you can ignore and close this issue. Please refer this program if you like.
And sorry my poor English. Thank you.

pszufe · 2020-03-12T20:41:03Z

Hello thank you for the feedback.

I tried to run your code. In my tests (see below) your code takes twice as long to compute.
Could you perhaps update the test below with an example illustrating a case where your code is actually faster?
Additionally, in case of degenerate hypergraphs, yields wrong results (but this should be fairly easy to repair if the performance is good).

using SimpleHypergraphs
using Random
using StatsBase
using BenchmarkTools
Random.seed!(0)
m = sample([nothing, true], Weights([0.8,0.2]),(30,30))
h = Hypergraph(m)

part = SimpleHypergraphs.randompartition(h,5)
ha = SimpleHypergraphs.HypergraphAggs(h)

function modularity2(h::Hypergraph, partition::Vector{Set{Int}},
        ha::SimpleHypergraphs.HypergraphAggs=SimpleHypergraphs.HypergraphAggs(h))        
    @boundscheck sum(length.(partition)) == nhv(h)
    @boundscheck union(partition...) == Set(1:nhv(h))
    volP_volV = [sum(ha.deg_vs[i] for i in p)/ha.volV for p in partition]
    eP = [count(i-> ha.hes[i]>0 && (keys(h.he2v[i]) ⊆ p), 1:nhe(h)) for p in partition]
    mr = 0
    v = 0
    for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
    dict = Dict([])
    for i in 1:nhe(h)
      dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
    end

    va = Dict([i => 0.0 for i in 1:length(partition)])
    for (i, cluster) in enumerate(partition)
      for node in cluster va[i] += length(gethyperedges(h, node)) end
      va[i] /= v
    end

    for (key, val) in dict
      tmp = 0
      for (key2, val2) in va
        tmp += (val2 ^ key)
      end
      tmp *= val
      mr += tmp
    end

    # (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
    (sum(eP) - mr) / nhe(h)
end

And now the tests:

julia> modu1 = @btime modularity($h,$part,$ha);
  12.700 μs (211 allocations: 8.27 KiB)

julia> modu2 = @btime modularity2($h,$part,$ha);
  21.500 μs (377 allocations: 11.29 KiB)

julia> modu1 ≈ modu2
true

Inazuma110 · 2020-03-13T12:05:57Z

First, generate hypergraph. It has clearly some clusters.

function create_hypergraph(npcs, hepcs, he_rate=0.5, noise=10)
  node_num = 0
  he_num = 0
  clusters = length(npcs)
  for npc in npcs
    node_num += npc
  end
  for hepc in hepcs
    he_num += hepc
  end
  h = Hypergraph{Int64}(node_num, he_num)
  step = 1
  fin_node = 1
  for (npc, hepc) in zip(npcs, hepcs)
    for i in 1:hepc
      # he_size = rand(1:npc)
      he = randsubseq(fin_node:fin_node+npc-1, he_rate)
      he = [(i in he ? 1 : nothing) for i in 1:nhv(h)]
      for (j, v) in enumerate(he)
        h[j, step] = v
      end
      step += 1
    end
    fin_node += npc
  end

 

  for i in 1:noise
    he = randsubseq(1:node_num, 0.5)
    he = Dict([i => 1 for i in he])
    add_hyperedge!(h, vertices=he)
  end

  h
end
h = create_hypergraph([100 for i in 1:5], [20 for i in 1:5], 0.5, 4)
rp = randompartition(h, 30)

npcs is node per clusters. This is array.
hepcs is hyperedge per clusters. This is array.
Arbitrary node is included hyperedges with he_rate probability.
Generated hypergraph has noise hyperedges. There are not fully included Arbitrary cluster.

Generated like this incidence matrix.

julia> mod1 = @btime modularity(h, rp)
  589.906 μs (3526 allocations: 194.89 KiB)
-1.5186397093946605e-53

julia> mod2 = @btime modularity2(h, rp)
  222.981 μs (5718 allocations: 152.38 KiB)
-1.518639709394661e-53

And accelerate when using findcommunities.

Using conventional modularity.

julia> cfm = CFModularityCNMLike(1000)
CFModularityCNMLike(1000)

julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
  memory estimate:  335.32 MiB
  allocs estimate:  6643143
  --------------
  minimum time:     1.214 s (0.00% GC)
  median time:      1.781 s (3.26% GC)
  mean time:        1.911 s (3.10% GC)
  maximum time:     2.739 s (4.38% GC)
  --------------
  samples:          3
  evals/sample:     1

Using my modularity

julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
  memory estimate:  265.13 MiB
  allocs estimate:  10089341
  --------------
  minimum time:     585.823 ms (0.00% GC)
  median time:      761.730 ms (7.58% GC)
  mean time:        976.228 ms (5.85% GC)
  maximum time:     1.728 s (6.59% GC)
  --------------
  samples:          6
  evals/sample:     1

However, I understand that not everything is better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate to calcuration modularity #34

Accelerate to calcuration modularity #34

Inazuma110 commented Mar 12, 2020

pszufe commented Mar 12, 2020

Inazuma110 commented Mar 13, 2020

Accelerate to calcuration modularity #34

Accelerate to calcuration modularity #34

Comments

Inazuma110 commented Mar 12, 2020

pszufe commented Mar 12, 2020

Inazuma110 commented Mar 13, 2020