Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Commit Networks #263

Merged
merged 16 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,21 @@
- Add line-based code coverage reports into CI pipeline. Coverage reports are generated by `coverage.R` (PR #262, 10cac49d005e87c3964cc61711e7f5acef749626, b3b9f4ac7a9911bd00293c68fac88e0f9033bdfb, c815d18dc6266d620a7a145493417b87ac08679e, e8093525fdaf46e54f2f7fcc6358ca7892e795e5, 32d04823e2007c63d2a43ce59bea3057327c19a7)
- Add the possibility to split data time-based by multiple data sources (PR #261, 1088395f46b84028c8d7c463ca86b5dc38500c26, e1f79fc9e40cd6f41c946be42db364b2101cfe10, 0bb187fec0fd801d7634bf8d5180525770f6ab0b, 371a97ac6ebf3de4fe9360dea79d62e2ed3ef585)
- Add tests for uncovered functionality in `util-misc.R` and `util-networks.R` (PR #264, ff30f3238b1bf2539280d0d055a5d925c197c271, af80551d0615a49b86e45ff596bd75941ee88f91)
- Add commit network as a new type of network. It uses commits as vertices and connects them either via cochange or commit interactions. This includes adding new config parameters and a function for adding vertex attributes to a commit network(PR #263, ab73271781e8e9a0715f784936df4b371d64c338, ab73271781e8e9a0715f784936df4b371d64c338, cd9a930fcb54ff465c2a5a7c43cfe82ac15c134d)
bockthom marked this conversation as resolved.
Show resolved Hide resolved

### Changed/Improved

- Change the default value for the `issues.from.source` configuration parameter. Instead of reading JIRA and GitHub issues together, which was the previous default, the new default value causes only GitHub issue data to be read. To restore the previous default behavior and read data from both issue sources, this now needs to be manually configured when needed. (PR #264, 5ff83c364f6bfc1e6ff95e9c5f1087e031c48a5d, 8c8080cb9caf115f19d9f145ad6e6c108b131a67, 8bcbc81db521877908d2e5c2989082ed672f2a3b)
- Replace deprecated `igraph` functions by their preferred alternatives (PR #264, 0df9d5bf6bafbb5d440f4c47db4ec901cf11f037)
- Deprecate support for R version 3.6 (PR #264, c8e6f45111e487fadbe7f0a13c7595eb23f3af6e, fb3f5474259d4a88f4ff545691cca9d1ccde90e3)
- Explicitly add R version 4.4 to the CI test pipeline (c8e6f45111e487fadbe7f0a13c7595eb23f3af6e)
- Refactor function `construct.edge.list.from.key.value.list` to be more readable(PR #263, 05c3bc09cb1d396fd59c34a88030cdca58fd04dd)
bockthom marked this conversation as resolved.
Show resolved Hide resolved

### Fixed

- Fix the creation of edgelists for issue-based artifact-networks by correctly iterating over the issue data (PR #264, 321d85043112971c04998249c14a0677a32c9004)
- Fix networks based upon commit interaction data to also have the attribute `artifact.type`(PR #263, 849123a8b7d898fbb1343745ecffc1f6000c9367)
bockthom marked this conversation as resolved.
Show resolved Hide resolved
- Fix endless recursion that could occur when commit interaction data was configured and commit data is empty (PR #263, 3fb7437b68950303916b62984fa449732c70353e)
bockthom marked this conversation as resolved.
Show resolved Hide resolved

## 4.4

Expand Down
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,11 @@ There are four types of networks that can be built using this library: author ne
* The vertices in an artifact network denote any kind of artifact, e.g., source-code artifact (such as features or files) or communication artifact (such as mail threads or issues). All artifact-type vertices are uniquely identifiable by their name. There are only unipartite edges among artifacts in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`. The relation also describes which kinds of artifacts are represented as vertices in the network. (For example, if "mail" is selected as `artifact.relation`, only mail-thread vertices are included in the network.)

- Commit networks
bockthom marked this conversation as resolved.
Show resolved Hide resolved
* The vertices in a commit network denote any commits in the data. All vertices
are uniquely identifyable by the hash of the commit. There are only unipartite edges among commits in this type of network.
* The relations (i.e., the edges meaning and source) can be configured using the [`networkConf`](#networkconf) attribute `commit.relation`. The relation also describes the type of data used for network construction (`cochange` uses commit data, `commit.interaction` uses commit interaction data).
bockthom marked this conversation as resolved.
Show resolved Hide resolved

- Bipartite networks
* The vertices in a bipartite network denote both authors and artifacts. There are only bipartite edges from authors to artifacts in this type of network.
* The relations (i.e., the edges' meaning and source) can be configured using the [`NetworkConf`](#networkconf) attribute `artifact.relation`.
Expand All @@ -249,6 +254,7 @@ Relations determine which information is used to construct edges among the verti
- `cochange`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who change the same source-code artifact are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), source-code artifacts that are concurrently changed in the same commit are connected with an edge.
* For commit networks (configured vie `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected if they change the same artifact.
* For bipartite networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), authors get linked to all source-code artifacts they have changed in their respective commits.

- `mail`
Expand All @@ -269,6 +275,7 @@ Relations determine which information is used to construct edges among the verti
- `commit.interaction`
* For author networks (configured via `author.relation` in the [`NetworkConf`](#networkconf)), authors who contribute to interacting commits are connected with an edge.
* For artifact networks (configured via `artifact.relation` in the [`NetworkConf`](#networkconf)), artifacts are connected when there is an interaction between two commits that occur in the artifacts.
* For commit networks (configured via `commit.relation` in the [`NetworkConf`](#networkconf)), commits are connected when they interact in the commit interaction data.
bockthom marked this conversation as resolved.
Show resolved Hide resolved
* This relation does not apply for bipartite networks.

#### Edge-construction algorithms for author networks
Expand Down Expand Up @@ -623,7 +630,7 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
- `author.relation`
* The relation(s) among authors, encoded as edges in an author network
* **Note**: The author--artifact relation in bipartite and multi networks is configured by `artifact.relation`!
* possible values: [*`"mail"`*, `"cochange"`, `"issue"`]
* possible values: [*`"mail"`*, `"cochange"`, `"issue"`, `commit.interaction`]
bockthom marked this conversation as resolved.
Show resolved Hide resolved
- `author.directed`
* The directedness of edges in an author network
* [`TRUE`, *`FALSE`*]
Expand All @@ -642,7 +649,7 @@ Updates to the parameters can be done by calling `NetworkConf$update.variables(.
- `artifact.relation`
* The relation(s) among artifacts, encoded as edges in an artifact network
* **Note**: Additionally, this relation configures also the author--artifact relation in bipartite and multi networks!
* possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`]
* possible values: [*`"cochange"`*, `"callgraph"`, `"mail"`, `"issue"`, `commit.interaction`]
bockthom marked this conversation as resolved.
Show resolved Hide resolved
- `artifact.directed`
* The directedness of edges in an artifact network
* **Note**: This parameter does only affect the `issue` relation, as the `cochange` relation is always undirected, while the `callgraph` relation is always directed. For the `mail`, we currently do not have data available to exhibit edge information.
Expand Down
11 changes: 10 additions & 1 deletion showcase.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
## Copyright 2021 by Niklas Schneider <[email protected]>
## Copyright 2022 by Jonathan Baumann <[email protected]>
## Copyright 2024 by Maximilian Löffler <[email protected]>
## Copyright 2024 by Leo Sendelbach <[email protected]>
## All Rights Reserved.


Expand Down Expand Up @@ -65,6 +66,7 @@ ARTIFACT = "feature" # function, feature, file, featureexpression (only relevant

AUTHOR.RELATION = "mail" # mail, cochange, issue
ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue
COMMIT.RELATION = "commit.interaction" # commit.interaction, cochange


## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Expand All @@ -73,13 +75,16 @@ ARTIFACT.RELATION = "cochange" # cochange, callgraph, mail, issue
## initialize project configuration
proj.conf = ProjectConf$new(CF.DATA, CF.SELECTION.PROCESS, CASESTUDY, ARTIFACT)
proj.conf$update.value("commits.filter.base.artifact", TRUE)
proj.conf$update.value("commit.interactions", TRUE)
## specify that custom event timestamps should be read from 'custom-events.list'
proj.conf$update.value("custom.event.timestamps.file", "custom-events.list")
proj.conf$print()

## initialize network configuration
net.conf = NetworkConf$new()
net.conf$update.values(updated.values = list(author.relation = AUTHOR.RELATION, artifact.relation = ARTIFACT.RELATION))
net.conf$update.values(updated.values = list(author.relation = AUTHOR.RELATION,
artifact.relation = ARTIFACT.RELATION,
commit.relation = COMMIT.RELATION))
net.conf$print()

## get ranges
Expand Down Expand Up @@ -141,6 +146,7 @@ x$get.author.network()
x$update.network.conf(updated.values = list(author.directed = FALSE))
x$get.author.network()
x$get.artifact.network()
x$get.commit.network()
x$reset.environment()
x$get.networks()
x$update.network.conf(updated.values = list(author.only.committers = FALSE, author.directed = FALSE))
Expand Down Expand Up @@ -201,6 +207,7 @@ y$update.network.conf(updated.values = list(edge.attributes = c("date")))
y$get.author.network()
y$update.network.conf(updated.values = list(edge.attributes = c("hash")))
y$get.artifact.network()
y$get.commit.network()
y$get.networks()
y$update.network.conf(updated.values = list(author.only.committers = FALSE, author.directed = TRUE))
h = y$get.bipartite.network()
Expand Down Expand Up @@ -232,6 +239,8 @@ sample.pull.requests = add.vertex.attribute.author.issue.count(my.networks, x.da
## add vertex attributes for the project-level network
x.net.as.list = list("1970-01-01 00:00:00-2030-01-01 00:00:00" = x$get.author.network())
sample.entire = add.vertex.attribute.author.commit.count(x.net.as.list, x.data, aggregation.level = "complete")
## add vertex attributes to commit network. Default value 'NO_AUTHOR' is used if vertex is not in commit data
add.vertex.attribute.commit.network(x$get.commit.network(), x.data, attr.name = "author.name", default.value = "NO_AUTHOR")


## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
Expand Down
10 changes: 6 additions & 4 deletions tests/test-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -564,15 +564,15 @@ test_that("Compare two ProjectData Objects with commit.interactions", {
proj.data.two$set.commits(create.empty.commits.list())

## create empty data frame of correct size
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 8))
commit.interactions.data.expected = data.frame(matrix(nrow = 4, ncol = 9))
## assure that the correct type is used
for(i in seq_len(8)) {
for(i in seq_len(9)) {
commit.interactions.data.expected[[i]] = as.character(commit.interactions.data.expected[[i]])
}
## set everything except for authors as expected
colnames(commit.interactions.data.expected) = c("commit.hash", "base.hash", "func", "file",
"base.func", "base.file", "base.author",
"interacting.author")
"base.func", "base.file","artifact.type",
"base.author", "interacting.author")
commit.interactions.data.expected[["commit.hash"]] =
c("0a1a5c523d835459c42f33e863623138555e2526",
"418d1dc4929ad1df251d2aeb833dd45757b04a6f",
Expand All @@ -588,6 +588,8 @@ test_that("Compare two ProjectData Objects with commit.interactions", {
commit.interactions.data.expected[["base.func"]] = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2")
commit.interactions.data.expected[["base.file"]] = c("test2.c", "test2.c", "test3.c", "test2.c")
commit.interactions.data.expected[["artifact.type"]] = c("CommitInteraction", "CommitInteraction",
"CommitInteraction", "CommitInteraction")

expect_equal(proj.data.two$get.commit.interactions(), commit.interactions.data.expected)

Expand Down
2 changes: 2 additions & 0 deletions tests/test-networks-artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
"test3.c::test_function", "test2.c::test2"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
artifact.type = c("File", "File", "File", "File"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
Expand Down Expand Up @@ -301,6 +302,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
base.author = c("Olaf", "Thomas", "Karl", "Thomas"),
interacting.author = c("Thomas", "Karl", "Olaf", "Thomas"),
artifact.type = c("Function", "Function", "Function", "Function"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
Expand Down
1 change: 1 addition & 0 deletions tests/test-networks-author.R
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,7 @@ patrick::with_parameters_test_that("Network construction with commit-interaction
base.func = c("test2.c::test2", "test2.c::test2",
"test3.c::test_function", "test2.c::test2"),
base.file = c("test2.c", "test2.c", "test3.c", "test2.c"),
artifact.type = c("CommitInteraction", "CommitInteraction", "CommitInteraction", "CommitInteraction"),
weight = c(1, 1, 1, 1),
type = c(TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA, TYPE.EDGES.INTRA),
relation = c("commit.interaction", "commit.interaction", "commit.interaction", "commit.interaction")
Expand Down
Loading