I want to capture hierarchy structure of the discussions and measure the related metric longitudinally. #272

drAcad24 · 2024-07-12T12:26:40Z

drAcad24
Jul 12, 2024

Example Code

(I have yet not been able to identify the code and have only functional understanding)

Description

I have to create tree using this data (overall data has 18000 rows)

And then capture the following structure so that longitudinally measurements can be done

My final objective is to measure the following,

mean/median depth of tree every day
mean/median breadth of tree every day
no. of unique participants every day
measure virality weiner index every day

Additional Context

No response

Answered by kayjan

Jul 15, 2024

The way bigtree is useful here is just to provide a quick and easy way to derive the depth of each node by constructing a tree. If you have an alternative way to derive the depth of each node, then you can bypass the use of bigtree. Hope this helps! If this answers your question, you can mark the comment above as answer and close this thread 👍

View full answer

kayjan · 2024-07-13T06:18:35Z

kayjan
Jul 13, 2024
Maintainer

Hello, to get the parameters (mean/median depth etc.) I would suggest to use pandas Dataframe operations to do your various aggregation. How bigtree can be used in this scenario is to convert the whole data into a tree using dataframe_to_tree_by_relation, then export the tree to dataframe and extract the depth of each node.

From there you can merge your original dataframe with the depth information. One assumption I’m making here is that the names of child and parent nodes are unique (so they will be matched with the correct depth information).

Another question I have would be how would breadth of tree and virality weiner index be calculated?

0 replies

drAcad24 · 2024-07-13T12:22:06Z

drAcad24
Jul 13, 2024
Author

Thanks, would it be possible for you guide to some functional code to get started ? Also, will the time aspect of the tree too will get captured while doing the transformation ?

Following is the understanding on Weiner's Index

0 replies

kayjan · 2024-07-13T14:48:30Z

kayjan
Jul 13, 2024
Maintainer

Below is some functional code, and what I mean by doing the aggregation / mathematical operation using pandas DataFrame operations.

Set up your data

import pandas as pd

data = pd.DataFrame(
    [
        ["a", "k", "2022-01-01 01:31"],
        ["b", "k", "2022-01-01 03:12"],
        ["c", "b", "2022-01-01 03:16"],
        ["d", "c", "2022-01-01 04:20"],
        ["e", "k", "2022-01-01 05:38"],
        ["f", "k", "2022-01-01 08:34"],
        ["g", "k", "2022-01-01 08:50"],
        ["h", "f", "2022-01-01 20:27"],
        ["i", "f", "2022-01-02 00:34"],
        ["j", "e", "2022-01-02 00:36"],
    ],
    columns=["child", "parent_id", "posting_date"]
)

Convert dataframe -> tree -> dataframe to extract out depth information

from bigtree import dataframe_to_tree_by_relation, tree_to_dataframe

tree = dataframe_to_tree_by_relation(data)

# Preview tree
tree.show(attr_list=["depth"])
# k [depth=1]
# ├── a [depth=2]
# ├── b [depth=2]
# │   └── c [depth=3]
# │       └── d [depth=4]
# ├── e [depth=2]
# │   └── j [depth=3]
# ├── f [depth=2]
# │   ├── h [depth=3]
# │   └── i [depth=3]
# └── g [depth=2]

data_depth = tree_to_dataframe(tree, name_col="child", attr_dict={"depth": "depth"})

# Preview data_depth
#         path child  depth
# 0         /k     k      1
# 1       /k/a     a      2
# 2       /k/b     b      2
# 3     /k/b/c     c      3
# 4   /k/b/c/d     d      4
# 5       /k/e     e      2
# 6     /k/e/j     j      3
# 7       /k/f     f      2
# 8     /k/f/h     h      3
# 9     /k/f/i     i      3
# 10      /k/g     g      2

Combine dataframe and perform dataframe operations

data_all = pd.merge(data, data_depth, on=["child"], how="left")
data_all["date"] = pd.to_datetime(data_all["posting_date"]).dt.date

data_all.groupby("date").agg(depth_sum=("depth", "sum"), depth_count=("depth", "count"))
#             depth_sum  depth_count
# date                              
# 2022-01-01         20            8
# 2022-01-02          6            2

From the result of the sum and count of depth, you can retrieve the daily mean (or cumulative mean) depth. To retrieve median depth, you can collect the depth information as list etc. I would think dataframe operations would still be the most efficient way of performing your calculations.

0 replies

drAcad24 · 2024-07-13T15:11:10Z

drAcad24
Jul 13, 2024
Author

Thanks. One more question, is it necessary to hardfeed the data ? For instance, the data I shared here is just a representation but my original data runs to 18000 rows.

0 replies

kayjan · 2024-07-15T06:56:16Z

kayjan
Jul 15, 2024
Maintainer

The way bigtree is useful here is just to provide a quick and easy way to derive the depth of each node by constructing a tree. If you have an alternative way to derive the depth of each node, then you can bypass the use of bigtree. Hope this helps! If this answers your question, you can mark the comment above as answer and close this thread 👍

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to capture hierarchy structure of the discussions and measure the related metric longitudinally. #272

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

I want to capture hierarchy structure of the discussions and measure the related metric longitudinally. #272

drAcad24 Jul 12, 2024

Example Code

Description

Additional Context

Replies: 5 comments

kayjan Jul 13, 2024 Maintainer

drAcad24 Jul 13, 2024 Author

kayjan Jul 13, 2024 Maintainer

Set up your data

Convert dataframe -> tree -> dataframe to extract out depth information

Combine dataframe and perform dataframe operations

drAcad24 Jul 13, 2024 Author

kayjan Jul 15, 2024 Maintainer

drAcad24
Jul 12, 2024

kayjan
Jul 13, 2024
Maintainer

drAcad24
Jul 13, 2024
Author

kayjan
Jul 13, 2024
Maintainer

drAcad24
Jul 13, 2024
Author

kayjan
Jul 15, 2024
Maintainer