-
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Hello, to get the parameters (mean/median depth etc.) I would suggest to use pandas Dataframe operations to do your various aggregation. How From there you can merge your original dataframe with the depth information. One assumption I’m making here is that the names of child and parent nodes are unique (so they will be matched with the correct depth information). Another question I have would be how would breadth of tree and virality weiner index be calculated? |
Beta Was this translation helpful? Give feedback.
-
Thanks, would it be possible for you guide to some functional code to get started ? Also, will the time aspect of the tree too will get captured while doing the transformation ? Following is the understanding on Weiner's Index |
Beta Was this translation helpful? Give feedback.
-
Below is some functional code, and what I mean by doing the aggregation / mathematical operation using pandas DataFrame operations. Set up your dataimport pandas as pd
data = pd.DataFrame(
[
["a", "k", "2022-01-01 01:31"],
["b", "k", "2022-01-01 03:12"],
["c", "b", "2022-01-01 03:16"],
["d", "c", "2022-01-01 04:20"],
["e", "k", "2022-01-01 05:38"],
["f", "k", "2022-01-01 08:34"],
["g", "k", "2022-01-01 08:50"],
["h", "f", "2022-01-01 20:27"],
["i", "f", "2022-01-02 00:34"],
["j", "e", "2022-01-02 00:36"],
],
columns=["child", "parent_id", "posting_date"]
) Convert dataframe -> tree -> dataframe to extract out depth informationfrom bigtree import dataframe_to_tree_by_relation, tree_to_dataframe
tree = dataframe_to_tree_by_relation(data)
# Preview tree
tree.show(attr_list=["depth"])
# k [depth=1]
# ├── a [depth=2]
# ├── b [depth=2]
# │ └── c [depth=3]
# │ └── d [depth=4]
# ├── e [depth=2]
# │ └── j [depth=3]
# ├── f [depth=2]
# │ ├── h [depth=3]
# │ └── i [depth=3]
# └── g [depth=2]
data_depth = tree_to_dataframe(tree, name_col="child", attr_dict={"depth": "depth"})
# Preview data_depth
# path child depth
# 0 /k k 1
# 1 /k/a a 2
# 2 /k/b b 2
# 3 /k/b/c c 3
# 4 /k/b/c/d d 4
# 5 /k/e e 2
# 6 /k/e/j j 3
# 7 /k/f f 2
# 8 /k/f/h h 3
# 9 /k/f/i i 3
# 10 /k/g g 2 Combine dataframe and perform dataframe operationsdata_all = pd.merge(data, data_depth, on=["child"], how="left")
data_all["date"] = pd.to_datetime(data_all["posting_date"]).dt.date
data_all.groupby("date").agg(depth_sum=("depth", "sum"), depth_count=("depth", "count"))
# depth_sum depth_count
# date
# 2022-01-01 20 8
# 2022-01-02 6 2 From the result of the sum and count of depth, you can retrieve the daily mean (or cumulative mean) depth. To retrieve median depth, you can collect the depth information as list etc. I would think dataframe operations would still be the most efficient way of performing your calculations. |
Beta Was this translation helpful? Give feedback.
-
Thanks. One more question, is it necessary to hardfeed the data ? For instance, the data I shared here is just a representation but my original data runs to 18000 rows. |
Beta Was this translation helpful? Give feedback.
-
The way |
Beta Was this translation helpful? Give feedback.
The way
bigtree
is useful here is just to provide a quick and easy way to derive the depth of each node by constructing a tree. If you have an alternative way to derive the depth of each node, then you can bypass the use of bigtree. Hope this helps! If this answers your question, you can mark the comment above as answer and close this thread 👍