SWC io: DFS-sort nodes, optionally do not reindex #110

clbarnes · 2022-11-23T14:21:15Z

Previously, nodes were sorted by parent index, which would ensure that root nodes appear before non-roots but did not guarantee that all children are defined after their parent.
Now, nodes are sorted in depth-first order (addressing their children in order of their index).

Additionally, added the option of return_node_map=None, which does not re-index nodes before writing.
This is useful when exporting SWC files of several neurons where node IDs must persist their global uniqueness.

Finally, adds a label "9" for nodes which have both pre- and
post-synapses.

schlegelp · 2022-11-23T14:48:35Z

Out of curiosity: have you experienced issues with the previous sorting when loading the SWC files elsewhere?

Also: Did you do any benchmarking, i.e. does this impact performance for writing large numbers (say >1000) of SWC?

clbarnes · 2022-11-23T16:24:33Z

Out of curiosity: have you experienced issues with the previous sorting when loading the SWC files elsewhere?

I haven't experienced any issues with the previous sorting but I don't think I use any SWC readers which are actually dependent on the order (and wrote a rust CLI for sanitising any SWC files we're sending elsewhere, because of course I did...). But I'm pretty sure the previous implementation, sorting by parent ID, doesn't ensure parents are defined before children: if you trace from dendrite to soma and then sort by parent node ID you'll have the root first, but the second row will be the end with a parent which isn't defined yet.

We could either drop the convention that they are sorted (which doesn't matter much, at least in the SWC implementations I've used/contributed to), and not sort at all, or sort them in a way which does what it needs to, for which I think the DFS is best suited. It does do some possibly-unnecessary sorts internally, so that each node's children is addressed in a predictable order; could cut a little time out by addressing them in arbitrary order, or reverse insertion order.

I haven't done any benchmarking but can give it a go. My intuition is that it will slow down make_swc_table a bit but that that may not matter so much when it's rolled in to the whole process of file IO.

For me, main thrust of the PR is allowing persisting node IDs, the sorting is just something I noticed along the way.

clbarnes · 2022-11-23T17:47:26Z

The DFS sort costs about 9ms for a randomly-shuffled SWC file from navis.data compared to the current sort. The internal sorts cost about 0.3ms. make_swc_table using the other changes in this PR but with the current sort takes ~64ms (so changing to the DFS sort would increase runtime by 14%). write_swc for a single SWC file takes 100ms (so changing to the DFS sort would increase runtime by 9%).

schlegelp · 2022-11-24T10:13:52Z

Thanks for testing this! I think 9% slow down isn't the end of the world.

I guess there are actually three scenarios to consider here:

Leave node IDs as is (which may be useful when working with e.g. CATMAID data but could cause troubles with some parsers)
Re-index nodes such that their IDs are sequential but don't worry about re-ordering them (i.e. half-compliant)
Re-index AND re-order using DFS

It'd be nice if we could cater for all cases but not super hung up on it. I wonder if this warrants changing/clarifying the parameter though? Perhaps something like this:

reindex_nodes :   'DSF' | bool
                    If 'DSF' (default), will use a depth-first search to reindex and reorder nodes
                    such that each parent is guaranteed to be defined before its childs. This comes
                    at the cost of a small overhead but makes the file fully compliant with the SWC
                    format (although most modern parsers don't strictly require this).
                    If True, will re-index nodes but not re-order the nodes (slightly faster) and good
                    enough for most parsers.
                    If False, will not reindex nodes (i.e. their current IDs will be written).

Returns 
--------
node_id_map :    dict
                   If reindex_nodes is not False will return a dictionary mapping the old to the re-indexed node IDs.

If you think scenario 2 is unnecessary and complicates things, I'd be happy to drop it but I think changing the parameter name could be helpful. Thoughts?

clbarnes · 2022-11-24T12:25:48Z

Yes, those are the cases I arrived at for the rust tool too. I guess probably best handled as two bools, "sort" (maybe topo_sort) and "reindex"? Or a single IntEnum, but they're not very clear/common. I would always return the index mapping on reindex (if we're generating it anyway, no reason not to return it). I'd be inclined to return None in its place if no reindexing happened; easier to do type annotations that way, but that would be a breaking change (although so would changing the parameter name).

Previously, nodes were sorted by parent index, which would ensure that root nodes appear before non-roots but did not guarantee that all children are defined after their parent. Now, nodes are sorted in depth-first order (addressing their children in order of their index). Additionally, added the option of return_node_map=None, which does not re-index nodes before writing. This is useful when exporting a SWC files of several neurons along with where node IDs must persist their global uniqueness. Finally, adds a label "9" for nodes which have both pre- and post-synapses.

clbarnes force-pushed the swc-ids branch from 2f28845 to d87fc26 Compare November 23, 2022 14:24

clbarnes added 7 commits July 28, 2023 10:36

Import defaultdict

a8356c5

SWC io: import numpy

168d990

Minor bugfix in writer

be3cc1a

Fix radius bug

e19a6e2

swc: Add 'pre and post' label to header

e169670

Test and fix SWC DFS sorting

2e8296f

clbarnes force-pushed the swc-ids branch from ff99172 to 2e8296f Compare July 28, 2023 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWC io: DFS-sort nodes, optionally do not reindex #110

SWC io: DFS-sort nodes, optionally do not reindex #110

clbarnes commented Nov 23, 2022 •

edited

Loading

schlegelp commented Nov 23, 2022

clbarnes commented Nov 23, 2022 •

edited

Loading

clbarnes commented Nov 23, 2022 •

edited

Loading

schlegelp commented Nov 24, 2022

clbarnes commented Nov 24, 2022

SWC io: DFS-sort nodes, optionally do not reindex #110

Are you sure you want to change the base?

SWC io: DFS-sort nodes, optionally do not reindex #110

Conversation

clbarnes commented Nov 23, 2022 • edited Loading

schlegelp commented Nov 23, 2022

clbarnes commented Nov 23, 2022 • edited Loading

clbarnes commented Nov 23, 2022 • edited Loading

schlegelp commented Nov 24, 2022

clbarnes commented Nov 24, 2022

clbarnes commented Nov 23, 2022 •

edited

Loading

clbarnes commented Nov 23, 2022 •

edited

Loading

clbarnes commented Nov 23, 2022 •

edited

Loading