-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SWC io: DFS-sort nodes, optionally do not reindex #110
base: master
Are you sure you want to change the base?
Conversation
Out of curiosity: have you experienced issues with the previous sorting when loading the SWC files elsewhere? Also: Did you do any benchmarking, i.e. does this impact performance for writing large numbers (say >1000) of SWC? |
I haven't experienced any issues with the previous sorting but I don't think I use any SWC readers which are actually dependent on the order (and wrote a rust CLI for sanitising any SWC files we're sending elsewhere, because of course I did...). But I'm pretty sure the previous implementation, sorting by parent ID, doesn't ensure parents are defined before children: if you trace from dendrite to soma and then sort by parent node ID you'll have the root first, but the second row will be the end with a parent which isn't defined yet. We could either drop the convention that they are sorted (which doesn't matter much, at least in the SWC implementations I've used/contributed to), and not sort at all, or sort them in a way which does what it needs to, for which I think the DFS is best suited. It does do some possibly-unnecessary sorts internally, so that each node's children is addressed in a predictable order; could cut a little time out by addressing them in arbitrary order, or reverse insertion order. I haven't done any benchmarking but can give it a go. My intuition is that it will slow down For me, main thrust of the PR is allowing persisting node IDs, the sorting is just something I noticed along the way. |
The DFS sort costs about 9ms for a randomly-shuffled SWC file from |
Thanks for testing this! I think 9% slow down isn't the end of the world. I guess there are actually three scenarios to consider here:
It'd be nice if we could cater for all cases but not super hung up on it. I wonder if this warrants changing/clarifying the parameter though? Perhaps something like this:
If you think scenario 2 is unnecessary and complicates things, I'd be happy to drop it but I think changing the parameter name could be helpful. Thoughts? |
Yes, those are the cases I arrived at for the rust tool too. I guess probably best handled as two bools, "sort" (maybe topo_sort) and "reindex"? Or a single IntEnum, but they're not very clear/common. I would always return the index mapping on reindex (if we're generating it anyway, no reason not to return it). I'd be inclined to return None in its place if no reindexing happened; easier to do type annotations that way, but that would be a breaking change (although so would changing the parameter name). |
Previously, nodes were sorted by parent index, which would ensure that root nodes appear before non-roots but did not guarantee that all children are defined after their parent. Now, nodes are sorted in depth-first order (addressing their children in order of their index). Additionally, added the option of return_node_map=None, which does not re-index nodes before writing. This is useful when exporting a SWC files of several neurons along with where node IDs must persist their global uniqueness. Finally, adds a label "9" for nodes which have both pre- and post-synapses.
Previously, nodes were sorted by parent index, which would ensure that root nodes appear before non-roots but did not guarantee that all children are defined after their parent.
Now, nodes are sorted in depth-first order (addressing their children in order of their index).
Additionally, added the option of
return_node_map=None
, which does not re-index nodes before writing.This is useful when exporting SWC files of several neurons where node IDs must persist their global uniqueness.
Finally, adds a label "9" for nodes which have both pre- and
post-synapses.