Releases: tskit-dev/tskit
C API 0.99.8
Minor feature release
New features
-
Add
tsk_treeseq_genetic_relatedness
for calculating genetic relatedness between
pairs of sets of nodes (@brieuclehmann, #1021, #1023, #974, #973, #898). -
Exposed
tsk_table_collection_set_indexes
to the API
(@benjeffery, #870, #921).
Breaking changes
-
Added an
options
argument totsk_table_collection_equals
and table equality methods to allow for more flexible equality criteria
(e.g., ignore top-level metadata and schema or provenance tables).
Existing code should add an extra final parameter0
to retain the
current behaviour (@mufernando, @jeromekelleher,
#896, #897, #913, #917). -
Changed default behaviour of
tsk_table_collection_clear
to not clear
provenances and addedoptions
argument to optionally clear provenances
and schemas (@benjeffery, #929, #1001). -
Renamed
tsk_treeseq_trait_regression
totsk_treeseq_trait_linear_model
.
Python 0.3.2
Minor feature release
Breaking changes
- Change several methods (
simplify()
,trees()
,Tree()
) so most parameters
are keyword only, not positional. This allows reordering of parameters, so
that deprecated parameters can be moved, and the parameter order in similar functions,
e.g.TableCollection.simplify
andTreeSequence.simplify()
can be made
consistent (@hyanwong, #374, #846, #851)
Features
-
Tree accessor functions (e.g.
ts.first()
,ts.at()
pass extra parameters such as
sample_indexes
to the underlyingTree
constructor; alsoroot_threshold
can
be specified when callingts.trees()
(@hyanwong, #847, #848) -
Genomic intervals returned by python functions are now namedtuples, allowing
.left
.right
and.span
usage (@hyanwong, #784, #786, #811) -
Added
include_terminal
parameter to edge diffs iterator, to output the last edges
at the end of a tree sequence (@hyanwong, #783, #787) -
#832 - Add
metadata_bytes
method to allow access to raw
TableCollection metadata (@benjeffery, #842) -
tskit.is_unknown_time
can now check arrays. (@benjeffery, #857).
C API 0.99.7
Minor feature release
-
Added
TSK_INCLUDE_TERMINAL
option totsk_diff_iter_init
to output the last edges
at the end of a tree sequence (@hyanwong, #783, #787) -
Added
tsk_bug_assert
for assertions that should be compiled into release binaries
(@benjeffery, #860)
Python 0.3.1
Minor bugfix release
Bugfixes
-
#823 - Fix mutation time error when using
simplify(keep_input_roots=True)
(@petrelharp, #823). -
#821 - Fix mutation rows with unknown time never being equal (@petrelharp, #822).
C API 0.99.6
Bugfixes
- #823 - Fix mutation time error when using
tsk_table_collection_simplify
withTSK_KEEP_INPUT_ROOTS
(@petrelharp, #823).
Python 0.3.0
Major feature release
This release adds metadata schemas, set-like operations, mutation times, SVG drawing improvements and many others. This release also comes with wheels for windows, osx and linux.
❤️ Many thanks go to the tskit community and contributors for their awesome work on this release. ❤️
Breaking changes
-
The default display order for tree visualisations has been changed to
minlex
(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree"
. -
File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError
. Loading from an empty file now raises andEOFError
. -
Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequence
object which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709). -
The
TableCollection object
no longer implements the iterator protocol. Previouslylist(tables)
returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_map
andTreeSequence.tables_dict
attributes, which perform the same function (@jeromekelleher, #500, #694). -
The arguments to
TreeSequence.genotype_matrix
,TreeSequence.haplotypes
andTreeSequence.variants
must now be keyword arguments, not positional. This is to support the change fromimpute_missing_data
toisolated_as_missing
in the arguments to these methods (@benjeffery, #716, #794).
New features
-
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subset
subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.union
forms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623). -
Mutations now have an optional double-precision floating-point
time
column. If not specified, this defaults to a particularNaN
value (tskit.UNKNOWN_TIME
) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see mutation requirements. Also added functionTableCollection.compute_mutation_times
. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672). -
Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
-
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
-
Tables with a metadata column now have a
metadata_schema
that is used to validate and encode metadata that is passed toadd_row
and decode metadata on calls totable[j]
and e.g.tree_sequence.node(j)
See metadata (@benjeffery, #491, #542, #543, #601). -
The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
-
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()
andtskit.TreeSequence.draw_svg()
methods. This also fixes #467 for duplicate SVG entityid
s in Jupyter notebooks (@hyanwong, #555). -
Add a
to_nexus
function that outputs a tree sequence in Nexus format (@saunack, #550). -
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance
(@daniel-goldstein, #548). -
Add an optional node traversal order in
tskit.Tree
that uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder"
) adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411). -
Add an
order
argument to the tree visualisation functions which supports two node orderings:"tree"
(the previous default) and"minlex"
which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"
(@brianzhang01, @jeromekelleher, #389, #566). -
Add
_repr_html_
to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514). -
Remove support for
kc_distance
on trees with unary nodes (@daniel-goldstein, #508). -
Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
-
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
-
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
-
Allow sites with missing data to be output by the
haplotypes
method, by default replacing with-
. Errors are no longer raised for missing data withisolated_as_missing=True
; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryError
to TypeError, or ValueError if the missing data character clashes (@hyanwong, #426). -
Access the number of children of a node in a tree directly using
tree.num_children(u)
(@hyanwong, #436). -
User specified allele mapping for genotypes in
variants
andgenotype_matrix
(@jeromekelleher, #430). -
New
root_threshold
option for the Tree class, which allows us to efficiently iterate over 'real' roots when we have missing data (@jeromekelleher, #462). -
Add
tree.as_dict_of_dicts()
function to enable use with networkx. See the tutorial (@winni2k, #457). -
Add
tree_sequence.to_macs()
function to convert tree sequence to MACS format (@winni2k, #727). -
Add a
keep_input_roots
option to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Bugfixes
- #453 - Fix LibraryError when
tree.newick()
is called with large node time values (@jeromekelleher, #637).
Deprecated
- The
sample_counts
feature has been deprecated and is now ignored. Sample counts are now always computed. - For
TreeSequence.genotype_matrix
,TreeSequence.haplotypes
andTreeSequence.variants
theimpute_missing_data
argument i...
C API 0.99.5
Breaking changes
- The macro
TSK_IMPUTE_MISSING_DATA
is renamed toTSK_ISOLATED_NOT_MISSING
(@benjeffery, #716, #794).
New features
- Add a
TSK_KEEP_INPUT_ROOTS
option to simplify which, if enabled, adds edges from the MRCAs of samples in the simplified tree sequence back to the roots in the input tree sequence (@jeromekelleher, #775, #782).
Python 0.3.0beta2
BETA PRE-RELEASE
Second beta of 0.3.0
Changes from beta 1
-
Mutation times can be a mixture of known and unknown as long as for each individual site they are either all known or all unknown (@benjeffery, #761).
-
Metadata and schemas are stored as canonical JSON to aid byte-wise comparison. Metadata schemas have improved equality methods. (@benjeffery, #764).
Bugfixes
- Fix too small buffer for newick, causing
LibraryError
fortree.newick()
(@jeromekelleher, #754).
C API 0.99.4
Note
- The
TSK_VERSION_PATCH
macro was incorrectly set to4
for 0.99.3, so both
0.99.4 and 0.99.3 have the same value.
Changes
- Mutation times can be a mixture of known and unknown as long as for each
individual site they are either all known or all unknown (@benjeffery, #761).
Bugfixes
- Fix for including core.h under C++ (@petrelharp, #755).
Python 0.3.0.beta1
BETA PRE-RELEASE
Major feature release for metadata schemas, set-like operations, mutation times,
SVG drawing improvements and many others. This release comes with wheels for windows, os and linux.
Breaking changes
-
The default display order for tree visualisations has been changed to
minlex
(see below) to stabilise the node ordering and to make trees more readily comparable. The old behaviour is still available withorder="tree"
. -
File system operations such as dump/load now raise an appropriate OSError instead of
tskit.FileFormatError
. Loading from an empty file now raises andEOFError
. -
Bad tree topologies are detected earlier, so that it is no longer possible to create a
TreeSequence
object which contains a parent with contradictory children on an interval. Previously an error was thrown when some operation building the trees was attempted (@jeromekelleher, #709). -
The
TableCollection object
no longer implements the iterator protocol. Previouslylist(tables)
returned a sequence of (table_name, table_instance) tuples. This has been replaced with the more intuitive and future-proofTableCollection.name_map
andTreeSequence.tables_dict
attributes, which perform the same function (@jeromekelleher, #500, #694).
New features
-
New methods to perform set operations on TableCollections and TreeSequences.
TableCollection.subset
subsets and reorders table collections by nodes (@mufernando, @petrelharp, #663, #690).TableCollection.union
forms the node-wise union of two table collections (@mufernando, @petrelharp, #381 #623). -
Mutations now have an optional double-precision floating-point
time
column. If not specified, this defaults to a particularNaN
value (tskit.UNKNOWN_TIME
) indicating that the time is unknown. For a tree sequence to be considered valid it must meet new criteria for mutation times, see :ref:sec_mutation_requirements
. Also added functionTableCollection.compute_mutation_times
. Table sorting orders mutations by non-increasing time per-site, which is also a requirement for a valid tree sequence (@benjeffery, #672). -
Add support for trees with internal samples for the Kendall-Colijn tree distance metric. (@daniel-goldstein, #610)
-
Add background shading to SVG tree sequences to reflect tree position along the sequence (@hyanwong, #563).
-
Tables with a metadata column now have a
metadata_schema
that is used to validate and encode metadata that is passed toadd_row
and decode metadata on calls totable[j]
and e.g.tree_sequence.node(j)
See :ref:sec_metadata
(@benjeffery, #491, #542, #543, #601). -
The tree-sequence now has top-level metadata with a schema (@benjeffery, #666, #644, #642).
-
Add classes to SVG drawings to allow easy adjustment and styling, and document the new
tskit.Tree.draw_svg()
andtskit.TreeSequence.draw_svg()
methods. This also fixes #467 for duplicate SVG entityid
s in Jupyter notebooks (@hyanwong, #555). -
Add a
nexus
function that outputs a tree sequence in Nexus format (@saunack, #550). -
Add extension of Kendall-Colijn tree distance metric for tree sequences computed by
TreeSequence.kc_distance
(@daniel-goldstein, #548). -
Add an optional node traversal order in
tskit.Tree
that uses the minimum lexicographic order of leaf nodes visited. This ordering ("minlex_postorder"
) adds more determinism because it constraints the order in which children of a node are visited (@brianzhang01, #411). -
Add an
order
argument to the tree visualisation functions which supports two node orderings:"tree"
(the previous default) and"minlex"
which stabilises the node ordering (making it easier to compare trees). The default node ordering is changed to"minlex"
(@brianzhang01, @jeromekelleher, #389, #566). -
Add
_repr_html_
to tables, so that jupyter notebooks render them as html tables (@benjeffery, #514). -
Remove support for
kc_distance
on trees with unary nodes (@daniel-goldstein, #508). -
Improve Kendall-Colijn tree distance algorithm to operate in O(n^2) time instead of O(n^2 * log(n)) where n is the number of samples (@daniel-goldstein, #490).
-
Add a metadata column to the migrations table. Works similarly to existing metadata columns on other tables (@benjeffery, #505).
-
Add a metadata column to the edges table. Works similarly to existing metadata columns on other tables (@benjeffery, #496).
-
Allow sites with missing data to be output by the
haplotypes
method, by default replacing with-
. Errors are no longer raised for missing data withimpute_missing_data=False
; the error types returned for bad alleles (e.g. multiletter or non-ascii) have also changed from_tskit.LibraryError
to TypeError, or ValueError if the missing data character clashes (@hyanwong, #426). -
Access the number of children of a node in a tree directly using
tree.num_children(u)
(@hyanwong, #436). -
User specified allele mapping for genotypes in
variants
andgenotype_matrix
(@jeromekelleher, #430). -
New
root_threshold
option for the Tree class, which allows us to efficiently iterate over 'real' roots when we have missing data (@jeromekelleher, #462). -
Add
tree.as_dict_of_dicts()
function to enable use with networkx. See :ref:sec_tutorial_networkx
(@winni2k, #457).
Bugfixes
- #453 - Fix LibraryError when
tree.newick()
is called with large node time values (@jeromekelleher, #637).
Deprecated
- The
sample_counts
feature has been deprecated and is now ignored. Sample counts are now always computed.