File formats

tsinfer uses the excellent zarr library to encode data in a form that is both compact and efficient to process. See the API documentation for details on how to construct and manipulate these files using Python. The tsinfer list command provides a way to print out a summary of these files.

Samples File

The samples file is tsinfer's input format. Data must be converted into this format before it can be processed using the SampleData class.

Todo

Document the structure of the samples file.

Ancestors File

The ancestors file contains the ancestral haplotype data inferred from the sample data in the Generate ancestors step.

Todo

Document the structure of the ancestors file.

Tree sequences

The goal of tsinfer is to infer correlated genealogies from variation data, and it uses the very efficient succinct tree sequence data structure to encode this output. Please see the tskit documentation for details on how to process and manipulate such tree sequences.

The intermediate .ancestors.trees file produced by the Match ancestors step is also a tree sequence and can be loaded and analysed using the tskit API.