samtools · tomwhite · Jun 5, 2018 · Jun 5, 2018 · Jun 6, 2018 · Jun 7, 2018
diff --git a/SAMv1.tex b/SAMv1.tex
@@ -1204,6 +1204,48 @@ \subsection{C source code for computing bin number and overlapping bins}\label{s
 \end{verbatim}
 }
 
+\subsection{Splitting BAM}\label{sec:code}
+A BAM file can be processed in parallel by conceptually dividing the file into
+splits (typically of a fixed, but arbitrary, number of bytes) and for each
+split processing alignments from the first known alignment after the split
+start up to the first known alignment of the next split.
+
+A splitting BAM index is a linear index of virtual file offsets of alignment
+start positions. The index must contain the virtual file offset for the first
+alignment, and a virtual file offset for the overall length of the BAM
+file.\footnote{In the unlikely event the BAM file has no alignment records,
+the index will consist of a single entry for the overall length of the
+BAM file.} It does not need to contain a virtual file offset for every
+alignment, merely a subset. A granularity of $n$ means that an offset is
+written for every $n$ alignments.
+
+To find the alignments for a split that covers a byte range {\tt [beg,\,end)}
+use the index to find the smallest virtual file offset, {\tt v1}, that falls
+in this range, and the smallest virtual file offset, {\tt v2}, that is
+greater than or equal to {\tt end}. If {\tt v1} does not exist, then the
+split has no alignments. Otherwise, it has alignments in the range
+{\tt [v1,\,v2)}. This method will map a set of contiguous, non-overlapping
+{\it file ranges} that cover the whole BAM file to a set of contiguous,
+non-overlapping {\it virtual file ranges} that cover the whole file.
+
+Splitting BAM index filenames have a {\tt .sbi} extension added to the BAM
+filename (so {\tt foo.bam.sbi} is the splitting BAM index filename for
+{\tt foo.bam}). Index files contain a header followed by a sorted list of
+virtual files offsets in ascending order.
+
+\begin{table}[ht]
+\centering
+{\small
+\begin{tabular}{|l|l|l|p{8.15cm}|l|r|}
+  \cline{1-6}
+  \multicolumn{3}{|c|}{\bf Field} & \multicolumn{1}{c|}{\bf Description} & \multicolumn{1}{c|}{\bf Type} & \multicolumn{1}{c|}{\bf Value} \\\cline{1-6}
+  \multicolumn{3}{|l|}{\sf magic} & Magic string & {\tt char[4]} & {\tt SBI\char92 1}\\\cline{1-6}
+  \multicolumn{3}{|l|}{\sf granularity} & Number of alignments between offsets, or $-1$ if unspecified & {\tt int32\_t} & \\\cline{1-6}
+  \multicolumn{6}{|c|}{\textcolor{gray}{\it List of offsets}} \\\cline{2-6}
+  & \multicolumn{2}{l|}{\sf offset} & Virtual file offset of the alignment & {\tt uint64\_t} & \\\cline{1-6}
+\end{tabular}}
+\end{table}
+
 \pagebreak
 
 \begin{appendices}