Base class for parallel processing of sequencing reads.
Implement call to define work for each reads chunk, (e.g., chromosome). Unaligned reads are permitted, but the work then cannot rely on any sort of biologically meaningful chunking of the reads unless a partition() function is implemented. If unaligned reads are used and no partition() is implemented, reads will be arbitrarily split into chunks.
def check_command(self, cmd)
Determine whether it appears that a command may be run.
str): command to check for runnability
OSError: if it's possible to verify that running given commandwould fail
def chunk_reads(*args, **kwargs)
def combine(self, good_chromosomes, strict=False, chrom_sep=None)
Aggregate output from independent read chunks into single output file.
Iterable[str]): identifier (e.g., chromosome)for each chunk of reads processed.
bool): whether to throw an exception upon encountering amissing file. If not, simply log a warning message and continue the aggregation process that's underway, working with what is available.
str): delimiter between output from each chromosome.
Iterable[str]: path to each file successfully combined.
pararead.exceptions.MissingOutputFileException: if executing instrict mode, and there's a reads chunk key for which the derived filepath does not exist.
pararead.exceptions.IllegalChunkException: if a chunk of readsoutside of those declared to be of interest is requested to participate in the combination.
Action to take when processing an empty reads chunk.
str): key for the empty chunk of reads.
def fetch_chunk(self, chromosome)
Pull a chunk of sequencing reads from a file.
str): identifier for chunk of reads to select.
Iterable[pysam.AlignedSegment]: collection of aligned reads
def fetch_file(self, file_key)
Retrieve one of the files registered with pararead.
str): which file to fetch
object: likely pysam.AlignmentFile -- file ADT instanceassociated with the requested key.
pararead.exceptions.CommandOrderException: if the indicated filehasn't been registered.
Refer to the pararead files mapping.
Mapping[str, object]: pararead files mapping.
def get_chrom_size(self, chrom)
Determine the size of the given chromosome.
str): name of chromosome of interest.
int: size of chromosome of interest.
pararead.exceptions.CommandOrderException: if there's nochromosome sizes map yet.
pararead.exceptions.UnknownChromosomeException: if requestedchromosome is not in the sizes map.
pysam.AlignmentFile | pysam.VariantFile: instance of the readsfile abstraction appropriate for the given type of input data (e.g., BAM or VCF).
pararead.exceptions.CommandOrderException: if a commandprerequisite for a parallel reads processor operation has not yet been performed.
def register_files(self, **file_builder_kwargs)
Add to module map any large/unpicklable variables required by call.
pararead.exceptions.FileTypeException: if path to the reads filegiven doesn't appear to match one of the supported file types.
def run(self, chunksize=None, interleave_chunk_sizes=False)
Do the processing defined partitioned across each unit (chromosome).
int): number of reads per processing chunk; ifunspecified, the default heuristic of size s.t. each core gets ~ 4 chunks.
bool): whether to interleave reads chunksizes. If off (default), just use the distribution that Python determines.
Iterable[str]: names of chromosomes for which result is non-null.
pararead.exception.MissingHeaderException: if attempting to runwith an unaligned reads file in the context of an aligned file requirement.
pararead v0.7.0, generated by