Package pararead Documentation

Class ParaReadProcessor

Base class for parallel processing of sequencing reads.

Implement call to define work for each reads chunk, (e.g., chromosome). Unaligned reads are permitted, but the work then cannot rely on any sort of biologically meaningful chunking of the reads unless a partition() function is implemented. If unaligned reads are used and no partition() is implemented, reads will be arbitrarily split into chunks.

def __init__(self, path_reads_file, cores, outfile=None, action=None, temp_folder_parent_path=None, limit=None, allow_unaligned=False, require_new_outfile=False, by_chromosome=True, intermediate_output_type='txt', output_type='txt', retain_temp=False)

Parameters:

  • path_reads_file (str): data location (aligned BAM/SAM file).
  • cores (int | str): number of processors to use.
  • outfile (str): path to location for output file; either thisor action is required.
  • action (str): name for what the child class is doing, used toderive outfile name if unspecified; if outfile is unspecified, this is required.
  • temp_folder_parent_path (str): temporary folder; if unspecified,this will match the folder containing the output file.
  • limit (list[str]): which chromosomes to process, process all bydefault.
  • allow_unaligned (bool): whether to allow unaligned reads.
  • require_new_outfile (bool): whether to raise an exception ifoutput file already exists.
  • by_chromosome (bool): whether to chunk reads on a per-chromosomebasis, implicitly imposing requirement for aligned reads.
  • intermediate_output_type (str): type of output file generated foreach chunk of reads processed.
  • output_type (str): type of final output file generated; this isused by both intermediate files that are created and by the combine() step that creates final output.

Raises:

  • ValueError: if given neither outfile path nor action actionname, or if output file already exists and a new one is required.
def check_command(self, cmd)

Determine whether it appears that a command may be run.

Parameters:

  • cmd (str): command to check for runnability

Returns:

  • OSError: if it's possible to verify that running given commandwould fail
def chunk_reads(*args, **kwargs)
def combine(self, good_chromosomes, strict=False, chrom_sep=None)

Aggregate output from independent read chunks into single output file.

Parameters:

  • good_chromosomes (Iterable[str]): identifier (e.g., chromosome)for each chunk of reads processed.
  • strict (bool): whether to throw an exception upon encountering amissing file. If not, simply log a warning message and continue the aggregation process that's underway, working with what is available.
  • chrom_sep (str): delimiter between output from each chromosome.

Returns:

  • Iterable[str]: path to each file successfully combined.

Raises:

  • pararead.exceptions.MissingOutputFileException: if executing instrict mode, and there's a reads chunk key for which the derived filepath does not exist.
  • pararead.exceptions.IllegalChunkException: if a chunk of readsoutside of those declared to be of interest is requested to participate in the combination.
def empty_action(read_chunk_key=None)

Action to take when processing an empty reads chunk.

Parameters:

  • read_chunk_key (str): key for the empty chunk of reads.
def fetch_chunk(self, chromosome)

Pull a chunk of sequencing reads from a file.

Parameters:

  • chromosome (str): identifier for chunk of reads to select.

Returns:

  • Iterable[pysam.AlignedSegment]: collection of aligned reads
def fetch_file(self, file_key)

Retrieve one of the files registered with pararead.

Parameters:

  • file_key (str): which file to fetch

Returns:

  • object: likely pysam.AlignmentFile -- file ADT instanceassociated with the requested key.

Raises:

  • pararead.exceptions.CommandOrderException: if the indicated filehasn't been registered.
def files(self)

Refer to the pararead files mapping.

Returns:

  • Mapping[str, object]: pararead files mapping.
def get_chrom_size(self, chrom)

Determine the size of the given chromosome.

Parameters:

  • chrom (str): name of chromosome of interest.

Returns:

  • int: size of chromosome of interest.

Raises:

  • pararead.exceptions.CommandOrderException: if there's nochromosome sizes map yet.
  • pararead.exceptions.UnknownChromosomeException: if requestedchromosome is not in the sizes map.
def readsfile(self)

Returns:

  • pysam.AlignmentFile | pysam.VariantFile: instance of the readsfile abstraction appropriate for the given type of input data (e.g., BAM or VCF).

Raises:

  • pararead.exceptions.CommandOrderException: if a commandprerequisite for a parallel reads processor operation has not yet been performed.
def register_files(self, **file_builder_kwargs)

Add to module map any large/unpicklable variables required by call.

Raises:

  • pararead.exceptions.FileTypeException: if path to the reads filegiven doesn't appear to match one of the supported file types.
def run(self, chunksize=None, interleave_chunk_sizes=False)

Do the processing defined partitioned across each unit (chromosome).

Parameters:

  • chunksize (int): number of reads per processing chunk; ifunspecified, the default heuristic of size s.t. each core gets ~ 4 chunks.
  • interleave_chunk_sizes (bool): whether to interleave reads chunksizes. If off (default), just use the distribution that Python determines.

Returns:

  • Iterable[str]: names of chromosomes for which result is non-null.

Raises:

  • pararead.exception.MissingHeaderException: if attempting to runwith an unaligned reads file in the context of an aligned file requirement.

Version Information: pararead v0.7.0, generated by lucidoc v0.4.4