cogent3.core.seq_storage.SeqsData#

class SeqsData(*, data: Mapping[str, str | bytes | NumpyIntArrayType], alphabet: c3_alphabet.CharAlphabet[Any], offset: dict[str, int] | None = None, check: bool = True, reversed_seqs: set[str] | None = None)#

The builtin cogent3 implementation of sequence storage underlying a SequenceCollection. The sequence data is stored as numpy arrays. Indexing this object (using an int or seq name) returns a SeqDataView, which can realise the corresponding slice as a string, bytes, or numpy array via the alphabet.

Attributes:
alphabet

the character alphabet for validating, encoding, decoding sequences

names

returns the names of the sequences in the storage

offset

annotation offsets for each sequence

reversed_seqs

names of sequences that are reverse complemented

Methods

add_seqs(seqs[, force_unique_keys, offset])

Returns a new SeqsData object with added sequences.

copy(**kwargs)

shallow copy of self

get_hash(seqid)

returns hash of seqid

get_seq_bytes(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as a bytes string.

get_seq_length(seqid)

return length for seqid

get_seq_str(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as a string.

get_view(seqid)

reurns view of sequence data for seqid

from_seqs

get_seq_array

to_alphabet

Notes

Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by IndelMap.