Read data from Binary Sequence Alignment/Map (BAM) file
BAMStruct = bamread(File,RefSeq,Range)
[BAMStruct,HeaderStruct] = bamread(File,RefSeq,Range)
... = bamread(File,RefSeq,Range,Name,Value)
BAMStruct = bamread(File,RefSeq,Range) reads the alignment records in File, a BAM-formatted file, that align to RefSeq, a reference sequence, in the range specified by Range. It returns the alignment data in BAMStruct, a MATLAB® array of structures.
String specifying a file name or path and file name of a BAM-formatted file. If you specify only a file name, that file must be on the MATLAB search path or in the Current Folder.
Either of the following:
Two-element vector specifying the begin and end range positions on the reference sequence, RefSeq. Both values must be positive, and are one-based. The second value must be ≥ to the first value.
Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.
Controls the return of only alignment records that are fully contained within the range specified by Range. Choices are true or false (default).
Controls the reading of the optional tags in addition to the first 11 fields for each alignment in the BAM-formatted file. Choices are true (default) or false.
String specifying a nonexisting file name or a path and file name for saving the alignment records in the specified range of a specific reference sequence. The ToFile name-value pair argument creates a SAM-formatted file. If you specify only a file name, the file is saved to the MATLAB Current Folder.
The SAM-formatted file is always one-based, even if you set the ZeroBased name-value pair argument to true. You can use the SAM-formatted file as input when creating a BioMap object.
Logical specifying whether bamread uses zero-based indexing when reading a file. The logical controls the return of zero-based or one-based positions in the Position and MatePosition fields in BAMStruct. Choices are true or false (default), which returns one-based positions.
This name-value pair argument affects the Position and MatePosition fields of BAMStruct. It does not affect the Range input argument or the SAM file created when using the ToFile name-value pair argument. SAM files are always one-based.
An N-by-1 array of structures containing sequence alignment and mapping information from a BAM-formatted file, where N is the number of alignment records stored in the specified range. Each structure contains the following fields.
MATLAB structure containing header information for the BAM-formatted file in the following fields.
* These structures and their fields appear in the output structure only if they are present in the BAM file. The information in these structures depends on the information present in the BAM file.
Read multiple alignment records from the ex1.bam file that align to two different reference sequences.
data1 = bamread('ex1.bam', 'seq1', [100 200]) data2 = bamread('ex1.bam', 'seq2', [100 200])
data1 = 59x1 struct array with fields: QueryName Flag Position MappingQuality CigarString MatePosition InsertSize Sequence Quality Tags ReferenceIndex MateReferenceIndex data2 = 79x1 struct array with fields: QueryName Flag Position MappingQuality CigarString MatePosition InsertSize Sequence Quality Tags ReferenceIndex MateReferenceIndex
Read alignments from the ex1.bam file that are fully contained in the 100 to 200 bp range of the seq1 reference sequence.
data3 = bamread('ex1.bam', 'seq1', [100 200], 'full', true)
data3 = 31x1 struct array with fields: QueryName Flag Position MappingQuality CigarString MatePosition InsertSize Sequence Quality Tags ReferenceIndex MateReferenceIndex
Read alignments from the ex1.bam file that align to the 100 to 300 bp range of the seq1 reference sequence. Read the same alignments using zero-based indexing. Compare the position of the 27th record in the two outputs.
data_one = bamread('ex1.bam','seq1', [100 300]); data_zero = bamread('ex1.bam','seq1', [100 300], 'zerobased', true); data_one(27).Position
ans = 135
ans = 134
The bamread function requires a BAM file.
Use the baminfo function to investigate the size and content, including reference sequence names, of a BAM-formatted file before using the bamread function to read the file contents into a MATLAB array of structures.
If your BAM-formatted file is too large to read using available memory, try either of the following:
Use a smaller range.
Use bamread without specifying outputs, but using the ToFile Name,Value pair arguments to create a SAM-formatted file. You can then use samread with the BlockRead Name,Value pair arguments to read the SAM-formatted file. Or you can pass the SAM-formatted file to the BioIndexedFile constructor function to construct a BioIndexedFile object, which you can use to create a BioMap object.
Use the BAMStruct output argument that bamread returns to construct a BioMap object, which lets you explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or viewing the data.
 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Goncalo, A., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 16, 2078–2079.