Main Content

emblread

Read data from EMBL file

Syntax

EMBLData = emblread(File)
EMBLSeq = emblread (File, 'SequenceOnly', SequenceOnlyValue)
EMBLSeq = emblread (File, 'TimeOut', TimeOutValue)

Input Arguments

File

Either of the following:

  • Character vector or string specifying a file name, a path and file name, or a URL pointing to a file. The referenced file is an EMBL-formatted file. If you specify only a file name, that file must be on the MATLAB® search path or in the MATLAB Current Folder.

  • Character vector or string that contains the text of an EMBL-formatted file

Tip

You can use the getembl function with the 'ToFile' property to retrieve data from the European Molecular Biology Laboratory (EMBL) database and create an EMBL-formatted file.

SequenceOnlyValueControls the reading of only the sequence without the metadata. Choices are true or false (default).
TimeOutValueConnection timeout in seconds, specified as a positive scalar. The default value is 5. For details, see here.

Output Arguments

EMBLData Structure with fields corresponding to EMBL data.
EMBLSeqCharacter vector representing the sequence.

Description

EMBLData = emblread(File) reads data from File, an EMBL-formatted file, and creates EMBLData, a MATLAB structure containing fields corresponding to the EMBL two-character line type code, based on release 107 of the EMBL-Bank flat file format. Each line type code is stored as a separate element in the structure. For a list of the EMBL two-character line type codes, see ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt.

Note

Topology information was not included in EMBL flat files before release 87 of the database. When reading a file created before release 87, EMBLREAD returns an empty Identification.Topology field.

Note

The entry name is no longer displayed in the ID line of EMBL flat files in release 87. When reading a file created in release 87, EMBLREAD returns the accession number in the Identification.EntryName field.

EMBLSeq = emblread (File, 'SequenceOnly', SequenceOnlyValue) controls the reading of only the sequence without the metadata. Choices are true or false (default).

EMBLSeq = emblread (File, 'TimeOut', TimeOutValue) sets the connection timeout (in seconds) to read data from a remote file or URL.

Examples

collapse all

Download the sequence information from the web and save to a file.

out = getembl('X00558','ToFile','rat_protein.txt');

Read data from the EMBL file.

seqData = emblread('rat_protein.txt')
seqData = 

  struct with fields:

            Identification: [1×1 struct]
                 Accession: 'X00558'
           SequenceVersion: 'X00558.1'
               DateCreated: '13-JUN-1985  Rel. 06, Created '
               DateUpdated: '18-APR-2005  Rel. 83, Last updated, Version 4 '
               Description: 'Rat liver apolipoprotein A-I mRNA  apoA-I    ...'
                   Keyword: 'apolipoprotein; lipoprotein; signal peptide. ...'
           OrganismSpecies: 'Rattus norvegicus  Norway rat                ...'
    OrganismClassification: [3×75 char]
                 Organelle: ''
                 Reference: {[1×1 struct]}
    DatabaseCrossReference: [4×75 char]
                  Comments: ''
                  Assembly: ''
                   Feature: [22×75 char]
                 BaseCount: [1×1 struct]
                  Sequence: 'agctccgggggaggtcgcccacatccttcgggatgaaagctgcag...'

Version History

Introduced before R2006a