eomegaps

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Single sequence with profile (ClustalO wrapper)

Description

eomegaps is a wrapper to clustalo. It takes a single sequence and one profile and produces a combined output alignment. In clustalo this is the only way to align a single sequence to a profile - the single sequence is treated as a profile using a profile-profile alignment.

Clustal-Omega (clustalo) is a general purpose multiple sequence alignment (MSA) program for proteins. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time.

In its current form Clustal-Omega can only align protein sequences but not DNA/RNA sequences. It is envisioned that DNA/RNA will become available in a future version.

Algorithm

Clustal-Omega uses HMMs for the alignment engine, based on the HHalign package from Johannes Soeding [1]. Guide trees are optionally made using mBed [2] which can cluster very large numbers of sequences in O(N*log(N)) time. Multiple alignment then proceeds by aligning larger and larger alignments using HHalign, following the clustering given by the guide tree.

Usage

Here is a sample session with eomegaps


% eomegaps op1.ali tsw:opsd_human 
Single sequence with profile (ClustalO wrapper)
(aligned) output sequence set [op1.aln]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

Single sequence with profile (ClustalO wrapper)
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-profile]           infile     Pre-aligned multiple sequence file (aligned
                                  columns will be kept fixed)
  [-sequence]          sequence   File containing single sequence to align
  [-outseq]            seqoutset  [.] Sequence set filename
                                  and optional format (output USA)

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -indistfile         infile     Pairwise distance matrix input file (skips
                                  distance computation)
   -inguidefile        infile     Guide tree input file (skips distance
                                  computation and guide tree clustering step)
   -dealign            toggle     [N] Dealign input sequences
   -cluster            menu       [mbed] Method (Values: mbed (mBED method);
                                  full (Full slow method); iter (Iteration
                                  used full slower method))
   -maxiterations      integer    [0] Number of (combined guide tree/HMM)
                                  iterations (Integer from 0 to 2000000000)
   -maxgiterations     integer    [2000000000] Maximum guide tree iterations
                                  (Integer from 0 to 2000000000)
   -maxhiterations     integer    [2000000000] Maximum number of HMM
                                  iterations (Integer from 0 to 2000000000)
   -maxseqs            integer    [2000000000] Maximum number of sequences
                                  (Integer from 2 to 2000000000)
   -maxlenseq          integer    [2000000000] Maximum length of sequence
                                  (Integer from 1 to 2000000000)
   -self               toggle     [N] Set options automatically (might
                                  overwrite some options
   -outformat          menu       [fasta] Format (Values: fasta (FASTA);
                                  clustal (Aln clustal); msf (MSF); phylip
                                  (Phylip); selex (Selex); stockholm
                                  (Stockholm); vienna (ViennaRNA))
   -outdistfile        outfile    [*.eomegaps] Pairwise distance matrix output
                                  file, only available in cluster mode 'full'
   -outguidefile       outfile    [*.eomegaps] Guide tree output file
   -log                toggle     [N] Log progress to standard output if not
                                  used for output

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin2            integer    Start of the sequence to be used
   -send2              integer    End of the sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -scircular2         boolean    Sequence is circular
   -squick2            boolean    Read id and sequence only
   -sformat2           string     Input sequence format
   -iquery2            string     Input query fields or ID list
   -ioffset2           integer    Input start position offset
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-outseq" associated qualifiers
   -osformat3          string     Output seq format
   -osextension3       string     File name extension
   -osname3            string     Base file name
   -osdirectory3       string     Output directory
   -osdbname3          string     Database name to add
   -ossingle3          boolean    Separate file for each entry
   -oufo3              string     UFO features
   -offormat3          string     Features format
   -ofname3            string     Features file name
   -ofdirectory3       string     Output directory

   "-outdistfile" associated qualifiers
   -odirectory         string     Output directory

   "-outguidefile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-profile]
(Parameter 1)
infile Pre-aligned multiple sequence file (aligned columns will be kept fixed) Input file Required
[-sequence]
(Parameter 2)
sequence File containing single sequence to align Readable sequence Required
[-outseq]
(Parameter 3)
seqoutset Sequence set filename and optional format (output USA) Writeable sequences <*>.format
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-indistfile infile Pairwise distance matrix input file (skips distance computation) Input file Required
-inguidefile infile Guide tree input file (skips distance computation and guide tree clustering step) Input file Required
-dealign toggle Dealign input sequences Toggle value Yes/No No
-cluster list Method
mbed (mBED method)
full (Full slow method)
iter (Iteration used full slower method)
mbed
-maxiterations integer Number of (combined guide tree/HMM) iterations Integer from 0 to 2000000000 0
-maxgiterations integer Maximum guide tree iterations Integer from 0 to 2000000000 2000000000
-maxhiterations integer Maximum number of HMM iterations Integer from 0 to 2000000000 2000000000
-maxseqs integer Maximum number of sequences Integer from 2 to 2000000000 2000000000
-maxlenseq integer Maximum length of sequence Integer from 1 to 2000000000 2000000000
-self toggle Set options automatically (might overwrite some options Toggle value Yes/No No
-outformat list Format
fasta (FASTA)
clustal (Aln clustal)
msf (MSF)
phylip (Phylip)
selex (Selex)
stockholm (Stockholm)
vienna (ViennaRNA)
fasta
-outdistfile outfile Pairwise distance matrix output file, only available in cluster mode 'full' Output file <*>.eomegaps
-outguidefile outfile Guide tree output file Output file <*>.eomegaps
-log toggle Log progress to standard output if not used for output Toggle value Yes/No No
Associated qualifiers
"-sequence" associated sequence qualifiers
-sbegin2
-sbegin_sequence
integer Start of the sequence to be used Any integer value 0
-send2
-send_sequence
integer End of the sequence to be used Any integer value 0
-sreverse2
-sreverse_sequence
boolean Reverse (if DNA) Boolean value Yes/No N
-sask2
-sask_sequence
boolean Ask for begin/end/reverse Boolean value Yes/No N
-snucleotide2
-snucleotide_sequence
boolean Sequence is nucleotide Boolean value Yes/No N
-sprotein2
-sprotein_sequence
boolean Sequence is protein Boolean value Yes/No N
-slower2
-slower_sequence
boolean Make lower case Boolean value Yes/No N
-supper2
-supper_sequence
boolean Make upper case Boolean value Yes/No N
-scircular2
-scircular_sequence
boolean Sequence is circular Boolean value Yes/No N
-squick2
-squick_sequence
boolean Read id and sequence only Boolean value Yes/No N
-sformat2
-sformat_sequence
string Input sequence format Any string  
-iquery2
-iquery_sequence
string Input query fields or ID list Any string  
-ioffset2
-ioffset_sequence
integer Input start position offset Any integer value 0
-sdbname2
-sdbname_sequence
string Database name Any string  
-sid2
-sid_sequence
string Entryname Any string  
-ufo2
-ufo_sequence
string UFO features Any string  
-fformat2
-fformat_sequence
string Features format Any string  
-fopenfile2
-fopenfile_sequence
string Features file name Any string  
"-outseq" associated seqoutset qualifiers
-osformat3
-osformat_outseq
string Output seq format Any string  
-osextension3
-osextension_outseq
string File name extension Any string  
-osname3
-osname_outseq
string Base file name Any string  
-osdirectory3
-osdirectory_outseq
string Output directory Any string  
-osdbname3
-osdbname_outseq
string Database name to add Any string  
-ossingle3
-ossingle_outseq
boolean Separate file for each entry Boolean value Yes/No N
-oufo3
-oufo_outseq
string UFO features Any string  
-offormat3
-offormat_outseq
string Features format Any string  
-ofname3
-ofname_outseq
string Features file name Any string  
-ofdirectory3
-ofdirectory_outseq
string Output directory Any string  
"-outdistfile" associated outfile qualifiers
-odirectory string Output directory Any string  
"-outguidefile" associated outfile qualifiers
-odirectory string Output directory Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

Input file format

eomegaps reads a profile file or aligned sequences and a single sequence, plus optional distance and guide tree files.

Input files for usage example

'tsw:opsd_human' is a sequence entry in the example protein database 'tsw'

File: op1.ali

>OPSD_ABYKO O42294 Rhodopsin (Fragment)
-----------------------------------------YLVNPAAYAALGAYMFLLI
LIGFP---INFLTLYVTLEHKKLRTPLNYILLNLAVANLFMVLGGFTTTMYTSMHGYFVL
GRLGCNLEAFFATLGGEIALWSLVVLAIERWIVVCKPISNFRFTEDHAIMGLAFTWVMAL
ACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFLIPLSVIFFCYGR
LLCAVKEAPAAQQE-------------SETTQRAEKEVSRMVVIMVIGFLVCWLPYASVA
WWIFCNQGSDFGPIFMTLPSFFAKSAAIYNPMIYICMNKQFRHCMI--------------
-------------------------
>OPSD_AMBTI Q90245 Rhodopsin
MNGTEGPNFYV-------PFSNKSGVVRSPFEYP-----QYYLAEPWQYSVLAAYMFLLI
LLGFP---VNFLTLYVTIQHKKLRTPLNYILLNLAFANHFMVFGGFPVTMYSSMHGYFVF
GQTGCYIEGFFATMGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVMMTWIMAL
ACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFLVHFTIPLMIIFFCYGR
LVCTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVVAFLICWVPYASVA
FYIFSNQGTDFGPIFMTVPAFFAKSSAIYNPVIYIVLNKQFRNCMIT---TICCGKNPFG
DDETTSAATSKTEASSVSSSQVSPA
>OPSD_ANOCA P41591 Rhodopsin
MNGTEGQNFYV-------PMSNKTGVVRNPFEYP-----QYYLADPWQFSALAAYMFLLI
LLGFP---INFLTLFVTIQHKKLRTPLNYILLNLAVANLFMVLMGFTTTMYTSMNGYFIF
GTVGCNIEGFFATLGGEMGLWSLVVLAVERYVVICKPMSNFRFGETHALIGVSCTWIMAL
ACAGPPLLGWSRYIPEGMQCSCGVDYYTPTPEVHNESFVIYMFLVHFVTPLTIIFFCYGR
LVCTVKAAAAQQQE-------------SATTQKAEREVTRMVVIMVISFLVCWVPYASVA
FYIFTHQGSDFGPVFMTIPAFFAKSSAIYNPVIYILMNKQFRNCMIM---TLCCGKNPLG
DEETSAG--TKTETSTVSTSQVSPA
>OPSD_APIME Q17053 Rhodopsin, long-wavelength (Opsin, green-sensitive)
MIAVSGPSYEAFSYGGQARFNNQTVVDKVPPDMLHLIDANWYQYPPLNPMWHGILGFVIG
MLGFVSAMGNGMVVYIFLSTKSLRTPSNLFVINLAISNFLMMFCMSPPMVINCYYETWVL
GPLFCQIYAMLGSLFGCGSIWTMTMIAFDRYNVIVKGLSGKPLSINGALIRIIAIWLFSL
GWTIAPMFGWNRYVPEGNMTACGTDYFNRGL--LSASYLVCYGIWVYFVPLFLIIYSYWF
IIQAVAAHEKNMREQAKKMNVASLRSSENQNTSAECKLAKVALMTISLWFMAWTPYLVIN
FSGIF-NLVKISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFAKFPSLACAAEPSS
DAVSTTSGTTTVTD-----NEKSNA
>OPSD_ASTFA P41590 Rhodopsin
MNGTEGPYFYV-------PMSNATGVVRSPYEYP-----QYYLAPPWAYACLAAYMFFLI
LVGFP---VNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSLNGYFVF
GRLGCNLEGFFATFGGINSLWCLVVLSIERWVVVCKPMSNFRFGENHAIMGVAFTWFMAL
ACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVVHFLTPLFVITFCYGR
LVCTVKEAAAQQQE-------------SETTQRAEREVTRMVILMFIAYLVCWLPYASVS
WWIFTNQGSEFGPIFMTVPAFFAKSSSIYNPVIYICLNKQFRHCMIT---TLCCGKNPFE
EEEGASTTASKTEASSVSSVSPA--
>OPSD_ATHBO Q9YGZ1 Rhodopsin
MNGTEGPYFYI-------PMLNTTGVVRSPYEYP-----QYYLVNPAAYAVLGAYMFFLI
LVGFP---INFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVL
GRLGCNVEGFSATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMAA
ACAVPPLFGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFTCHFCIPLMVVFFCYGR
LVCAVKEAAAAQQE-------------SETTQRAEREVTRMVIIMVVSFLVSWVPYASVA
WYIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMIT---TLCCGKNPFE
EEEGASSTASKTEASSVSSSSVSPA
>OPSD_BATMU O42300 Rhodopsin (Fragment)
-----------------------------------------YLVSPAAYAALGAYMFLLI
LIGFP---VNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVL
GRLGCNLEGFFATLGGEIALWSLVVLAIERWIVVCKPISNFRFTEDNAIMGLAFSWVMAL
TCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFPIPLSVIFFCYGR
LLCAVKEAAAAQQE-------------SETTQRAEKEVSRMVVILVIGFLVCWLPYASVA
WWIFCNQGSDFGPIFMTLPSFFAKRPAIYNPMIYICMNKQFRHCMI--------------
-------------------------
>OPSD_BATNI O42301 Rhodopsin (Fragment)
-----------------------------------------YLVSPAAYAALGAYMFLLI
LIGFP---VNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVL
GRLGCNLEGFFATLGGEIALWSLVVLAIERWIVVCKPISKFRFTEDNAIMGLAFSWVMAL
ACAVPPLVGWLRYIPEGMQCTCGVDYYTRAEGFDNESFVIYMFIVHFLIPLSVIFFCYGR
LLCAVKEAAAAQQE-------------SETTQRAEKEVSRMVVIMVIGFLVCWLPYASVA
WWIFCNQGSDFGPIFMTLPSFFAKRPAIYNPMIYICMNKQFRHCMI--------------
-------------------------
>OPSD_BOVIN P02699 Rhodopsin
MNGTEGPNFYV-------PFSNKTGVVRSPFEAP-----QYYLAEPWQFSMLAAYMFLLI
MLGFP---INFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVF
GPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMAL
ACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQ
LVFTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVIAFLICWLPYAGVA
FYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVT---TLCCGKNPLG
DDEASTTVS-KTET-----SQVAPA
>OPSD_BUFBU P56514 Rhodopsin
MNGTEGPNFYI-------PMSNKTGVVRSPFEYP-----QYYLAEPWQYSILCAYMFLLI
LLGFP---INFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFIL
GATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMAL
SCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGR
LVCTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVVFFLICWVPYASVA
FFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMIT---TLCCGKNPFG
EDDASSAATSKTEASSVSSSQVSPA
>OPSD_BUFMA P56515 Rhodopsin
MNGTEGPNFYI-------PMSNKTGVVRSPFEYP-----QYYLAEPWQYSVLCAYMFLLI
LLGFP---INFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFVF
GQTGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAIMGVAFTWIMAL
ACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIIFFCYGR
LVCTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVVFFLICWVPYASVA
FFIFTHQGSEFGPVFMTIPAFFAKSSSIYNPVIYIMLNKQFRNCMIT---TLCCGKNPFG
DEDASSAATSKTEASSVSSSQVSPA
>OPSD_CALPD Q6W3E1 Rhodopsin
MNGTEGPNFYV-------PFSNKTGVVRSPFEEP-----QYYLAEPWQFSCLAAYMFMLI
VLGFP---INFLTLYVTIQHKKLRTPLNYILLNLAIADLFMVFGGFTTTLYTSLHGYFVF
GPTGCDLEGFFATLGGEIALWSLVVLAIERYIVVCKPMSNFRFGENHAIMGVAFTWVMAL
ACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMVVIFFCYGQ
LVFTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVIAFLICWLPYAGVA
FYIFTHQGSNFGPILMTLPAFFAKTSAVYNPVIYIMLNKQFRTCMLT---TLCCGKIPLG
DDEASATAS-KTET-----SQVAPA

Database entry: tsw:opsd_human

ID   OPSD_HUMAN              Reviewed;         348 AA.
AC   P08100; Q16414; Q2M249;
DT   01-AUG-1988, integrated into UniProtKB/Swiss-Prot.
DT   01-AUG-1988, sequence version 1.
DT   13-JUN-2012, entry version 145.
DE   RecName: Full=Rhodopsin;
DE   AltName: Full=Opsin-2;
GN   Name=RHO; Synonyms=OPN2;
OS   Homo sapiens (Human).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC   Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC   Catarrhini; Hominidae; Homo.
OX   NCBI_TaxID=9606;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RX   MEDLINE=84272729; PubMed=6589631; DOI=10.1073/pnas.81.15.4851;
RA   Nathans J., Hogness D.S.;
RT   "Isolation and nucleotide sequence of the gene encoding human
RT   rhodopsin.";
RL   Proc. Natl. Acad. Sci. U.S.A. 81:4851-4855(1984).
RN   [2]
RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA].
RA   Suwa M., Sato T., Okouchi I., Arita M., Futami K., Matsumoto S.,
RA   Tsutsumi S., Aburatani H., Asai K., Akiyama Y.;
RT   "Genome-wide discovery and analysis of human seven transmembrane helix
RT   receptor genes.";
RL   Submitted (JUL-2001) to the EMBL/GenBank/DDBJ databases.
RN   [3]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RC   TISSUE=Retina;
RX   PubMed=17974005; DOI=10.1186/1471-2164-8-399;
RA   Bechtel S., Rosenfelder H., Duda A., Schmidt C.P., Ernst U.,
RA   Wellenreuther R., Mehrle A., Schuster C., Bahr A., Bloecker H.,
RA   Heubner D., Hoerlein A., Michel G., Wedler H., Koehrer K.,
RA   Ottenwaelder B., Poustka A., Wiemann S., Schupp I.;
RT   "The full-ORF clone resource of the German cDNA consortium.";
RL   BMC Genomics 8:399-399(2007).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
RX   PubMed=15489334; DOI=10.1101/gr.2596504;
RG   The MGC Project Team;
RT   "The status, quality, and expansion of the NIH full-length cDNA
RT   project: the Mammalian Gene Collection (MGC).";
RL   Genome Res. 14:2121-2127(2004).
RN   [5]
RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1-120.
RX   PubMed=8566799; DOI=10.1016/0378-1119(95)00688-5;
RA   Bennett J., Beller B., Sun D., Kariko K.;
RT   "Sequence analysis of the 5.34-kb 5' flanking region of the human
RT   rhodopsin-encoding gene.";


  [Part of this file has been deleted for brevity]

FT                                /FTId=VAR_004816.
FT   VARIANT     209    209       V -> M (effect not known).
FT                                /FTId=VAR_004817.
FT   VARIANT     211    211       H -> P (in RP4; dbSNP:rs28933993).
FT                                /FTId=VAR_004818.
FT   VARIANT     211    211       H -> R (in RP4).
FT                                /FTId=VAR_004819.
FT   VARIANT     216    216       M -> K (in RP4).
FT                                /FTId=VAR_004820.
FT   VARIANT     220    220       F -> C (in RP4).
FT                                /FTId=VAR_004821.
FT   VARIANT     222    222       C -> R (in RP4).
FT                                /FTId=VAR_004822.
FT   VARIANT     255    255       Missing (in RP4).
FT                                /FTId=VAR_004823.
FT   VARIANT     264    264       Missing (in RP4).
FT                                /FTId=VAR_004824.
FT   VARIANT     267    267       P -> L (in RP4).
FT                                /FTId=VAR_004825.
FT   VARIANT     267    267       P -> R (in RP4).
FT                                /FTId=VAR_004826.
FT   VARIANT     292    292       A -> E (in CSNBAD1).
FT                                /FTId=VAR_004827.
FT   VARIANT     296    296       K -> E (in RP4; dbSNP:rs29001653).
FT                                /FTId=VAR_004828.
FT   VARIANT     297    297       S -> R (in RP4).
FT                                /FTId=VAR_004829.
FT   VARIANT     342    342       T -> M (in RP4).
FT                                /FTId=VAR_004830.
FT   VARIANT     345    345       V -> L (in RP4).
FT                                /FTId=VAR_004831.
FT   VARIANT     345    345       V -> M (in RP4).
FT                                /FTId=VAR_004832.
FT   VARIANT     347    347       P -> A (in RP4).
FT                                /FTId=VAR_004833.
FT   VARIANT     347    347       P -> L (in RP4; common variant).
FT                                /FTId=VAR_004834.
FT   VARIANT     347    347       P -> Q (in RP4).
FT                                /FTId=VAR_004835.
FT   VARIANT     347    347       P -> R (in RP4; dbSNP:rs29001566).
FT                                /FTId=VAR_004836.
FT   VARIANT     347    347       P -> S (in RP4; dbSNP:rs29001637).
FT                                /FTId=VAR_004837.
SQ   SEQUENCE   348 AA;  38893 MW;  6F4F6FCBA34265B2 CRC64;
     MNGTEGPNFY VPFSNATGVV RSPFEYPQYY LAEPWQFSML AAYMFLLIVL GFPINFLTLY
     VTVQHKKLRT PLNYILLNLA VADLFMVLGG FTSTLYTSLH GYFVFGPTGC NLEGFFATLG
     GEIALWSLVV LAIERYVVVC KPMSNFRFGE NHAIMGVAFT WVMALACAAP PLAGWSRYIP
     EGLQCSCGID YYTLKPEVNN ESFVIYMFVV HFTIPMIIIF FCYGQLVFTV KEAAAQQQES
     ATTQKAEKEV TRMVIIMVIA FLICWVPYAS VAFYIFTHQG SNFGPIFMTI PAFFAKSAAI
     YNPVIYIMMN KQFRNCMLTT ICCGKNPLGD DEASATVSKT ETSQVAPA
//

Output file format

eomegaps writes alignments using the default Clustal-Omega output.

Output files for usage example

File: op1.aln

>OPSD_HUMAN P08100 Rhodopsin (Opsin-2)
MNGTEGPNFYV-------PFSNATGVVRSPFEYP-----QYYLAEPWQFSMLAAYMFLLI
VLGFP---INFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVF
GPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMAL
ACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQ
LVFTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVIAFLICWVPYASVA
FYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLT---TICCGKNPLG
DDEASATVS-KTET-----SQVAPA
>OPSD_ABYKO O42294 Rhodopsin (Fragment)
-----------------------------------------YLVNPAAYAALGAYMFLLI
LIGFP---INFLTLYVTLEHKKLRTPLNYILLNLAVANLFMVLGGFTTTMYTSMHGYFVL
GRLGCNLEAFFATLGGEIALWSLVVLAIERWIVVCKPISNFRFTEDHAIMGLAFTWVMAL
ACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFLIPLSVIFFCYGR
LLCAVKEAPAAQQE-------------SETTQRAEKEVSRMVVIMVIGFLVCWLPYASVA
WWIFCNQGSDFGPIFMTLPSFFAKSAAIYNPMIYICMNKQFRHCMI--------------
-------------------------
>OPSD_AMBTI Q90245 Rhodopsin
MNGTEGPNFYV-------PFSNKSGVVRSPFEYP-----QYYLAEPWQYSVLAAYMFLLI
LLGFP---VNFLTLYVTIQHKKLRTPLNYILLNLAFANHFMVFGGFPVTMYSSMHGYFVF
GQTGCYIEGFFATMGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVMMTWIMAL
ACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFLVHFTIPLMIIFFCYGR
LVCTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVVAFLICWVPYASVA
FYIFSNQGTDFGPIFMTVPAFFAKSSAIYNPVIYIVLNKQFRNCMIT---TICCGKNPFG
DDETTSAATSKTEASSVSSSQVSPA
>OPSD_ANOCA P41591 Rhodopsin
MNGTEGQNFYV-------PMSNKTGVVRNPFEYP-----QYYLADPWQFSALAAYMFLLI
LLGFP---INFLTLFVTIQHKKLRTPLNYILLNLAVANLFMVLMGFTTTMYTSMNGYFIF
GTVGCNIEGFFATLGGEMGLWSLVVLAVERYVVICKPMSNFRFGETHALIGVSCTWIMAL
ACAGPPLLGWSRYIPEGMQCSCGVDYYTPTPEVHNESFVIYMFLVHFVTPLTIIFFCYGR
LVCTVKAAAAQQQE-------------SATTQKAEREVTRMVVIMVISFLVCWVPYASVA
FYIFTHQGSDFGPVFMTIPAFFAKSSAIYNPVIYILMNKQFRNCMIM---TLCCGKNPLG
DEETSAG--TKTETSTVSTSQVSPA
>OPSD_APIME Q17053 Rhodopsin, long-wavelength (Opsin, green-sensitive)
MIAVSGPSYEAFSYGGQARFNNQTVVDKVPPDMLHLIDANWYQYPPLNPMWHGILGFVIG
MLGFVSAMGNGMVVYIFLSTKSLRTPSNLFVINLAISNFLMMFCMSPPMVINCYYETWVL
GPLFCQIYAMLGSLFGCGSIWTMTMIAFDRYNVIVKGLSGKPLSINGALIRIIAIWLFSL
GWTIAPMFGWNRYVPEGNMTACGTDYFNRGL--LSASYLVCYGIWVYFVPLFLIIYSYWF
IIQAVAAHEKNMREQAKKMNVASLRSSENQNTSAECKLAKVALMTISLWFMAWTPYLVIN
FSGIF-NLVKISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFAKFPSLACAAEPSS
DAVSTTSGTTTVTD-----NEKSNA
>OPSD_ASTFA P41590 Rhodopsin
MNGTEGPYFYV-------PMSNATGVVRSPYEYP-----QYYLAPPWAYACLAAYMFFLI
LVGFP---VNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSLNGYFVF
GRLGCNLEGFFATFGGINSLWCLVVLSIERWVVVCKPMSNFRFGENHAIMGVAFTWFMAL
ACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVVHFLTPLFVITFCYGR
LVCTVKEAAAQQQE-------------SETTQRAEREVTRMVILMFIAYLVCWLPYASVS
WWIFTNQGSEFGPIFMTVPAFFAKSSSIYNPVIYICLNKQFRHCMIT---TLCCGKNPFE
EEEGASTTASKTEASSVSSVSPA--
>OPSD_ATHBO Q9YGZ1 Rhodopsin
MNGTEGPYFYI-------PMLNTTGVVRSPYEYP-----QYYLVNPAAYAVLGAYMFFLI


  [Part of this file has been deleted for brevity]

LVCAVKEAAAAQQE-------------SETTQRAEREVTRMVIIMVVSFLVSWVPYASVA
WYIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMIT---TLCCGKNPFE
EEEGASSTASKTEASSVSSSSVSPA
>OPSD_BATMU O42300 Rhodopsin (Fragment)
-----------------------------------------YLVSPAAYAALGAYMFLLI
LIGFP---VNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVL
GRLGCNLEGFFATLGGEIALWSLVVLAIERWIVVCKPISNFRFTEDNAIMGLAFSWVMAL
TCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFPIPLSVIFFCYGR
LLCAVKEAAAAQQE-------------SETTQRAEKEVSRMVVILVIGFLVCWLPYASVA
WWIFCNQGSDFGPIFMTLPSFFAKRPAIYNPMIYICMNKQFRHCMI--------------
-------------------------
>OPSD_BATNI O42301 Rhodopsin (Fragment)
-----------------------------------------YLVSPAAYAALGAYMFLLI
LIGFP---VNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVL
GRLGCNLEGFFATLGGEIALWSLVVLAIERWIVVCKPISKFRFTEDNAIMGLAFSWVMAL
ACAVPPLVGWLRYIPEGMQCTCGVDYYTRAEGFDNESFVIYMFIVHFLIPLSVIFFCYGR
LLCAVKEAAAAQQE-------------SETTQRAEKEVSRMVVIMVIGFLVCWLPYASVA
WWIFCNQGSDFGPIFMTLPSFFAKRPAIYNPMIYICMNKQFRHCMI--------------
-------------------------
>OPSD_BOVIN P02699 Rhodopsin
MNGTEGPNFYV-------PFSNKTGVVRSPFEAP-----QYYLAEPWQFSMLAAYMFLLI
MLGFP---INFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVF
GPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMAL
ACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQ
LVFTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVIAFLICWLPYAGVA
FYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVT---TLCCGKNPLG
DDEASTTVS-KTET-----SQVAPA
>OPSD_BUFBU P56514 Rhodopsin
MNGTEGPNFYI-------PMSNKTGVVRSPFEYP-----QYYLAEPWQYSILCAYMFLLI
LLGFP---INFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFIL
GATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMAL
SCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGR
LVCTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVVFFLICWVPYASVA
FFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMIT---TLCCGKNPFG
EDDASSAATSKTEASSVSSSQVSPA
>OPSD_BUFMA P56515 Rhodopsin
MNGTEGPNFYI-------PMSNKTGVVRSPFEYP-----QYYLAEPWQYSVLCAYMFLLI
LLGFP---INFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFVF
GQTGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAIMGVAFTWIMAL
ACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIIFFCYGR
LVCTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVVFFLICWVPYASVA
FFIFTHQGSEFGPVFMTIPAFFAKSSSIYNPVIYIMLNKQFRNCMIT---TLCCGKNPFG
DEDASSAATSKTEASSVSSSQVSPA
>OPSD_CALPD Q6W3E1 Rhodopsin
MNGTEGPNFYV-------PFSNKTGVVRSPFEEP-----QYYLAEPWQFSCLAAYMFMLI
VLGFP---INFLTLYVTIQHKKLRTPLNYILLNLAIADLFMVFGGFTTTLYTSLHGYFVF
GPTGCDLEGFFATLGGEIALWSLVVLAIERYIVVCKPMSNFRFGENHAIMGVAFTWVMAL
ACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMVVIFFCYGQ
LVFTVKEAAAQQQE-------------SATTQKAEKEVTRMVIIMVIAFLICWLPYAGVA
FYIFTHQGSNFGPILMTLPAFFAKTSAVYNPVIYIMLNKQFRTCMLT---TLCCGKIPLG
DDEASATAS-KTET-----SQVAPA

Data files

None.

Notes

None.

References

[1] Johannes Soding (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21 (7): 951–960.

[2] Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG. Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 2010 May 14;5:21.

[3] http://www.genetics.wustl.edu/eddy/software/#squid

[4] Wilbur and Lipman, 1983; PMID 6572363

[5] Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680.

[6] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948.

[7] Kimura M (1980). "A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences". Journal of Molecular Evolution 16: 111–120.

[8] Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput.Nucleic Acids Res. 32(5):1792-1797.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program name Description
edialign Local multiple alignment of sequences
emma Multiple sequence alignment (ClustalW wrapper)
eomega Multiple sequence alignment (ClustalO wrapper)
eomegapp Profile with profile (ClustalO wrapper)
eomegash Sequence with HMM (ClustalO wrapper)
eomegasp Sequence with profile (ClustalO wrapper)
infoalign Display basic information about a multiple sequence alignment
mse Multiple sequence editor
plotcon Plot conservation of a sequence alignment
prettyplot Draw a sequence alignment with pretty formatting
showalign Display a multiple sequence alignment in pretty format
tranalign Generate an alignment of nucleic coding regions from aligned proteins

Author(s)

Alan Bleasby
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None