Your sequence must be a protein (amino acid) sequence in FASTA format with the header or without. e.g.,
>tr|Q9FAE8|Q9FAE8_9BURK Flagellin OS=Acidovorax avenae subsp. avenae OX=80870 GN=N1141-fla1 PE=3 SV=1
MASTINTNVSSLTAQRNLSLSQSSLNTSIQRLSSGLRINSAKDDAAGLAISERFTSQIRG
LNQAVRNANDGISLAQTAEGALKSTGDILQRVRELAVQSANATNSSGDRKAIQAEVGQLL
SEMDRIAGNTEFNGQKLLDGSFGSATFQVGANANQTITATTGNFRTNNYGAQLTASATGA
ATTGATAGSAGAAAGTVVIAGLQTKTVNVAAAGTASDIASAVNAVADSTGVTASARNVSE
MKFSGTGSFTLAVKGDNSTAANVTFNVSATSTAAGLAEAVKAFNDVSSQTGVTAKLNSDS
SGLILTNESGNDINIANGSSSAAGITLASQDAVTTQSSGTLTFTSATAAGTGVTVASRGT
VEYKSDKGYTVSGTGGTMTNATATSSTLTKVSDIDVSTVDGSTKALKIIDAALSAVNGQR
ASFGALQSRFETTVNNLQSTSENMSASRSRIQDADFAAETANLSRSQILQQAGTAMVAQANQLPQGVLSLLK
The search is using DIAMOND default mode to perform a protein sequence similarity search against the AnnoView database, which is based on the GTDB data (Release 95) and AnnoTree database The protein homology search criteria includes E-value, coverage cut-off, and which database to search (bacteria/archaea). Annotations are from the AnnoTree database, with every protein sequence annotated by KEGG orthology identifiers, Pfam protein families and TIGRFAM protein families.
This is not currently supported. Future versions of AnnoView will be updated to the latest GTDB release.
No. Only one protein sequence is allowed at a time.
No, you won’t be able to see the result even if the internet is reconnected. You’ll have to perform the search again with your query.
No. The search result can only be saved after the gene neighborhood is displayed. This is something that will be developed for later versions of the tool.
Running a query against the archaea database typically takes around 15 seconds, while a query against the bacteria database typically takes about 6 minutes. These times may increase depending on the number of concurrent users.
Yes, our tool provides the flexibility to display multiple protein hits if they are found within a single genome. After entering your query, you will be directed to an intermediate page where you can choose which protein hits and their associated gene neighborhoods you would like to view. If there are multiple hits within one genome, you have the option to display either a single hit or select multiple hits for further exploration. This allows you to customize your results based on your specific interests or research requirements.
Annoview currently accepts .gbk, .gff and .csv format files.
The .csv file should include the following columns: GTDB/nucleotide ID (must in the first column), organism/species name (must in the second column), GTDB gene ID/ Protein ID, start position, end position, CDS length, and at least one column with metadata to annotate the genes with (gene name, product, KEGG, Pfam, TIGRFAM etc). The following columns are optional: sequence, domain, phylum, class, order, family, genus, default center (to set a gene to center the rows around), extra metadata/function annotation columns. Download a template here Slr4_example.
Currently, the maximum size for a single file upload is 5MB. and the maximum total file size for multiple files is 16 MB.
Only one .csv file can be uploaded at a time.
Column explanation | .csv downloaded from Search GTDB | .csv downloaded from uploading NCBI .gbk/.gff | Columns required for upload |
Genome/assembly/nucleotide accession | GTDB ID | Nucleotide ID | Yes and must be the first column |
Organism name | Species | Organism | Yes and needs to be the second column |
Taxonomy level | Domain | - | Optional |
Taxonomy level | Phylum | - | Optional |
Taxonomy level | Class | - | Optional |
Taxonomy level | Order | - | Optional |
Taxonomy level | Family | - | Optional |
Taxonomy level | Genus | - | Optional |
Gene product | - | Product | Optional |
Protein ID | GTDB Gene ID | Protein ID | Yes |
Gene name | - | Gene | Optional |
Start location | Start | Start | Yes |
Stop location | End | End | Yes |
Orientation | Strand | Strand | Optional |
Nucleotide sequence length | CDS Length | CDS Length | Yes |
KEGG orthology | KEGG | - | Optional |
Pfam protein family | Pfam | - | Optional |
TIGRFAM protein family | TIGRFAM | - | Optional |
Center gene used for gene neighborhood clustering | Default Center | - | Optional |
Protein sequence | Sequence | Sequence | Optional |
Customized protein annotation by user | - | - | Optional |
You can download the figure in .svg format, and edit it using a vector graphic editor such as Adobe Illustrator or Inkscape. An alternative way of saving or editing the visualization is to download the gene neighborhood dataset in .csv format. You can then add columns that contain taxonomic information, default centering instructions for the visualization, or protein annotations to the table. This table in .csv format can be re-uploaded to AnnoView.
Yes. Gene neighborhoods can be sorted and aligned based on a clustering algorithm implemented in our server. You can do this by picking a center gene first, and then right clicking on the gene and choosing “center on that gene. This centers the rows on the first gene with a matching annotation (whichever annotations are currently being visualized) in each row. Alternatively, you can download the .csv file and change the default center gene to whichever gene you want (one in each row). You can also manipulate row order manually by dragging the gene neighborhood labels on the left side. Each row can also be moved right-left by grabbing the row so that it can be aligned by the user. Rows can also be flipped by right clicking on a row and choosing "Flip track"."
Yes. You can select one or multiple gene neighborhood tracks by clicking on their labels on the left, then right click on the labels and choose “Delete all selected tracks”.
These genes don’t have any annotations under the currently selected annotation category.
You can use the search function in the toolbar and type the annotation of the gene you want to locate. The browser will then highlight all the genes with that specific annotation. The genes in the search bar are sorted according to their frequency, allowing users to locate the most commonly occurring genes within the current gene neighborhoods.
Chrome.
Please contact h29tan@uwaterloo.ca or use the following question form.