DNA & RNA
A database providing information on the structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data.
A curated set of metadata for culture collections, museums, herbaria and other natural history collections. The records display collection codes, information about the collections' home institutions, and links to relevant data at KOK体育官方网站.
- BioProject (formerly Genome Project)
A collection of genomics, functional genomics, and genetics studies and links to their resulting datasets. This resource describes project scope, material, and objectives and provides a mechanism to retrieve datasets that are often difficult to find due to inconsistent annotation, multiple independent submissions, and the varied nature of diverse data types which are often stored in different databases.
The BioSample database contains descriptions of biological source materials used in experimental assays.
- Consensus CDS (CCDS)
A collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality.
- Database of Short Genetic Variations (dbSNP)
Includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP contains population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at KOK体育官方网站. These three organizations exchange data on a daily basis. GenBank consists of several divisions, most of which can be accessed through the Nucleotide database. The exceptions are the EST and GSS divisions, which are accessed through the Nucleotide EST and Nucleotide GSS databases, respectively.
- Influenza Virus
A compilation of data from the NIAID Influenza Genome Sequencing Project and GenBank.聽 It provides tools for flu sequence analysis, annotation and submission to GenBank. This resource also has links to other flu sequence resources, and publications and general information about flu viruses.
- KOK体育官方网站 Pathogen Detection Project
A project involving the collection and analysis of bacterial pathogen genomic sequences originating from food, environmental and patient isolates. Currently, an automated pipeline clusters and identifies sequences supplied primarily by public health laboratories to assist in the investigation of foodborne disease outbreaks and discover potential sources of food contamination.
- Nucleotide Database
A collection of nucleotide sequences from several sources, including GenBank, RefSeq, the Third Party Annotation (TPA) database, and PDB. Searching the Nucleotide Database will yield available results from each of its component databases.
Database of related DNA sequences that originate from comparative studies: phylogenetic, population, environmental and, to a lesser degree, mutational. Each record in the database is a set of DNA sequences. For example, a population set provides information on genetic variation within an organism, while a phylogenetic set may contain sequences, and their alignment, of a single gene obtained from several related organisms.
A public registry of nucleic acid reagents designed for use in a wide variety of biomedical research applications, together with information on reagent distributors, probe effectiveness, and computed sequence similarities.
- A collection of human gene-specific reference genomic sequences. RefSeq gene is a subset of聽 KOK体育官方网站鈥檚 RefSeq database, and are defined based on review from curators of locus-specific databases and the genetic testing community. They form a stable foundation for reporting mutations, for establishing consistent intron and exon numbering conventions, and for defining the coordinates of other biologically significant variation. RefSeqGene is a part of the Locus Reference Genomic () Collaboration.
- Reference Sequence (RefSeq)
A collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by KOK体育官方网站. RefSeqs provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. The RefSeq collection is accessed through the Nucleotide and Protein databases.
- Sequence Read Archive (SRA)
The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System庐, Illumina Genome Analyzer庐, Life Technologies AB SOLiD System庐, Helicos Biosciences Heliscope庐, Complete Genomics庐, and Pacific Biosciences SMRT庐.
- Third Party Annotation (TPA) Database
A database that contains sequences built from the existing primary sequence data in GenBank. The sequences and corresponding annotations are experimentally supported and have been published in a peer-reviewed scientific journal. TPA records are retrieved through the Nucleotide Database.
- Trace Archive
A repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects.
BLAST executables for local use are provided for Solaris, LINUX, Windows, and MacOSX systems. See the README file in the ftp directory for more information. Pre-formatted databases for BLAST nucleotide, protein, and translated searches also are available for downloading under the db subdirectory.
Sequence databases for use with the stand-alone BLAST programs. The files in this directory are pre-formatted databases that are ready to use with BLAST.
Sequence databases in FASTA format for use with the stand-alone BLAST programs. These databases must be formatted using formatdb before they can be used with BLAST.
- FTP: GenBank
This site contains files for all sequence records in GenBank in the default flat file format. The files are organized by GenBank division, and the full contents are described in the README.genbank file.
This site contains all nucleotide and protein sequence records in the Reference Sequence (RefSeq) collection. The ""release"" directory contains the most current release of the complete collection, while data for selected organisms (such as human, mouse and rat) are available in separate directories. Data are available in FASTA and flat file formats. See the README file for details.
- FTP: Sequence Read Archive (SRA) Download Facility
This site contains next-generation sequencing data organized by the submitted sequencing project.
This site contains the trace chromatogram data organized by species. Data include chromatogram, quality scores, FASTA sequences from automatic base calls, and other ancillary information in tab-delimited text as well as XML formats. See the README file for details.
This site contains the UniVec and UniVec_Core databases in FASTA format. See the README.uv file for details.
This site contains whole genome shotgun sequence data organized by the 4-digit project code. Data include GenBank and GenPept flat files, quality scores and summary statistics. See the README.genbank.wgs file for more information.
An online form that provides an interface for researchers, consortia and organizations to register their BioProjects. This serves as the starting point for the submission of genomic and genetic data for the study. The data does not need to be submitted at the time of BioProject registration.
- GenBank: BankIt
A web-based sequence submission tool for one or a few submissions to the GenBank database, designed to make the submission process quick and easy.
- GenBank: Barcode
Tool for submission to the GenBank database of Barcode short nucleotide sequences from a standard genetic locus for use in species identification.
- GenBank: Sequin
A stand-alone software tool developed by the KOK体育官方网站 for submitting and updating entries to public sequence databases (GenBank, EMBL, or DDBJ). It is capable of handling simple submissions that contain a single short mRNA sequence, complex submissions containing long sequences, multiple annotations, segmented sets of DNA, as well as sequences from phylogenetic and population studies with alignments. For simple submission, use the online submission tool BankIt instead.
- GenBank: tbl2asn
A command-line program that automates the creation of sequence records for submission to GenBank using many of the same functions as Sequin. It is used primarily for submission of complete genomes and large batches of sequences.
- Sequence Read Archive Submission
This link describes how submitters of SRA data can obtain a secure KOK体育官方网站 FTP site for their data, and also describes the allowed data formats and directory structures.
A single entry point for submitters to link to and find information about all of the data submission processes at KOK体育官方网站. Currently, this serves as an interface for the registration of BioProjects and BioSamples and submission of data for WGS and GTR. Future additions to this site are planned.
- Trace Archive Submission
This link describes how submitters of trace data can obtain a secure KOK体育官方网站 FTP site for their data, and also describes the allowed data formats and directory structures.
Finds regions of local similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as to help identify members of gene families.
- Batch Entrez
Allows you to retrieve records from many Entrez databases by uploading a file of GI or accession numbers from the Nucleotide or Protein databases, or a file of unique identifiers from other Entrez databases. Search results can be saved in various formats directly to a local file on your computer.
Tools that provide access to data within KOK体育官方网站's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.
- Genome BLAST
This tool compares nucleotide or protein sequences to genomic sequence databases and calculates the statistical significance of matches using the Basic Local Alignment Search Tool (BLAST) algorithm.
- Genome Remapping Service
KOK体育官方网站's Remap tool allows users to project annotation data and convert locations of features from one genomic assembly to another or to RefSeqGene sequences through a base by base analysis. Options are provided to adjust the stringency of remapping, and summary results are displayed on the web page. Full results can be downloaded for viewing in KOK体育官方网站's Genome Workbench graphical viewer, and annotation data for the remapped features, as well as summary data, is also available for download.
- Genome Workbench
An integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at KOK体育官方网站, and mix these data with your own data.
- Open Reading Frame Finder (ORF Finder)
A graphical analysis tool that finds all open reading frames in a user's sequence or in a sequence already in the database. Sixteen different genetic codes can be used. The deduced amino acid sequence can be saved in various formats and searched against protein databases using BLAST.
The Primer-BLAST tool uses Primer3 to design PCR primers to a sequence template. The potential products are then automatically analyzed with a BLAST search against user specified databases, to check the specificity to the target intended.
A utility for computing alignment of proteins to genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, ProSplign is accurate in determining splice sites and tolerant to sequencing errors.
- Sequence Viewer
Provides a configurable graphical display of a nucleotide or protein sequence and features that have been annotated on that sequence. In addition to use on KOK体育官方网站 sequence database pages, this viewer is available as an embeddable webpage component. Detailed documentation including an API Reference guide is available for developers wishing to embed the viewer in their own pages.
A utility for computing cDNA-to-Genomic sequence alignments. It is based on a variation of the Needleman-Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, Splign is accurate in determining splice sites and tolerant to sequencing errors.
A system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen searches a query sequence for segments that match any sequence in a specialized non-redundant vector database (UniVec).
- Save text searches and set up automated searches with E-mailed results
- Submit data to KOK体育官方网站
- Submit sequence data to KOK体育官方网站
- Retrieve all sequences for an organism or taxon
- View/download features around an object or between two objects on a chromosome
- Find a curated version of a sequence record (KOK体育官方网站 Reference Sequence)
- Find published information on a gene or sequence
- Find transcript sequences for a gene
- Link from an object on a map to another resource
- Design PCR primers and check them for specificity
- Download a large, custom set of records from KOK体育官方网站