What is the NetAffx® Analysis Center?
The NetAffx Analysis Center distributes regular annotation updates, reporting association between GeneChip® array probe sets and the constantly evolving set of transcript sequences and genome assembly builds from public data sources. Primary Affymetrix catalog arrays, including 3' IVT expression, exon, gene, and genotyping products are supported.
2. How often do annotation updates happen?
The annotation updates are released two times yearly in mid-May, and November for all arrays. If there is an update in the genome assembly build for a given organism, that update usually is reflected in the following annotation update.
3. How do I know the annotation update has been released?
4. Will my browser work with the NetAffx site?
The NetAffx Analysis Center supports Internet Explorer versions 7 and 8 on PC and Mac. While Firefox 3, Opera 9 and Safari 3 and later work as well although these browsers are unsupported.
5. What if I forget my Affymetrix.com password and/or login?
We will email your password to you after you supply the answer to your security question. Please click on the "Forgot Password?" link to retrieve the password. Alternatively, you can contact Affymetrix Application Support at 1-888-DNA-CHIP (1-888-362-2447) or email@example.com to reset your password.
6. The NetAffx site is not responding. Am I logging in correctly?
When the load on the servers is too high, it may help to close the browser and log back into the site. If Site maintenance occurs, this will take place in the early morning hours (North American Pacific Time) and will, occasionally, leave the Affymetrix website inoperable for a short period of time.
7. How do Quick/Batch/Standard Queries work on the NetAffx site?
The NetAffx site allows you to search through the content of as many as three arrays at once. There are three types of search; for particulars on the search terms, see the help on each respective Query page.
Quick Query searches for keywords from public transcripts, gene names, functional annotations and probe set IDs.
Standard Query allows searches for terms from more than 30 specific fields, including searches of probe set design data and annotations from associated public transcripts; gene names, gene ontology annotations and genomic locations.
Batch Query enables users to upload lists of terms for a NetAffx query. At this time, Batch Query is limited to 3,000 probe sets or accessions or 250 keywords uploaded. The Batch Query accepts plain text files with one entry per line.
The Probe Match Tool will take any nucleotide sequence and compare it against the individual probes on any given catalog array, returning a list of probe sets that corresponds to the query sequence. This is convenient for users with nucleotide sequences not included in the public sequence record.
9. UCSC Query
The UCSC Query page allows users to see the genomic context of probes, probe sets, genes and public transcripts in the UCSC browser for any Affymetrix catalog GeneChip array with probes that have been mapped to a UCSC-hosted genome assembly.
Links to the UCSC browser are also available on all probe set, exon cluster or transcript cluster details pages.
10. BLAST Tool
The NetAffx BLAST page allows the comparison of a user-submitted nucleotide sequence to the target and consensus sequences (see Question 36) of 3' IVT expression arrays.
While the Probe Match Tool is most useful for relating probe sets to an exact match of a nucleotide sequence, the BLAST Tool is more useful for finding probe sets to orthologous, homologous or paralogous nucleotide sequence.
Using the NetAffx Website
11. How can I find information for a given probe set?
From a Standard Query search page, select the "Probe Set ID" search field and enter the name or ID of the probe set of interest in the search term field.
A deep link to retrieve information for a given probe set can be constructed using instructions on the Direct Access to Probe Set Information page.
12. How can I find the information for a list of probe sets, keywords or gene names?
Batch Query enables users to upload lists of terms for a NetAffx query. At this time, Batch Query is limited to 3,000 probe sets or accessions or 250 keywords uploaded. The Batch query accepts plain text files with one entry per line. Click on the Batch Query link for the application area of interest in the left-hand column of any NetAffx page. For example, here's the batch query page for 3' IVT expression arrays.
13. How do I find the gene associated with a SNP?
The "Associated Genes" section in the NetAffx details page for the SNP lists the gene relationships. For SNPs that fall within genes, the structural region of the gene (exon, intron, UTR, CDS, etc.) that the SNP overlaps is indicated. For genes that are either upstream or downstream of the SNP, the distance between the SNP position and the start of the transcript is indicated. You can retrieve the NetAffx details page for a SNP represented on a mapping array by querying with either a probe set ID or a RefSNP ID.
14. Can I save my results?
When you have a list of probe sets, exon clusters or transcript clusters in table view, the export link will appear above the table. As many as 3,000 probe sets can be saved in one of several file formats through the "Export" link at the top of the results page.
15. How do I find my gene of interest?
From a Standard Query search page, select the "Gene Symbol" search field and enter the official name of the gene of interest in the search term field. Official names of genes can be obtained from Unigene.
16. Why do I see differences in genomic coordinate or between the records at the public sequence databases and the ones in the NetAffx Analysis Center?
In order to ensure consistency, genomic coordinates for transcript records and probe sets are recalculated for each annotation update, eliminating conflicts from genomic positions from differing genome builds or version conflicts from the different sources used to assemble the NetAffx database.
17. How do I convert SNP IDs to TSC IDs?
For the RefSNPs represented on a GeneChip mapping array, perform a batch query using the RefSNP IDs and display results in a custom view with the TSC ID field, which can then be downloaded. TSC IDs are available only for the Mapping 10K and 100K Arrays.
18. Why are there so many transcripts for a particular probe set?
Most of the probe sets are related to more than one transcript, and while the NetAffx annotation collection removes some redundant transcripts, all associated alternative transcripts and known variants are provided for each probe set.
The NetAffx database provides transcript sequence data from GenBank, RefSeq, Ensembl and organism-specific databases such as WormBase and FlyBase. The NetAffx annotation collection removes redundant GenBank transcripts but does not remove duplicates that are derived from different data sources.
19. What are custom annotation views?
On the NetAffx navigation menu, the annotation views menu selection allows users to create custom table views for an array type, selecting from more than 80 different data columns to be included in the custom view.
Any custom view can be selected for any query from the same technology (e.g., 3' IVT, exon/gene or mapping arrays) using the pull-down menu above the results table on the Query Results page.
20. Why are there so few annotations for my array of interest?
The NetAffx database covers transcript sequence data from GenBank, RefSeq, Ensembl and organism-specific databases such as WormBase or FlyBase, and does not track EST-based sequence data. Expression arrays are designed using EST sequence evidence in addition to transcript sequences. The NetAffx Analysis Center will report EST-based evidence known at the time of array design. The NetAffx annotation coverage for an array is only as thorough as the respective public sequence repositories. If public sequence information is lacking for a given organism, so is the Netaffx annotation.
21. What information is on the NetAffx details page?
The details page for each given probe set is a compilation of all nucleotide, genomic, proteomic and functional data that has been collected from all transcripts believed to be interrogated by the probe set on the basis of the NetAffx transcript assignment pipeline.
For definitions of the content on the details pages, glossaries are available for every term employed in the different application areas: 3' IVT Expression, whole-transcript exon/gene and mapping.
NetAffx Annotation Data
22. What data do NetAffx annotation updates include?
Annotation updates incorporate current releases from GenBank, RefSeq, Ensembl, UniGene, Entrez Gene, UniProt and UCSC, as well as sequences from other organism-specific databases. The sources for each annotation update since release 20 are documented and available on the NetAffx Support Materials Page.
23. Why can't I download all the annotations that I want through the NetAffx export?
From a query, you can export the results of a query of as many as 10,000 probe sets to your desktop. To facilitate GeneChip array data analysis, the NetAffx site also provides all annotation data in bulk files.
You will find a link to download the annotations in bulk comma-separated values (CSV) files on the support page for your array.
24. How are annotation updates generated?
To optimize the coverage and quality of probe set annotations, the NetAffx annotation pipeline consists of a set of tiered methods that provide the broadest coverage of the arrays without compromising on the assignment confidence.
For full details, see the white paper, Transcript Assignment for NetAffx Annotations.
25. The annotations I used for my analysis are different than the ones I have now, and I can't find the same result I got.
Archival NetAffx annotation updates starting from July 2006 (release 20) are available on the support page for each respective GeneChip array.
26. What is "design time"?
The "Probe Design Information" section of a probe set details page describes the original evidence used to design the probe set. This design time data could be years old and therefore may not reflect the most accurate assessment of what the probe set detects based on currently available public sequence repositories.
However, the design time reference data is particularly useful in cases where a probe set may not be associated with a well-characterized sequence from the current public transcript record.
27. What does the grade of an IVT probe set mean?
The NetAffx Analysis Center tracks five levels of relationships between IVT probe sets and the current transcript record. The letter annotation grade corresponds to the class of evidence described in the annotation description field, also summarized below. For more information, see annotation grade below or read the white paper, Transcript Assignment for NetAffx Annotation.
- Grade A - Nine or more probes from the probe set match this transcript perfectly. For all other grades, fewer than nine probes had perfect matches to the transcript.
- Grade B - The transcript and the probe set's target sequence overlap on the genome. Some probes fail to match the transcript presumably because the 3' end of the transcript is truncated in the record.
- Grade C - The transcript and the probe set's consensus/exemplar sequence overlap on the genome. The transcript does not overlap the target sequence presumably due to 3' end of the truncation.
- Grade E - No transcripts are known to correspond to this probe set at this time, but a UniGene EST cluster is known to correspond to it.
- Grade R - No transcript currently supports this probe set. Annotation is based on a representative sequence from the original design data.
Only the transcripts with the highest available assignment grade are referenced in the NetAffx Analysis Center. So, if transcripts with Grade A transcript assignments are available, Grade B, C and matching EST data (Grades E and R) will not be displayed.
Note that due to the sheer volume of information, EST evidence related to a probe set is not updated; only non-EST transcript associations are updated.
28. What is a consensus or exemplar sequence?
Consensus/exemplar sequences are compiled at the time of array design to represent all the known transcripts that the GeneChip® probe set measures. A consensus sequence results from base-calling algorithms that align and combine full-length and partial cDNA and EST sequences data into groups. An exemplar sequence is a representative transcript sequence for each gene.
29. Where can I find probe, cluster, exemplar and consensus sequences?
The probe sequences are described on the bottom of the probe set details page.
30. Why are the alleles reported in the NetAffx SNP details page different from the alleles in dbSNP?
At the time of design, Affymetrix follows its own rules on choosing one of the two strands as forward strand for allele definition. As a result, Affymetrix may report alleles that are complementary to those listed in dbSNP for the same SNP. For details on this allele naming convention, see the FAQ entry.
31. Which strand is given on the NetAffx SNP details page?
At the time of array design, Affymetrix follows its own rules on choosing one of the strands as forward strand for allele definition. This strand is reported on the NetAffx SNP details page. For details on this allele naming convention, see the FAQ entry.
32. What are orthologs and how do I use them? Where does ortholog data come from?
Orthologs are genes in different species with similar function which evolve from a common ancestor by speciation. Ortholog annotations can be used to infer gene function of poorly annotated genes in one species by comparing it with a well-annotated ortholog from a different species. Sometimes, GeneChip arrays designed from one species can be used to study gene expression in an orthologous species. Ortholog annotations are useful in such cases to identify genes that display interesting expression patterns.
Ortholog annotations in the NetAffx Analysis Center are derived from the HomoloGene database at NCBI. These orthologs may be "curated" (derived from literature) or "computed" (by determining reciprocal best hits in an all-versus-all comparison of protein sequences from different organisms using a program such as BLASTP).
33. What are Array Comparison Spreadsheets?
Array Comparison Spreadsheets list similar probe sets between two GeneChip arrays. Similarity between probe sets on different arrays is assessed by comparing the design sequences (probes along with consensus/exemplar or transcript cluster sequences) between two arrays using the BLASTN program. The identity cutoff and other rules used to define this similarity are described in the User's Guide to Product Comparison Spreadsheets. The spreadsheets can be "cross-species" or "same-species" comparison spreadsheets. The cross-species spreadsheets compare probe sets between two arrays from different species, for example, the Human Genome U133 Plus 2.0 Array versus The Canine Genome 2.0 Array. The same-species comparison spreadsheets compare probe sets between two different versions of an array from the same species, for example, the Human Genome U133 Plus 2.0 Array versus the Human Genome U95 Set.
NetAffx Download Files
34. What are the annotation update download files?
To support in-house customer bioinformatics efforts, the NetAffx annotation collection is available in a bulk download as a comma-separated values (CSV) file by array product. These files are readable by spreadsheet programs such as Microsoft Excel and can be easily imported into a RDBMS such as Microsoft Access, SQLite or MySQL. The download file is provided in a compressed (zip) archive and is bundled with a README file describing details of the file content and format.
35. Where can I find annotation update download files?
Links to download files are listed on the support page for each respective product. They can also be downloaded from the Annotation Files Index Listing, or as listed in the NetAffx SDK XML notification.
36. Where is the guide for annotation download files contents?
Details on the file content and format are described in a README file that is bundled with each downloadable data file in a compressed zip archive.
37. What are CSV and TSV?
Comma- and tab-separated values files (CSV and TSV, respectively) are spreadsheet versions of text files where the individual columns are separated by commas or tabs. They are readily machine readable or loaded into spreadsheet programs such as Excel or an RDBMS such as Microsoft Access, SQLite or MySQL.
Because individual data fields can contain commas, the CSV data fields are enclosed in double quotes.
Details on the file content and format are described in a README file that is bundled with each downloadable data file in a compressed zip archive.