Coping with uncertainties in the description of sequence variants
This page discusses how to cope with uncertainties in the description of sequence variants and where available we link to existing recommendations. Do you agree / disagree?, did we miss cases?, do you want to make suggestions?; please contact us by E-mail (to: HGVSmn @ JohanDenDunnen.nl) or use the HGVS-mutnomen Facebook.
Often clear changes can be detected in the genome, but a precise description in relation to the genomic sequence is not possible. Examples include cytogenetically detectable changes, changes detected using Fluorescence In Situ Hybridisation (FISH) and changes detected using array technology. Other examples are cases were genomic deletions / duplications are detected indirectly using RNA analysis; although the change can be described exactly at RNA-level, the genomic sequence spanning the break point of the rearrangement is required to describe the change at DNA-level. Finally there are examples, mostly from older publications, where the changes detected are not described precisely ("an 11 nucleotide deletion in exon 3") or uniquely ("changing amino acid Gly-17 to Arg").
In many diseases changes are found which delete or duplicate (sets of) whole exons and thereby seriously affect the function of the gene. These changes are detected with technologies like PCR, Southern blotting, MAPH and MLPA. The analysis however does not reveal the breakpoints at a molecular level. Incorporation of these variants in sequence variation databases is very important, most importantly to be able to determine their frequency. In addition, e.g. in Duchenne/Becker Muscular Dystrophy (DMD/BMD), the breakpoints can be predictors for the severity of the disease: a truncated reading frame causes DMD, an open reading frame BMD. There is a clear demand to be able to include such changes in catalogs of all DNA sequence variants found.
It is clear that the goal should be to describe these variants as precise as possible. However, in real life this is often not possible and one should wonder whether a lot of detail adds useful information. The deletion of genomic sequences can be detected using several different methods, incl. FISH, Southern blot, quantitative PCR, MAPH and MLPA. One could argue that - to be as precise as possible - the description of the deletion should include information of the probe sequences used. Using some methods, the presence of a probe signal only indicates that a (large) part of the probe is present (e.g. BAC-derived FISH signals). In PCR-based methods the absence of a signal does not mean that the entire target sequence is missing; PCR results are negative when only one of the two primer annealing sites is deleted or mutated (see FAQ).
In MLPA were a pair of short oligonucleotides is used to detect the copy number of a speficic sequence (mostly an exon). Effectively, when the signal for a probe is decreased, it only indicates that the pair of oligo nucleotide sequences (20-30 nucleotides) do not both hybridize to the target sequence and can therefore not be ligated, extended and detected. It is common practice (after excluding other variants when only 1 exon probe is affected) to describe the result as a change affecting the entire set sequences (exons), like "a deletion of exons 23-27" or "c.3163-?_3786+?del". When indeed the location of the probe would need to be included the description becomes too complex to be useful. Furthermore, assuming the probe hybridizes from position c.3211 to c.3236, which location should be taken?, only part of the target sequence might be deleted.
To indicate uncertainties in the description of sequence changes the question mark ("?") and brackets ("()") are used. When the exact position of a change is not known, the range of the uncertainty is listed between brackets (like (5' border_3' border)) and one should describe the change on DNA-level as precise as possible. When it is difficult to give an exact nucleotide position for a specific probe/sequence tested, a rule of thumb is to use the central nucleotide.
Based on these considerations the description of a deletion has the format: (last-positive_first-negative)_ (last-negative_first-positive). Details of the description are based on the technology used to detect the change:
For further details see below.
For duplications the same recommendations hold, except that duplications
are designated by "dup" in stead of "del": (last-positive_first-negative)_
Recommendations). It should be noted however that the
description "dup" may by definition (see
Standards) only be used when the additional copy is located
directly 3'-flanking the original copy (a tandem duplication). In most
cases there will be no experimental proof, one simply detects the presence
of an additional copy that can be anywhere in the genome (inserted /
transposed). Discussions are ongoing how to include this uncertainty best
in the description (see Recommendations).
It should be clear though that describing a duplication like c.3163-?_3786+?dup
in general is not correct.
When the exact position of a change is not known, the range of the uncertainty is listed between brackets ("()", see Recommendation). Similarly, when insertions have not been specified (e.g. "ins5") or when an insertion was not sequenced but its length estimated (e.g. from gel electrophoresis), brackets are used to indicate the uncertainty.
The description should use the basic format (last-positive_first-negative)_ (last-negative_first-positive), be based on the reference sequence used and include the position of the most extreme region(s) tested (e.g. segment PCR-ed, probes used for hybridisation, etc.).
For clarity and to make descriptions form specific tests, e.g. MLPA, not too complicated, it is allowed to describe changes assuming that when a probe for a specific exon scores deleted (duplicated) that the entire exon is deleted (duplicated), i.e. detailed knowledge regarding the exact location of the probe sequence used is not used in the description.
NOTE: it should be clearly indicated which technology was used (MLPA, PCR, etc.) and where primer/probe sequences can be found. For rearrangements affecting 1-exon only it should be indicated whether DNA sequencing was performed to exclude variants affecting the primer/probe target sequences. When different probes for one exon score different, this information must be used in the description of the sequence change (see FAQ).
Deletions are designated by "del" after an indication of the first and last nucleotide(s) deleted (see Recommendations).
Many chromosomal rearrangements, especially in the past, have been detected using techniques like Southern blotting and Fluorescence In Situ Hybridisation (FISH). The description of these changes was often in tabular or graphical format based on either position-ordered probe names or relative chromosomal positions. Especially when FISH was used, relatively little or often even no actual DNA sequence of the probe(s) used was known. In those days, even when a probe sequence was known, this information was of little help to determine the probe location more precisely.
With the availability of the reference human genome sequence, this situation has dramatically changed. Now, any piece of DNA probe sequence can be used to position that probe with great precision on the human genome map. In addition, the clones used to generate the human genome sequence are freely available and have become the preferred probes for new FISH experiments. The latter is especially true for genome-wide array-CGH experiments using ordered PAC/BAC-clones. As a consequence of these developments it is now possible to describe these changes based on DNA sequences. NOTE: see Discussion.
Following the recommendation to describe rearrangements using the format (5' border_3' border) for FISH probes this becomes (last-positive-clone_first-negative-clone)_ (last-negative-clone_first-positive-clone).
Basically, chromosomal rearrangements and other DNA sequence variants detected using array technology can, based on the array-probe sequences used, be described as those for FISH-detected rearrangements (see FISH-detected rearrangements). An advantage here is that the array probes used are often exactly defined, being mostly relatively short 20-60-mer oligonucleotide sequences. This information can be used to exactly describe the rearrangements at the nucleotide level.
For deletions the basic format is (last-probe-present_first-probe-deleted)_(last-probe-deleted_first-probe-present).
A nomenclature system to describe cytogenetically detectable rearrangements has been suggested early on (see ISCN 1985). Current recommendations in this areas are made by the "Standing Committee on Human Cytogenetic Nomenclature" and were published recently as ISCN 2013.
When two sequence changes are found in one gene of an individual but it is unknown whether they are located on the same or on different chromosomes, the change is described using the format c.[76A>C(;)483G>C] (see Recommendations).
| Top of page | MutNomen
homepage | Check-list |
| Recommendations: general, RNA, protein |
| Discussions | FAQ's | Symbols, codons, etc. | History |
| Example descriptions: QuickRef / symbols, DNA, RNA, protein |