Once the mutations present in the DNA of cancer cells are known, researchers can identify which proteins have been altered. This can help:
to make a diagnosis, by spotting which altered proteins are responsible for the tumour’s progression;
to choose an adapted treatment, by spotting which altered protein could be targeted by a drug and/or which altered protein could be involved in drug resistance.
SEQUENCE THE DNA
When DNA profiling a tumour, you have to sequence the DNA of a biopsy.
DNA is made of two ‘complementary’ strands. Each strand is a succession of four molecules (nucleotides) symbolized by the letters a, c, g and t: a always binds to t and c always binds to g.
‘DNA sequencing’ entails determining the order in which the four nucleotides appear in one of the two DNA strands.
NEW GENERATION SEQUENCING (NGS) TECHNIQUES
One DNA sequencing technique has been revolutionary in the life sciences: new generation high throughput sequencing, or NGS (Next Generation Sequencing).
NGS is a technique with which scientists can sequence huge quantities of DNA in record time.
Over 10 billion fragments of DNA, each about 150 to 300 nucleotides long, can be sequenced within only a few hours!
NGS SEQUENCING, STEP BY STEP
The complementary strands are synthesized in the presence of a, c, g and t – the four nucleotides – each of which is linked to a different-coloured fluorophore (a fluorescent chemical compound). One nucleotide is inserted at a time by a protein known as DNA polymerase. Nucleotide a always binds nucleotide t and nucleotide c always binds nucleotide g.
INDEX THE MUTATIONS
To index mutations found in the DNA of the tumour cells, the DNA sequences obtained by NGS are aligned and compared with the DNA sequence of the human reference genome.
It is a huge undertaking, and can only be accomplished with the help of bioinformatical analysis tools.
ALIGNING, COMPARING, VALIDATING
When these DNA sequences are aligned, researchers can ‘see’ which nucleotides in the DNA of cancer cells differ from those in the human reference genome
For a difference to be validated as a ‘mutation’, a minimum amount of fragments must present the same difference.
Several hundreds of mutations are usually validated.
In the example given here, there is a t -> a mutation (*).
Nucleotide a is found in several fragments where it replaces nucleotide t found in the DNA sequence of the human reference genome (in grey).
ON THE HUMAN REFERENCE GENOME
We refer to the human genome as though there was only one. But nothing is further from the truth: each one of us has a unique genome.
There are about 3 million differences between our genome and our neighbour’s!
“THE” human genome is in fact a mosaic of bits of genomes which belong to a dozen different individuals.
This ‘mosaic’ human genome is used as a reference among researchers. A mutation is identified as such if a nucleotide in the DNA that is being studied differs from the nucleotide at the same position in the DNA of the reference genome.
A mutation is defined by a change observed (for example a -> t) and its position (chr7:140’753’336) on the current version of the human reference genome (GRCh38.p12).
LOOKING FOR A NEEDLE IN A HAYSTACK
Analyzing all the DNA present in cancer cells is a very long and complex process.
Searching for several dozen ‘spelling mistakes’ in a text that is 6 billion letters long is a real challenge, especially if the text is present in millions of copies – since there are millions of cells in one tumour sample.
...AN ALTERNATIVE: TARGETED SEQUENCING
It would be far easier if researchers were able to select the DNA they would like to analyze; such techniques now exist.
Today, researchers can sequence specifically:
→ a panel of 50 to 500 genes that are known to be frequently altered in tumours. This represents 0.025% of the human genome (i.e. between 500,000 and several million nucleotides).
→ DNA parts that contain information which only code for protein. This is done by considering only the parts of genes that actually code for protein (called exons). The sum of exons in a genome is called the exome, and represents less than 2% of the genome.
INTERPRET THE MUTATIONS
Once a panel of 50 genes has been sequenced and analysed (bioinformatics), it generates a list of approximately several dozen to several hundred mutations of interest.
Certain mutations play no role in the development of cancer or on the choice of treatment. As an example, certain mutations do not alter the protein’s sequence, while others may alter a protein’s sequence, but if they do not hit the functional site, they could have no consequence on the protein’s function.
So how do researchers decide which mutation is biologically more important than another?
This marks the beginning of a long investigation!
THE IMPORTANCE OF DATABANKS
If known, any information regarding a given mutation and the effect it might have on a protein’s function or its reaction to a given drug are indexed in various specialized databanks. As scientific and technical advances progress, the information is updated – corrected or completed – over time .
WHEN NOTHING IS KNOWN…
In about 25% of cases, the effects a mutation may have on a given protein is not known. In this case, researchers use bioinformatical tools – such as molecular modelling – to predict the effects.
Sometimes, however, there is no information available on the effects of certain mutations. These mutations are called ‘variants of unknown clinical significance’ (VUS).
After complex – bioinformatical and statistical – analyses of the data, researchers can identify and interpret about a dozen mutations that are then described to the medical team: this is a key step!
Example of a report addressed to the patient’s oncologist
“The BRAF V600E mutation has been identified in the patient’s tumour cells.
This mutation is indexed as ‘pathogenic’ (or ‘driver’) in several international databanks and by many scientific publications. The functional repercussion of this mutation is well established, and is responsible for tumour progression.
The mutation is somatic and cannot be used to screen test other members of the family.
One known drug targets specifically BRAF V600E: vemurafenib.”