What is DNA profiling?

A DEFINITION

When researchers create the DNA profile of a tumour, the tumour cells are examined so that all their molecular anomalies (mutations) can be indexed and interpreted.

LEARN MORE

WHY CREATE A DNA PROFILE?

Once the mutations present in the DNA of cancer cells are known, researchers can identify which proteins have been altered. This can help:

1

to make a diagnosis, by spotting which altered proteins are responsible for the tumour’s progression;

2

to choose an adapted treatment, by spotting which altered protein could be targeted by a drug and/or which altered protein could be involved in drug resistance.

CREATING A DNA PROFILE

1

SEQUENCE THE DNA

When DNA profiling a tumour, you have to sequence the DNA of a biopsy.

DNA is made of two ‘complementary’ strands. Each strand is a succession of four molecules (nucleotides) symbolized by the letters a, c, g and t: a always binds to t and c always binds to g.

‘DNA sequencing’ entails determining the order in which the four nucleotides appear in one of the two DNA strands.

NEW GENERATION SEQUENCING (NGS) TECHNIQUES

One DNA sequencing technique has been revolutionary in the life sciences: new generation high throughput sequencing, or NGS (Next Generation Sequencing).

NGS is a technique with which scientists can sequence huge quantities of DNA in record time.

Over 10 billion fragments of DNA, each about 150 to 300 nucleotides long, can be sequenced within only a few hours!

NGS SEQUENCING, STEP BY STEP

A tumour sample – or biopsy – is done.

All the DNA is extracted from the biopsy’s cells.

All the DNA is fragmented.

Each DNA fragment (double strand) is separated into two single strands.

Millions of single-stranded DNA fragments are obtained.

The single-stranded DNA fragments are fixed onto a plate. Each fragment is then copied several hundred times.

The complementary strands are synthesized in the presence of a, c, g and t – the four nucleotides – each of which is linked to a different-coloured fluorophore (a fluorescent chemical compound). One nucleotide is inserted at a time by a protein known as DNA polymerase. Nucleotide a always binds nucleotide t and nucleotide c always binds nucleotide g.

1st cycle: the incorporation of one nucleotide emits a coloured signal.

2nd cycle: the incorporation of a second nucleotide emits a coloured signal.

3rd cycle: the incorporation of a third nucleotide emits a coloured signal, and so on.

Here is an illustration of what an image looks like after one cycle, on a plate with 100 DNA fragments.

Here is a fraction of a real image after one cycle, on a 10cm² plate on which there are about 10 billion DNA fragments.

When successive images are analyzed, researchers can decipher – in only a few hours! – the DNA sequence of several billion DNA fragments that are 150 to 300 nucleotides long.

Though not used in clinics yet, recent sequencing techniques can deal with very long fragments of DNA (video).

2

INDEX THE MUTATIONS

To index mutations found in the DNA of the tumour cells, the DNA sequences obtained by NGS are aligned and compared with the DNA sequence of the human reference genome.

It is a huge undertaking, and can only be accomplished with the help of bioinformatical analysis tools.

ALIGNING, COMPARING, VALIDATING

When these DNA sequences are aligned, researchers can ‘see’ which nucleotides in the DNA of cancer cells differ from those in the human reference genome

For a difference to be validated as a ‘mutation’, a minimum amount of fragments must present the same difference.

Several hundreds of mutations are usually validated.

In the example given here, there is a t -> a mutation (*).

Nucleotide a is found in several fragments where it replaces nucleotide t found in the DNA sequence of the human reference genome (in grey).

ON THE HUMAN REFERENCE GENOME

We refer to the human genome as though there was only one. But nothing is further from the truth: each one of us has a unique genome.

There are about 3 million differences between our genome and our neighbour’s!

“THE” human genome is in fact a mosaic of bits of genomes which belong to a dozen different individuals.

This ‘mosaic’ human genome is used as a reference among researchers. A mutation is identified as such if a nucleotide in the DNA that is being studied differs from the nucleotide at the same position in the DNA of the reference genome.

A mutation is defined by a change observed (for example a -> t) and its position (chr7:140’753’336) on the current version of the human reference genome (GRCh38.p12).

LEARN MORE

LOOKING FOR A NEEDLE IN A HAYSTACK

Analyzing all the DNA present in cancer cells is a very long and complex process.

Searching for several dozen ‘spelling mistakes’ in a text that is 6 billion letters long is a real challenge, especially if the text is present in millions of copies – since there are millions of cells in one tumour sample.

...AN ALTERNATIVE: TARGETED SEQUENCING

It would be far easier if researchers were able to select the DNA they would like to analyze; such techniques now exist.

Today, researchers can sequence specifically:

→ a panel of 50 to 500 genes that are known to be frequently altered in tumours. This represents 0.025% of the human genome (i.e. between 500,000 and several million nucleotides).

→ DNA parts that contain information which only code for protein. This is done by considering only the parts of genes that actually code for protein (called exons). The sum of exons in a genome is called the exome, and represents less than 2% of the genome.

3

INTERPRET THE MUTATIONS

Once a panel of 50 genes has been sequenced and analysed (bioinformatics), it generates a list of approximately several dozen to several hundred mutations of interest.

Certain mutations play no role in the development of cancer or on the choice of treatment. As an example, certain mutations do not alter the protein’s sequence, while others may alter a protein’s sequence, but if they do not hit the functional site, they could have no consequence on the protein’s function.

So how do researchers decide which mutation is biologically more important than another?

This marks the beginning of a long investigation!

THE IMPORTANCE OF DATABANKS

If known, any information regarding a given mutation and the effect it might have on a protein’s function or its reaction to a given drug are indexed in various specialized databanks. As scientific and technical advances progress, the information is updated – corrected or completed – over time (‘disclaimer’).

WHEN NOTHING IS KNOWN…

In about 25% of cases, the effects a mutation may have on a given protein is not known. In this case, researchers use bioinformatical tools – such as molecular modelling – to predict the effects.

Sometimes, however, there is no information available on the effects of certain mutations. These mutations are called ‘variants of unknown clinical significance’ (VUS).

LEARN MORE

THE RESULTS

After complex – bioinformatical and statistical – analyses of the data, researchers can identify and interpret about a dozen mutations that are then described to the medical team: this is a key step!

Example of a report addressed to the patient’s oncologist

“The BRAF V600E mutation has been identified in the patient’s tumour cells.

This mutation is indexed as ‘pathogenic’ (or ‘driver’) in several international databanks and by many scientific publications. The functional repercussion of this mutation is well established, and is responsible for tumour progression.

The mutation is somatic and cannot be used to screen test other members of the family.

One known drug targets specifically BRAF V600E: vemurafenib.”

LEARN MORE

WHY CREATE A DNA PROFILE?

CREATING A DNA PROFILE

THE RESULTS

VIDEOS