What is a protein?

A DEFINITION

From bacteria and viruses to plants and humans, proteins are molecules that are essential for the creation and function of very single living organism.

They are like workers with a specific field of expertise: highly dynamic, they interact with one another and are responsible for almost all the bodily functions an organism needs to stay alive.

Chevron

WHAT IS A PROTEIN FOR?

TRANSPORT…

Over 2,000 human proteins are involved in some form of transport (source UniProtKB).

As an example, the haemoglobin protein transports the oxygen we breathe from our lungs to every other organ.

There are about 250 million haemoglobin proteins in each red blood cell.

 

Chevron

DEFENCE...

When an infectious agent (virus, fungus, bacteria or parasite) enters our organism, our immune system recognises it as a ‘foreign body’.

As a consequence, our organism will produce proteins known as antibodies whose role is to seek out the infectious agents and have them wiped out.

Antibodies also play an important part in wiping out cancer cells.

Hundreds of different proteins are involved in our body’s defence
(source UniProtKB).

Chevron

SUPPORT… 

Proteins support tissue cohesion in our body.

The collagen protein plays an important role in structuring our bones, our cartilage as well as our skin. There are about 50 different kinds of collagen proteins
(source UniProtKB).

Chevron

CATALYSIS...

Enzymes are proteins which are specifically involved in chemical reactions.

As an example, while digesting, food is split up by enzymes so that it can be assimilated by our organism. The pepsin protein splits proteins, the lipase protein splits certain fats, the lactase protein splits lactose, etc.

Other enzymes are involved in DNA repair, such as the BRCA1 protein for example, or in producing pain signals, such as the COX2 protein.

Our body produces over 4,300 different enzymes
(source UniProtKB). And there is still a lot to discover!

Chevron

WHAT DO PROTEINS LOOK LIKE?

Proteins are like necklaces where each pearl is a molecule known as an amino acid. There are 20 different kinds of amino acids, each symbolized by one letter: L, A, T, V, K and so on. The order in which the amino acids follow one another is known as the protein sequence.

Every ‘protein necklace’ folds into a specific 3D structure, itself directly influenced by the underlying sequence of amino acids. The protein’s 3D structure is very important for the protein’s function.

Different proteins have different sequences of amino acids. Proteins are also of very varied lengths – from a mere 10 amino acids to 30,000! The titin protein has the longest sequence (source UniProtKB).

 

THE SIZE OF PROTEINS

Proteins are too small (100 to 1,000 angstroms) to be seen under the microscope.

However, several techniques are used to visualize the 3D structure of proteins. One of these is known as molecular modelling.

This figure illustrates how researchers are able to visualize proteins using bioinformatics tools. Here is a representation of the 3D structure of the BRAF protein.

Chevron

HOW AND WHERE ARE PROTEINS MADE?

CELLS: PROTEIN MANUFACTURERS

Proteins are synthesized inside each of our cells.

A human being is made out of 100,000 billion cells each of which has a specific role depending on where it is found: in our brain, blood, muscle, heart or skin for instance.

Each cell makes proteins according to its needs and to those of our body.

For instance, certain blood cells make haemoglobin to ferry oxygen throughout our body, brain cells make myelin that protects our neurons, and the cells in our eyes make proteins that help us see.

Chevron

HOW ARE PROTEINS MADE?

Each cell produces the proteins it needs by using the genetic information found in every protein’s corresponding gene (DNA).

About 20,000 genes, spread across 23 pairs of chromosomes, form the total of basic recipes required to produce over 1,000,000 different proteins.

 

 

Chevron

DNA, WHERE GENETIC INFORMATION IS STORED

Like balls of highly compacted wool, each chromosome is made out of one extremely compact thread of DNA. DNA is a succession of molecules known as nucleotides. There are 4 different nucleotides: a, c, g and t – where a stands for adenine, c for cytosine, g for guanine and t for thymine.

As an illustration, the DNA of chromosome 7 is made up of about 159 million nucleotides and contains over 900 genes (source GenBank).

GENES, RECIPES FOR MAKING PROTEINS

Genes are bits of DNA of varied lengths.

The gene that codes for the insulin protein is about 4,000 nucleotides long (source GenBank).

The longest human gene has 2,400,000 nucleotides and codes for a muscle protein known as dystrophin.

Chevron

FROM GENES TO PROTEINS

When a cell needs to make a specific protein, it begins by making a ‘photocopy’ of the protein’s recipe, i.e. its gene. These copies are made by yet another protein called RNA polymerase.

The copy – called messenger RNA – is transferred to large molecular complexes known as ribosomes that are, literally, protein-manufacturing machines.

Ribosomes ‘read’ the instructions on the messenger RNA, thus gradually building up the protein as amino acids are added to the growing chain.

One single ribosome is able to assemble 10 to 20 amino acids per second (source).

FROM NUCLEOTIDES TO AMINO ACIDS...

Scientists have needed to resort to a great deal of imagination to understand how cells use the information contained in their DNA to make proteins. Once the structure of DNA had been discovered in 1953, it took a further 10 years to crack the ‘genetic code‘.

“…a biochemical enigma such as the link that existed between DNA (4 nucleotides) and proteins (20 amino acids) was reduced to an abstract problem of manipulating symbols. […] The goal was to establish a mathematical link between messages written using two different alphabets.” (source [FR]).

THE GENETIC CODE

This is a representation of the genetic code, which is read from the circle’s centre to its perimeter.

Three nucleotides – or codon – correspond to one amino acid.

In this way, the gtg codon codes for amino acid V (Valine), and the codon gag codes for amino acid E (Glutamate).

Note that different codons can code for the same amino acid: as an illustration, codons gaa et gag both code for amino acid E (Glutamate). This is why the genetic code is said to be redundant.

Chevron

SO MANY LETTERS, SOMETIMES CONFUSING…

Proteins are a succession of amino acids. There are 20 different amino acids, each of which is symbolized by a distinctive letter: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W or Y.
A for Alaline, C for Cysteine, G for Glycine and T for Threonine, …!

DNA is a succession of nucleotides. There are 4 different nucleotides, symbolized by four letters: a, c, g and t.
a for adenine, c for cytosine, g for guanine and t for thymine!

Chevron

HOW MANY PROTEINS IN ONE CELL?

The genome of each human cell contains about 20,000 genes – that is to say 20,000 master recipes that will give rise to about 1 million different proteins.

Cells only make the proteins they need. A human cell contains about 10’000 different proteins. Each protein can be present in 100 to 10 million copies!  (source [FR], video, Molecular Art – Molecular Science).

 

 

Chevron

PROTEINS AND DISEASES

 
A person falls ill when a protein is altered, when there are too many proteins, or when there are too few.

 

 

 

 

DISEASE (CASE 1): DYSFUNCTIONAL PROTEINS

Mutations can alter a protein’s functional site.

A change in a gene’s DNA can, in turn, change the sequence of amino acids in its corresponding protein. If this happens, the protein’s shape can change causing it to become too active for example or stopping it from interacting with other proteins. As a result, the biological processes in which the protein is usually involved can be affected.

Example 1: when the CFTR protein is altered, it cannot transport chlorine anymore and this is what causes cystic fibrosis. Over 4,200 human proteins – when altered – are associated with genetic diseases (source UniProtKB).

Example 2: when the BRAF protein is altered, it cannot control cell division anymore, which can lead to cancer. It has been estimated that, once altered, several hundreds of proteins are involved in cancer.

A DISEASE (CASE 2): TOO MANY PROTEINS, OR TOO FEW

A change in the expression rate of a given protein can be at the heart of a defective biological process and hence a disease.

Example 1: when the rate of the COX2 protein is too high (or when the protein is too active, biologists are still not sure), too many pain signals are produced.

Example 2: when bacteria or viruses infect our body, bacterial and viral proteins invade our body.

Example 3: some people suffering from diabetes do not make enough insulin.

 

Chevron

PROTEINS AND DRUGS

Drugs usually target proteins that have key roles in a disease’s causes or symptoms.

A drug interacts with a protein much in the way a key is inserted into a lock, thus restoring the protein’s biological function. This is how it is possible to treat the causes of a disease, or at least to ease its symptoms.

The painkiller ibuprofen (in red) nesting in the COX protein (cross-section view). When ibuprofen interacts with the protein, it stops it from producing pain signals.

An anti-cancer drug nesting in an altered BRAF protein (cross-section view). When the drug interacts with the protein, it blocks its activity and hence the cells from dividing.

PRECISION MEDICINE