How can artificial intelligence help us identify causes of cancer?

Phoebe He, a scientist in the Alexandrov Lab at the University of California San Diego, explains how machine learning and artificial intelligence are crucial to the work of the Mutographs project.

The Mutographs project is based around the study of mutational signatures. These signatures describe patterns of changes, known as mutations, in the DNA of tumour tissue. With the help of powerful technology known as sequencing, we are able to read how a person’s DNA has changed, or mutated, in their tumour tissue compared to their normal tissue.

We are investigating the causes of somatic mutations, which occur during a person’s lifetime rather than being inherited from a parent. However, in most cases, the patterns that these somatic mutations create are indistinguishable to the human eye. Fortunately, scientists from the Mutographs project found a way to use artificial intelligence and machine learning to meaningfully separate these patterns of mutations to create mutational signatures. We can use these signatures to identify what has caused the cancer-related changes to DNA and therefore what factors might be linked to the development of the disease.

Two types of machine learning

Image credit: Si-Gal (iSTOCK)

Machine learning is the process of training computers to self-improve by identifying novel patterns, making decisions, and emulating human behaviour. Sometimes, computers can be used as a replacement for human labour, for example, to automatically classify emails or to recognise faces in a camera’s lens. In these cases, computers learn from examples of previous decisions made by humans. The computers improve by continuously correcting themselves. This is called “supervised machine learning”.

Computers can also be used to recognise things that we as humans cannot see. This includes meaningfully separating the patterns of mutations found in cancer tissues. This “unsupervised machine learning” is used by scientists and researchers on the Mutographs project to study the unknown signatures of the processes that cause cancer.

Using machine learning to separate mutational signatures

Image credit: Bb3cxv (Wikimedia commons)

Identifying mutational signatures becomes very difficult for cancers with complex, multiple, and even completely unknown origins. By way of analogy, imagine a group of coloured lights arranged to focus at a single spot. Each coloured light can either be turned on or off at different intensities. Imagine that we want to identify which lights are turned on and with what intensity, but we can only see the bright white light at the focus point, not the colours of the individual lights. This would be an impossible task if we could only use a single observation. However, when many observations are available, we can solve this problem using unsupervised machine learning. Each observation has a different set of coloured lights turned on at various intensities. The computer can use this information to work out what lights are shining at what intensity at the focus point. This is similar to how we want to separate mutational signatures (symbolised by the different coloured lights) from the somatic mutations observed in different people’s tumours (symbolised by the bright white light at the focus point).

The machine learning approach developed by the Mutographs team uses a specific algorithm, known as “nonnegative matrix factorization”, to separate the mutational signatures just as we would want to separate the coloured lights. The results can tell us which mutational signatures are present and what percentage of the observed somatic mutations they have caused in each person’s tumour.

Artificial Intelligence helps to provide explanations for cancer

An example of a mutational signature

After studying tumour samples from 20,000 people with this machine learning approach, scientists from the Mutographs project have identified approximately 50 universal mutational signatures, with usually less than 5 present in each person. Over half of the identified signatures have been linked to specific causes of cancer, such as ultraviolet (UV) light exposure or tobacco smoking, while the rest are currently under study. By separating each person’s mutation patterns into biologically meaningful mutational signatures, we can provide explanations for what contributed to their cancer. Additionally, by studying tumours from groups of people with the same novel mutational signatures, scientists may be able to identify new causes of cancer.

Building computational tools for everybody

The Alexandrov Lab, located at the Moores Cancer Center at the University of California San Diego, is currently developing and refining the computational tools that allow the machine learning algorithms to be used on a personal computer. The goal is to make the process easy for scientists around the world to separate and identify mutational signatures using available data on mutations in cancer. These tools are being made freely available to everybody as soon as they are operational.

Moores Cancer Center, La Jolla, California, USA