Skip to page content

Will Harnessing Big Data Finally Lead to a Cure for Cancer?


Grossman-1
Josh Miller, Kenneth Polonsky, and Robert Grossman (Photo by Robert Kozloff)

Could big data lead to the next breakthrough in cancer treatment?

A major project from the University of Chicago is betting on it.

The Genomic Data Commons, housed at UChicago, is one of the largest open access repositories for cancer data in the world. Publicly launched last year, it contains 4.1 petabytes of data from National Cancer Institute-supported research programs (for scale, one petabyte equals one million gigabytes).

For the first time ever, cancer data from over 14,000 patient cases, and some of the most comprehensive data sets in the world, has beens standardized and housed under one digital roof, available for anyone to access at any time. The idea is that researchers can access and analyze data sets faster than ever before, as well as contribute to the ever-growing database of information. Former vice president Joe Biden is a fan.

Chicago Inno chatted with the GDC's principal investigator Robert Grossman to hear how the project has fared in its first year, next generation data commons, and how big data can help cure cancer.

Chicago Inno: Let’s talk about the scale of this thing. What’s the scope of what the GDC covers, and how much is it in terms of all the cancer-related data in the nation and the world?

Grossman: We’re one of the largest open access repositories for cancer data in the world. Within the next six months, we’ll be approximately doubling the size of the GDC. Shortly, we’ll be adding functionality so people can understand information about specific genes [and] about specific genomic variations. Instead of looking at the data from the high level we can basically look across all the data gene-by-gene and all the other genomic variations.

Where does the GDC fit in and streamline the research process?

On the research side, the majority of researchers in cancer, I think, find the amount of data frustrating. They want to use all available data but to set up an environment, to manage it, keep it secure and compliant--the process is just overwhelming. Our role is to bring together the large public research data sets to consistently analyze them and make it available in a digestible form to the research community to accelerate the pace of research. We do that with the best available pipeline developed by the cancer research community.

You’re one year into launching the GDC. How widely has it been used, and what are a few use cases you’ve seen so far?

Each day between 1,000 and 1,800 people access the Genomic Data Commons. So it’s well used. I think we’ve all been very excited by the number of people who use this every day, and we’re very excited about the ability to visualize genes and expression and other genomic variations that we’ll be making available in the next several weeks to increase the usage even more.

Right now when you publish research it’s not always easy to see exactly what datasets you’ve used. As people start to use GDC datasets, we have IDs that are used in the publications so we can track over time which discoveries are enabled by the GDC. But at this point we don’t have a good handle on that, because it’s done by people without our help. They get the data from the GDC and we by-and-large are not involved in it. 

Within the next 6 to 9 months we’ll be making announcements of discoveries using the GDC

So you’re hands off when it comes to the research side. Would you like to be more hands-on at some point? Maybe having an in-house research team that is always combing through the data to recommend best studies?

We actually do research on our own. I have a couple hats I wear. If I take off my GDC hat, and put on my researcher hat, then we have a research team that uses data from the GDC and are making discoveries with it. Within the next six to nine months we’ll be making announcements of discoveries using the GDC data. We’re growing our team that does that. We’re very excited and it really does make research much easier at the scale we’re working at.

What's next for the technology?

We not only have the ability to run these large-scale commons like the GDC, which are used by 1,500 people per day we now have the ability to build out [newer] data commons that can grow as large as the GDC. We’ll be announcing a couple other commons with the Gen 3 technology.

One project is called BloodPAC. We used what we call the next generation version of the GDC, the Gen 3 GDC, and built a data commons for liquid biopsies. Over time, as the technology matures, when patients have a treatment, they'll get blood drawn and from the cell DNA in blood we’ll be able to understand how you reacted to chemo and whether the chemo needs to change.

Why is big data important for understanding and treating cancer, specifically?

The reason this started with cancer is that cancer ran into the big data wall a little sooner than the other fields. But it has certainly run into it, and that’s one of the reasons they got organized as a community.

At a simple level, cancer is in some sense is about mutations, some of which are quite rare, which result in different tumors, each with a genomic signature. Each of these tumors is best attacked with different drugs. The more new drugs we do the better, but the best combinations of these drugs is determined by the genomic signature of the tumor. The reason [big data] hit cancer first is because you need a lot of data to understand the right genomic signature to get the right combination of drugs to target the tumor.

One of the two new commons that we’ll be announcing is not in cancer, so I think the technology is very, very broad. Most diseases can benefit from this data. This approach will be used more and more. For the other commons we’re bringing it behavioral data from wearables. It’s not just next-gen sequencing that’s generating large amounts of data, but these next commons will have a lot of new large scale imaging technologies and wearable technologies, because all the different types of measurements are relevant to understanding disease.

So do you think data will help us finally find a cure for cancer?

Having cancer data brought together and shared is the best way we can understand what combination of drugs works for what tumor with today’s knowledge. By sharing cancer data we directly impact patient care by improving and understanding which combination of drugs work for which particular patient based on the genomic signature of the tumor. That’s today.

As we bring together this collection of data, we can also begin to try to understand basic facts about how tumors grow, how heterogenous tumors are, what are the targets, so we can create new drugs that work for particular tumors with particular genomic signatures. It helps us do the research required to understand which new drugs can be created to target specific classes of tumors.

In a slightly longer time frame, as we add more types of data, as well as data on the micro-environment, data around the immunology around the tumor, data around the microbiome, then we have even better understanding to bring drugs out and how we might target new technologies, such as immunotherapies. On each of those time scales, by increasing the amount of data, by basically removing the barriers that keep data from being shared, we directly impact the patient today.

Note: The interview has been edited for length and clarity.


Keep Digging

News
Cannect Wellness founding team
News
News
News
Workbox - Fulton Market Exterior
News


SpotlightMore

See More
Chicago Inno Startups to Watch 2022
See More
See More
2021 Fire Awards
See More

Want to stay ahead of who & what is next? Sent twice-a-week, the Beat is your definitive look at Chicago’s innovation economy, offering news, analysis & more on the people, companies & ideas driving your Chicago forward. Follow the Beat

Sign Up