"MathWorks tools are integral to the custom analysis work that we perform. MATLAB frees us to focus on data analysis instead of programming. It greatly speeds up our coding process."
Dr. Hongyue Dai, Rosetta Inpharmatics/Merck & Company
Rosetta Inpharmatics, a wholly owned subsidiary of Merck & Company, recently collaborated with the Netherlands Cancer Institute (NKI) to develop a tool that enables clinicians to determine a breast cancer patient's prognosis based on the gene expression profile of the primary tumor. This project is an example of how programmers, when equipped with the right software, can respond promptly to researchers' requests for custom analysis tools.
"MathWorks tools are integral to this type of custom analysis work," says Dr. Hongyue Dai, Director of Custom Analysis and MATLAB Tools at Rosetta Inpharmatics/Merck Research Laboratories. "MATLAB frees us to focus on data analysis instead of programming. It greatly speeds up our coding process because we don't have to write lower-level routines that are already in the MATLAB library."
It is difficult to determine the best course of treatment for a patient with breast cancer. Patients at the same stage of the disease and receiving the same treatment can have markedly different outcomes. Chemotherapy and hormone therapy reduce the risk of distant metastases by about a third, but studies show that 70% – 80% of patients receiving this treatment would have survived without it.*
Dr. Dai and his colleagues were asked to develop a tool that would enable cancer researchers to determine which genes in breast cancer patients were strong predictors of future metastases. To do this, they would need software that coupled powerful statistics capabilities with the ability to handle large datasets rapidly. The software had to be flexible enough to allow for trial and error when selecting features and constructing classifiers.
“One of the key challenges in microarray experiments is image analysis,” explains Dr. Dai. His team needed an effective means of extracting signal intensities from TIFF images of microarray slides to determine how much of a gene was present in a particular cell. Because the TIFF images are too large and complex to be processed by hand, programmers would need to preprocess the images and design a batch process to extract the relevant data.
* Early Breast Cancer Trialists’ Collaborative Group
NKI researchers collaborating with the Rosetta team followed the progress of a group of 117 patients for more than five years. They examined the original DNA samples of patients who had had a poor outcome in order to identify genes whose expression level was associated with that result. Using that data, Dr. Dai’s programmers used MATLAB to perform DNA microarray analysis, which identified genes that were strong predictors of distant metastases.
“Since MATLAB and the Image Processing Toolbox are fully integrated and the MATLAB platform is very good for matrix calculation, we did not have to spend time writing the low-level image processing and the basic data analysis routines like vector and matrix calculations,” notes Dr. Dai.
The programmers then developed an unsupervised, hierarchical clustering algorithm in MATLAB that enabled them to group the patients’ tumors based on the dominant expression features. They then developed a classifier based on the genes that carry the prognosis information. They discovered that 70 genes correlated tightly with the patients’ outcome, indicating that a prognosis could be determined based on the gene expression profile of the primary tumor.
The Informatics group also used MATLAB to prototype algorithms and code for their commercial product, the Rosetta Resolver Gene Expression Data Analysis System. Based on the same premise as the breast cancer prediction tool, Rosetta Resolver includes tools for high-powered analysis, visualization, and storage of gene expression data. Dr. Dai notes that MATLAB considerably accelerated the prototyping process on this product.
To accurately predict the clinical outcome for breast cancer patients
Use MathWorks products to develop a tool that lets clinicians make a prognosis based on the gene expression profile of the patient’s primary tumor