Proteomics

GenePattern provides the following support for the analysis of proteomic data:

All modules support the analysis of mass spectrometry (MS) based proteomic data and are designed to work with proteomics data stored in the universal mzXML format. The algorithms are tuned to the SELDI (surface enhanced laser desorption/ionization), MALDI (matrix assisted laser desorption/ionization), and LC-MS (liquid chromatography-mass spectrometry) platforms.

PEPPeR (LC-MS)

For the analysis of LC-MS data, GenePattern provides support for the algorithms defined by PEPPeR, a Platform for Experimental Proteomic Pattern Recognition:

  • Landmark matching is a method to propagate identified peptides over time onto accurate mass LC-MS features in such a way as to maximize total identified peptides from disparate data acquisition methods. Using a combination of accurate mass and local retention time information it is possible to determine the likely identification of an unknown peak based on its relative location to known peaks.
  • Peak matching attempts to group similar features (or peaks) across multiple LC-MS sample runs by incorporating m/z and retention time (RT) variation. Although peak matching can be performed on virtually any type of LC-MS data, it is typically performed after landmark matching.

The PEPPeR modules are based on work published by Jaffe, Mani, et al in PEPPeR, a Platform for Experimental Proteomic Pattern Recognition (Molecular & Cellular Proteomics 5:1927-1941, 2006).

ProteoArray (LC-MS)

GenePattern's ProteoArray module provides the following support for the analysis of LC-MS data:

  • For a series of LC-MS experiments in mzXML format, GenePattern provides the ability to detect and align features across runs. This module is provided by Brian Piening of the Fred Hutchinson Cancer Research Center.

SELDI/MALDI

GenePattern provides the following support for the analysis of SELDI/MALDI data:

  • Quality assessment of the input spectrum as a function of the area under the spectrum and the area under the spectrum after removing the noise component of the signal.
  • Peak detection using digital convolution (moving window) filters, which applies smoothing, background correction, and peak enhancement filters to the spectrum before identifying final peak locations.
  • Spectra comparison, which filters the noise from two spectra and then compares the spectra using a cross correlation function.
  • A proteomics pipeline provides automated processing of SELDI/MALDI data. In addition to quality assessment and peak detection, the pipeline incorporates a range of normalization methods and sophisticated peak alignment algorithms for matching peaks across multiple samples. Starting with spectra from a set of samples, the pipeline outputs matched peaks as features, and normalized intensities of these peaks for each sample. Several aspects of the pipeline are fully customizable.
  • Integration with other GenePattern analysis modules. By representing peaks as features, the peak detection and proteomics pipeline modules create output files similar to those used as input for the modules that support gene expression analysis. Analyses such as clustering, classification, and differential marker selection are based on pattern recognition and applicable to the analysis of both proteomic data and gene expression data.

The modules for the analysis of SELDI/MALDI data are based on work published by Mani and Gillette in Proteomic Data Analysis: Pattern Recognition for Medical Diagnosis and Biomarker Discovery (Mehmed Kantardzic and Jozef Zurada (Eds.) Next Generation of Data Mining Applications, Wiley-IEEE Press).

Data Formats

Proteomics analysis modules are designed for easy access:

  • All proteomics modules read and write data using mzXML or comma-separated value (csv) files. Generally, mzXML files tend to be used for LC-MS data and csv files for SELDI/MALDI data.
  • GenePattern provides support for data conversion, including support for converting to and from mzXML files.
  • If you consistently convert between different file formats, you can write a simple converter and add it to GenePattern as a new module.

Modules

View current GenePattern proteomics modules