1. GSEA Tutorial - Overview
|
 |
|
The GSEA Desktop Application Tutorial provides a brief overview of the main features of the GSEA
application. It is organized in a series of slides which may be navigated by pressing "Next", or you
may jump to any section of interest using the links to the left.
For more detailed information, see the
Documentation
page.
|
2. GSEA Tutorial - Ways to Run GSEA
|
 |
|
You can run GSEA in multiple ways:
- The GSEA desktop application provides an easy-to-use graphical interface.
When you launch the application from the download page of the GSEA web site,
as you will do in this tutorial,
you are using Java Web Start technology
(http://java.sun.com/products/javawebstart/)
to download, install, and start the application.
- The GSEA .jar file provides command line access to GSEA and
allows you to run the GSEA desktop application without being connected to the internet.
You can download the .jar file from the download page of the GSEA web site.
- R-GSEA makes GSEA available from the R programming environment.
- A GSEA GenePattern module makes GSEA available from GenePattern.
|
3. GSEA Tutorial - Launching GSEA
|
 |
|
To launch GSEA:
- Go to the Downloads page.
- Register as instructed.
- Click the Launch icon to start the GSEA Desktop Application.
When GSEA starts, the main window appears. The main components of the user interface are:
-
The navigation bar on the left, which provides quick access to
common GSEA operations.
-
The Processes panel in the bottom left corner, which provides information
about the status of your analyses.
-
The main panel on the right, which is used to display diaglogs and results.
When you start GSEA, the main panel displays the Home page.
As you open new pages, tabs will appear next to the Home tab.
To close a page, click the close (X) icon on the tab.
|
4. GSEA Tutorial - Loading Data
|
 |
|
Click the Load Data icon in the navigation bar.
The Load Data page appears. You use this page to load your data files:
expression datasets, phenotype labels
(e.g tumor vs normal), gene sets, and chip annotations.
Once imported these files are stored in memory and are available to the program for analysis.
GSEA supported data files are simply tab delimited ASCII text files, which have special
file extensions that identify them.
For example, expression data usually has the extension *.gct,
phenotypes *.cls, gene sets *.gmt, and
chip annotations *.chip. Click the More on file formats
help button to view detailed
descriptions of all the data file formats.

GSEA provides several ways to load data:
- Click the Browse for files button.
When the Open window appears, select the file(s) to load and then
click the Open button. To select multiple files, use SHIFT-click or CTRL-click.
- Click the Load last dataset used button. GSEA loads the data used in the most
recent gene set enrichment analysis.
- Drag-and-drop the files from a file browser window into the drag-and-drop pane.
When the files that you want to load are listed in that pane, click the Load these files button.
To remove files from the drag-and-drop pane, click the Clear button.
- The Recently Used Files pane contains files that you have used previously. (The first time you start GSEA, this
pane is empty.) Double-click a file to load it.
The Object Cache pane lists the data that you have loaded into memory.
|
5. GSEA Tutorial - Loading the P53 Sample Data
|
 |
|
The GSEA web site provides several sample datasets that correspond to results from the
GSEA Subramanian & Tamayo PNAS 2005 paper. For the tutorial, you will use the P53 sample data.
To download the P53 sample files:
- Go to the Datasets page.
- Download the three p53 data files. For each file:
right-click on the file, select Save link as and save the file to your local drive.
- Confirm that the saved files have a .gct or .cls file extension.
If a .txt file extension has been appended, remove it.
To load the P53 data into GSEA:
- Go to the Load Data page of the GSEA application.
- Click Browse for files.
- Select the three files that you just downloaded.
- Click Open.

|
6. GSEA Tutorial - Analysis Parameters
|
 |
|
Now that you have loaded your data files, you are ready to run the gene set enrichment analysis.
Click the Run GSEA icon in the navigation bar. The Run GSEA page
displays the parameters for the analysis.
There are three categories of parameters:
- Required: Essential parameters which you must specify before the analysis can be run.
- Basic: Additional parameters with standard defaults. Typically, accepting the defaults is ok. Click Show to see these parameters.
- Advanced: Parameters that allow control of several more details of the GSEA algorithm and the
java implementation. Typically, these do not need to be changed by most users. Click Show to see these parameters.
For descriptions of the parameters,
click the ? help button.
|
7. GSEA Tutorial - Running the Gene Set Enrichment Analysis
|
 |
|
To run the analysis, set the parameters and click the Run button.
- Use the drop-down selector to pick the p53_hgu95av2 dataset.
- Use the ... button to pick one or more gene sets. GSEA displays a window that lists gene sets in a
number of different tabs. For this example, on the GeneMatrix (from website) tab select the
c1.v2.symbols.gmt.
- Type in or choose the number of permutations to perform. Typically,
you start with a small number (perhaps 5) and, when that successfully completes,
try a full set of 1000 permutation. For now, choose 5.
- Use the ... button to pick a phenotype. In this sample data, the two phenotypes
are the same (MUT_vs_WT or WT_vs_MUT).
- Use the ... to select the chip annotation file that matches
the probe identifiers in your expression dataset.
For this example, on the Chips (from website) tab, choose the HG_U95Av2.chip file.
- Leave the Collapse dataset to gene symbols parameter set to true.
This indicates that you want the probe sets in your dataset collapsed to gene symbols.
- Leave the Permutation type parameter set to phenotype.
- Click Run to start the analysis.
|
8. GSEA Tutorial - Keeping Identifiers Consistent Between Platforms
|
 |
|
Typically, the gene or probe identifiers in your expression dataset are the probe identifiers for the
DNA chip array used to produce the data. When running the gene set enrichment analysis, it is critical
that all of your data files use the same gene or probe identifiers. You can either use the probe
identifiers native to your expression dataset, or collapse each probe set into a gene vector and use
HUGO gene symbols as your identifiers.
When you run the gene set enrichment analysis, the value you choose for the Collapse dataset to gene symbols
parameter tells GSEA which identifiers you want to use:
- Choose true (default) to have GSEA collapse each probe set in your expression dataset into a single gene vector, which is
identified by its HUGO gene symbol. In this case, you are using HUGO gene symbols for the analysis.
The gene sets that you use for the analysis must use HUGO gene symbols to identify the genes in the gene sets.
- Choose false to use your expression dataset "as is." In this case, you are using the probe identifiers that are in your
expression dataset for the analysis. The gene sets that you use for the analysis must also use these probe
identifiers to identify the genes in the gene sets.
Collapsing the probe sets eliminates multiple probes, which can inflate enrichment scores, and
facilitates the biological interpretation of the gene set enrichment analysis results.
Therefore, the GSEA team recommends leaving the default value for this parameter.
|
9. GSEA Tutorial - Viewing Program Progress and Results
|
 |
|
Use the Processes panel at the lower left corner to
view the status of analyses run in this session, including the currently running analysis:
- The blue
Running label indicates the currently running analysis. You can click on this label to
pause or stop an analysis, as shown in the next slide.
- If a red Error appears, click on it for a description of the error.
If you need help resolving an error, include this error text in an e-mail message to gsea@broad.mit.edu.
- When the analysis completes, click the green
Success
label to display the results in a web browser. For help interpreting the results, see
Interpreting GSEA Results
in the GSEA User Guide.
- Click the analysis name to view the parameters used in the analysis (a new Run GSEA page appears, which
you can use to re-run the analysis).
- Click the status bar at the bottom of the window to display the execution log, which shows analysis progress
(for example, the number of permutations completed).
|
10. GSEA Tutorial - Stopping or Pausing a Running Analysis
|
 |
|
- Click the blue Running label to display the thread control panel.
- You can pause the analysis or change the amount of the
computer's processor being used for the analysis.

|
11. GSEA Tutorial - Running the Leading Edge Analysis
|
 |
|
After running a gene set enrichment analysis, you can use the leading edge analysis to examine the genes in the leading edge subsets of selected enriched gene sets. Genes that appear in multiple subsets are more likely to be of interest than those that appear in only one.
To run a leading edge analysis, click the Leading Edge Analysis icon on the GSEA main page.
When GSEA displays the Leading Edge Analysis page:
- Click the ... button to select a Gene Set Enrichment Report from the application cache (analyses that you have run).
- Click the Load GSEA Results button to display the gene sets that were analyzed in that report.
- SHIFT-click or CTRL-click to select the gene sets to analyze. For this example, click the FDR column head to order the gene sets
by FDR and select the 11 gene sets with an FDR < .01.
- Click the Run leading edge analysis button to start the analysis.
- The analysis displays four graphs showing the overlap among the leading edge subsets of the selected gene sets.
For help interpreting the results, see
Interpreting
Leading Edge Analysis Results in the GSEA User Guide.

|
12. GSEA Tutorial - Browsing MSigDB Gene Sets
|
 |
|
The power of the gene set enrichment analysis is a function of how well your gene sets
represent meaningful coordinated or concordant gene expression behavior that reflects
actual biological processes or states. You are welcome to use curated gene sets from the
Molecular Signature Database (MSigDB), which is maintained by the GSEA team.
You can browse the MSigDB from the
Molecular Signatures Database page
of the GSEA web site or the Browse MSigDB page of the GSEA application. To browse
the MSigDB from the GSEA application:
- Click the Browse MSigDB icon in the
navigation bar. An empty Browse MSigDB page appears.
- Click the Load database button to display the latest MSigDB gene sets.
From this page you can
- Use the fields at the top of the page to filter the gene sets displayed in the table.
- Select a gene set from the table and right-click to display information about the gene set.
- When the table displays the gene sets that you are interested in, export the
selected gene sets to a gene set file.
GSEA exports the gene set files to your default output folder (Help>Show GSEA Output Folder). The gene set files are
tab-delimited ASCII text files that can be viewed in Excel or NotePad.
|
13. GSEA Tutorial - Viewing Analysis History
|
 |
|
Click the Analysis History icon in the navigation bar to display the Analysis History page, which
records and displays analyses that you have run. The left panel lists the reports run in the
current session and organizes previously run reports by date. Click on an analysis in the left
panel to display information about that analysis in the right panel.
In the right panel of the Analysis History page:
- You can view the parameters used in the analysis.
- You can choose to re-run an analysis with the exact same set of parameters by clicking the
Show in ToolRunner button.
- You can choose to automatically load or not load data from the previous analysis (perhaps
you are on a different computer
or are only interested in the previous parameters to use with different datasets).
- You can view files produced by the analysis. Double-click the index.html
file to display the analysis results in a web browser.
Note: When you run an analysis, by default, GSEA writes the analysis results to the GSEA output folder (Help>Show GSEA output folder). The Analysis History page is simply a convenient way to browse the reports in this folder.
|
14. GSEA Tutorial - Sharing Results with Collaborators
|
 |
|
Sharing GSEA analysis results with collaborators is easy. Click Help>Show GSEA output folder to display the folder that holds the GSEA reports, navigate to the subfolder for the report that you want to share, zip it up, and send it to your collaborator. All reports and their hyperlinks are preserved.
Alternatively, when you run an analysis, you can have GSEA create the zip for you by setting the
Make a zipped file with all reports parameter to true (by default, the parameter is set to false).
|
15. GSEA Tutorial - Setting Preferences
|
 |
|
The Options menu provides several preferences to control the application and
algorithm defaults.
One useful preference is the location of your GSEA output folder, which holds all of the analysis results (Help>Show GSEA output folder). By default, the
output folder is a subfolder of your GSEA home folder.
To change the location of your default output folder, click Options>Preferences.
When the Preferences window appears, change the default output folder and click OK.
|
16. GSEA Tutorial - Creating Data Files for GSEA
|
 |
|
The gene set enrichment analysis requires four files: an expression dataset file, phenotype labels file, gene sets file, and chip annotations file. All four files are tab-delimited ASCII text files that can be created and edited using Excel or any text editor.
- Expression dataset file: This file contains your expression data: genes/probes, samples, and expression values for each probe in each sample. Your expression data can come from any source (Affymetrix, CDNA 2-color ratio data, and so on). You create an expression data file by converting your expression data into a gct, res, or pcl formatted file. Typically, your expression data is already in a tab-delimited ASCII text file, which can be turned into a gct, res, or pcl formatted file with relatively minor edits.
- Phenotype label file: This file lists your phenotype labels and associates each sample in your dataset with a phenotype. You can create this file or have GSEA create it for you (you supply the phenotype information and GSEA creates the appropriate file).
- Gene sets file: This file defines the gene sets to be analyzed. You can use the gene sets that
are available on the Broad ftp site, export gene sets from the MSigDB, or create your own. If you have gene sets that you want to use, GSEA provides a Chip-to-Chip utility, which converts gene/probe identifiers from one DNA chip platform to another (or to HUGO gene symbols).
- Chip annotations file: This file maps probe identifiers to HUGO gene symbols. GSEA uses it to collapse each probe set
in your dataset to a single gene vector (if you choose to collapse your dataset) and to annotate the gene set enrichment report.
The chip annotations files for common DNA chip platforms are available on the Broad ftp site. If necessary (for example, if you are
using custom chips), you can create your own chip annotations file.
For descriptions of all of the GSEA file formats, see Data Formats.
For more information about creating the data files, see
Preparing Data Files for GSEA in the
GSEA User Guide.
|
17. GSEA Tutorial - Examples from Published GSEA Results
|
 |
|
The GSEA web site provides the datasets
that correspond to results from the GSEA Subramanian & Tamayo PNAS 2005 paper:
- Go to the Downloads page.
- Near the bottom of the page, click view datasets.
Note: Because random number
generators (for sample permutation) are different and because different seeds are used, numbers in the reports on the website, or reports run with the sample date, will not precisely match those in the paper.
However, the significant sets are identical to published results.
|
18. GSEA Tutorial - Getting Help for GSEA
|
|
As you begin to use GSEA, you can get help in several ways:
- Click Help>GSEA documentation to view the Documentation page, which includes the GSEA User Guide and a Frequently Asked Questions (FAQ) page.
- Click the Help button, which appears on most GSEA windows, to display context-sensitive help.
- If you cannot find the information that you are looking for in the documentation, e-mail us at
gsea@broad.mit.edu.
Thanks for taking the time for this Quick Tour of GSEA.
If you have questions, comments or suggestions, we'd like to hear them:
gsea@broad.mit.edu.
|
|