Key Dates
Requirements.
- It must be approved by the course instructor & will be done individually. It must be gene expression and in cancer (using TCGA or ICGC).
- You must have full ownership and copyright over all result material. Please read: https://viterbigrad.usc.edu/2018/01/usc-intellectual-property-opportunities-rights-responsibilities/ . If you volunteer in a lab and work on a sponsored project, you cannot use that data for your final project.
Initial Project Outline & timeline (10%)
Please see an example: https://github.com/davcraig75/final_project
Thursday November 11th, End of Day as Github
- Title
- Example: Differential Gene Expression in Stage 1 Lung Adenocarconomas by Number of Cigarettes Per Day Using DeSEQ2.
- Author
- Example:
- David W. Craig
- Example:
- Overview of project:
- I will identify differentially expressed genes between Lung Cancer Adenocarcomas for heavy smokers and vs light smokers. This analysis will utilize the package DeSEQ2 (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) and complete the entire vignette. For this analysis, I’ll utilize the TCGA cohort, and have identified 388 HTSeq files for tumors that fit within my cohort with 121 light smokers and 271 heavy smokers.
- Data:
- Identification of data, and demonstration of availability
- Milestone 1
- One to five sentences describing the measurable point
- Milestone 2
- One to five sentences describing the measurable point
- Deliverable
- R MarkDown/Notebook/Jupyter.
Milestone 1 (10%)
Tuesday November 22th
Milestone 2/RC1 (10%)
Tuesday November 29th
Final Project Due Date
December 3rd: 11:50PM
Grading
- Plan (10%)
- Milestone 1 (10%)
- Milestone 2 (10%)
- Organization/Readability (25%)
- Repeatability (25%)
- Final Product (20%)
Expectations of Final Project
Deadline is End of day Dec. 3rd.
Formatted Github
Formatted Github, with headers (e.g. header 2 for each section) and 1 to 2 sentences stating what a graph shows or analysis show compared to expectations. This doesn’t elaborate, and in some respect, you will benchmark it to the vignette you are working with.
- Pictures should be inline visible.
- Please truncate anything where it’s pages of output. For example, using head function.
Gene Expression Vigentte
Please provide a link to a CSV file of differentially expressed genes in HUGO, not ensembl. Thus BRAF not ENSG00000157764. This can be all genes, or just those identified as significant, typically the latter.
Evaluation genes https://www.gsea-msigdb.org/gsea/msigdb/annotate.jsp
Known Issues
Please indicate any issues.
Last Section Conclusions
Opinion of your analysis. Subjective. Grade on completion, less content.
1-week extensions are available on request
5% taken off of the total score
Meetings to go over before you have finalized.
https://docs.google.com/spreadsheets/d/1hf2GS7cybhf2Slf-xFWb6eGlf6VBox1neVAOxEW2n9g/edit?usp=sharing
Examples:
- RECOMMENDED: Choose a cancer and analysis from GDC (TCGA plus US funded cancer genome study) or ICGC (international cancer genome project, including TCGA);
- e.g. Hispanic Breast Cancer, Smokers vs. non-smokers. and complete analysis with DeSEQ:
- http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
- Supplemented w/ https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html
- Background paper:
- Key elements:
- Identify a specific sub-type of cancer you wish to study
- Identify possible covariates to examine
- Examples:
- Identification of differentially expressed genes in TCGA Lung-Cancer by Cigarettes Per Day, controlling for stage and Race
- Resources
- Option 1 (International Genome Database): https://dcc.icgc.org/
- Option 2: https://portal.gdc.cancer.gov/
- Clinical Data
- Cohort:
- Cancer Type and any variables narrowing it down.
- Example: Stage 1a & 1B Adenomas and adenocarcima Lung Cancer,
- Data: While there are several types of data that you may start with, please start with star_counts. Note that STAR counts is not listed in the vignette and you’ll need to solve.
- Variables of Interest (at least 1)
- Please be aware that you need to see if your data is available for your particular study. Studies don’t always collect what you want. Definately check! Also ICGC has more data than TCGA/GDC, so feel free to use that resource.
- Examples: Cigarettes Per Day As Categorical Variable divided into High vs. Low at 3 packs per day
- Controlling for these variables, not of interest (at least 1)
- Examples: Race, Sex
- Location of Data and details about data.
- Variables of Interest (at least 1)
- Cohort: