Skip to content

Final Projects

Key Dates

Requirements.

Initial Project Outline & timeline (10%)

Please see an example: https://github.com/davcraig75/final_project 

Thursday November 11th, End of Day as Github

  • Title
    • Example: Differential Gene Expression in Stage 1 Lung Adenocarconomas by Number of Cigarettes Per Day Using DeSEQ2.
  • Author
    • Example: 
      • David W. Craig
  • Overview of project:
    • I will identify differentially expressed genes between Lung Cancer Adenocarcomas for heavy smokers and vs light smokers. This analysis will utilize the package DeSEQ2 (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) and complete the entire vignette. For this analysis, I’ll utilize the TCGA cohort, and have identified 388 HTSeq files for tumors that fit within my cohort with 121 light smokers and 271 heavy smokers.
  • Data:
    • Identification of data, and demonstration of availability
  • Milestone 1
    • One to five sentences describing the measurable point
  • Milestone 2
    • One to five sentences describing the measurable point
  • Deliverable
    • R MarkDown/Notebook/Jupyter.

Milestone 1 (10%)

Tuesday November 22th

Milestone 2/RC1 (10%)

Tuesday November 29th

Final Project Due Date

December 3rd: 11:50PM

Grading

  • Plan (10%)
  • Milestone 1 (10%)
  • Milestone 2 (10%)
  • Organization/Readability (25%)
  • Repeatability (25%)
  • Final Product (20%)

Expectations of Final Project

Deadline is End of day Dec. 3rd.

Formatted Github

Formatted Github, with headers (e.g. header 2 for each section) and 1 to 2 sentences stating what a graph shows or analysis show compared to expectations. This doesn’t elaborate, and in some respect, you will benchmark it to the vignette you are working with.

  1. Pictures should be inline visible.
  2. Please truncate anything where it’s pages of output. For example, using head function.

Gene Expression Vigentte

Please provide a link to a CSV file of differentially expressed genes in HUGO, not ensembl.  Thus BRAF not ENSG00000157764.  This can be all genes, or just those identified as significant, typically the latter.

Evaluation genes https://www.gsea-msigdb.org/gsea/msigdb/annotate.jsp

Known Issues

Please indicate any issues.

Last Section Conclusions

Opinion of your analysis. Subjective. Grade on completion, less content.

1-week extensions are available on request

5% taken off of the total score

Meetings to go over before you have finalized.

https://docs.google.com/spreadsheets/d/1hf2GS7cybhf2Slf-xFWb6eGlf6VBox1neVAOxEW2n9g/edit?usp=sharing

 

Examples:

  • RECOMMENDED: Choose a cancer and analysis from GDC (TCGA plus US funded cancer genome study) or ICGC (international cancer genome project, including TCGA);
    • e.g. Hispanic Breast Cancer, Smokers vs. non-smokers. and complete analysis with DeSEQ:
    • http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
    • Supplemented w/ https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html
    • Background paper:
    • Key elements:
      • Identify a specific sub-type of cancer you wish to study
      • Identify possible covariates to examine
      • Examples:
        • Identification of differentially expressed genes in TCGA Lung-Cancer by Cigarettes Per Day, controlling for stage and Race
      • Resources
        • Option 1 (International Genome Database): https://dcc.icgc.org/
        • Option 2: https://portal.gdc.cancer.gov/
      • Clinical Data
        • Cohort:
          • Cancer Type and any variables narrowing it down.
          • Example: Stage 1a & 1B Adenomas and adenocarcima Lung Cancer,
        • Data: While there are several types of data that you may start with, please start with star_counts.  Note that STAR counts is not listed in the vignette and you’ll need to solve.
          • Variables of Interest (at least 1)
            • Please be aware that you need to see if your data is available for your particular study.  Studies don’t always collect what you want. Definately check!  Also ICGC has more data than TCGA/GDC, so feel free to use that resource.
            • Examples: Cigarettes Per Day As Categorical Variable divided into High vs. Low at 3 packs per day
          • Controlling for these variables, not of interest (at least 1)
            • Examples: Race, Sex
          • Location of Data and details about data.