RNA Seq Analysis
RNA Seq Analysis
Introduction
Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030.
Ribonucleic acid ( RNA ) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes.
RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors . The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.
The GCT file is like multi-dimensional DataFrame, which consists of 3 DataFrames combined in 2-D.
These are:
- data_df: It has 18465 rows (Gene ID) abd 183 columns (Sample Name/ID)
- row_metadata_df: It has row metadata and When we see the type, It is empty dataframe. This means in our data, the row metadata is not present.
- col_metadata_df: It has 183 columns (Sample Names/ID) and 124 rows (Column metadata like histological_type, Patient_ID, status(is he alive or not)) for each sample.
For more details clink on the link