1 minute read

RNA Seq Analysis

Introduction

Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030.

Ribonucleic acid (​ RNA​ ) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes.

RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors​ . The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is ​ GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.

The GCT file is like multi-dimensional DataFrame, which consists of 3 DataFrames combined in 2-D.

These are:

  • data_df: It has 18465 rows (Gene ID) abd 183 columns (Sample Name/ID)
  • row_metadata_df: It has row metadata and When we see the type, It is empty dataframe. This means in our data, the row metadata is not present.
  • col_metadata_df: It has 183 columns (Sample Names/ID) and 124 rows (Column metadata like histological_type, Patient_ID, status(is he alive or not)) for each sample.

For more details clink on the link

Image of the all sample gene distribution

gene_distribution

Image of the Type 1 IFN genes (25 genes) –> it’s distribution across samples of Exocrine.

gene_25