Characterizing Intron Splicing Sites Using Unsupervised Clustering
January 2022 - May 2022
After the formation of an mRNA by transcription, the nascent mRNA under- goes splicing. This is a process by which an enzyme termed as the spliceosomal complex acts on the mRNA strand, and cuts out certain sections of the mRNA known as introns. The remaining regions, called exons, are stitched together, and translation into a protein proceeds. In this project, I sought to eluci- date the possible structures for these splice sites and explore the patterns that determine them. This involved identifying the nucleotide sequences around the splicing positions, and then using NPLB, an unsupervised clustering al- gorithm, to identify architectures within these sequences. The patterns of the architectures, conservation scores of these splice sites, and correlations of these structures and scores with other parameters of the gene sequence were studied.
