There are often many ways to solve a computational problem. Can we formalize that some problems are harder than others? Are some problems inherently difficult? How would you determine a “good” solution? Throughout the semester, we will explore topics such as:
Along the way, we will build a broad understanding of the process of algorithm analysis, a useful skill when working with cross-disciplinary technical teams.
By the end of the course, we will be able to:
Machine Learning (ML) is the art of solving a computation problem using a computer without an explicit program. ML is so pervasive today that various ML applications such as image recognition, stock trading, email spam detection, product recommendation, medical diagnosis, predictive maintenance, cybersecurity, etc. are constantly used by organizations around us and probably sometimes without our awareness.
In this course, we will rigorously apply machine learning techniques to real-world data to solve real-world problems. We will briefly study the underlying major principles of diverse machine learning approaches to help retain the strategies such as anomaly detection, ensemble learning, deep learning with a neural network, etc. Main tools of the course will be the Python-based Anaconda and Java-based Weka data science platforms. Datasets will be used from online resources such as Kaggle, UCI KDD, open source repositories, etc. We will also use Jupyter notebooks to present and demonstrate machine learning pipelines.
Bioinformatics is an interdisciplinary field that researches and develops methods/tools for understanding biological data, majority of the data sets are highly complex and large. As an interdisciplinary scientific filed, bioinformatics combines statistics and mathematics, information engineering, chemistry, physics, computer science, and biology to analyze and interpret various biological data. Biological studies in this area use computer algorithms, programming, and pipelines designed to yield highly varying degrees of precision and accuracy. Common uses of bioinformatics are in genomics, where the aim is to identify candidate genes and single nucleotide polymorphisms. The aim is to better understand the basis of diseases, unique adaptations, desirable variables, and or differences. Also, bioinformatics tries to understand the organizational principles within protein sequences and nucleic acid.
This semester long research class is aimed at exploring various methodologies and frameworks towards effective representational learning of spatial transcriptomics data, and by extension, develop an optimized pipeline allowing for real time analysis of this data. Exploratory methods that are applied towards learning of spatial transcriptomics data are the use of autoencoders and foundational models. The project will be developed using the Python programming language and its derivative libraries (Pytorch, Pandas, Numpy, to name a few). The deliverable will for one be code that builds models most suited towards representational learning of the data by direction of feature engineering and/or inference, and secondly a design, primarily in documentation with accompanying code, for deployment of the model for real time analysis.
Many students are curious about research, mentorship, being more connected and networking with some faculty. Every office hours we go on into a tangent about research and the process in industry and academia, often they seek advice on their career and want to hear more about our experiences and research. Thus, Dr. Yasin have been organizing research chats every month.