Dataset Analysis

Glioma EDA & Predictive Modeling

This project leverages a comprehensive dataset sourced from the UCI Machine Learning Repository and contains 3 clinical features and 20 molecular features of patients. The goal of the project was to determine the glioma grade of a patient by finding the optimal subset of genes and clinical features to enhance the glioma grading process. Through Logistic Regression and a 10-fold cross validation the model achieved significant performance metrics of 87.3% accuracy, 80.2% precision, and 92.9% recall. Glioma EDA & Predictive Modeling Glioma Write Up

Diabetes EDA & Predictive Modeling

This dataset was found on Kaggle and includes patient variables - Age, Hypertension, Heart Dsease, BMI, HbA1c-level, Blood Glucose Levels, and Diabetes. The dataset was cleaned and transformed to perform Linear Regression Predictive Modeling. This data can not be trusted as its source is unknown but was a quick modeling project with a large untrusted dataset.

Diabetes EDA & Predictive Modeling

Heart Disease EDA & Predictive Modeling

This dataset was found reprocessed on kaggle and aquired through Cleveland UCI. This project was for my DSA 610 class and aimed to analyze variables that may have a direct effect on the likelihood of a person developing heart disease. The original dataset and 2 variations of this dataset were each used for Logistic Regression, Linear Regression, Decision Tree and Random Forest predictive modeling with each of their scores evaluated. Learn more about the methods, EDA, models and results in "Heart Disease Modeling Presentation".

UCI Machine Learning Repository Heart Disease EDA & Predictive Modeling Download Heart Disease Modeling Presentation