Dylan Skalman
Grand Canyon landscape

These are some of the data-driven explorations and predictive modeling projects I’ve completed, showcasing advanced analysis techniques applied to real-world biomedical datasets.

NCBI Intein Investigation

This project investigated large-scale misrepresentation of intein-containing protein records in the National Center for Biotechnology Information (NCBI) databse using legacy intein datasets from prior peer reviewed research.

Download Write Up Download Presentation

Glioma EDA & Predictive Modeling

A UCI dataset with 3 clinical and 20 molecular features was analyzed to classify glioma grades. Logistic Regression with 10-fold cross-validation yielded: 87.3% accuracy, 80.2% precision, 92.9% recall.

View on Kaggle Read the Write-up

Diabetes EDA & Predictive Modeling

A Kaggle dataset including variables like BMI, Glucose, and HbA1c-level was cleaned and modeled using Linear Regression. Though the data source was unverified, it was a fast-paced modeling experiment.

View on Kaggle

Heart Disease EDA & Predictive Modeling

Based on reprocessed UCI Cleveland data, this project compares Logistic, Linear, Decision Tree, and Random Forest models. Variants of the dataset were explored for predictive accuracy.

Original Dataset View on Kaggle Download PDF