Healthcare Access Risk Screening System

Machine Learning
Python
Healthcare Equity
Built a machine learning system to identify patients at risk of delayed care due to structural barriers.
Published

January 15, 2026

Overview

Delayed healthcare access isn’t just an individual problem — it’s driven by structural factors that compound for certain populations. This project developed a risk screening tool to identify patients vulnerable to delayed care by combining clinical and social determinants of health (SDOH) data.

Problem Statement

Healthcare access barriers disproportionately affect marginalized communities. Transportation, cost, language, and discrimination all delay care initiation. Early identification of at-risk patients could enable targeted interventions — but most screening tools ignore these structural factors entirely.

Data & Methods

  • Integrated synthetic patient data with social determinants (ZIP code-level transportation access, insurance type, language preference)
  • Built a logistic regression model with care-delay prediction as the outcome
  • Conducted fairness audits using group fairness metrics (demographic parity, equalized odds) across race, insurance status, and language groups
  • Created a risk stratification framework to prioritize outreach

Results

The model successfully identified high-risk patients with an AUC of 0.78 overall, but showed meaningful disparities in recall across demographic groups before mitigation. Reweighting the loss function to balance false negatives across groups improved equalized odds while maintaining reasonable overall performance.

Key Insights

  • Raw predictive accuracy masks bias in who gets missed by the model
  • Access barriers are structural, not individual — the model needs to reflect that
  • Risk stratification tools require explicit fairness constraints, not just algorithmic tweaking

What I Learned

Doing this project reinforced that fairness in healthcare ML isn’t a feature to add at the end — it needs to be part of how you frame the problem from the start. Whose data is missing? Whose outcomes are you measuring? These questions matter as much as model architecture.