Mesopotamian Journal of Artificial Intelligence in Healthcare

Data Mining Driven Segmentation of Health Insurance Policyholders Using K-Means Clustering

Abstract

This study illustrates a data‐driven approach to the segmentation of health insurance policyholders based on K-Means clustering of an open insurance dataset. Key demographic and financial features like age, body mass index (BMI), dependents, annual medical spending, and premium payment were normalized first to ensure comparability. The optimal number of clusters (k = 3) was determined using silhouette analysis, and three clusters were formed: (1) young, low‐cost individuals, (2) middle‐aged medium‐cost individuals, and (3) old, high‐cost individuals. Cluster centroids provide actionable profiles that can be utilized by insurers for target marketing, risk profiling, and development of customized plans. A set of visualizations scatter plots, boxplots, histograms, and bar charts illustrate the separation and within‐distribution nature of these segments. The preprocessing workflow (missing value treatment, encoding of categorical features, and feature scaling) was encoded in a flowchart for reproducibility. Results demonstrate that straightforward-to-implement unsupervised learning techniques can yield interpretable customer segmentations, offering a foundation for more advanced predictive modeling and individualized insurance policies.

Recommended Citation

(2025) "Data Mining Driven Segmentation of Health Insurance Policyholders Using K-Means Clustering," Mesopotamian Journal of Artificial Intelligence in Healthcare: Vol. 3: Iss. 1, Article 17.
DOI: https://doi.org/10.58496/MJAIH/2025/018

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download

COinS

Mesopotamian Journal of Artificial Intelligence in Healthcare

Data Mining Driven Segmentation of Health Insurance Policyholders Using K-Means Clustering

Authors

Abstract

Recommended Citation

Creative Commons License

Share

Search