•  
  •  
 

Mesopotamian Journal of Artificial Intelligence in Healthcare

Authors

    Abstract

    This study illustrates a data‐driven approach to the segmentation of health insurance policyholders based on K-Means clustering of an open insurance dataset. Key demographic and financial features like age, body mass index (BMI), dependents, annual medical spending, and premium payment were normalized first to ensure comparability. The optimal number of clusters (k = 3) was determined using silhouette analysis, and three clusters were formed: (1) young, low‐cost individuals, (2) middle‐aged medium‐cost individuals, and (3) old, high‐cost individuals. Cluster centroids provide actionable profiles that can be utilized by insurers for target marketing, risk profiling, and development of customized plans. A set of visualizations scatter plots, boxplots, histograms, and bar charts illustrate the separation and within‐distribution nature of these segments. The preprocessing workflow (missing value treatment, encoding of categorical features, and feature scaling) was encoded in a flowchart for reproducibility. Results demonstrate that straightforward-to-implement unsupervised learning techniques can yield interpretable customer segmentations, offering a foundation for more advanced predictive modeling and individualized insurance policies.

    Creative Commons License

    Creative Commons Attribution 4.0 International License
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Share

    COinS