This study aimed to identify patients with the highest risk of periodontal disease (PD), and to provide recommendations for the effective use and application of data mining (DM) techniques when establishing evidence‐based dental‐care policies for vulnerable groups at a high risk of PD.
This study used the SEMMA (Sample, Explore, Modify, Model, and Assess) methodology to construct DM models based on data acquired from the fifth and sixth Korea National Health and Nutrition Examination Surveys (2000‐2015). We analyzed the sociodemographic and comorbidity variables that influence PD by applying the popular DM techniques of decision‐tree, neural‐network, and regression models, and also attempted to improve the predictive power and reliability by comparing the results obtained by these three models.
Our comparisons of the three DM algorithms confirmed that the average squared error, misclassification rate, receiver operating characteristic index, Gini coefficient, and Kolmogorov–Smirnov test results were the most appropriate for the decision‐tree model. The analysis of the decision‐tree model revealed that age and smoking status exert major effects on the risk of PD, and that stress and education level exert effects in rural areas, whereas education level, sex, hyperlipidemia, and alcohol intake exert effects in urban areas.
We demonstrated that the decision‐tree model is an effective DM technique for identifying the complex risk factors for PD. These results are expected to be helpful in improving the equality and efficacy of dental‐care policies for vulnerable groups at a high risk of PD.