SPSS MODELER software, one of the softwares in the field of data mining and text analysis from IBM, was used to build prediction models and perform other analytical tasks. For each model, the authors performed two activities to test accuracy and compare the models with each other:
1- Entering and correcting the data structure
2- Determining the dependent and independent variables and the type of variables
3-Partitioning the data into test and trade categories
4- Making statistical models
5- Calculation of accuracy, sensitivity and specificity rates
6- Creating Gain Charts to compare models
Results
Draw a Gain Chart
The structure of the Gain Chart diagram is like drawing a diagram. The closer the diagrams of each algorithm are to the BEST-RET axis, the better evidence of the algorithm's performance is the green diagram. This characterizes the Chaid algorithm.
It is higher in the test category (Test) and in the training category (Train) than the diagrams of other algorithms.
Calculate accuracy, sensitivity and specificity indicators
Predictions of all models are compared with the main classes to identify true positive, true negative, false positive, and false negative values. Each cell contains the raw number of items classified to combine with the desired and actual classifier outputs. The values of statistical parameters (sensitivity, specificity and accuracy of the total classification) are calculated from four models and are presented in Table 2. Accuracy, sensitivity and specificity approximate the probability of positive and negative labels being correct. They evaluate
the usefulness of the algorithm in a single model.
Moreover, the percentage and number of missing data were added to all the data from each variable to analyze the most suitable algorithm with missing data.
According to the results of Chart(4), the variables related to retinopathy according to the Chaid algorithm in the order of the most influencing variables was as follows:
BMI had the greatest impact in terms of this algorithm. Duration of illness, medication, patient's age, hypertension, gender, cholesterol, hemoglobin ,A1C were next in terms of importance
Discussion
The significance level of 0.05 was for splitting and the significance level of 0.05 for merging was for output of the software. The results of Table 2 show that the classification accuracy in the Cart model test sample with 73.44% is the best accuracy. Model C5.0 and Chaid with rates of 71.65 and 68.9 percent are at a short distance from the approximate value of the Cart model. After them is the Quest model with an approximate value of 65.07 percent. Sensitivity analysis is often used to determine the degree to which each predictive feature contributes to the identification of output class values (27).
The practical meaning of the specificity rate generally represents the percentage of times that the test in this study correctly diagnosed cases as positive. For this reason, it is significant for authors that the higher the percentage of specificity error, the higher is the percentage of those who are ill and wrongly diagnosed as healthy. This causes lack of trust in the existing test. Furthermore, sensitivity represents the percentage of times that the test correctly places healthy people in the category of healthy people. Its error causes healthy people to be wrongly classified in the category of ill people. The importance of sensitivity is, thus, lower than the characteristic case. The high feature rate is of double importance for the researchers.
According to the sensitivity and specificity rates in the test table above, the authors conclude that the test sample in the Chaid model has the highest specificity rate with a value of 72.2%. This is followed by the Quest model with 59.58%, the Cart model with 57.14%, and the C5.0 model with 49.05%. This suggests that the Chaid model has the best results in the table. By vareful observation, the researchers found that Gains Chart in the example of testing the validity of this claim shows the Chaid model is better than other models.
Compared to other models, the Chaid chart has the ability to react to missing data and create splitting on the missing data due to the large amount of missing data. In some data variables seen in Table (2), the authors can prove better performance of this model compared to other models.
In the evaluation paper of Cart, Chaid and Quest algorithms, the purpose of this study was to discover the capability of three types of decision tree algorithms: Cart, Chaid, and Quest. It showed the results regarding predicting the construction of the algorithm. Among the three algorithms, Chaid created the highest classification rules and displayed the greatest prediction accuracy (29).
In the study of breast cancer diagnosis using decision tree models and SVM, the authors evaluated classification performance of four different decision tree models of Chaid, C and R, Quest, and C5. Then, they compared the results with SVM in breast cancer diagnosis. Significance analysis has shown that the feature of "cell size uniformity" in Chaid and Quest was the most important feature in differentiating cancer from healthy samples (30).
In the study comparing different decision tree algorithms to evaluate the severity and type of collisions in the urban network, this research compared different decision tree algorithms to evaluate the severity and type of collisions in the urban network (case study: Mashhad city). Comparing the models, 4 decision tree algorithms including Quest, Chaid, Cart, and C5 algorithms have been used. It has been to other models made.
Conclusion
In this study, the classification performance of four different decision tree algorithms of Chaid, Cart, Quest, and C5.0 in the diagnosis of retinopathy is evaluated. This is done using data from diabetic patients referred to the Diabetes Center of Yazd
The results showed that the best model in terms of performance accuracy is decision tree model using Chaid algorithm.
Acknowledgment
Author like to thank healthcare Data modeling center of Shahid sadougi university of medial science. This study was part of MSc thesis with ethical code IR.SSU.SPH.REC.1399.063.
Author Contribution
All authors contributed to data collection and modeling