Download PDFOpen PDF in browserPrediction of Diabetes Using Various Feature Selection and Machine Learning ParadigmsEasyChair Preprint 658713 pages•Date: September 13, 2021AbstractMany health experts have identified diabetes as one of the most widespread diseases. Not only the underdeveloped but also developed countries have a vast majority of their citizens who suffer from diabetes. In one of the surveys by WHO (World Health Organisation), almost 170 million people are detected with diabetes. It is predicted to increase twofold by the coming decade. Many metabolites for example glucose are considered to be the vital reason for diabetes when present in great amounts. Serious concerns have been raised by health officials around the globe to cure and detect it at an early stage. With the advancement in technology and data mining techniques. This paper aims at developing a classifier and comparing different data mining techniques based on their accuracy for the detection of diabetes based on different symptoms and features. The machine learning techniques were applied to the Diabetes data-set provided by the Biostatistics program at Vanderbilt. The best accuracy (93.95%) was observed with the Genetic algorithm as a feature selection technique along with Random Forest for classification. Thus, Random Forest along with a Genetic Algorithm can be used for efficient diagnosis and prediction of diabetes. Keyphrases: ANOVA, Decision Tree, Diabetes Prediction, Genetic Algorithm, K-Nearest Neighbour, Logistic regression., Naive Bayes, Random Forest, Support Vector Machines, algorithm based feature selection, classification algorithm, mutual information, stochastic gradient descent, vanderbilt dataset
|