Assignment 6
数据挖掘原理代写 The ensemble model simply takes the predictions of “Readmitted” for each observation from the 7 models that you already built.
Use the Diabetes Readmissions data (on Canvas) 数据挖掘原理代写
KNN, Support Vectors, Naïve Bayes, and Ensemble Models
1.Use the training and test samples for the Diabetes data set that you used for the Logistic Regression analysis.
2.Use the Train data set to build K-Nearest Neighbors, SVMs, and Naïve Bayes to predict the Readmitted Variable.
3.Summarize the performance of these 3 separate models. 数据挖掘原理代写
4.Build another model called the “Ensemble” model across all 7 models that you have built thus far – Logisitc, Classification Trees, LDA, QDA, KNN, SVM, and Naïve Bayes for predicting the “Readmitted” variable. [To save time and effort, you can use results from the earlier assignments for this purpose]. The ensemble model simply takes the predictions of “Readmitted” for each observation from the 7 models that you already built. It then applies the “Majority rule”. To illustrate, if the predictions from Logistic Regression, Classification Trees, LDA, QDA, KNN, SVM, and NB are Yes, Yes, Yes, No,Yes, No, No respectively for an observation, resulting in 4 predictions of Yes and 3 prediction of No, then the Ensemble model would yield a prediction of Yes, since majority of the models predict “Yes” (4 > 3). Predict all observations in Training and Test using the Ensemble model. 数据挖掘原理代写
5.Summarize your results – for both Training and Test. Which of the models yielded the best predictions overall, and which models yielded the best predictions of “Yes”? Do you like the Ensemble model? Did it predict as well as you expected it to predict?