统计学习代写 – Statistical Learning代写 – R语言代写
统计学习代写

统计学习代写 – Statistical Learning代写 – R语言代写

Statistical Learning Assignment – Semester 1, 2021

 

统计学习代写 1.Theassignment must be typed (not handwritten). You may either use Microsoft Word (or similar) or R markdown in RStudio for the  Note that ···

INSTRUCTIONS: 统计学习代写

1.Theassignment must be typed (not handwritten). You may either use Microsoft Word (or similar) or R markdown in RStudio for the  Note that the final project will require the use of R markdown. When answering this question, it should be no longer than 10 A4 pages [single sided] with a font size no smaller than 11 point.

2.The assignment due date is listed on the Wattle (Turn-it-in) site.Upload the assignment through Wattle using Turn-it-in. You should submit your assignment in two different parts. If you are using R markdown:

a.A pdf file [or HTML file] of your assignment (this should include important R code to highlightwhat you have done).

b.A‘.Rmd’ file [an R markdown file].

If you are using Microsoft Word (or similar): 统计学习代写

a.AWord file of your assignment (this should include important R code to highlight what you have done).

b.A‘.R’ file of your R cod

3.Inanswering the questions, write your answers clearly and  Use appropriate graphs and tables when you think they help to describe your point or thinking process.  Do not just “print” a set of results. Every result should be discussed and have a reason for being presented. No points will be awarded unless you clearly discuss what you are doing.

4.Nolate assignments will be

5.You should not discuss the assignment (questions, solutions, code, etc.)with your classmates or other individuals. You can discuss these with me or your tutor (Dr. Ha Nguyen) during our consultation times. You must independently write your own  This includes all computer code, English, and mathematics. University policies  on  academic  integrity  will  be  strictly  enforced.  See  http://www.anu.edu.au/ students/program-administration/assessments-exams/academic-honesty-plagiarism  for more details.

6. Have fun with the exploration! 统计学习代写

1.(100points) We will explore some of the techniques we are considering by examining data on housing pric  We will use the data from the prediction competition available on Kaggle https://www.kaggle. com/c/house-prices-advanced-regression-techniques.  For this question you will need to create an account on Kaggle.  Please let me know if you don’t want to use Kaggle based on privacy concerns.

a.Createan account on  What is your Kaggle username? Download the training and test data.

b.Considera multiple regression model to examine the relationship between housing sale prices (Y ) in Ames, Iowa, USA from 2006 to 2010 and their covariate information (x). While, 79 covariates are available, for this assignment we will only use a few covariates. Only consider the following covariates: LotArea, OverallCond, GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. As the real test data does not contain the response Y  (SalePrice), split the training data in half. The first half will be the new training data and the second half will be your personal test data. For this assignment set α = 0.

AND 统计学习代写

1(20points) Using all of the training data together (personal training and test data), conduct an exploratory data  In doing your analysis make sure to identify any unusual points and discuss why they are unusual. For this assignment do not remove any unusual points, only comment on them (if they exist). In addition to visualisations of the raw data, consider the natural log transformation of the response. You may also consider any transformations of the covariates. For the rest of the assignment, if you believe the transformations are appropriate (provide justification – this can simply be a discussion), use those transformations.

2.(6 points) Using just your personal training data and the covariate GrLivArea, based ontraditional regression approaches (possibly:  t-tests,  F-tests,  etc.),  determine if there exists a non-linear (quadratic, cubic, etc.) between the covariate and the response. How flexible should the model be? Make sure to fully outline any tests and

3.(6points) Using your personal training and personal testing data, along with the notion of squared error loss, determine if there exists a non-linear (quadratic, cubic, etc.) relationship between the covariate and the  How flexible should the model be?

统计学习代写
统计学习代写

AND 统计学习代写

4.(6points) Consider all the covariates which we are using in this assignment: LotArea, Over- allCond, GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. Using just your personal training data and traditional regression approaches, determine if any of the variables are statistically  Are you able to reduce the model (i.e. not use all the covariates)? Here you do not need to consider any non-linearities or interactions. Make sure to fully outline any tests and conclusions.

5(6points) Based on the ordering of the covariates in your final model in the previous question, using your personal training and personal testing data, along with the notion of squared error loss, determine which covariates should be included in the

6.(6points) Consider all the covariates which we are using in this assignment: LotArea, Over- allCond, GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. Using just your personal training data and traditional regression approaches, determine if PoolArea has a statistically signif- icant interaction with any of the other  You may have up to five interactions in your model. Make sure to fully outline any tests and conclusions.

AND 统计学习代写

7(6points) Based on the ordering of the covariates in your final model in the previous question, using your personal training and personal testing data, along with the notion of squared error loss, determine which interactions should be included in the

8.(6 points) Consider all the covariates which we are using in this assignment: LotArea,OverallCond, GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. You may now consider any modelling that you wish using your personal training  You may also consider any type of model selection approach (i.e. traditional or based on squared-error loss for the testing data). Make sure to fully outline any tests and conclusions. Calculate the mean-squared error on your personal testing data.

9.(6 points) Using your final model from Question 1(b)viii and the Kaggle test data, submit aprediction file to Kaggle. See Kaggle for details on what the file should look like. What was your score and rank?

AND 统计学习代写

Note: as discussed on the site (https://www.kaggle.com/c/titanic/details/evaluation), “[t]he Kaggle leader-board has a public and private component.  50% of your predictions for the test set have been randomly assigned to the public leader-board (the same 50% for all users). Your score on this public portion is what will appear on the leader board. At the end of the contest, we will reveal your score on the private 50% of the data, which will  determine  the  final  winner.   This  method  prevents  users  from  ‘over-fitting’  to  the leader-board.”

1(6points) Examining the leader board you can see that one individual has a perfect score (when I last looked). Is this surprising? What explanation might there be for this?

2(6points)This Kaggle competition is using Root Mean Squared Logarithmic Error instead of Mean Squared Error. Provide a discussion about the difference between the two

3(20points) Provide a full discussion of your final model from Question 1(b)viii. This may include, but is not limited to, discussions of the coefficients, visualisations of the fitted model, and model .

 

更多代写:美国CS代写 生物学代考 ANTH代写 essay代写  paper代写 新加坡网课代写

合作平台:天才代写 幽灵代  写手招聘  paper代写

发表回复