PSet 3
探索性数据分析代写 Create a parallel coordinates plot showing the number of crimes in each of the categories listed for 2020. Show actual counts; do not rescale.
Note: Grading is based both on your graphs and verbal explanations. Follow all best practices as discussed in class, including choosing appropriate parameters for all graphs. Do not expect the assignment questions to spell out precisely how the graphs should be drawn. Sometimes guidance will be provided, but the absense of guidance does not mean that all choices are ok.
Read Graphical Data Analysis with R, Chap. 6, 7
1. Crime 探索性数据分析代写
[10 points]
Data source: https://data.ny.gov/Public-Safety/Index-Crimes-by-County-and-Agency-Beginning-1990/ca8h-8gjq
You do not need to submit the data with your assignment. You may either download and read from your local copy or read directly from the web site with df <- read_csv("https://data.ny.gov/api/views/ca8h-8gjq/rows.csv")
.
- Create a parallel coordinates plot showing the number of crimes in each of the categories listed for 2020. Show actual counts; do not rescale. Your graph should have one line for each county in New York State. (Use
GGally::ggparcoord()
) - Now experiment with alpha blending, splines, and rescaling to create the clearest version of the plot that you can. What patterns do you observe? (Use
GGally::ggparcoord()
) - Create an interactive parallel coordinates plot of the same data, coloring by
Region
. Discuss outliers, clusters, and correlations in detail.
2. Sleep
[10 points]
Data: SleepStudy
from Lock5withR package
Draw the following graphs and answer the questions.
- Is there an association between
ClassYear
andAnxietyStatus
? BetweenClassYear
andNumEarlyClass
? Justify your answers with mosaic plots. - Perform chi square tests to test for associations between the sets of variables graphed in part a). What are the results? Discuss in relation to the mosaic plots.
- How is the relationship between anxiety status and number of early classes affected by class year? Create a mosaic plot showing all three variables, treating anxiety status as the dependent variable. Discuss the results.
- Use the base
pairs()
function to draw a mosaic pairs plot of all all categorical (factor) variables inSleepStudy
. (Note: The vcd package must be loaded forpairs()
to find the correct method.) Name a pair of variables which appear to have a very strong association. Name a pair of variables which appear not to be associated.
3. Wait List 探索性数据分析代写
[10 points]
The file stats_wl.csv
contains information about waitlist movement for a Fall 2021 Columbia U undergraduate statistics class.
There are 640 rows and 4 variables:
Name
name of student (actual names were replaced with names generated from the randomNames package)
Date
since SSOL updates overnight, waitlist positions were collected each morning during the change of program period 探索性数据分析代写
Priority
position in waitlist, for example 1
= top position on list
Status
final outcome, Registered = received a place in class and remained; Dropped Class = received a place in class and left; Left List = left waiting list; Joined = remained on waiting list at the end of the change of program period. (Note that the status reflects what ultimately happened, not what the status was on a particular date.)
Create an alluvial diagram that shows waitlist movement during the change of program period. It is not necessary to include the Name
column in the diagram, but it should be possible to observe movement of individual students: for example, that the student who was 22nd in the waitlist on Sept 9th moved up to 15th place on Sept 16th and then left the list.
更多代写:宏观经济学网课代上 托福代考 英国经济学assignment代写 化学专业essay代写 国外大学论文代写 论文大纲
合作平台:essay代写 论文代写 写手招聘 英国留学生代写