美国统计代写价格-STAT 339代写-BAYES NETS代写
美国统计代写价格

美国统计代写价格-STAT 339代写-BAYES NETS代写

STAT 339: HOMEWORK 6 (BAYES NETS AND BAYESIAN

CLASSIFICATION)

美国统计代写价格 You may also use any typesetting software to prepare your writeup, but the fifinal document should be a PDF. LATEXis highly encouraged.

Instructions.  美国统计代写价格

Create a directory called hw6 in your stat339 GitHub repo. Your main writeup should be called hw6.pdf.

You may also use any typesetting software to prepare your writeup, but the fifinal document should be a PDF. LATEXis highly encouraged.

I will access your work by cloning your repository; make sure that any fifile path information is written relative to your repo – don’t use absolute paths on your machine,

or the code won’t run for me!


Date: Last Revised: December 19, 2021.

2 DUE VIA GITHUB FRIDAY 1/7/21

1.Bayes Nets

0.Consider the Bayes net depicted in Fig. 1, which comes from the BRML book.

Each variable is binary.

美国统计代写价格
美国统计代写价格

Figure 1. Bayes Net for diagnosis of lung disease at a chest clinic

(a) Write down the factorization of the joint distribution that is implied by the graph.

(b) According to the model, can you predict whether someone has visited Asia based on whether or not they are a smoker? That is, are s and a independent?

(c) Does knowing that someone is a smoker help you predict whether they visited Asia if you also have a chest x-ray? That is, are s and a conditionally  independent given x? Explain the intuition behind these two results.

2.Naive Bayes With Categorical Features  美国统计代写价格

1.Spam Filtering (Adapted from BRML 10.5) This problems is about a hypothetical classififier to label emails as either “spam” or “not spam”. The questions do not involve actually implementing the classififier, just examining and reflflecting on its mathematical/statistical properties.

Each email is represented by a vector of binary features:

xn = (xn1, …, xnD)

where each xnd ∈ {0, 1}. Each entry of the vector indicates whether a particular symbol or word (out of D symbols/words in the vocabulary) appears in the email. The symbols/words are things like

money, cash, !!!, viagra, . . . , etc.

so that, for example, xn2 = 1 if the word ‘cash’ appears in email n (Note: this is a difffferent way of representing the contents of a document than the Federalist papers example from class, though the basic classifification goal is essentially the same)  美国统计代写价格

The training dataset consists of a set of vectors along with the class label tn for each email, where tn = 1 indicates that email n is spam, and tn = 0 indicates that it is not spam. Therefore, the training set consists of a set of pairs {(xn, tn)}, n = 1, . . . , N.

The naive Bayes model for the joint probability of the category (tn) and contents (abstracted as xn) of email n is

Explicitly, the parameters are (π, θ01, . . . , θ0D, θ11, . . . , θ1D), where

π := p(tn = 1 | π),      for all n

θ1d := p(xnd = 1 | tn = 1, θ)  for all n

θ0d := p(xnd = 1 | tn = 0, θ)   for all n

That is to say, each tn | π Bernoulli(π), and each xnd | tn = c, θ Bernoulli(θcd):The same parameters are assumed to apply for every email of the same type (spam or not spam), which is why n does not appear in their defifinitions.

(a) Derive expressions for the maximum likelihood estimates of θ and π, in terms of of the training data. Assume that, the collection of labels tns are conditionally independent given π, and that the xn are conditionally independent of each other given the tn and θ. That is, assume

美国统计代写价格
美国统计代写价格

(b) Given a trained model (i.e., given MLEs for the ˆπMLE andˆθ MLE parameters),

give an expression for the posterior probability that a new email is spam, that is, for p(tnew = 1 | xnew, ˆθ MLE, πˆMLE) where tnew and xnew are the category and feature vector, respectively, for a new email. The expression should be explicitly stated in terms of πMLE, the entries of ˆθ MLE, and the binary entries of xnew only, such that if you had numbers for each of these things, you could plug them in to calculate a numerical value for the posterior probability.

(c) If the word “viagra” never appears in the spam training data, discuss what effffect this will have on the classifification for a new email that contains the word “viagra”, assuming we are using the MLE parameter estimates. Explain how you might counter this effffect.

(d) What effffect will misspelled words (such as “v1agra”) have on the spam fifilter? How could a spammer try to fool a naive Bayes spam fifilter if  they know that the spam fifilter is a naive Bayes classififier?

美国统计代写价格
美国统计代写价格

2.Naive Bayes Classifification as Regression

Show that, when using the naive Bayes classififier above, for fifixed θ and π, the log odds that an email is spam,

defifined as

美国统计代写价格
美国统计代写价格

can be written as

for some suitably chosen weight functions wd, d = 0, . . . , D, of the parameters, π and θ (which do not depend on the data, provided we have chosen values for π and θ). That is, the log odds that the email is spam is a linearSTAT function of the entries in xn. Find explicit expressions for these weight  functions w0, w1, . . . , wDs in terms of π and the entries in θ only.

3.Naive Bayes for Cancer Screening  美国统计代写价格

The data for this problem consists of several diagnostic variables from tumors from each of 699 breast cancer patients (modifified from a dataset in the University of California Irvine Machine Learning Repository1 ).

  • The class variable, t, is binary: Is the tumor malignant?
  • The nine diagnostic variables (which make up the 699 × 9 feature matrix X) are measurements of things like mean cell size, variability of cell sizes, various shape measures, etc. Each diagnostic variable has been coded on an integer scale ranging from 1 to 10.

I have randomly divided the full dataset into training and test sets: cancer train.csv and cancer test.csv, containing 2/3 and 1/3 of the cases, respectively. In row n of the .csv fifile:

  • The fifirst entry is an ID code (don’t use this for classifification)
  • The second is the target, tn, the binary Malignant label (0 or 1)
  • The remaining columns are the diagnostic features, where each xnd has a value in the set {1, 2, . . . , 10}, for n = 1 to 699 and d = 1 to 9, with the exception of missing values (see below).

Some of the cases have missing values for one of the features, BareNuclei.

These missing values are denoted by -1 in the data. Be sure to handle these as missing, not as a data value. Note also that for several features, not all of the values 1-10 might appear in both tumor types, but they could in principle.  美国统计代写价格

Your mission (should you choose to accept it) is to design a naive Bayes classififier that reports, for a novel case, a probability that it is malignant. In order to do this, you will need to make some subjective design decisions about how to represent the data-generating process.

You may choose to use the feature values as they are, or to bin them (since they consist of ordered values). If you choose to bin, you might select bins that have equal numbers of feature values, or bins that have approximately


1http://archive.ics.uci.edu/ml/6 DUE VIA GITHUB FRIDAY 1/7/21

equal numbers of cases aggregated over classes, or use some other scheme; up to you.

(a) Implement a training function, train naive bayes(), that takes in a set of training data and returns a classififier function. The classififier function should take an N × D matrix Xnew as input and return an N × 2 array of probabilities, where the entry in row n and column c is the posterior probability that case n has class c.

It is up to you if you want your function to take a .csv fifile consisting of training data directly, or preprocess the data fifirst and pass in t and X as separate arguments — probably the latter will make your code more generalizeable)  美国统计代写价格

Your training function should have two modes,which can be selected via an argument. In the fifirst mode it should fifind the maximum likelihood estimates of the prevalence parameter π (πc := p(t = c | π)) and the class conditional distribution parameters θ, and should return a classififier that uses these to classify the new data.

Note that because the features (columns of the input matrix) are difffferent kinds of things, we probably want to have a separate probability vector θcd for each feature for each class. That is, we should let θcdk := p(xnd =k | tn = c, θcd), where k indexes the possible values of feature d (howeveryou decided to bin them, if you did, or just 1 through 10 if you didn’t) and for each c, d.

In the second mode, the user should be able to specify Dirichlet priors for π and for each θcd, and the resulting classififier should return the array of posterior predictive probabilities that each tumor in Xnew belongs to each category.

Recall that this is defifined as

where the fifirst factor inside the integral is the posterior probability that the new tumor belongs to class c for specifific parameter settings π and θ,and the second factor represents the posterior density of that combination of π and θ. However, since we are using a conjugate prior, the result of this integral has a very simple form, which we derived in class (and so you do not actually need to work with this integral!).

The specifification of the parameters of the Dirichlet priors on π and each θcd could in principle involve separate parameter vectors for each, but to simplify things assume that all of the priors on the θcds are symmetric  Dirichlet distributions, that is that

θcd Dir(α/K, α/K, . . . , α/K)

governed by a single scalar parameter α.

However, the prior on π should be allowed to be an arbitrary Dir(γ1, . . . , γC) distribution over C probabilities (C = 2 for this data), since we likely do not expect malignant and nonmalignant tumors to be equally common.  美国统计代写价格

(b) Explain the shortcomings of maximum likelihood estimation when it comes to the possibility of seeing a feature take the value k in the test set that did not appear in any of the cases in the training set.

(c) Discuss why a naive Bayes classififier trivially handles missing features, whereas a KNN classififier would have problems

(d) For the Bayesian method, use cross-validation to fifind the best choice of the prior parameter α on the Dirichlet priors on the θcds. Here,“best” is defifined in terms of the mean cross-validation error. You do not need to worry about fifinding the best γs empirically — you can treat these as fifixed.

 

更多代写:商科分析金融代写  duolingo代考  纽约网课代上价格  美国网课essay代写  美国文书代写  作文代写

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

美国统计代写价格
美国统计代写价格

发表回复