 Artificial Intelligence代写

## Final Review Exercise Sheet

Artificial Intelligence代写 If we know that A and B are conditionally independent given C, which of the following statements are guaranteed to be true?

### 1 Rules of Probability

1.1 Marginalization

Which of the following statements are guaranteed to be true?

(a)∑ a P(A = a) = 1

(b) ∑ a P(A = a|B = b) = P(B = b)

(c) ∑ a P(A = a, B = b) = P(B = b)

(d) ∑ b P(A = a|B = b) = P(A = a)

(e) ∑ b P(A = a|B = b) = 1

(f) ∑ a P(A = a|B = b) = 1

(g) ∑ a P(A = a, B = b|C = c) = P(B = b|C = c)

#### 1.2 Conditional Independence

If we know that A and B are conditionally independent given C, which of the following statements are guaranteed to be true?  Artificial Intelligence代写

(a) P(A|C) = P(B|C)

(b) P(A, B|C) = P(A, B)

(c) P(A, B|C) = P(A|C)P(B|C)

(d) P(A, B, C) = P(C)P(A|C)P(B|C)

(e) P(A|B, C) = P(A|B)

(f) P(B|A, C) = P(B|C)

#### 1.3 Product Rule

Use the product rule to rewrite P(A, B, C) in 12 distinct ways. (For this problem, changing the order of

variables and rearranging products do not count as “distinct” answers.)

#### 1.4 Bayes’ Rule

Suppose that on any given day there is a 40% chance that I will be in a bad mood if I read the news, and a 10% chance that I will be in a bad mood if I do not read the news. Suppose also that each day there is a 70% chance that I read the news. If I am in a bad mood, what is the probability that I have read the news that day?

### 2 Belief Networks and d-Separation  Artificial Intelligence代写

2.1 Specifying a Belief Network

For this problem, we are interested in three medical symptoms: sneezing, fever, and itchy eyes; in addition,we are interested in two potential causes: allergies and the flflu. Suppose we know that a fever can be caused only by the flflu, itchy eyes can be caused only by allergies, and sneezing can be caused by either the flflu or allergies. We want to create a belief network with 5 random variables: Flu, Allergies, Sneezing, Fever, and ItchyEyes.

(a) Draw a DAG that represents the causal relationships between the 5 random variables mentioned above.

(b) List all of the probabilities that belong to the CPTs for this belief network.

(c) List all the pairs of variables that are marginally independent in this belief network.

(d) List all the pairs of variables that are conditionally independent given the evidence set E = {Sneezing}.

(e) List all the pairs of variables that are conditionally independent given the evidence set E = {Flu}.

#### 2.2 d-Separation Exercise

(a) Does D d-separate C and F?

(b) Do D and E d-separate C and F?

(c) Write down all pairs of nodes which are inde

pendent of each other.

(d) Which pairs of nodes are independent of each

other given B?

(e) P(A, F|E) = P(A|E)P(F|E)?

### 3 Polytrees

3.1 Identifying Polytrees

Which of the following DAGs are polytrees? (Here, each DAG is represented as a list of edges.)

(a) A B, B C

(b) A B, C B

(c) A B, C B, C D

(d) A C, A D, B C, B D

### 4 Variable Elimination

4.1 Variable Elimination for the Alarm Network

Consider the belief network with 5 variables: Earthquake, Burglary, Alarm, JamalCalls, MayaCalls. Suppose we want to compute the probabilities P(B = 1|J = 1, M = 0) and P(B = 0|J = 1, M = 0).

(a) Express the conditional probabilities P(B = 1|J = 1, M = 0) and P(B = 0|J = 1, M = 0) in terms of the probabilities P(B = 1, J = 1, M = 0) and P(B = 0, J = 1, M = 0).

(b) Letting b represent either 0 or 1, express P(B = b, J = 1, M = 0) in terms of the CPTs of the belief network. (Do not get rid of redundant operations. This corresponds to the brute-force enumeration method.)    Artificial Intelligence代写

(c) Count the number of additions and multiplications needed to compute the answer to part (b). (Include operations for both b = 0 and b = 1.)

(d) Rewrite the answer in (b) by pulling as many constant factors as possible out of sums. (Hint: to get the most effiffifficient answer, you should have a nested sum: there will be a sum over values of E inside a sum over values of A.)

(e) Count the number of additions and multiplications needed to compute the answer to part (d). (Include operations for both b = 0 and b = 1.)

(f) Using the elimination ordering J, M, E, A, write out the sequence of factors that would be introduced when applying the variable elimination algorithm to this computation. This answer should correspond closely to your answer in part (d).

### 5 Maximum Likelihood Estimation  Artificial Intelligence代写

5.1 Working with Complete Data

Let the graph be as follows:

Nodes: A, B, C, D, E, F

Edges: A B, A C, B D, B E, E C, C F, D F

Consider a complete data set of T i.i.d. examples: {(at, bt, ct, dt, et, ft)}t T =1. Compute the maximum likeli-hood estimates of the conditional probability tables (CPTs) shown below for this data set. Complete these CPT estimates in terms of the indication function denoted by I(a, at) = 1 if a = at and 0 otherwise.

(a) P(A = a)

(b) P(B = b|A = a)

(c) P(C = c|A = a, E = eArtificial Intelligence代写

(d) P(D = d|B = b)

(e) P(E = e|B = b)

(f) P(F = f|C = c, D = d)

#### 5.2 Log-Likelihood for Belief Nets

Suppose we have a belief network with nodes X1, . . . , Xn, and let Pa(Xi) denote the parents of node Xi .

A fully-observed dataset for this model can be written as The log-likelihood of this model is L = log P(data). Show that the log-likelihood can be written as count(Xi = x,Pa(Xi) = π) log P(Xi = x|Pa(Xi) = π)

### 6 Expectation Maximization

6.1 EM for Naive Bayes Model

Consider a model with the following structure:

A B C

Suppose all of the variables are binary, and suppose only A and C are observable in the dataset. If we have T observations, then the dataset is of the form {(at, ct)}T t=1.

(a) If we want to apply EM to this dataset, list all of the CPT entries that would need to be estimated.

(Hint: there are 5 CPT entries in this model.)  Artificial Intelligence代写

(b) Find formulas for the EM update rules for the CPT entries listed in part (a). (Hint: to simplify the formulas, you can fifirst express the EM updates in terms of equality-testing functions and the probability P(B|A, C). Then, you can fifind a formula for P(B|A, C) in terms of the CPT entries.)

(c) Write pseudocode to compute one iteration of EM updates for this model. You should express this as a function with the following signature:

em update(data, cpt estimates)

Here, data is a 2D array (with T rows and 2 columns) where data[t] represents the observed value at and data[t] represents the observed value ct. Furthermore, cpt estimates is a map (or dictionary) where the keys are 5 strings that represent each of the CPT entries of the model. You may choose an arbitrary but consistent encoding for the map keys (for example: you might encode the map key corresponding to P(A = 1|B = 1) as the string “A1|B1”). This function should return a map of updated CPT estimates. You may defifine additional helper functions if needed.

### 7 Hidden Markov Models  Artificial Intelligence代写

7.1 DJ HMM

Assume there’s a DJ playing songs at a party, and the songs she plays can be grouped into two types: slow songs and fast songs. The probability that the fifirst song she plays is slow is (so the probability that the fifirst song is fast is .

If she plays a fast song, the probability that she plays a slow song immediately afterwards is . If she plays a slow song, the probability that she follows it with a slow song is . We will assume that the selection of every song is conditionally independent of every single song before it except the one song immediately before it.

4For each song the DJ plays, the audience can have one of two reactions to it: either to dance or not to dance. If the DJ plays a fast song, the probability that the audience will dance to it is .If she plays a slow song, the probability that the audience will dance to it is . We will assume that whether the audience dances or not depends only on whether the current song is fast or slow.  Artificial Intelligence代写

#### 7.1.1 Get This Party Started

Use the f orward algorithm to fifind the probability that, i f the DJ plays only two s ongs, the audience will dance to both of them.

#### 7.1.2 Panic! At the Disco

Use the forward algorithm and the probabilities you computed for the previous problem to fifind the probability that the audience danced during the fifirst two songs, did not dance during the third song, and the third song was fast.

#### 7.1.3 I Could Have Danced All Night

Assume that the DJ plays 10 s ongs i n total, and the 8th s ong i s slow. Use the backward algorithm to fifind the probability that the audience won’t dance during the 9th and 10th s ongs.