算法代写 – p(w|C) 建模 – Algorithm代写 – COMP6714代写
算法代写

算法代写 – p(w|C) 建模 – Algorithm代写 – COMP6714代写

COMP6714 ASSIGNMENT 1

 

算法代写  Consider the following pseudo code which performs list intersection based on the divide – and-conquer paradigm. Note that the input lists are ···

 

Q1. (25 marks) 算法代写

Consider the following pseudo code which performs list intersection based on the divide – and-conquer paradigm. Note that the input lists are not necessarilysorted.

(1)Complete the above pseudo code. You can assume that you can invoke the following member methods on a List object L:

  • L.len returns the length of the list L. You can also use the usually indexing and slicing operation on the list (as in python).

(2) Think of a method to divide each input list into k sub-lists (k 2) without changing the main logic of the algorithm you implemented in the fifirst part. You should be able to describe the only change succinctly

Q2. (25 marks)

Consider the scenario of dynamic inverted index construction. Assume that t sub-indexes (each of M pages) will be created if one chooses the no-merge strategy.

(1)Show that if the logarithmic merge strategy is used, it will result in at most dlog2 te sub-indexes.

(2)Prove that the total I/O cost of the logarithmic merge is O(t · M · log2 t)

Q3. (25 marks) 算法代写

The following list of Rs and Ns represents relevant (R) and nonrelevant (N) returned documents in a ranked list of 20 documents retrieved in response to a query from a collection of 10, 000 documents. The top of the ranked list is on the left of the list. This list shows 6 relevant documents. Assume that there are 8 relevant documents in total in the collection.

R R N N N                 N N N R N                  R N N N R               N N N N R

(Note that spaces above are just added to make the list easier to read)

(1)What is the precision of the system on the top-20?

(2)What is the F1 on the top-20?

(3)What is/are the uninterpolated precision(s) of the system at 25% recall?

(4)What is the interpolated precision at 33% recall?

(5)Assume that these 20 documents are the complete result set of the system. What is the MAP for the query? Assume, now, instead, that the system returned the entire 10, 000 documents in a ranked list, and these are the fifirst 20 results returned.

(6)What is the largest possible MAP that this system could have?

(7)What is the smallest possible MAP that this system could have?

(8)In a set of experiments, only the top-20 results are evaluated by hand. The result in (5) is used to approximate the range (6) to (7). For this example, how large (in absolute terms) can the error for the MAP be by calculating (5) instead of (6) and (7) for this query?

COMP6714
COMP6714

Q4. (25 marks) 算法代写

Suppose we have a document collection with an extremely small vocabulary with only 6 words w1, w2, . . . , w6. The following table shows the estimated background language model p(w|C) using the whole collection of documents (2nd column) and the word counts for document d1 (3rd column) and d2 (4th column), where c(w, di) is the count of word w in document di . Let Q = {w1, w2, w3, w4, w5, w6} be a query.

(1) Suppose we do not smooth the language model for d1 and d2. Compute the likeli hood of the query for both d1 and d2, i.e., p(Q|d1) and p(Q|d2) (Do not compute the log-likelihood. You should use the scientifific notation (e.g., 0.0061 should be 6.1 × 103 ) Which document would be ranked higher?

2) Suppose we now smooth the language model for d1 and d2 using the Jelinek-Mercer smoothing method with λ = 0.8 (i.e., p(w|d) = λ·pmle(w|Md)+(1λ)·pmle(w|Mc)). Recompute the likelihood of the query for both d1 and d2, i.e., p(Q|d1) and p(Q|d2) (Do not compute the log-likelihood. You should use the scientifific notation) Which document would be ranked higher?

Submission Instructions 算法代写

You need to write your solutions to the questions in a pdf fifile named ass1.pdf. You must include your name and student ID in the fifile, and the fifile can be opened correctly on CSE machines. You need to show the key steps to get the full mark.

Note: Collaboration is allowed. However, each person must independently write up his/her own solution.

You can then submit the fifile by give cs6714 ass1 ass1.pdf. The fifile size is limited to 5MB.

Late Penalty: -10% per day for the fifirst two days, and -20% per day for the following days

 

更多其他:代写作业 数学代写 物理代写 生物学代写 程序编程代写  墨尔本essay代写

合作平台:天才代写 幽灵代  写手招聘  paper代写

发表回复