c语言作业代写-computer Science作业代写
c语言作业代写

c语言作业代写-computer Science作业代写

A3:Parallel KNN

c语言作业代写 This distance measure is well motivated geometrically since it gives the distancebetween two points in a (Euclidean) D-dimensional space.

Introduction

So far we’ve seen two machine learning protocols: nearest neighbor classification and decision trees.Ne saw that KNN classification tends to be accurate but slow, whereas the binary decision tree classifieiwas a little less accurate but much faster. ln this assignment we will aim to mitigate the speed issues of KNN by parallelizing the code to run over multiple processes which can be distributed over the the coresin your processor.

ln practice, this can lead to speedups of between up to a factor of 2 (for an intel lceLake 2 Core CPU) or up to a factor of 64 (AMD Threadripper 64 Core CPU) depending on the details ofyour processor. The reason for these speedups is that testing a KNN classifier is embarrassinglyparallel, which allows us to perform the classification independently for each of the vectors in the testset. This makes ML tasks such as the KNN classification problem you considered in A1 a near idealproblem to use to demonstrate your new-found ability to use system calls to take advantage of multi-processing within the operating system.  c语言作业代写

This assignment will build on work that you did in A1 and A2. You will use the same KNN algorithm asA1, but will use multiple processes to compute the classification of the test images in parallel.

There is one further way that we will build on the previous assignments: we will write a KNN classifierthat allows us to dynamically change the distance measure used to define the “closeness” of two imagesthrough the use of function pointers.

Program Details  c语言作业代写

The program you will write takes several command line arguments. Some of these arguments havedefault values. See the starter code for exactly how to use these arguments:
– The value k of the kNN algorithm

-The number of children that the parent process will create to classify the images in the training set.

– An string that specifies the distance metric that your code will use.

– An option “-∨” (verbose) to allow printing of extra debugging information. You can see how this optionis used in the starter code, and can add additional print statements. Be careful to ensure that anyadditional debugging statements you add enhance the readability of your program.
-The name of the file containing the training set of images
– The name of the file containing the test set of images

The only output that the program will produce (when -v is not used) is the accuracy achieved by kNNwhen classifying the images in the test data set.

c语言作业代写
c语言作业代写

Distance Measures in K-Nearest Neighbor Classification  c语言作业代写

The standard distance measure used in nearest-neighbor classification is the Euclidean distance, which reads

In the case of our images, the values xli] and yli] correspond to pixels from the data sets that either takethe value 0 or 255. This distance measure is well motivated geometrically since it gives the distancebetween two points in a (Euclidean) D-dimensional space. However, for our data sets, it’s unclearwhether this distance measure truly reflects the notion of what it means for two images to be closelyrelated. Conceptually the most direct way to investigate this is to change the Euclidean distance toanother distance measure that better reflects the geometry of the training data set.

You will be comparing the performance of KNN classifiers using the Euclidean distance and analternative distance metric, known as the cosine distance.

The cosine distance is motivated using the standard expression for the dot-product between two vectors

c语言作业代写
c语言作业代写

where the ] x and llyll terms refer the length of the vectors (using Pythagoras).he cosine distanceuses the above formula to observe that the angle theta can be used as a measure of the similaritybetween two vectors. Specifically, assume x,y are vectors of non-negative real numbers. Then if x=y,we find that theta is 0. ln contrast, if x and y are orthogonal then theta=pi/2. This means that theta itself.which is a measure of the angle between two vectors, can be used as a measure of the degree ofsimilarity between two vectors.Specifically,

The factor of 2/pi is introduced just to make the cosine distance (in our case) bounded between 0 and 1,but doesn’t serve any deeper purpose for us.

Our aim is to use function pointers to write general code that allows you to swap out the distancefunction used without having to rewrite any of the code you wrote for the classifier, This is important inML applications because it is important to play with different distance metrics (or feature maps) whentuning classifiers like KNN and so engineering code so that it becomes easy to test such changes rapidlyand repeatably is essential for developing a good classifier for a dataset, making this refactoringexercise an excellent demonstration of the power of function pointers.

The Format of the Input Data

The two files that comprise the training and the test data sets are in the same binary format as in A2.Several datasets can be found on teach.cs in /u/csc209h/winter/pub/datasets/a2_datasets
Until you have some confidence in the correctness of your program, you should use smaller datasets fortesting, especially if you are testing your work on teach.cs. This will avoid overloading the server, andyou will be able to see results faster.

Analyzing the speedup from parallelization  c语言作业代写

An interesting parameter when running your program is the number of children the parent uses. In anideal world, using two parallel processesinstead of one sequential process will allow us to classify thesame number of images in half the time. Or more generally, using n times the number of processing letsus finish classification n times faster. Run your program for n= 2, 3, 4, 5,.. 10 child processes and reporthe time classification took for each value of n. The starter code includes functions for timing howlongthe classification of al test images took. Do you observe a linear speed-up, where k child processesfinish the job k times faster? lf not, why do you think that is the case? (You do not need to submit yourobservations.)

Important Hints

Here are a few things to remember as you implement and test the functions:
1. When you look at the starter code you will see a few lines that look like the following. These linesare there to prevent compiler warnings about unused variables, and should be removed when youmake use of the variables.

(void)K;
(void)dist metric;
(void)num_procs;

2. Test thoroughly before submitting a final version. We are using automated testing tools, so yourprogram must compile and run according to the specifications. In particular, your program mustproduce output in the specified format.

3. As always, build up functionality one small piece at a time.

4. Check for errors on every system or librarycall that is not a print. Send any potential error messagesto stderr, use perror correctly, and exit when appropriate for the error.

5. Be very careful to initialize pointers to NULL and to make sure that you copy any strings that youuse.

6. Close all pipes that you do not need to communicate between the parent process andits children.

Submission and Marking  c语言作业代写

Your program must compile on teach.cs, so remember to testit there before submission. We will beusing the same compileflags in the Makefile.
Your submission minimally needs to include 3 files: knn.c , classifier.c and Makefile . Do not commitany datasets or executable files.
Your program should not produce any error or warning messages when compiled. As with previousassignments, programs that do not compile will receive a 0. Programs that produce warning messageswill be penalized.
Submit your work by committing and pushing the files to the MarkUs git repo.

 

更多代写:管理学代写ASSIGNMENT  gmat保分  高中作业代写  多伦多英语essay代写  法学毕业论文代写  怎么写论文

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

c语言作业代写
c语言作业代写

发表回复