英国final考试代写-ECS709P代写-Final代写

ECS709P Final

英国final考试代写 Depth estimation from the acquired image, either from static image or optical flow while the robot is walking in the room.

Question 1

(a)

Since the vacuum cleaner is only equipped with one camera, to produce a map of the room, the robot must be able to perform the following tasks to produce a map of the room:

Depth estimation from the acquired image, either from static image or optical flow while the robot is walking in the room.

From the image, perform semantic segmentation to classify which part of the image can be walked on.

Combine the two information, and possibly with some SLAM algorithm, to produce a map of the room. 英国final考试代写

(b)

This task the robot is performing is called visual localization, i.e. estimating the position of itself using the keypoint it observed from current view, by comparing itto the map that it built earlier. Such local image features relies heavily on the following properties:

The features must have a high repeatability, that a high proportion of interest points detected under different view angles, light conditions should be visible most of the time.

The features must also have enough variations and characters to be distinguishable in the environment.

The features need to be local, that is, being able to be detected in a small region of the image by a simple algorithm.

(c)

To reduce energy consumption, we can avoid heavier feature detections that arebased on deep learning. We can instead choose a simple feature detector, such as SIFT. For the matching distance function, we can use the Euclidean distance or the Hamming distance, depending on the descriptor we choose. Both SIFT and Euclidean distances are quite suitable to be deployed on a mobile robot due to their low complexity. Such algorithms could also be programmed to the hardware for maximum efficiency.

Question 2

(a)

Since the drone is looking down from the top and need to distinguish different land types. The drone isn’t flying too high, therefore the ground textures should look very homogeneous, spatially, i.e. the textures should be very similar in the same region. Therefore I think the following approaches should be suitable:

statistical approaches that depends on pixel or pixel-pair values (first order or second order statistics) are suitable for low energy consumption. For example, concrete and grass are very different in terms of colour and local texture. A well-designed statistical descriptor should be able to distinguish them. However these descriptors might not be very robust to noise, and might not be able to distinguish between different types of grass. 英国final考试代写

Fourier transform based descriptors are also suitable for this task. Such descriptors characterize an image by frequency components, and are very robust to noise. Since different terrains such as forestation, grassland and concrete have very different frequency components, they should be able to distinguish them.

(b)

To distinguish different types of agricultural vegetation, we need descriptors that can capture shapes, sizes and fine details. I think the following approaches should be

suitable:

Haar feature descriptors are suitable for this task. They are able to capture shapes and sizes of objects by calculating the difference between a neighbour of pixels and the pixel itself.

Data-driven approaches such as deep learning are also suitable for this task.They are able to capture fine details of objects by learning from a large amount of data.

Comparing the choice of descriptors in Question 2(a) and 2(b), we can see that the requirement for classification are different. In Question 2(a), different land types are

very different in terms of frequency components or simple statistics. Therefore,statistical or Fourier transform based descriptors are suitable. However, in Question 2(b), different types of agricultural vegetation are very similar in terms of frequency components or simple statistics. To be able to distinguish between different algricultural vegetation, we need to capture shapes, sizes and fine details of the objects. Therefore, Haar feature descriptors and deep learning are suitable.

(c)

Since the drone would need to recognises roads and rivers consistently, from the camera image, while it is flying. A possible pipeline would be the following:

The drone detects the presense of roads and rivers by performing semantic segmentation on the image.

The drone then performs visual SLAM to estimate its position in the environment. 英国final考试代写

The drone then combines the image segmentation result and the SLAM result to produce a map of the environment, i.e. after data collection, the drone could stitch up the maps, and the segmentation results every 5 seconds (or similar) to generate the final result.

Some alternatives choices are:

Instead of semantic segmentation, we can use deep learning to perform object detection. However, object detection only returns the presence and a vague location of the object, and it is not able to tell us the exact location of the object. Depends on the application, this could be sufficient and save some energy.

Instead of performing visual SLAM, we can use GPS to estimate the position of the drone. Depends on the actual flying condition, GPS might not be able to accurately track the drone in high speed surveying.

Question 3 英国final考试代写

(a)

Since the camera worn on the chest by the police officer is only able to capture the image of the people in front of the officer. And that in a busy morning, people might occlude each other from time to time, in order to count the number of people the police officer encounters during any 2-minute intervals, a possible pipeline would be the following:

The camera detects the presense of people by performing object detection on the image.

For the detected people, the camera then performs object segmentation to extract the person from the background.

The camera then performs a matching model such as Siamese network or DeepMatching to identify the same person in different frames. The same person in the view of the camera is identified by the same ID.

The camera then reports the number of people it has seen in the last 2 minutes, by unique ID.

Some considerations:

The police officer is also walking, therefore we can’t simply use motion detection and object trajectory to ID and cound people.

People might occlude each other from time to time, and they can also walk out of / into the camera view. Therefore, we need to perform person re-identification to identify the same person in different frames.

Simple object detection will tend to over-count the number of people, since it will detect the same person multiple times. Therefore, we need to perform object segmentation to extract the person from the background.

Simple feature matching might not be able to distinguish people in varying poses, lightings and backgrounds. Therefore, we need to perform a matching model such as Siamese network or DeepMatching to identify the same person in different frames.

Given this pipeline, the potential limitations are:

The camera might not be able to detect all the people in the view of the camera. For example, people across the street might be too small to be detected by the camera. 英国final考试代写

The camera might fail to re-identify the same person in different frames especially when the person is changing in size, pose, lighting. 3. The segmentation might fail to extract every instance of the person from the background. For example, if the person is wearing a coat colored as the wall, the segmentation might fail to extract the person from the background.

(b)

The challenges that the viewpoint of the body-worn camera include:

The camera will be blocked by the officer’s arm or tools from time to time, also by people in front of the officer. Under these circumstances,the camera will lose a big part of the view.

The extra motion introduced by the officer will make obtaining a clean and crisp image harder, re-identification will be more difficult.

Some typical errors that might occur:

The camera fail to detect people far away while being blocked.

The camera fail to re-identify the same person in different frames,possibly due to occlusion, or that the person’s second appearance is too different from the first appearance. 英国final考试代写

(c)

We assume that we can build a test set that the expected number of people in each frame is known.

The types of error this camera can generate are:

False positive: the camera detects a person but there is no person in the frame.

False negative: the camera fails to detect a person but there is a person in the frame.

Mis-counting error: the camera fails to re-identify the same person in different frames, generating a new ID for the same person.
Re-identification error: the camera re-identifies a different person as the same person, generating one less ID for the new person.

We can evaluate the accuracy of the results by numerically evaluating the errors generated by the computer vision pipeline described in your answer to Question 3(a). We can calculate the precision, recall and F1 score for each type of error. We can also calculate the overall precision, recall and F1 score for the whole pipeline to evaluate how the system would work end-to-end in practice.

合作平台：essay代写论文代写写手招聘英国留学生代写

ECS709P Final

Question 1

(b)

Question 2

(b)

(c)

Question 3 英国final考试代写

Some considerations:

(b)

(c)

你可能也喜欢

机械工程专业PS代写-Mechanical engineering代写

北美Business Plan代写-商科论文代写-代写business

代写resume-简历代写-英文简历代写

发表回复 取消回复

发表回复取消回复