代写数据科学-Data Science代写-Final Project代写

Final Project

Data Science pipeline on-premises and on-the-cloud

代写数据科学 You work for a travel booking website that wants to improve the customer experience for flights that were delayed.

Objective

In this assignment, you will be provided with a real-world dataset, and you are required to implement the whole pipeline of building the data science pipeline on-premises and on-the-cloud. This includes understanding the business problem, preparing data, exploring the data, performing feature engineering, building, and deploying models.

Introducing the business scenario 代写数据科学

You work for a travel booking website that wants to improve the customer experience for flights that were delayed. The company wants to create a service to let the customers know the likelihood of the flight being delayed based on the weather conditions before they book the flight to or from the busiest airports in US.

You are tasked for solving parts of this problem by using machine learning(ML) to identify how likely the flight will be delayed based on the available weather data. You have been given access to the dataset of the on-time performance of the domestic flights that are operated by large air carriers.You can use this data to train a ML model to predict if the flight will be delayed for the busiest airports.

About the dataset

The provided dataset contains scheduled and actual departure and arrival times reported by certified US air carriers that account for at least 1 percent of domestic scheduled passenger revenues. The data was collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS).

The dataset contains date, time, origin, destination, airline, distance, and delay status of flights for flights between 2014 and 2018.The data are in 60 compressed files, where each file contains a CSV for the flight details in a month for the five years (from 2014 – 2018). The data can be downloaded from this link: [compressed data].

Features of the dataset 代写数据科学

Dataset(s) used in this assignment were compiled by the Office of Airline Information, Bureau of Transportation Statistics (BTS), Airline On-Time Performance Data, available with the following link: [dataset attributes].

Tasks

The tasks of this assignment are divided in two parts as follows:

Part A –Data Science on-premises

(60 marks)

In this part you are expected to:

– Understand the dataset and describe the business problem;

– Document an exploratory data analysis and whenever possible draw conclusions about the analysis; 代写数据科学

– Employ popular graphical modules (matplotlib, seaborn or tableau) to answer questions;

– Implement machine learning techniques to predict whether the flights will be delayed or not.

You are given a Jupyter notebook named “onpremises.ipynb”, which contains a starter code and instructions to go ahead with this part. You are required to answer the questions in this notebook and upload it as a response to this part.

Part B –Data Science on-cloud 代写数据科学

(40 marks)

In this part you are expected to:

– Use your skills in performing the machine learning pipeline using the

Amazon SageMaker.

– Compare the results of implementing the ML pipeline on premises versus on the cloud.

You are given a Jupyter notebook named “oncloud.ipynb”, which contains a starter code and instructions to go ahead with this part. You are required to answer the questions in this notebook and upload it as a response to this part.

Deliverables

You are required to submit a compressed (e.g. ZIP) file to the Canvas website of the unit with the following files:

1- A Python Jupyter Notebook with the code for parts A

2- A Python Jupyter Notebook with the code for parts B

3- [Optional] A PDF document with your reflection on the unit highlighting what you likedand what you didn’t.

合作平台：essay代写论文代写写手招聘英国留学生代写

Final Project

Data Science pipeline on-premises and on-the-cloud

Objective

Introducing the business scenario 代写数据科学

About the dataset

Features of the dataset 代写数据科学

Tasks

Part A –Data Science on-premises

Part B –Data Science on-cloud 代写数据科学

Deliverables

你可能也喜欢

代写JAVA设计编程价格-代写JAVA设计编程有哪些坑需要考虑

企业管理final代考-IBUS6020代写-Final Exam代写

留学生学术不端听证会-结果会记入个人的档案吗

发表回复 取消回复

发表回复取消回复