代写数据科学-Data Science代写-Final Project代写
代写数据科学

代写数据科学-Data Science代写-Final Project代写

Final Project

Data Science pipeline on-premises and on-the-cloud

代写数据科学 You work for a travel booking website that wants to improve the customer experience for flights that were delayed.

Objective

In this assignment, you will be provided with a real-world dataset, and you are required to implement the whole pipeline of building the data science pipeline on-premises and on-the-cloud. This includes understanding the business problem, preparing data, exploring the data, performing feature engineering, building, and deploying models.

Introducing the business scenario  代写数据科学

You work for a travel booking website that wants to improve the customer experience for flights that were delayed. The company wants to create a service to let the customers know the likelihood of the flight being delayed based on the weather conditions before they book the flight to or from the busiest airports in US.

You are tasked for solving parts of this problem by using machine learning(ML) to identify how likely the flight will be delayed based on the available weather data. You have been given access to the dataset of the on-time performance of the domestic flights that are operated by large air carriers.You can use this data to train a ML model to predict if the flight will be delayed for the busiest airports.

About the dataset

The provided dataset contains scheduled and actual departure and arrival times reported by certified US air carriers that account for at least 1 percent of domestic scheduled passenger revenues. The data was collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS).

The dataset contains date, time, origin, destination, airline, distance, and delay status of flights for flights between 2014 and 2018.The data are in 60 compressed files, where each file contains a CSV for the flight details in a month for the five years (from 2014 – 2018). The data can be downloaded from this link: [compressed data].

Features of the dataset  代写数据科学

Dataset(s) used in this assignment were compiled by the Office of Airline Information, Bureau of Transportation Statistics (BTS), Airline On-Time Performance Data, available with the following link: [dataset attributes].

Tasks

The tasks of this assignment are divided in two parts as follows:

Part A –Data Science on-premises

(60 marks)

In this part you are expected to:

– Understand the dataset and describe the business problem;

– Document an exploratory data analysis and whenever possible draw conclusions about the analysis;  代写数据科学

– Employ popular graphical modules (matplotlib, seaborn or tableau) to answer questions;

– Implement machine learning techniques to predict whether the flights will be delayed or not.

You are given a Jupyter notebook named “onpremises.ipynb”, which contains a starter code and instructions to go ahead with this part. You are required to answer the questions in this notebook and upload it as a response to this part.

代写数据科学
代写数据科学

Part B –Data Science on-cloud  代写数据科学

(40 marks)

In this part you are expected to:

– Use your skills in performing the machine learning pipeline using the

Amazon SageMaker.

– Compare the results of implementing the ML pipeline on premises versus on the cloud.

You are given a Jupyter notebook named “oncloud.ipynb”, which contains a starter code and instructions to go ahead with this part. You are required to answer the questions in this notebook and upload it as a response to this part.

Deliverables

You are required to submit a compressed (e.g. ZIP) file to the Canvas website of the unit with the following files:

1- A Python Jupyter Notebook with the code for parts A

2- A Python Jupyter Notebook with the code for parts B

3- [Optional] A PDF document with your reflection on the unit highlighting what you likedand what you didn’t.

 

更多代写:澳洲金融代上网课  gmat作弊  高中网课代上  个人陈述代写英文  分析论文代写  土木工程作业代写

合作平台:essay代写 论文代写 写手招聘 英国留学生代写 

数据科学代写
数据科学代写

发表回复