美国PYTHON代写-Financial analysi代写-data science代写

Case analysis

美国PYTHON代写 Family is a discrete variable, with number ranged in [1,2,3,4]. All the other variables are continuous variables.

Part (a) Data Cleaning and Basic Data Exploration

I have imported the excel data using pandas library in python. The basic descriptive statistics for columns with numerical data is shown in the table below:

	Age	Experience	Income	Family	CCAvg	Mortgage	Personal Loan	Brokerage Account	GIC Account	Online	CreditCard
count	4499	4499	4499	4499	4499	4499	4499	4499	4499	4499	4499
mean	47.2683	77.2683	74.3034	2.39809	1.95183	56.0856	0.10669	0.108691	0.0649033	0.592132	0.296733
std	11.4547	11.4547	46.3106	1.14704	1.75648	101.788	0.308754	0.311286	0.246383	0.491493	0.456868
min	25	55	8	1	0	0	0	0	0	0	0
25%	37	67	39	1	0.7	0	0	0	0	0	0
50%	47	77	64	2	1.6	0	0	0	0	1	0
75%	57	87	99	3	2.6	99.5	0	0	0	1	1
max	69	99	224	4	10	617	1	1	1	1	1

From the descriptive statistics, we can see that there are a few binary data:

Personal Loan
Brokerage Account
GIC Account
Online
Credit Card

Family is a discrete variable, with number ranged in [1,2,3,4]. All the other variables are continuous variables. From the distribution of the data, we observed that little clients have mortgages, and their income are more likely to be in power-law distribution (skewed) rather than normal.

Part (b) visualization 美国PYTHON代写

Visualization 1: Correlations between features

The first visualization is the pairwise scatter plot and histogram for the numerical and boolean features.

The above is a pairwise scatter plot of some numerical and boolean features. We can see that:

1.Experience and Age are basically correlated perfectly, we will drop one of them.

2.The Income level doesn’t correlate to Mortgage, Family Size and use of other financial products in the bank including credit card.

Visualization 2: features grouped by result

Since we want to see which type of clients are more likely to accept personal loan promotions, we therefore grouped by the result and plotted the distribution for each features above. Note that we saw Experience are highly correlated with Age, then we removed this feature.

Above is the overlayed histogram, grouped by Personal Loan

We can see that:

1.People with higher income are more likely to use Personal Loan.

2.The likelihood of getting a personal loan increases with the family size .

3.People with higher CCAvg are more likely to use Personal Loan.

Visualization 3: Does bank branch make a difference

We are also interested in whether some branches are better at promoting personal loans to their clients. Therefore we showed each branch’s total mortgage outstanding, as well as the percentage of clients who accepted the personal loan promotion:

we can see that Branches with larger business volume don’t typically have a higher conversion rate. In fact, the conversion rate is unrelated to the size of their business.

Visualization 4: Does adviser make a difference

The same visualization then applied to advisors.

Similarly, advisors with most clients (in terms of mortgage demand) didn’t have the highest conversion rate as well.

Part (c) Business analytics 美国PYTHON代写

We will group by Advisor name and then calculate how successful each advisor was in terms of:

selling most Personal Loan products
highest conversion rate

And below are the 10 most successful advisors by each of the standard

Sold most personal loan products

Advisor Name	Personal Loan
Gita Pinelli	16
Kathaleen Horgan	15
Prudence Masters	15
Jacqueline Leveque	14
Corazon Eastin	13
Erik Clinard	12
Vicki Sowers	12
Val Sauceda	12
Siobhan Flaugher	12
Carylon Race	12

Highest conversion rate

Advisor Name	Personal Loan
Prudence Masters	0.182927
Kathaleen Horgan	0.159574
Gita Pinelli	0.152381
Eulah Kicklighter	0.148148
Jaquelyn Cubbage	0.144578
Weston Jeon	0.142857
Jacqueline Leveque	0.142857
Val Sauceda	0.139535
Corazon Eastin	0.138298
Carylon Race	0.136364

Part (d) Artificial Intelligence 美国PYTHON代写

In this part we will try to fit 3 different models to the data and predict whether a customer will respond positively to a promotion activity and apply for a Personal Loan (Personal Loan).

The features we used are based on historical behavior of the customers:

Age
Experience
Income
Family
CCAvg
Mortgage
Brokerage Account
GIC Account
Online
CreditCard

And we will classify the customers into Personal Loan = 1 and Personal Loan = 0. The dataset will be randomly split into 80% training set and 20% testing set. Because this result is binary, so we tried 3 different models:

Logistic Regression

The logistic regression model got the following classification result:

precision recall f1-score support

0 0.94 0.98 0.96 802
1 0.70 0.47 0.56 98

accuracy                           0.92       900
macro avg       0.82      0.72      0.76       900
weighted avg       0.91      0.92      0.91       900

It has an average accuracy of 92%, which is not bad, however we can see that the recall rate for people who would want a personal loan product is merely 47%. That means we will fail to locate more than half of selling opportunities. 美国PYTHON代写

Support Vector Machine

The SVM model got the following results:

precision recall f1-score support

0 0.90 0.99 0.94 802
1 0.67 0.10 0.18 98

accuracy                           0.90       900
macro avg       0.78      0.55      0.56       900
weighted avg       0.88      0.90      0.86       900

The performance of SVC is not as good as logistic regression. It had a strong bias towards personal_loan=0 result.

Random forest

Finally, we tried the random forest model and it appeared to be the best model among the 3:

precision recall f1-score support

0 0.96 0.99 0.97 802
1 0.92 0.62 0.74 98

accuracy                           0.95       900
macro avg       0.94      0.81      0.86       900
weighted avg       0.95      0.95      0.95       900

The random forest model performed the best, with on average 95% accuracy and most importantly, 62% recall on people who “would” actually apply for Personal Loan, which is the highest of 3 candidate models.

合作平台：essay代写论文代写写手招聘英国留学生代写

Case analysis

Part (a) Data Cleaning and Basic Data Exploration

Part (b) visualization 美国PYTHON代写

Visualization 1: Correlations between features

Visualization 2: features grouped by result

Visualization 3: Does bank branch make a difference

Visualization 4: Does adviser make a difference

Part (c) Business analytics 美国PYTHON代写

Part (d) Artificial Intelligence 美国PYTHON代写

Logistic Regression

Support Vector Machine

Random forest

你可能也喜欢

毕业论文大纲-一定不可以缺失下面这些项目

算法代考-final exam代写-Algorithm算法代写

clustering-analysis代写 – 聚类分析代写 – Python语言代做

发表回复 取消回复

发表回复取消回复