## Case analysis

### Part (a) Data Cleaning and Basic Data Exploration

I have imported the excel data using pandas library in python. The basic descriptive statistics for columns with numerical data is shown in the table below:

 Age Experience Income Family CCAvg Mortgage Personal Loan Brokerage Account GIC Account Online CreditCard count 4499 4499 4499 4499 4499 4499 4499 4499 4499 4499 4499 mean 47.2683 77.2683 74.3034 2.39809 1.95183 56.0856 0.10669 0.108691 0.0649033 0.592132 0.296733 std 11.4547 11.4547 46.3106 1.14704 1.75648 101.788 0.308754 0.311286 0.246383 0.491493 0.456868 min 25 55 8 1 0 0 0 0 0 0 0 25% 37 67 39 1 0.7 0 0 0 0 0 0 50% 47 77 64 2 1.6 0 0 0 0 1 0 75% 57 87 99 3 2.6 99.5 0 0 0 1 1 max 69 99 224 4 10 617 1 1 1 1 1

From the descriptive statistics, we can see that there are a few binary data:

• Personal Loan
• Brokerage Account
• GIC Account
• Online
• Credit Card

Family is a discrete variable, with number ranged in [1,2,3,4]. All the other variables are continuous variables. From the distribution of the data, we observed that little clients have mortgages, and their income are more likely to be in power-law distribution (skewed) rather than normal.

### Visualization 1: Correlations between features

The first visualization is the pairwise scatter plot and histogram for the numerical and boolean features.

The above is a pairwise scatter plot of some numerical and boolean features. We can see that:

1.Experience and Age are basically correlated perfectly, we will drop one of them.

2.The Income level doesn’t correlate to Mortgage, Family Size and use of other financial products in the bank including credit card.

### Visualization 2: features grouped by result

Since we want to see which type of clients are more likely to accept personal loan promotions, we therefore grouped by the result and plotted the distribution for each features above. Note that we saw Experience are highly correlated with Age, then we removed this feature.

Above is the overlayed histogram, grouped by Personal Loan

We can see that:

1.People with higher income are more likely to use Personal Loan.

2.The likelihood of getting a personal loan increases with the family size .

3.People with higher CCAvg are more likely to use Personal Loan.

### Visualization 3: Does bank branch make a difference

We are also interested in whether some branches are better at promoting personal loans to their clients. Therefore we showed each branch’s total mortgage outstanding, as well as the percentage of clients who accepted the personal loan promotion:

we can see that Branches with larger business volume don’t typically have a higher conversion rate. In fact, the conversion rate is unrelated to the size of their business.

### Visualization 4: Does adviser make a difference

The same visualization then applied to advisors.

Similarly, advisors with most clients (in terms of mortgage demand) didn’t have the highest conversion rate as well.

### Part (c) Business analytics  美国PYTHON代写

We will group by Advisor name and then calculate how successful each advisor was in terms of:

• selling most Personal Loan products
• highest conversion rate

And below are the 10 most successful advisors by each of the standard

Sold most personal loan products

 Advisor Name Personal Loan Gita Pinelli 16 Kathaleen Horgan 15 Prudence Masters 15 Jacqueline Leveque 14 Corazon Eastin 13 Erik Clinard 12 Vicki Sowers 12 Val Sauceda 12 Siobhan Flaugher 12 Carylon Race 12

Highest conversion rate

 Advisor Name Personal Loan Prudence Masters 0.182927 Kathaleen Horgan 0.159574 Gita Pinelli 0.152381 Eulah Kicklighter 0.148148 Jaquelyn Cubbage 0.144578 Weston Jeon 0.142857 Jacqueline Leveque 0.142857 Val Sauceda 0.139535 Corazon Eastin 0.138298 Carylon Race 0.136364

### Part (d) Artificial Intelligence  美国PYTHON代写

In this part we will try to fit 3 different models to the data and predict whether a customer will respond positively to a promotion activity and apply for a Personal Loan (Personal Loan).

The features we used are based on historical behavior of the customers:

• Age
• Experience
• Income
• Family
• CCAvg
• Mortgage
• Brokerage Account
• GIC Account
• Online
• CreditCard

And we will classify the customers into Personal Loan = 1 and Personal Loan = 0. The dataset will be randomly split into 80% training set and 20% testing set. Because this result is binary, so we tried 3 different models:

#### Logistic Regression

The logistic regression model got the following classification result:

precision    recall  f1-score   support

0       0.94      0.98      0.96       802
1       0.70      0.47      0.56        98

accuracy                           0.92       900
macro avg       0.82      0.72      0.76       900
weighted avg       0.91      0.92      0.91       900

It has an average accuracy of 92%, which is not bad, however we can see that the recall rate for people who would want a personal loan product is merely 47%. That means we will fail to locate more than half of selling opportunities.  美国PYTHON代写

#### Support Vector Machine

The SVM model got the following results:

precision    recall  f1-score   support

0       0.90      0.99      0.94       802
1       0.67      0.10      0.18        98

accuracy                           0.90       900
macro avg       0.78      0.55      0.56       900
weighted avg       0.88      0.90      0.86       900

The performance of SVC is not as good as logistic regression. It had a strong bias towards personal_loan=0 result.

#### Random forest

Finally, we tried the random forest model and it appeared to be the best model among the 3:

precision    recall  f1-score   support

0       0.96      0.99      0.97       802
1       0.92      0.62      0.74        98

accuracy                           0.95       900
macro avg       0.94      0.81      0.86       900
weighted avg       0.95      0.95      0.95       900

The random forest model performed the best, with on average 95% accuracy and most importantly, 62% recall on people who “would” actually apply for Personal Loan, which is the highest of 3 candidate models.