Explanation of Dataset
data-set There is only one table named Payments. The table tracks every time someone pays for a service. Every row represents a payment that was made ···
There is only one table named Payments. The table tracks every time someone pays for a service. Every row represents a payment that was made including the time the payment was made, the id of the user that made the payment and the amount that the payment was for
Question to answer data-set
We are interested in seeing the Average Revenue Per Customer (ARPC) over time (up to 365 days). This is used to calculate how much revenue we can expect to get from a customer after a certain amount of time.
In order to calculate the average revenue per customer at a particular day (say day n), we need to calculate 2 things:
- How many customers are at least n days old (at least n days between first purchase and today)?
- Of those customers above, what was the cumulative total of purchases in their first n days (all purchases made within n days of their first purchase).
Taking answer 2 and dividing by answer 1 gives us the the average revenue per customer at n days – Hint: you can use the psql function generate_series to create the numbers between 1 and 365
Example data-set
As an example (not representative of the actual dataset):
Assume we have 10 customers.
1.2 of these customers’ oldest purchase is 25 days ago
2.8 of these customers’ oldest purchase is 50 days ago
3.All 10 customers have an age of at least 25 days
If we want to calculate the ARPC at 25 days, we
- take all customers with an age of at least 25 days (10 customers)
- sum up their purchases that ocurred within their first 25 days as a customer
- divide by the number of customers with an age of at least 25 days
If we want to calculate the ARPC at 50 days, we
- take all customers with an age of at least 50 days (8 customers)
- sum up their purchases that ocurred within their first 50 days as a customer
- divide by the number of customers with an age of at least 50 days
Note: Notice how the purchases of the 2 customers with an age of 25 days are not included in the calcuation of ARPC at 50 days.
Expected Output data-set
The output should be in the form:
Where $x represents the calculation above at day 0, $y represents the calculation at day 1, $z represents it at day 365 etc