1. Funnel Analysis
Chose which metrics:
- Click through rate (Growth):
- (i.e. how many customer clicked on it after seeing the ad)
- # clicked / # shown
- Goal: bring customer to the site.
- Pros:
- able to identify demand from users
- allow estimate to test and improve ad features to maximize CTR (since it’s only take account user behavior after seeing ad)
- Cons:
- No information of the 2nd layer and 3rd layer behavior, thus does not know what user does once they come to the site
- Cost of bringing user to site is not considered
- User conversion remains unsolved, even with high CTR
- (might be click-bait ads)
- (i.e. how many customer clicked on it after seeing the ad)
- Conversion Rate (Growth and Product)
- (i.e. how many customer converted after seeing the ad)
- # converted / # shown
- Pros:
- the final goal of an ad is to get customer, this metric is represents it
- Cons:
- No info about cost
- Highly depend on product: Ad could be great, fair product, low conversion
- Conversion might happen after a long time, even after visit multiple times(it’s hard to link conversion to click)
- Profit, revenue from conversions (Business / Product)
- Not useful for growth, used by large companies (main goal: being profitable)
- total_revenue – clicked * avg_cost_per_click
- Pros:
- Business driven
- Cons:
- Profit might happen after a long time(hard to link profit to click)
- Only consider users after they clicked on ad
- No information about top funnel (people see ad did not click)
a. Data:
Two format of data is available for funnel analysis.
- each row represents a user visited website
- column: user_id, layer1, layer2, layer3, date
- Value: layer1, layer2, layer3 is binary
- if one user was only able to get to layer 2, then it’s 1, 1, 0
- each row represents a campaign or ad
- column: ad, layer1 , layer2, layer3, date
- Value: layer1, layer2, layer3 is continuous variable indicates total amount of user able to get to that layer on that day
b. Steps:
- data preparation: merge / logic check
- note: always check with the flow of funnel
- i.e. layer1 > layer2 > layer3
- plot a total count of visit vs date
- for each campaign / ads/ device/ feature you are interested in
- check for sudden drop, it’s usually a bug, or error in data collection
- if count after sudden drop went back to normal:
- remove those drops using count > 0
- if count after sudden drop do not went back to normal:
- bug need to fix
- if count after sudden drop went back to normal:
# some useful code
# 1. plot date vs layer, hue = interested_feature
data = data.query('layer1 > 0')
g=sns.lineplot(x="date", y="layer1",hue="interested_feature", data=data, legend="")
g.xaxis.set_major_locator(mdates.MonthLocator())
g.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
g.set_title("Layer1")
plt.show()
# 2. plot date vs multiple layers, hue = interested_feature
#funnel as a time series
funnel = data.groupby(['date', 'interested_feature']).apply(
lambda x: pandas.Series({
'to_layer2' : x['layer2'].mean()/x['layer1'].mean(),
'to_layer3': x['layer3'].mean()/x['layer2'].mean(),
'to_layer4': x['layer4'].mean()/x['layer3'].mean()
})).reset_index()
# Make the dataset long for the plot
funnel = pandas.melt(funnel,id_vars=['date', 'interested_feature'])
# Plot it
# the row will be to_layer2, 3, 4
g = sns.FacetGrid(funnel, hue="interested_feature", row="variable", aspect=4, sharey=False)
g.map(sns.lineplot, "date", "value")
g.set_axis_labels("", "Conversion Rate")
g.set(xticks=funnel.date[2::30].unique())
plt.xticks(rotation=45)
plt.legend(loc='upper right')
plt.show()

- draw business insights:
- Random Forest (Var Imp + PDP) + Decision Tree
- If only a few feature is available,
- for each feature
- if it’s binary, use 2-tail t-test
- if multiple level, multiple comparison using Bonferroni Correction
- simply group data based on interested feature and see count and mean for each layer
- for each feature
- Note: sometimes difference might come from the other feature
- i.e. when grouping, group by 2 features to investigate where the difference come from.
- Remember, a feature could always be a proxy for the other
test = stats.ttest_ind(data.loc[data['interested_feautre'] == 'level1']['layer3'],
data.loc[data['interested_feautre'] != 'level1']['layer3'], equal_var=False)
test
- Some useful index:
- CTR: clicked number / shown number
- Profit: Total Revenue – Clicked * Average cost per click
- Count for layer interested
- Some useful model:
- RF for business insights
- Time series for future prediction
c. General rule:
- More mobile user than desktop
- Growing mobile should be priority for company
- If one device has lower conversion rate:
- Product
- Product team look into the difference among different version and see if UI/UX can be improved
- Marketing
- Different platform may attract different users
- Maybe a proxy for user distribution
- If a platform has higher conversion rate but with low amount of user:
- Maybe it’s only attracting power users, who is likely to convert
- Note:
- this will be more reasonable
- if one platform has lower absolute amount of users
- Solution:
- Collect more data about users,
- find if user distribution is the same across the devices
- Same: product problem
- Not same: not product problem
- Product
- Sudden drop of Conversion Rate:
- Bug / Bad product changes
- Expected Bouncer from home page to search page: 0.5
- If rate is not reasonable for Search page to payment page:
- Look into the characteristics of search queries
- See if highly searched item is not available in our site
- Understand lack of supply compared to user demand.
2. Subscription Retention Problem
- Data:
- Pick a interested billing cycling
- Collect user information subscribed in that same billing cycling
- Column: user_id, sign_up_date, service_rate, ….., billing cycles
- Value: billing cycles is continuous, indicates how many billing cycle that user stays, a.k.a., how long that user subscribed to our service.
- Terms:
- Retention Rate for a billing cycle :
- How many people is still subscript to service at the end of that billing cycle
- Life Time Value (LTV)
- Can be defined as one year average customer value
- i.e. for data less than one year and model is really good,
- Predict monthly retention
- Sum it up for 12 month, get the yearly average customer value
- (Note: if it’s subscription retention problem, need to add one month fee after sum)
- Can be defined as one year average customer value
- Retention Rate for a billing cycle :
- Insights:
- Build a table with interested feature + retention rate + count + revenue
- Retention rate: percentage of current active user
- Count: total amount of user
- Revenue: average of customer revenue (avg(price * month for each customer))
- Build a table with interested feature + retention rate + count + revenue
- Plot with y= i or ii or iii, x = interested feature, hue = different pricing
- Focus on Revenue plot:
- This will give general idea of LTV for each level
- See which level is doing the best, and pay attention to abnormal levels
- Combine with retention rate plot:
- Is a particular level is doing well because of a high retention rate?
- Focus on Revenue plot:
- Goal:
- Shift users towards the optimal price point ,
- Thus increase LTV, revenue gain, more click on ads, grow faster
- Solution
- UI design:
- Change the UI so that, for each level, the optimal price (highest LTV) is more prominent
- Make the best price has most attractive features:
- Look into how users from each level are using the product
- Figure out the tool features they like the most,
- So that best LTV price has the most attractive tool for each level
- Coupon:Targeted / regional coupons
- Example:
- Levels with low count but high LTV and retention_rate:
- Big opportunity, try to increase those customer
- (discount, campaign)
- UI design: