Random Forest — Business Insights

Draw Business Insights from RF 1. Var Imp: Look at the rank of important variables, if the top one are the least actionable variable, meaning that it’s impossible for company to change that variable, delete it and re-build RF check whether the top variable are continuous or categorical variable continuous variables tend to show upContinue reading “Random Forest — Business Insights”

Random Forest — Method and Application (Python)

Advantage of RF: Only little time is needed for optimization (the default param are good enough) Strong with outliers, correlated variables For continuous variables, it’s able to segmentize it Method: Create a bootstrapped dataset (Sample with replacement) Create a decision tree using the bootstrapped datasetBut only use a random subset of variables at each splitContinue reading “Random Forest — Method and Application (Python)”

EDA and Feature Engineering

Data Preparation Before landing a model for optimization or recommendation model, we need to make sure our data is in “ready-to-go” status. Here, I summarized some ways to clean data for future reference. Descriptive Stat Query and Merge Group and Plot Data fill na, replace and assign values Data Transformation on Column (Log, datetime) CheckContinue reading “EDA and Feature Engineering”