In a recent post we discussed data preparation for machine learning algorithms and feature selection at a high level. In this post, we discuss feature engineering methods in greater detail.
There are three feature engineering methods we are going to cover with simple examples: bucketing, crossing, and embedding. We will use a toy dataset I created called "Height Data."
Bucketing is a type of feature engineering that is related to one-hot encoding. Whereas one-hot encoding maps categorical features to bits in a bit string, bucketing maps ranges of a variable into a bit string.
Example: Suppose you want to predict height in inches. You may want to include age as a feature in your model. If you think closely about height and age, it is not a linear relationship:
- Variance in height between years is greater before the age of 18 because infants, children, and teenagers are growing rapidly.
- Height generally levels off from 20 to 50 years old once adults reach peak height.
- Height decreases after the age of 50 due to aging changes in bones, muscles, and joints.
Grouping age into buckets allows a classifier to capture the relationship between age and height by learning weights for each bucket. See the new bucketed data below.
Now we are able to model a different relationship between variables. Of course the buckets here are arbitrary and are best decided by your understanding of, and best guesses at, the relationships in your data. This is where the value of experimenting with multiple models can help you build the best single model you want to use in an ongoing way.
Crossing is a feature engineering technique where you combine other features into a new feature. Some models, like Factorization Machines and Neural Networks, do this sort of feature engineering automatically; it's inherent to the structure of the model.
Example: If you're trying to predict height you would likely consider gender because height varies between males and females. Below age has been crossed with gender to create new features.
The age range buckets are similar to what we did above. Yet, we now have an additional variable within the same encoded column doubling the number of features as we now have one range per gender. In your own data there could be many more options, further increasing the number of features created by this method. Whether or not the growth in features is useful is again a good candidate for trial and error. The Nexosis API will attempt to do feature reduction for you as well so don’t be afraid to throw lots of features in there.
Embedding is a more sophisticated feature engineering technique. Embedding methods look for low-dimensional representations of high-dimensional data.
Example: If you are predicting height you might not care about geographic location by city. An embedding method would map cities to a new state feature as seen in the table below.
Other examples of embedding methods could be creating new features that are linear combinations of existing ones.
In the next post of our data education series we will discuss the importance of experimentation in machine learning.
Other data education posts: