As a wine fan, Leah was excited to find a wine dataset she could use to build a model that predicts wine quality. The end result didn't go as she expected and she explains why.
It is sometimes intimidating be a non-developer in a developer heavy work environment, especially when our machine learning platform requires developer expertise.
I’ve recently been learning basic coding and I wanted to prove that I could find a dataset, build a model by using our platform, and make predictions - all on my own! I located a wine dataset that peaked my interest, which included metrics about acidity, pH, alcohol content, sugar levels, etc. to determine the wine’s quality with a score 1-10. I'm a bit of a wine connoisseur so this is right up my alley!
I want to use a red wine dataset to build a model so that I can later plug in variables from my favorite red wines to determine the quality ranking (1 - 10 score). I also want to plug in metrics from wines I haven't tried before, predict the quality score, and of course, taste test to confirm the score. My question takes the form "What is the quality of this wine?" and the answer is numeric so it is a regression problem.
The data can be found in the University of California Irvine Machine Learning Repository. The CSV was actually delimited with semicolons, not commas, so I used Visual Studio Code to easily make changes. We will cover common data formatting problems like this in a future blog post, so stay tuned.
Data upload and model creation
- Using Postman I uploaded the CSV and named the dataset "wine".
- I then started a session to build a regression model.
- Our platform automatically identified columns as numeric data types and split training and test sets.
- Using hundreds of algorithms, it tuned and selected a model that best fit the data.
- Once I had a model, I made a request for the confidence metrics to determine if it was any good. The mean absolute percent error was 7%, meaning the model on average is off by 7% of the actual target value. Not bad!
Now the fun stuff...but wait
I am finally able to make wine quality predictions using my regression model! But wait, I need to have wine variables/features like acidity, sulfur, etc. to predict wine quality from the model, and these are nowhere to be found on the label. I scoured the internet and came up empty. I found an article explaining how to measure acidity using a titration kit, but found out there are several different acidity metrics, AND this doesn’t help me identify the other necessary variables to make a prediction.
Regardless of the outcome of this project, I am still excited to say that as a non-developer I was able to use our machine learning platform to succesfully create a model. I learned that in order for the model to be useful I must have access to the required inputs to make predictions. If anyone knows the acidity for a Cabernet (my favorite), I’d love to predict the quality score and taste test.