<img src="https://certify.alexametrics.com/atrk.gif?account=J5kSo1IWhd105T" style="display:none" height="1" width="1" alt="">

Nexosis @ Work & Play

Building the Bigfoot Classinator: The Model

March 1, 2018 @ 2:56 PM | Musings, Technical


In part 3 of Building the Bigfoot Classinator series, Guy discusses how he built and tested the model.


This is part 3 of a four-part series where I explain how I built the Bigfoot Classinator. In part 1, I talked about the problem we are solving and the approach I used. In part 2, I told you how I munged the data and got it loaded into the Nexosis API. In this part, I'll show you how I built and tested the model.

Go back and read the other parts if you need some catching up. Or blithely plod forward with no guidance from the past. I'm not gonna tell you how to live your life.


It's only a model

Models are the things in machine learning that do the predicting. They are, in a sense, an intersection of your data and an algorithm. Typically, when you build a model, you divide your data into test data and train data, you pick an algorithm and adjust its parameters, and then you build a model. Usually, your first model is, shall we say, suboptimal. And so you tweak you your data, your parameters, and build it again. And again. And again. Eventually, you get something that is good, or at least good enough.

With the Nexosis API, we do the grunt work for you. So, all you need to do is make an HTTP request to start a model building session.

With the Nexosis API, we do the grunt work for you. So, all you need to do is make an HTTP request to start a model building session. To kick off the model build for the Bigfoot Classinator, I used Postman to make an HTTP POST request against the following URL.

https://ml.nexosis.com/v1/sessions/model

 

The body of this request provided instruction to the API on the type of model I wanted to build, the field that was to be targeted, and the data source for the model to be built from.

{
    "dataSourceName": "bigfoot-classes",
    "targetColumn": "reportClass",
    "predictionDomain": "classification"
}

The dataSourceName contains the name of the dataset I uploaded. It could also contain a view, which would be a joining of multiple datasets. This is why it's called the dataSourceName and not the dataSetName.

The targetColumn is the column I wanted to predict. I wanted to predict the reportClass so I put that in there. Nothing fancy here.

The predictionDomain contains the type of machine learning problem I wanted to do. Since I was predicting a class, I chose classification. Other options include regression (if I wanted to predict a number instead of a class) and anomalies (if I wanted to find anomalous records). I didn't want those other things for this application, so classification it is.

As always, if you are following along, don't forget to include HTTP headers for your Content-Type of application/json and your api-key.


Wait

Once I submitted this request, I got the following response. Well, not exactly this response. I've trimmed some of the extra bits from it that you don't need to worry about for today. Don't fret, if you're following along, you'll get to see them.

{
    "sessionId": "0161d8d2-771d-46f0-a9db-f9ddcc53ba50",
    "type": "model",
    "status": "requested",
    "predictionDomain": "classification",
    "requestedDate": "2018-02-27T19:52:07.165831+00:00",
    "statusHistory": [
        {
            "date": "2018-02-27T19:52:07.165831+00:00",
            "status": "requested"
        }
    ],
    "dataSourceName": "bigfoot-classes",
    "targetColumn": "reportClass",
}

A lot of what I got back was stuff I put in it. We can see the dataSourceName, targetColumn, and predictionDomain that I entered earlier. We also get a sessionId, a status, and statusHistory. What is all that?

Here's the thing, sometimes it can take a while to build a model. And the more data, the longer it takes. So, instead of keeping an HTTP session open that long and all that that implies to server-side threading models, we return a sessionId that can be used for polling the server to get the status of the session.

Of course, I wanted to check the status immediately, so I did. I just made a simple HTTP GET to this URL replacing <sessionId> with the sessionId I got back.

https://ml.nexosis.com/v1/sessions/<sessionId>

 

And I got this result—redacted as before—in return.

{
    "sessionId": "0161d8d2-771d-46f0-a9db-f9ddcc53ba50",
    "type": "model",
    "status": "started",
    "predictionDomain": "classification",
    "requestedDate": "2018-02-27T19:52:07.165831+00:00",
    "statusHistory": [
        {
            "date": "2018-02-27T19:52:07.165831+00:00",
            "status": "requested"
        },
        {
            "date": "2018-02-27T19:52:07.9877818+00:00",
            "status": "started"
        }
    ],
    "dataSourceName": "bigfoot-classes",
    "targetColumn": "reportClass",
}

We can see now that the model building session had started. It was time to wait.


Testing the model

Whew! This model took about 40 minutes to build. Here's the final session status.

{
    "sessionId": "0161d8d2-771d-46f0-a9db-f9ddcc53ba50",
    "type": "model",
    "status": "completed",
    "predictionDomain": "classification",
    "modelId": "eceb8e53-706c-49e3-933a-d8b3fed1b871",
    "requestedDate": "2018-02-27T19:52:07.165831+00:00",
    "statusHistory": [
        {
            "date": "2018-02-27T19:52:07.165831+00:00",
            "status": "requested"
        },
        {
            "date": "2018-02-27T19:52:07.9877818+00:00",
            "status": "started"
        },
        {
            "date": "2018-02-27T20:31:09.8452378+00:00",
            "status": "completed"
        }
    ]
    "dataSourceName": "bigfoot-classes",
    "targetColumn": "reportClass",
}

You will note a new field on the results. The modelId. This is the model that has been created. It can be tested using Postman. So, I did just that. I hit the following URL with an HTTP POST, replacing <modelId> with the returned modelId.

https://ml.nexosis.com/v1/models/<modelId>/predict

 

For the body of the request, I provided reports of two Bigfoot sightings in an array. Personally, I think these are great works of fiction. No commentary on Bigfoot here, it's just that I made them up.

{
    "data": [
        { "observed": "I saw Bigfoot in the woods." },
        { "observed": "I found a big footprint by the woodshed." }
    ]
}

The result, which returned immediately, echoed back the observed sightings text and the reportClass the model predicted for that sighting text.

{
    "data": [
        {
            "observed": "I saw Bigfoot in the woods.",
            "reportClass": "class a"
        },
        {
            "observed": "I found a big footprint by the woodshed.",
            "reportClass": "class b"
        }
    ],
}

Seeing Bigfoot in the woods is certainly a Class A sighting. Finding a big footprint is a Class B sighting. Looks like it worked. Hooray!


Next time on the Bigfoot Classinator

Next time, I'll be posting the exciting conclusion to this series where I write some code around this model to make something an end user would actually use. As always, the code for this is on GitHub. Go check it out.


All Bigfoot Classinator posts:


Ready to start building machine learning applications?

Sign up for free  Talk to an expert


Guy Royse

Guy is one of our developer evangelists at Nexosis. He spends his days sharing with developers why our API is so great and his nights reminiscing about Hogwarts and dreaming of retiring to his dream job: Santa Claus.