The wings of a computer engineer
The wings of a computer engineer

Personal blog for Timothy D Meadows II

ʍɐɔ ʍɐɔ ʍɐɔ

Share


Twitter


Language Understanding, Part 2

Timothy D Meadows IITimothy D Meadows II

In the last article on language understanding we went over the basic concepts of what language understanding can be, and how this might apply to an intelligent application that accepts free user input.

Microsoft .NET provides a host of ways to train, and consume AI models. This includes both online (azure), and offline (ml.net). By default all code samples will be utilizing CPU pipelines for maximum compatibility. However, if you have a modern GPU that supports CUDA you should be able to utilize GPU pipelines as well as CPU pipelines with minimal code changes.

Requirements

First we need to create an empty C# .NET 6 or higher project in your editor of choice (mine is often Visual Studio).

Next, we need to install the packages we require, in this tutorial that's only one package. You can install this using nuget console, or just searching out the package by name using manage packages in your ide.

Install-Package Microsoft.ML  

Models

Not to be confused with AI models, most modern programming paradigms also have a concept known as "models". These models are structures meant to represent how pieces of data look.

When working with AI models you will almost always need a minimum of two models.

  1. A model that represents the structure of our input data.
public class InputData  
{
    [LoadColumn(0)]
    public string Text;

    [LoadColumn(1)]
    public string Label;
}

Note: LoadColumn specifies the order in which we intend the data to be in.

  1. A model that represents the structure of our output data.
public class OutputData  
{
    [ColumnName("PredictedLabel")]
    public string Prediction { get; set; }
}

Note: ColumnName in our model also represents the column in the output data that we want.

Training

Model training requires data that matches our input structure we specified above. Normally this data is supplied through a data set that's often in the form of a spreadsheet, or database. This typically requires sanitizing, or shaping the data before it's use. However, in this article we will just be hard coding a training set of data for the purpose of education.

var data = new List<InputData>  
{
    new InputData { Text = "Hello", Label = "Greeting" },
    new InputData { Text = "Bye", Label = "Farewell" },
    new InputData { Text = "Can you check order", Label = "OrderStatus" }
};

Next, we need to load our data into a format that the Microsoft.ML pipeline can consume for training.

var context = new MLContext();  
var trainingData = context.Data.LoadFromEnumerable(data);  

Next, we need to fit, and train our model with the data we going to be giving it. Further, because pipelines are designed to be generic. We also need to set various settings letting the pipeline know what we are training, and finally which output columns we should be populating our results.

var pipeline = context.Transforms.Conversion.MapValueToKey(inputColumnName: "Label", outputColumnName: "Label")  
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Text", outputColumnName: "Features"))
.Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

var model = pipeline.Fit(trainingData);  

Consuming

Finally, we can start testing our new model! We can do this by first creating an engine for our training model. In this case, a prediction engine.

var engine = context.Model.CreatePredictionEngine<InputData, OutputData>(model);  

Next, we can create a sample input and pass it to our model for classification (prediction).

var sentence = new InputData { Text = "Can you check order 12345?" };  
var intent = engine.Predict(sentence);

Console.WriteLine($"Text: {sentence.Text}");  
Console.WriteLine($"Intent: {intent.Prediction}");  

Lastly, we can see that our sentence contains an entity, or in this case an order number. Once we determine that the intent of the sentence to get the status of that order, we can then work on extracting that entity ourselves.

if (intent.Prediction == "OrderStatus") {  
    var orderIdRegex = new Regex(@"order\s(\d+)");
    var match = orderIdRegex.Match(sentence.Text);
    if (match.Success)
    {
        Console.WriteLine($"Entity: {match.Groups[1].Value}");
    }
}

ʍɐɔ ʍɐɔ ʍɐɔ

Comments