Plan
We will create Vector Embeddings of our Information title and description and then perform cosine search to get similarity score.
We will tune the similarity score and thus we will get similar information stored that does not have exact match in title but their context is the same.
Observation
There are many services for vector embeddings. GitHub Models have some nice free options. So I will use text embedding 3 small offered by OpenAI. For reference
check out here
.
Practical
View Full Code
Let’s create a new project XAF Blazor. I am using XPO as ORM.
Take a new business object with 2 properties (Title, Description) and another calculated property (Semantic Document) to hold them together.
[Size(SizeAttribute.DefaultStringMappingFieldSize)]
public string Title
{
get => title;
set => SetPropertyValue(nameof(Title), ref title, value);
}
[Size(SizeAttribute.Unlimited)]
public string Description
{
get => description;
set => SetPropertyValue(nameof(Description), ref description, value);
}
[VisibleInDetailView(false), VisibleInListView(false)]
public string SemanticDocument => $"{Title}\n{Description}";
To store vector embeddings lets take a byte property ‘Embedding’ and to store similarity score take a double non persistent property ‘Similarity Score’.
[Size(SizeAttribute.Unlimited)]
[VisibleInDetailView(false), VisibleInListView(false)]
public byte[] Embedding
{
get => embedding;
set => SetPropertyValue(nameof(Embedding), ref embedding, value);
}
[NonPersistent]
[ModelDefault("DisplayFormat", "0.000")]
[ModelDefault("AllowEdit", "False")]
public double SimilarityScore
{
get => similarityScore;
set => SetPropertyValue(nameof(SimilarityScore), ref similarityScore, value);
}
private double similarityScore;
The AI Services
To use Github models (text embedding 3 small from OpenAI), we need “OpenAI” package reference. We will add the package reference to our platform agnostic projects i.e., the module project.
dotnet add package OpenAI
Now go to the Github models and get your personal access token. We will save that access token in appsettings.json like this.
"ConnectionStrings": {
"ConnectionString": "Data Source=(localdb)\\mssqllocaldb;Integrated Security=SSPI;Pooling=false;Initial Catalog=SemanticSearchBlog",
"EasyTestConnectionString": "Data Source=(localdb)\\mssqllocaldb;Integrated Security=SSPI;Pooling=false;Initial Catalog=SemanticSearchBlogEasyTest"
},
"GitHubModels": {
"Token": "your_github_tokens"
},
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information",
"DevExpress.ExpressApp": "Information"
}
Now we will create Get Embedding service and vectormath in services folder of the module project.
Get Embedding service is creating a Embedding Client. Here check how we retrieved our Github model token from appsettings.json. You can more securely save token in other more secure storage.
private EmbeddingClient CreateClient()
{
var token =_config["GitHubModels:Token"] ?? _config["GITHUB_MODELS_TOKEN"] ?? Environment.GetEnvironmentVariable("GITHUB_TOKEN");
if (string.IsNullOrWhiteSpace(token))
throw new InvalidOperationException("GitHub Models token is missing.");
var options = new OpenAIClientOptions
{
Endpoint = new Uri("https://models.github.ai/inference")
};
return new EmbeddingClient("openai/text-embedding-3-small",new ApiKeyCredential(token),options);
}
Finally we are accepting text and returning generated embeddings.
public byte[] GenerateEmbedding(string text)
{
var result = Client.GenerateEmbeddings(new[] { text });
var embeddings = result.Value;
var vector = embeddings[0].ToFloats().Span;
return FloatVectorToBytes(vector);
}
Vectormatch.cs performs the cosine similarity and return a scrore less than 1.
View Full Code
Call AI Services
Generate embeddings may take a long time for long text. So better not to use OnSaving method of the business objects. You can create a background service that will periodically run embedding service for newly created records. Here we will use a view controller and will create an action to generate embedding on demand for selected records.
Check AI Services controller code. Check the Find Similarity action code. Here I am checking similarity and taking top 5 objects and also I used threshold of 0.5. You can increase the threshold value to your needs. This needs to be tuned for better results.
How to Test the App
-
Downlaod the project.
-
Run it or create your own project and download codes.
-
Run the application, go to knowledge note List View.
-
Select a Knowledge and press Get Embeddings.
-
Thus generate embeddings for all notes. Finally select a note and press Find Similarity.
There are many other ways to extend this demo. Implementing AI techniques in your existing XAF application is going to be a much prioritized work in coming times ahead.