What is SharePoint Syntex?

Nipuna Weerasinghe

Lead Consultant

SharePoint Syntex is a Microsoft 365 service that uses advanced Artificial Intelligence (AI) and Machine learning (ML) to harness an organization’s expertise and content into knowledge. It works by allowing Machine Learning to understand, identify and provide intelligence to the content by extracting key information from documents and adding appropriate tags automatically.
This provides an organisation the capability to find and manage their business contents easily and convert that into knowledge at scale. It allows them to streamline everyday business processes and tasks while reducing compliance and security risks by applying sensitivity and retention labels automatically.

Types of AI models in SharePoint Syntex

Currently, Microsoft provides two types of AI models in SharePoint Syntex.

1. Document understanding model.

This type of model is used when you want to extract information from unstructured documents such as SLA (Service Level Agreements), SOW (Statements of Work), letters, etc where the text entities you want to extract reside in sentences or in a specific section of the document.
For example, an SLA can be written in different ways, but information exists consistently in the document such as Service Level Agreement (this “Agreement” or this “Service-Level Agreement”), effective as of followed by an actual agreement start date, This Service Level Agreement end date is followed by an actual end date of the agreement and is made by and between followed by the agreement party’s name.
These models are created and managed in a SharePoint content center site.

2. Form processing model.

This model is used when the information is documented in a structured or semi-structured manner that follows a pre-arranged format like a tax Invoice and custom client build forms. Generally, in a tax Invoice, the entities such as client address, tax Invoice number, etc are in the same location on the Invoice.
These models are created in PowerApps AI Builder, but the creation is initiated directly from a SharePoint document library.

The Business use case
Irrespective of the industry, documents such as SLA, Contracts, SOW, Invoices, Forms etc are produced that contain business critical information.
Handling hundreds of these documents between clients and the business, there is the need to uniquely categorise them while extracting key information such as client’s name, client’s email, SLA agreement due date, SoW start & end dates, total value of the SOW, etc, to create different types of automation process such due date notifications as defined in the SLA and apply sensitivity and retention labels automatically.
I’ll walk you through a business use case that needs to identify SLAs and SoWs and show you how you can achieve the business requirement with SharePoint Syntex.
NOTE: I’ll be using the document understanding model type to create and train the AI model and extract the required information from the SLA and SOW documents.

So how do I do this?

Step 1 – Subscribe and activate SharePoint Syntex.

Log into your Microsoft 365 organization account through this link.

Figure 1

Once you sign up for a trial version or into a valid subscription, activate the feature in Microsoft 365 admin center.
Go to settings –> Organizational knowledge and click on the “Automate content understanding” as shown below in figure 2.

Figure 2

Once you see the welcome screen, click on the get started button and select the configure form processing option as shown figure 3.

Figure 3

Step 2 – Create Content Center site

After successfully activating SharePoint Syntex, create the Content Center SharePoint site to house the model as shown below in figures 4 and 5.

NOTE: You can create multiple Content Center based on your organisational requirement.

Figure 4

Figure 5

Once the site has been created, navigate to the brand-new SharePoint Content Center site as shown below.

Figure 6

Step 3 – Create content types (optional)

NOTE: – This step is optional as you can create a new content type while you create your model on the Content Center, so feel free to skip this step and move to step 4.
From the content type Gallery in the SharePoint Admin center, I’ll create two Content types.

Service Level Agreement
Statement of Work

I will publish the content types to the hub site called Customer-Hub to which I have associated all my individual client’s team site. I’ll then create the required site columns within the content type and publish the content type as shown below in figure 7.

Figure 7

Repeat the same process to create the Statement of Work content type or you can use the same parent content type as two different children content type for SLA and SOW.

Step 4 – Create Models and training them.

Task 1 – Create models

Now I’ll create the SLA model called “Service Level Agreement” and associate it with the content type that I have created in the previous step (step 3). If you skip the step 3, select the “Create a new content type” under the Associated content type shown on the figure 8.

Navigate to the Models tab from the top navigation bar on the Content Center SharePoint site.
Click on the + Create a model to create the SLA model and select the “use an Existing Content type” option and select the Service Level Agreement content type under the existing content type.
Click Create

You will be able to see a newly created model in the document library on the Content Center with the extension called “Classifier extension”, “Service Level Agreement.classifier”.

Figure 8

Repeat the same process to create the SOW content type.

Task 2 – Add example files

In this task, I’ll upload sample SLA documents into the Training file library to train the model as shown in the following figures.
The minimum requirement is that you must upload at least five positive sample SLA and one negative sample. File types you can upload include PDF, JPEG, PNG, etc.

Figure 9

Figure 10

Figure 11

Task 3 – Classify the documents and run training

I’ll now add intelligence to the SLA content type by going through the documents and labelling them with an explanation as shown in figure 12.

Figure 12

Next, I’ll go through each of six files and mark each file as an example of the SLA or not by selecting the Yes or No as seen below in figure 13.

Figure 13

Once you complete this task, you will see your results as follows.

Figure 14

Task 4 – Provide an explanation

After identifying the documents correctly, I ‘ll start to train the model with my explanation as to why I have labeled some documents as Positive and others as negative. Explanations help the model distinguish the Service Level Agreements from other types of documents.

Figure 15

Microsoft has provided two options to create explanations.

Blank – In this option, as a user, we can create explanations using a blank template.
From a template – in this option, Microsoft has provided couple of predefined templates such as date, currency, number, SSN, credit card, etc to create the explanation.

I’ll use the blank option to provide my explanation as follows.

Figure 16

The following figure shows how I have created an explanation to identify the documents as an SLA type document.

Figure 17

After I add my explanations, I can now train these files by selecting Train Model.

NOTE: The more types of explanation you add will produce greater accuracy when the machine classifies the documents.
In the figure below, you will see the documents classification accuracy is 100, the documents evaluation is matched based on the two explanations.
In addition to the above explanation, I have added another explanation called “Contains other Key Words”. In this explanation, I have added keywords that must exist in the SLA documents (e.g., agreement, effective as of, Rider Agreement, Service Levels & Service Credits, Performance Monitoring), so when the machine classifies the documents, it scans through each document to see if these defined keywords exist in them.

Figure 18

Next, I’ll test the classification by uploading an SLA document and some other documents to see whether my model is smart enough to classify the document correctly.

Figure 19

Yes! my classification works as expected. Note how it has classified the SLA6.docx as positive and the other document as negative.

Task 5 – Train the model to extract the required metadata

Now let us extract the client’s name, start date and due date from the SLA documents. I’ll explain later why I have extracted these bits of information from the SLAs.
Click on Create Extractor -> Client Name and give the new entity extractor a name as seen in Figures 20 and 21.

Figure 20

Figure 21

I’ll go through the positive sample files and select the value within the document that I want to populate in the content type column. If the sample file is a negative sample, then select tick on the “No label”, otherwise for positive samples, (which is true in this case, BBQ Ptv Ltd, is the client’s name) click the Save button.

Figure 22

I’ll then train my model by adding an explanation as to why I have decided why BBQ Ptv Ltd is the client’s name in the above agreement. The client’s name in this document is followed by the keyword “is made by and between” and before the word “, with.”

Figure 23

Once you save the explanation, you will see the evaluation and the accuracy of the explanation as follows.
We see from Figure 24, that if your explanation is identified correctly, the label you have added marked in Blue and the label added by classification marked in Green are married together.

Figure 24

I have added similar extractions for the Start date and due date. The list of extractions and the accuracy are shown in the figure below.

Figure 25

Task 6 – Apply model to libraries

In previous tasks, we have added intelligence to our SLA documents and now it’s time to apply the model to a SharePoint document library to see how it works in the SharePoint library.
First, I’ll apply the trained model to a document library as shown below and upload some new SLAs and see how the model behaves.

Figure 26

I have uploaded a couple of SLAs and in a few minutes, we can see that the Content type and the other metadata are applied to the SLA as shown below. wonderful, isn’t it?
The user uploads the document, the Model identifies the document type and extracts the metadata and populates the columns automatically, perfect!

Figure 27

Create the SOW model by repeating Task 1 through Task 5.

Figure 28

Task 7 – Apply compliance

We have now tested and experienced how the model works and how its extracts information automatically from the documents and how it is set as metadata on the document. To add icing to the cake, Microsoft has integrated Microsoft Compliance feature with Syntex, so you can apply compliance labels automatically to the document using syntax, let us see how it’s done.
First, I have set up the retention and Microsoft 365 sensitivity labels on the Microsoft 365 security center and then published these labels so I can use it on my Model.

After setting up the compliance labels, I ‘ll use the labels in the model as follows.
Click on the gear icon next to “Model settings” and apply the appropriate retention and sensitivity labels as shown below and reapply the model to the library.

Figure 29

Once you successfully apply the retention and sensitivity labels, you will see it in the document library as highlighted in the figure below.

I have set up the DLP policy to be applied automatically when a document gets classified with Highly Confidential\ Finance; this will make sure to stop oversharing highly confidential Finance information.
I have defined the document as a record with my Retention label to ensure no one can delete or edit the document when the retention labels are added. But, if you make your document as a record, then Syntex cannot apply the sensitivity label as the document gets locked by retention/record label. I have raised this issue/ concern with Microsoft and hope they will change the order of applying labels, first sensitivity label and then retention/record label.
The Sensitivity label allows me to apply encryption, sharing, and conditional access policies to the documents that my models identify.

Figure 30

The image below shows what happens when you try to delete an SLA document that has a retention (where defined as a record) label applied with to it.

Figure 31

Now the sky is the limit! You can use this metadata for different purpose such as to create alerts, notifications etc.
As an example, you can use power automate to check the End date of the SLA, check a month before a particular SLA is at its due date, send a notification to the team managing the SLA via MS Teams, so every on that team are made aware of the due date of the SLA and take the necessary action to renew it. Additionally, if you capture the client’s email you can send an email automatically to the client regarding the SLA due date. Isn’t it great to not enter metadata manually, no more manual checks and all notifications and reminders are automated? As soon as you upload the SLA to the SharePoint library, the document gets automatically labelled with retention, DLP and sensitivity labels to comply with the organisational compliance requirements.
I’ll wrap up this post with a summary of the advantages of SharePoint Syntex.

Adds an intelligence to your documents.
Extracts key piece of information and populate as metadata.
Non office documents such as PDF, PNG, JPG can also be tagged with retention and DLP label/policies.
Integrats with M365 compliance labels.
Enhances the searchability by adding metadata into the documents.
Extracted metadata can be used to initiate workflows.
It is automated information and security governance integrations.