Cloud and Microsoft technologies enthusiast architect in Switzerland RSS 2.0
# Sunday, January 05, 2020

In the previous post, we saw how to create and train a Form Processing model, a tool released in preview last June and available in the AI Builder box of Power Automate.

In this second post, we are going to use that model in a Flow, so that it will take files from a specific repository, analyze and extract data from it, and store it in a final location associated with the extracted data

The source where the files will be picked up is going to be a OneDrive folder whereas, on the other end, the destination will be a SharePoint Online document library.

As such, in this post we will :

  • Create a SharePoint Online document library able to store the file's metadata
  • Create the Flow
  • Test the end-to-end scenario

From the previous post, the following document properties will have to be stored, and thus, the corresponding columns in the document library (for simplicity, all of them are going to be a "Single Line of Text" type of property) :

  • Last Name
  • First Name
  • Address
  • Email
  • Cost
  • Reason

For the source, just create a OneDrive for Business folder.

Create a Power Automate Flow

Switch to Power Automate and create a new solution, or use an existing one, give it a name, select a publisher and set its version (all these properties are mandatory). In fact, AI Builder models can only be used from within a solution-aware Flow (reference here).

Once in the selected, or created, solution, we can add a new Flow.

As a trigger, we select the "When a file is created" from the "OneDrive for Business" category. It needs to have the permissions to read the files in the configured directory, thus, credentials may be asked to get access to the folder.

This trigger will thus launch the Flow whenever a new file arrives in this folder.

When the Flow gets the file, it is time to use the model created in the previous blog post. Let's select the "Predict" action from the "Common Data Service" group, we can see that it is indeed a Premium feature. In the configuration of the "Predict" action, we can now select the model we have built and named, in this case, "My Request Form Model". For the "Request Payload" JSON parameter, we have to take the content of the HTTP request's body, which gives the following :

{

"base64Encoded": "string(triggerBody()?['$content'])",
"mimeType": "application/pdf"

}

The call to the model will return a JSON response that requires to be parsed. Obviously, we should use the "Parse JSON" action to extract the different value from the response. But, for this, we must specify the schema, and the best way to get it is to provide an example of the data returned by "Predict".

We can have such response by saving the Flow as it is now, and to test it. Let's click on "Test" and using an example of a form to be dropped in the OneDrive folder. After several seconds, the Flow will run, and hopefully, succeed. If we click on the execution history, we can see the different inputs and outputs for all the executed actions. Then, if we take the "Predict" action, we can copy the "Output" of it in the clipboard.

Now, we go back to the edition of our Flow, and add the "Parse JSON" action, set the "Response Payload" as a "Content" parameter.

For the "Schema", we click on the "Generate from sample" which will pop-up a window where we can paste what we have in the clipboard. After clicking the "Done" button, the "Schema" parameter is filled.

Everything should be fine would we say.

But, unfortunately not. It happens that the schema generator is misleading by the data we give to it. A typical example is the "confidence" value. If all your fields during the analysis of the model returned 100% of confidence, it means the data returned will be equal to "1". The consequence is that the schema generator will interpret it as an integer, which is wrong. It is enough that once the model returns a confidence of less than 100% to prevent any schema validation. So, it is necessary to replace all confidence fields of type integer by number, to be safe.

We are done with the "Parse JSON", now, let's create the file.

For that, we will use the "Create file" action from the "SharePoint Online" group, then we select the "Site Address", the "Folder Path" (which is nothing but the target document library), and we set the "File Name" and "File Content" parameters to the corresponding attributes of the source file.

This action can't update the properties of the file, leading us to add another action called "Update file properties" from the same "SharePoint Online" group.

As for the previous action, the "Site Address" and the "Library Name" must be set, but rather than specifying the file name, we have to use the "ItemId" from the previous action to define the file that needs to be updated. The columns added to the library are automatically present as parameters of the action, but the difficulty here is to find the right "value" to set in each of these fields. There is apparently a bug as the value selector only displays the "value" attribute's name, which is "value", instead of taking the parent attribute name that would be "LastName", "FirstName" and so on. Nevertheless, before picking randomly a value and hoping for the best, we can make a quick check once we have picked a dynamic field. By hovering the "value" field with the mouse, a tip is popping up, displaying the complete expression, containing the real attribute name (here "LastName").

Finally, once all these file attributes are set, it is time to save the Flow, and test it. Just drop an PDF form in the OneDrive for Business folder, wait for some seconds or minutes, and the file should appear in the SharePoint Online document library with the correct metadata. Or, almost.

As can be seen in the screenshot, the LastName property is not well extracted, and this comes back to my first suggestion for improvement where the field zone should be manually adjustable to resize the location where the data can be. In this case, we could see that the "LastName" zone was far too large, leading to a property that is not only containing the name, but also part of the form instructions.

But, otherwise, we can see that the document has the correct properties taken from the content of the PDF.

Sunday, January 05, 2020 9:26:10 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] -
Power Automate | SharePoint
# Sunday, December 29, 2019

Last June, Microsoft introduced AI Builder, which is an artificial platform that can be integrated in a Power Platform low-code application. It is still in preview, and since that day, I wanted to give it a try and find an interesting use case to test these new features.

In the context of automation of office document and application processing, AI has perfectly its place and we can see that Microsoft is making important investments in that area.

The use case I took for this post is to automate the processing of request forms in order to extract the data from them and to use them as metadata in SharePoint. Of course, these metadata could directly be used to make decision or to go further in the processing of a request, but let's keep simple first.

Thus, the steps are :

  • Get the file
  • Analyze the file and extract the data
  • Upload the file in SharePoint
  • Update the file with the metadata

This post and the next one will then guide through the steps to create this flow

Create a Form Processing Model

The first thing to do is to create a Form Processing model that will be used later (in the next post) by the Flow.

At the time this post is written, Microsoft proposes 4 models ready to be used :

  • Form Processing to read and process documents or forms
  • Object Detection to analyze images
  • Prediction in order to give an idea of what will happen, based on past data
  • Text classification in order to analyze its meaning or even make sentiment analysis

The starter model to use in this case is of course the Form Processing one.

These models are premium features, which means that you'll probably have to activate them or start a 30-days trial period. Unfortunately, after the 30 days of trial, it seems that the models wouldn't be usable anymore

Creating a Form Processing model is done in several steps, but before starting the creation of the model, you will need a minimum of 5 forms to train it. It is expected that the forms have the same layout. Indeed, the model will try to identify the zones were the data is, therefore, these data must be in the same locations on the form. An advantage is that it is also able to identify zones where hand-written inputs are present. In other words, the OCR seems working pretty well.

If the forms are already available, when starting the wizard, the very first step of the Form Processing model creation is simply to give it a name. At this stage, there are also some explanations of the constraints for the model to work, like the one mentioned just before about the layout.

The next step asks to upload the minimum 5 forms that will be used to identify the zones where the data is. To have a better result, I would advise to use the best forms available, in terms of clarity, contrast, etc, which would help the model in the identification of the interesting zones.

After this step, there is still a way to upload other samples of the forms, but the limit is set to a total upload of 4MB. This limit, for the time being, seems a bit low, but, with 5 or 6 documents, the results are already good. According to this post, it is a feature that will be improved.

The next step is the analysis of the uploaded documents that will lead to the selection of the fields we want to get from the forms.

It shows one of the uploaded forms, with the discovered zones of data and a percentage of accuracy estimated by the model. As the example shows, almost all the fields have been correctly identified, except the first one, where the zone is a bit too large.

Here, I would make one suggestion of improvements : either we should be able, at this stage, to come back to upload additional samples or at this stage, or, to be able manually to resize the zones. This is important, because having a zone too large will make the extraction of data less accurate (we will see what it means later in the second post). Apparently, this is something that will come.

In addition to selecting the fields, we can give and change the names of these fields.

Once this is done, we are back to the list of models we have created, and what remains to be done is to make our model available for use. This step is called publishing. A click on the model will allow us to either make a quick test by uploading a form sample, or to publish it.

Here, I would also have a to be able to update the model, for example after having improved it and to be able to publish further versions of the model.

As we can see, creating a model to process forms is quite easy, thanks to the AI in the background, making these capabilities at our fingertips. It is still in preview and no doubts they will be perfectioned in a near future, but it shows that automation is something taken seriously at Microsoft. The process of creating a model and how it can be used in Flow is really easy and makes it accessible also for the users that don't have too much technical knowledge in artificial intelligence.

We will see in the next post how to use it in Flow, in order to automate the processing of PDF documents.

Sunday, December 29, 2019 9:10:26 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] -
AI | Power Automate
Google Cloud Platform Certified Professional Cloud Architect
Ranked #1 as
French-speaking SharePoint
Community Influencer 2013
Navigation
Currently Reading :
I was there :
I was there :
I was exhibiting at :
I was there :
I was a speaker at :
I was a speaker at :
I was a speaker at
(January 2013 session):
I was a speaker at :
I was a speaker at :
United Nations (UN) SharePoint Event 2011
I was a speaker at :
I was there !
I was there !
I was there !
I was there !
Archive
<April 2020>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2020
Yves Peneveyre
Sign In
Statistics
Total Posts: 289
This Year: 1
This Month: 0
This Week: 0
Comments: 19
Themes
Pick a theme:
All Content © 2020, Yves Peneveyre
DasBlog theme 'Business' created by Christoph De Baene (delarou)