Creating XML import files with Powershell

For some time, the FileScan Bridge has offered FileHold customers a standard feature for importing documents with metadata. Known as Manage Imports in the desktop client, this feature makes it easy to integrate with image processing applications such as SmartSoft Capture™, Kodak Alaris Capture Pro™, Abbyy FlexiCapture™, and others. It also opens the door for any software that can properly format an XML file to describe the documents that will be imported to FileHold.

We have a number of examples in our knowledge base of creating these XML files using Microsoft Excel™ or the Windows™ command language, but one of the best ways to create XML files for Manage Imports is with Powershell. This scripting tool is the long-term strategy for scripting with Windows and Microsoft improves it with every Windows version.

There are two styles of XML files that Manage Imports can consume, but for the sake of this article, we will focus on the one that has been around the longest and will work will all versions of the desktop application. In these XML files, the root node is named Batch. The file can describe one document or many. Each document starts with a Page node and each page contains one or more Field nodes. A field is made up of a Name and Value. At a minimum, each document must have a File Name field, but it can contain zero or more index or metadata fields.

If we assume a document with three index fields invoice, vendor, and total, the XML file might appear as follows:

<?xml version="1.0"?>
<Batch>
    <Page>
        <Field>
            <Name>File Name</Name>
            <Value>C:\MyFiles\MyImage.tif</Value>
        </Field>
        <Field>
             <Name>Invoice</Name>
            <Value>I564849</Value>
        </Field>
        <Field>
            <Name>Vendor</Name>
            <Value>Acme Speed Works</Value>
        </Field>
        <Field>
            <Name>Total</Name>
            <Value>5412.99</Value>
        </Field>
    </Page>
</Batch>

Suppose the invoices we are scanning contain a QR Code™ barcode and that barcode contains a list of our three example fields separated by commas. We could automate the process of getting documents into FileHold given three things: a program to read the barcode, a Powershell script to create the XML, and a Manage Imports profile to watch a folder and automatically file the documents it finds. For the remainder of the article, we assume the reader has some familiarity with Powershell and Manage Imports.

Image
QR code example

There are plenty of programs that can read QR Code barcodes, but ZBar is a good one to experiment with as it is open source and available with no license costs. Setting up a Manage Import profile is a simple task, so we are left to create the Powershell script.

ZBar reads barcodes by providing an image file with a few optional command line parameters. The most basic usage is as follows:

zbarimg.exe MyImage.tif

Given our sample data, if the barcode contains “I564849,Acme Speed Works,5412.99” (without quotes) we should expect output like the following:

QR-Code:I564849,Acme Speed Works,5412.99
scanned 1 barcode symbols from 1 images

We can simplify this output to make it easy to process as a simple comma delimited list (Powershell loves those) by adding the raw and quiet output options.

zbarimg.exe –-raw -q MyImage.tif

This will just provide us with the data.

I564849,Acme Speed Works,5412.99

Now our job with Powershell is simple. Convert the delimited list above to the Manage Imports XML format. Powershell plays nicely with dot net, so we can take advantage of the System.Xml.XmlTextWriter. This lets us build the XML file in a stepwise fashion. This is good when we have many documents we would like to include in the same file or many fields in the same document. For this simple example, we will just create a file with one document and our three fields, plus the filename. Note there are other ways to create XML files in Powershell, this one just happens to work nicely for our purpose.

The first part is to start the XML file. We will add some formatting instructions to make the file easy for a human to read in a text editor.

$global:xmlWriter = New-Object System.Xml.XmlTextWriter( ‘MyImage.xml’, $null )
$global:xmlWriter.Formatting = ‘Indented’
$global:xmlWriter.Indentation = 1
$global:xmlWriter.IndentChar = ‘`t’
$global:xmlWriter.WriteStartDocument()
$global:xmlWriter.WriteStartElement( ‘Batch’ )

We will create a general-purpose function called AddField for outputting fields as we will need more than one (four in our example). It will take two parameters, name and value, and assume we have already setup the XML writer object.

function AddField {
        param( [string]$Name, [string]$Value=’’ )
        $global:xmlWriter.WriteStartElement( ‘Field’ )
        $global:xmlWriter.WriteElementString( ‘Name’, $Name )
        $global:xmlWriter.WriteElementString( ‘Value’, $Value )
        $global:xmlWriter.WriteEndElement()
}

Now we need to read our comma delimited list from the barcode and put it in a variable we can process into the separate fields. We will do this in one step using two Powershell features. First, we will execute ZBar using Invoke-Expression and we will pipe its output to ConvertFrom-Csv. This later cmdlet takes our comma delimited list and stuffs it into a Powershell object we will use in a moment. We will use the Header parameter define the names for these three fields.

$CsvHeader = @( ‘Invoice’, ‘Vendor’, ‘Total’ )
$command = ‘zbarimg.exe --raw -q MyImage.tif'
$csvRecord = Invoke-Expression $command | ConvertFrom-Csv -Header $CsvHeader

Now we have our barcode in the object $csvRecord, so it is time to write the meat of the XML file. We will start with creating a new document with the page node, then we will write our image filename, finally we will create the three index fields in a little loop using the AddField function we created above. We use a little Powershell trickery to get the names and values of the fields from the object in a loop.

$xmlWriter.WriteStartElement( 'Page' )
AddField -XmlWriter $global:xmlWriter -Name 'File Name' -Value ‘MyImage.tif’ -ErrorAction Stop
foreach ( $fld in ( $csvRecord | Get-Member -MemberType NoteProperty | select -Property Name ))
{
    AddField -XmlWriter $global:xmlWriter -Name $fld.name -Value $csvRecord.($fld.name)
}

All that is left now is to close off the XML file.

$global:xmlWriter.WriteEndElement()
$global:xmlWriter.WriteEndDocument()
$global:xmlWriter.Flush()
$global:xmlWriter.Close()

And, that is it. A more complete example Powershell script is attached at the end of the article. It will take a folder location and traverse it and any subfolders looking for images. When it finds them, it will look for barcodes, extract the data in a format like the samples above, and create an XML file for each image. Of course, it is just an example. We do not offer any support for it, though our professional services team or your own Powershell experts can help you prepare something similar to match your own needs.

ConvertQR2Xml.zip

Image
Russ Beinder

Russ Beinder is the Chief Technology Officer at FileHold. He is an entrepreneur, a seasoned business analyst, computer technologist and a certified Project Management Professional (PMP). For over 35 years he has used computer technology to help organizations solve business problems.