SmartSoft Capture Part 3 – Using Regular Expressions to Help Make Scanning Smarter
If you’re having issues with capturing certain data, regular expressions may be able to help. So what exactly are regular expressions?
“A regular expression, regex or regexp (sometimes called a rational expression) is, in theoretical computer science and formal language theory, a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or using a string searching algorithm, i.e. "find and replace"-like operations.” Source: Wikipedia
Um, what?
Basically what that is saying is that a regular expression describes a rule or pattern. When you use regular expressions, the input text must match the pattern. If it doesn’t match, the data can be modified to according the rule – kind of like an advanced search and replace feature.
The regular expression pattern itself is written in a programming-type language. I don’t know too much about writing regular expressions but I once read that the code resembles expletives in comic strips. If you’re interested, there are a lot of resources on the internet that can help you understand how these are written and what they mean.
Using regular expressions in the form templates in SmartSoft Capture helps make the data capture processing stage more intelligent. When the software goes to acquire information from the page, the regular expression helps to ensure that the pattern matches the data that was captured which gives greater accuracy.
For example, a regular expression for a date can be used in order to verify the date format. If the date format rule is mm-dd-yyyy, the regular expression would be:
^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d$
I have encountered a few situations with some customers where Capture was not accurately capturing certain data. For example:
- Putting spaces in between the digits in a Proof of Delivery number when there were no spaces.
- Changing 1’s into |’s (pipes) in a Parts Invoice number.
- Not capturing any data after the colon in a Purchase Order number that was formatted like “X101092079:01”. The pattern is a letter, several digits, colon sign and then more digits.
For each of these issues, using a regular expression in the form template helped immensely. Capture was able to acquire the values more accurately and required less manual intervention by the user.
To use regular expressions in a form template, enter the regex code in the Format Expression column in the form template designer.
In the example above, the regular expression was entered as:
[^|Il]*
Which means:
- The square brackets [ ] denote a "character group", meaning that any character inside this group is accepted in its place.
- The caret ^ denotes to negate the following letters in the character group; | (pipes), I (capital i) and l (lowercase L).
- The asterisk * means that the character group can be matched zero or more times
So in the case of the Parts Invoice number, no pipes, capital I’s, or lowercase L’s will be captured in the value. Smart, right?
Stay tuned to the FileHold blog for more SmartSoft Capture tips and tricks!