Full text search (boolean and raw query)
Many complex search expressions are possible when using the "Contains in FTS" operator. These are processed by the full text search engine. When you check the boolean option or you choose the raw query search type, you have more flexibility and power than with the simpler basic query.
A "boolean" search request consists of a group of words, phrases, or macros linked by connectors such as AND and OR that indicate the relationship between them. For boolean searches, enable the Boolean search check box in the advanced search screen.
Boolean or raw query examples
What you are looking for | The query that will find it |
---|---|
Documents that must contain "apple" and "pear" and one of "banana" or "mango". Parenthesis ensures the precedence of logic operations. Deepest nesting is evaluated first. | (apple and pear) and (banana or mango) |
Documents that are likely to be invoices because the word "invoice" is on the first 20 words on the page and it contains at least one of "total", "subtotal" or "balance due". This also eliminates red herring documents where invoice is part of the name of a project document or contract. | invoice w/20 xfirstword and (total or subtotal or "balance due") and not ("solution design" w/100 xfirstword) and not (contract w/20 xfirstword) |
Documents with Manhattan, New York city area phone numbers. Parenthesis, periods, hyphens and other similar characters do not affect the search as they are ignored when indexing. | "212 === ====" or "332 === ====" or "646 === ====" or "917 === ====" |
Documents where the document name references a purchase order and telsa or rivian. | s_y_s4 contains (po or "purchase order" or po*) and s_y_s4 contains (telsa or rivian) |
Example searches | What they mean |
---|---|
apple AND pear | Both words must be present |
apple OR pear | Either word can be present |
apple w/5 pear | Apple must occur within 5 words of pear |
apple NOT w/12 pear | Apple must occur, but not within 12 words of pear |
apple AND NOT pear | Only apple can be present |
apple w/5 xfirstword | Apple must occur in the first five words |
Noise words such as “if” and “the” are ignored in searches. Contact FileHold Professional Services if you would like to change any of the noise words.
Search option characters
Such operators and logic including stemming, fuzzy, synonym and phonic can be used with “Contains in FTS” or ‘Does not contain in FTS” type searches. Search terms may include the following special characters to change the standard behavior of the search:
Character | Meaning | Examples |
---|---|---|
? (question mark) | Matches any single character. | appl? would match apply and apple but not apples |
= (equal) | Matches any single digit. | N=== would match N123 but not N1234 or Nabc |
* (asterisk) | Matches any number of characters. Use to search for a term where the spelling is in question or there are multiple possible spellings.
| appl* would match apple, application, etc. *cipl* would match principle, participle, etc. ap*ed would match applied, approved, etc. Using a wildcard like this is sometimes called "truncation". The first example would be right truncation. The second is both left and right trunction and the last example is middle truncation. |
% (percent) | Fuzzy search. Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors (such as emails), or for text that has been scanned using optical character recognition (OCR). | ba%nana matches words that begin with ba and have at most one difference between the word and banana. b%%anana matches words that begin with b and have at most two differences between the word and banana. |
# (hash) | Phonic search. Phonic searching will find matches for words that sound like the word you are searching for and begins with the same letter. | #smith will find Smithe and Smythe in addition to Smith. |
~ (tilde) | Stemming. Stemming extends a search to cover grammatical variations of words. Stemming does not slow searches noticeably and is almost always helpful in making sure you find what you want. Contact FileHold Professional Services if you would like to change the stemming rules for non-English languages. | apply~ finds apply, applying, applies, and applied. A search for fish would find fishing among other variations. |
& (ampersand) | Synonym search. Synonym searching includes synonyms of the word included in a search request. | A search for fast& would also find results with the word quickly. |
: (colon) | The weight character is used for preferring one part of a search term over another. This impacts the relevance column returned with the search results. It works with specific words or phrases and fields. | apple:5 AND pear:1 would count hits on apple as five times more relevant than hits on pear. (( DocumentMetadata:1 contains ((apple))) OR ( //text:4 contains ((pear)))) weights the relevance more heavily on the contents of the document (pear) than the contents of the metadata (apple). |
~~ (double tilde) | Numeric range searching. Integers values are supported from 0 to 2147483647. For purposes of numeric range searching, decimal points and commas are treated as spaces and minus signs are ignored. For example, -123,456.78 would be interpreted as: 123 456 78 (three numbers). Using alphabet customization, the interpretation of punctuation characters can be changed. For example, if you change the comma and period from space to ignore, then 123,456.78 would be interpreted as 12345678. | (invoice or inv or "invoice number") w/5 8455~~9105 would find text where the words invoice, inv, or invoice number appear within five words of the number range starting from 8455 to 9105. |
The default behavior for stemming, fuzzy, synonym and phonic searches are controlled by configuration as they can have a significant impact on search performance and or search behavior.
Words and phrases
To search for a phrase, use quotation marks around it, like this: "fruit salad"
If a phrase contains a noise word, the search engine will skip over the noise word when searching for it. For example, a search for statue of liberty would retrieve any document containing the word statue, any intervening word, and the word liberty.
Punctuation inside of a search word is treated as a space. For example:
- can't would be treated as a phrase consisting of two words: can and t.
- 1843(c)(8)(ii) would become 1843 c 8 ii (four words).
Connectors
AND
Use the AND connector in a search request to connect two expressions, both of which must be found in any document retrieved. For example, apple pie and poached pear would retrieve any document that contains both phrases.
A search for banana and pear w/5 grape would retrieve any document that (1) contains banana, AND (2) contains pear within 5 words of grape.
OR
Use the OR connector in a search request to connect two expressions, at least one of which must be found in any document retrieved. For example, apple pie or poached pear would retrieve any document that contained apple pie, poached pear, or both.
ANDANY
Use ANDANY to change the relevance of the results. The words before ANDANY are required, and the words after ANDANY are optional. For example, (apple and pear) ANDANY (grape or banana) would find only documents containing the work apple and pear, but grape and banana will also be counted as hits in those documents. A system of "best bets" or documents promoted in search results could be created by combining ANDANY with a metadata field built for the purpose. Assume there is dropdown a metadata field called "Search relevance" with one value "zxy_promote" and an internal field ID of 9. Now a search like (apple and pear) ANDANY (grape or banana) ANDANY m_f9 contains zxy_promote:1000 would cause the relevance column in the results to be increased in value for documents tag with the value.
W/N (proximity search)
Use the W/N connector in a search request to specify that one word or phrase must occur within N words of the other. For example, apple w/5 pear would retrieve any document that contained apple within 5 words of pear.
The following are examples of search requests using W/N:
- apple or pear w/5 banana
- apple w/5 banana w/10 pear
- apple and banana w/10 pear
The xfirstword term is useful if you want to limit a search to the beginning of a file. For example, apple w/10 xfirstword would search for apple within 10 words of the beginning of a document.
NOT and NOT W/N (proximity search)
NOT allows you to exclude documents from a search. For example: "apple sauce" AND NOT pear would return all documents with term "apple sauce" but did not include the word pear. NOT can be the first word in a search term or it can appear immediately following AND or OR.
The NOT W/ ("not within") operator allows you to search for a word or phrase not in association with another word or phrase. For example: apple NOT w/20 pear.
Unlike the W/ operator, NOT W/ is not symmetrical. That is, apple NOT w/20 pear is not the same as pear NOT w/20 apple. In the apple NOT w/20 pear request, it searches for apple and excludes cases where apple is too close to pear. In the pear NOT w/20 apple request, it searches for pear and excludes cases where pear is too close to apple.
PRE/N Connector
The PRE/N connector is like W/N, but it also requires that the first expression must occur before the second. For example, (apple or pear) pre/5 banana. You can specify target search terms near the end of the document with xlastword like apple pre/10 xlastword.
CONTAINS
When you index a database or other file containing fields, the search engine saves the field information so that you can perform searches limited to a particular field. For example, suppose that you indexed an Access database with a Name field and a Description field. You could search for apple in the Name field like this:
Name contains apple
The search engine automatically collects field information from: databases, META tags in HTML files, XML, and Office and WordPerfect document properties. FileHold also automatically defines full text index fields for each metadata field, the document name, one special field which is a combination of all metadata field values and the document name, and the document version ID.
- m_fn – Searches on a specific metadata field ID where n is the internal ID number of the metadata field. For example: m_f249 contains PO540069 where the ID of the purchase order metadata field is 249. This search is effectively the same as a metadata field search on the purchase order number using contains in FTS. It can be useful for cases where you want to use the OR connector on two metadata fields or other infromation in the full text index.
- s_y_s4 - Searches on the document name. For example, s_y_s4 contains "quarter end results". This search is effectively the same as a document name search using contains in FTS.
- DocumentMetadata – Searches all metadata. For example: DocumentMetadata contains "apple sauce" searches for the term “apple sauce” in all metadata fields and the document name. This search is effectively the same as a metadata only search for "apple sauce". The metadata fields and document name are all connected as a long string in this field. However, each field is separated by the arbitrary text ZQYXZ. This is intended to prevent strange results in rare cases. For example, if one metadata field ended with the word apple and the next one started with the word sauce, the earlier example would return a nonsensical result as the phrase "apple sauce" does not really exist. Since the actual value stored would be ... apple ZQYXZ sauce ... the phrase "apple sauce" would not accidentally be created during indexing.
- LMID – Searches on the internal document version ID. For example: LMID _2_0_1_ retrieves the document with document version 201. The document version ID is not normally user accessible information, but it can be useful for diagnosing issues with the the full text index.
- //text - Search only the contents of the file. For example, //text contains "lorum ipsom". This search is effectively the same as a file only search for "lorum ipsom".
You can also define a field on-the-fly at the time of a search by designating words that begin and end the field using TO. For example, (beginning to end) contains (something).
The beginning TO end part defines the boundaries of the field. The CONTAINS part indicates the words or phrases you are searching for in the field. The only connector allowed in the beginning and end expressions in a field definition is OR. This method for defining fields in the arbitrary content of the text is useful for isolating portions of regular documents like forms. For example, a form has a section titled "Name" and another titled "Address", a possible query might be (name to address) contains john smith (name to (address or xlastword)) contains (oak w/10 lane).
The field boundaries are not considered hits in a search. Only the words being searched for (john smith, oak, lane) are marked as hits.
Punctuation from field names is removed when indexing, with these six exceptions: :&_+=.. Spaces are also removed and hyphens are mapped to underscore characters.