How FileHold searches for documents

Users can only retrieve documents to which they have access. Access is based on user rights and permissions. Therefore, in order for a user to see a document in their search results they would have to be a member of the cabinet, folder, and schema that the document belongs to.

The following are how the document management system conducts searches for documents:

  • The "Contains FTS" operator searches metadata, full text file content, and file properties. It searches whole words only, but you can search for partial words using wildcards in addition to creating very complex search expressions.
  • The "Contains DB" operation only searches metadata and version properties. It does not have any concept of words, but words, partial words, and phrases can be searched using wildcards.
  • Searching specific metadata fields or version properties typically produces more accurate results than a full text search.
  • You can get faster results if you do not use the contains operator since the full text index will not be searched.
  • When the user invokes search functionality by right clicking on a cabinet, drawer, folder group, or folder, the search will default to included only documents in that portion of the library.
  • By default contents of the library archive are not included in search results. Select the Include in Archive check box in the Advanced Search options to expand the search to include the Library Archive.
  • If using a simple search or the Boolean search, the search engine ignores all document fields such as those created by file properties.

  • Documents that have been soft or hard deleted from the system are not included in search results.
  • By default, only the latest version of a document is searched. The document usage history and document version history are not included in the search scope. To expand the search to include all document versions, select the Include All Document Versions check box in the advanced search options.
  • Search results always come from the contents of folders. My FileHold results, search results, virtual folders, and document tray contents are not searched as these are only temporary links to the documents in library or library archive folders.
  • Metadata field names that have been edited or deleted are not searched. To search using old metadata field names, select the Search Using Historical Metadata Fields check box in the advanced search options.

When you create a search with multiple criteria any search criteria using the contains fts operator are executed first. The results from contains searches are then combined with the results from any other search criteria to produce the final result set. If the contains fts search criteria are very broad it may cause the search to take a long time even if other search criteria would narrow the search. If the criteria is too broad the search may timeout and produce no results. Regardless of how many contains fts searches there are they are all executed at one time. By default, this first phase of a search using a contains fts operator is not limited in the number of documents that can be returned. Contains fts searches that can return very large numbers of documents can have a negative impact on overall system performance. This default can be adjusted.

General usage

  1. Avoid searching for two (2) letter words alone; instead use three (3) letter words or simply use a wild card  *  before or after the 2 letters. Two letter word searches are not searchable for several technical reasons:
  • The two letters you are searching for can appear in a portion of a word, or very commonly on their own, these if used in the search engine criteria, would greatly slow down and impede the performance of the FileHold search system.
  • In addition, they can also affect the quality of search results because they can appear inside a larger word or on their own with great frequency. This throws off search quality to unacceptable levels. The document management system is designed to ignore common words or characters as they tend to slow searches without improving the quality of the results. See the table below on Noise Words.
  • For example, if you wish to search for "ATCommands" when you are looking for a document that describes "ATCommands for communicating with a cellular phone or communication hardware device", then we recommend using ATC* in the search bar. As a side note, the single word "at" is ignored by the full text search engine.
  1. Use whole words when searching for words inside of documents. Simple search acts like a "Google" search and partial words may not return the results you are looking for or may not return any results. If you need to use partial words, use a wildcard (*, ?,) with the search.
  2. If you use a very generic search term, then the search engine may time out. Try to look for unique words or terms or limit the search to a specific area in the Library.

Full text search - Wildcard search limitation setting

When a wildcard search (*) is performed or is used in conjunction with the “contains” operator in an Advanced search, using a common term in the repository may return a very large number of intermediate search results that must be processed. A parameter limitation can be set on the number of results returned when a search like this has been performed. Care should be taken not to set a value that makes any typical searches impractical. For most users this value does not need to be changed from the default which returns all specified results that can be found before the full text search timeout expires.

The new entry in the web config file in C:\Program Files\FileHold Systems\Application Server\FullTextSearch is under <appSettings>:

<add key="LimitNumberOfEntriesToReturn" value="0" />

If the value is set to 0, then the limitation is disabled. It is disabled by default.

If a wildcard full text search exceeds the limit set in the web config file then a message is displayed:

“Your search with the CONTAINS operator would consume more server resources than allowed by your system administrator. Consider narrowing your search conditions to reduce the possible results. Your search would have returned {x} documents and your system administrator has set a limit of {x} documents.”

Noise words (Ignored words)

A noise word is a word such as "the" or "if" that is so common that it is not useful in searches. To save time, noise words are not indexed and are ignored in index searches. All single letters are ignored and include the list of words in the table below.

Letter

Noise Words

A

a, about, after, all, also, an, and, another, any, are, as, at

B

be, because, been. before, being, between, both, but, by

C

came, can, come, could

D

did, do

E

each,  even

F

for, from, further, furthermore

G

get, got

H

had, has, have, he, her, here, hi, him, himself, how, however

I

i, if, in, indeed, into, is, it, its

J

just

L

like

M

made, many, me, might, more, moreover, most, much, must, my

N

never, not, now

O

of, on, only, or, other, our, out, over

S

said, same, see, she, should, since, some, still, such

T

take, than, that, the, their, them, then, there, therefore, these, they, this, those, through, thus, to, too

U

under, up

V

very

W

was, way, we, well, were, what, when, where, which, while, who, will, with, would

Y

you, your