Search Engine Errors

The Search Engine Errors report proactively warns Librarians and System Administrators about documents that are not capable of being indexed due to encryption, macro security, or digital rights management. For example, a PDF can be encrypted so that the text inside the document is locked and cannot be searched. Another example is a Microsoft Excel worksheet that is protected with a password or macro level security. In this case, the search engine would also be blocked from indexing the contents of the file. Note that metadata and title searches still work on encrypted files.

This report can also warn of access permissions or other technical IT issues related to full text search operations. The Domain\FileHold Service account that runs the entire FileHold server system needs to have full control of the FullTextSearch collection. The report can warn if file permissions change and do not allow the FullTextSearch collection folder structure to be accessed.

The Full Text Search report is emailed out nightly via a scheduled task from the FileHold Server via SMTP to your email server for delivery. See Search Engine Configuration on how to set the email address.

Not all alerts are cause for action. These alert emails can include the following types:

  • Files that are not capable of being indexed for a variety of reasons.

  • When a user searches for a word or phrase that is in the majority of the document collection in terms of full text search  - i.e, an overly common word -  that is in the body/contents of a file - there may be a message that looks like this

    • Search Job error(s):
      / $E 0137 Too many words retrieved in index E:\FileHoldData\FullTextSearch\DTSIndex a*:2: 65530; financial: 6

    • In this case - a user was searching for "a* financial* - which meant that virtually every document in the 250,000 document repository because the word financial was in almost every document (they are an investment company) and a* is using the world card - so that meant that any letter "a" near the word financial is a candidate.

    • A user trying to search using a single character with a wildcard would be given the message/warning "The full text search query has invalid syntax".
       

  • There are also errors that log that at a specific point in time, that something in the FileHoldData repository could not be accessed by the FH_Service account.

    • The DTSearch\FTS folder that contains the Full Text Search (FTS) index files cannot be accessed by the Service account that runs FileHold. Sometimes this is stored locally on the FileHold Web Server, and sometimes it is stored on a NAS or SAN. Permissions can change or there may be a network issue. You need to work with your IT department to make sure the Service account that runs FileHold has full control over this directory structure. You can quickly check what Service account name is, by going to the FileHold server and examining which account runs the FTS Update Index scheduled task, or other FileHold tasks. You can also check the SQL Server's security logins to confirm this, or the FH App Pool's account in IIS 6 or 7's administration console.

  • Missing full text search files. Antivirus and security systems, rarely, will sometimes remove index files. This happens rarely, but is the prime culprit. These files are heavily used by system and FileHold processes, and some Antivirus software can view heavy file activities as being suspicious and take action.

To manage search engine errors

  1. To hide known errors, click Hide / Hide All.
  2. To show all errors, click Show / Show All.
  3. To create a report of all errors, click Create CSV. Save the file and open in Microsoft Excel to modify the report or send to support@filehold.com for analysis.

 

See Also:

Rebuilding the Full Text Search

Search Engine Status

Un-indexed Files

Search Engine Configuration