File formats supported for full text searching

The following file formats are supported for full text content indexing and searching for documents in the document management system.

A Windows administrator can narrow the list of indexed document types. For example, it may not be effective to index the binary content of EXE file or the rows in a database.

Supported file formats

Adobe Acrobat (*.pdf)

Adobe Framemaker MIF (*.mif)

Ami Pro (*.sam)

Ansi Text (*.txt)

ASCII Text

ASF media files (metadata only) (*.asf)

CSV (Comma-separated values) (*.csv)

DBF (*.dbf)

EBCDIC

EML files (emails saved by Outlook Express) (*.eml)

Enhanced Metafile Format (*.emf)

Eudora MBX message files (*.mbx)

Flash (*.swf)

GZIP (*.gz)

HTML (*.htm, *.html)

Ichitaro (versions 5 and later)

JPEG (*.jpg)

Lotus 1-2-3 (*.123, *.wk?)

MBOX email archives such as Thunderbird, including attachments. In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.

MHT archives (HTML archives saved by Internet Explorer) (*.mht)

MIME messages, including attachments. In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.

MSG files (emails saved by Outlook), including attachments (*.msg). In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.

Microsoft Access MDB files (*.mdb, *.accdb, including Access 2007 and Access 2010)

Microsoft Document Imaging (*.mdi)

Microsoft Excel (*.xls)

Microsoft Excel 2003 XML (*.xml)

Microsoft Excel 2007 and 2010 (*.xlsx)

Microsoft Outlook data files, including attachments. In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.

Microsoft Outlook/Exchange Messages, Notes, Contacts, Appointments, and Tasks

Microsoft Outlook Express 5 and 6 (*.dbx) message stores

Microsoft PowerPoint (*.ppt)

Microsoft PowerPoint 2007 and 2010 (*.pptx)

Microsoft Rich Text Format (*.rtf)

Microsoft Searchable Tiff (*.tiff)

Microsoft Word for DOS (*.doc)

Microsoft Word for Windows (*.doc)

Microsoft Word 2003 XML (*.xml)

Microsoft Word 2007 and 2010 (*.docx)

Microsoft Works (*.wks)

MP3 (metadata only) (*.mp3)

Multimate Advantage II (*.dox)

Multimate version 4 (*.doc)

OpenOffice versions 1, 2, and 3 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications)

Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw)

QuickTime (*.mov, *.m4a, *.m4v)

RAR (*.rar). RAR support currently applies to the Windows version.

TAR (*.tar)

TIFF (*.tif)

TNEF (winmail.dat files)

Treepad HJT files (*.hjt)

Unicode (UCS16, Mac or Windows byte order, or UTF-8)

Visio XML files (*.vdx)

Windows Metafile Format (*.wmf)

WMA media files (metadata only) (*.wma)

WMV video files (metadata only) (*.wmv)

WordPerfect 4.2 (*.wpd, *.wpf)

WordPerfect (5.0 and later) (*.wpd, *.wpf)

WordStar version 1, 2, 3 (*.ws)

WordStar versions 4, 5, 6 (*.ws)

WordStar 2000

Write (*.wri)

XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf)

XML (*.xml)

XML Paper Specification (*.xps)

XSL

XyWrite

ZIP (*.zip)

Automatically-detected fields

FileHold document management software automatically detects fields in the following file formats:

File format

Fields

Email files (Outlook Express, Eudora, MBOX, EML)

To, CC, BCC, From, Sent Via, Sender, Recipient, Subject, Date, Attachments

Outlook items and .MSG files

To, CC, BCC, From, Sent Via, Sender, Recipient, Subject, Date, Sent Date, Delivered Date, Attachments, contact fields (StreetAddress, CompanyName, etc.)

Microsoft Word, Excel, PowerPoint

Document summary information fields

OpenOffice/Open Document Format

Document properties fields

HTML

META tags; <TITLE> is indexed as HtmlTitle field; <H1>, <H2>, <H3> are indexed as HtmlH1, HtmlH2, HtmlH3, etc.

XML

All fields

DBF

All fields

CSV

All fields (CSV, or comma-separated values, files must have a .csv extension, a list of field names in the first line, and must use tab, comma, or semicolon delimiters)

PDF files

Document Properties

WordPerfect

Document summary information fields

MP3

All metadata fields

JPG, TIFF

EXIF and IPTC metadata fields; XMP (Vista) metadata supported in version 7.40

ASF, WMA, WMV

All metadata fields

Other file formats supported

FileHold document management software will still index, search, and display other file formats, but they will be treated as binary file types. In other words, all binary codes, etc. will be displayed along with the text.

Image formats

FileHold document management software can extract and display embedded images in these document formats: Word (.doc/.docx), PowerPoint (.ppt/.pptx), Excel, (.xls/.xlsx), Access (.mdb/accdb), RTF, and email files including Thunderbird (mbox/.eml), and Outlook (.pst/.msg) files. Images are displayed using the HTML <img> tag and are not converted, so only images such as .jpg and .png that can be displayed in a browser will appear.

Embedded object and attachment extraction

Embedded objects and attachments are indexed as part of the document that contains them. For example, a spreadsheet object embedded in a PowerPoint presentation would be treated as part of the PowerPoint presentation.

International language support

FileHold document management software search supports all languages through Unicode support. Unicode support allows for indexing and searching of non-English text, including every character set supported by the Unicode standard.