File formats supported for full text searching
The following file formats are supported for full text content indexing and searching for documents in the document management system.
A Windows administrator can narrow the list of indexed document types. For example, it may not be effective to index the binary content of EXE file or the rows in a database.
Supported file formats
Adobe Acrobat (*.pdf)
Adobe Framemaker MIF (*.mif)
Ami Pro (*.sam)
Ansi Text (*.txt)
ASCII Text
ASF media files (metadata only) (*.asf)
CSV (Comma-separated values) (*.csv)
DBF (*.dbf)
EBCDIC
EML files (emails saved by Outlook Express) (*.eml)
Enhanced Metafile Format (*.emf)
Eudora MBX message files (*.mbx)
Flash (*.swf)
GZIP (*.gz)
HTML (*.htm, *.html)
Ichitaro (versions 5 and later)
JPEG (*.jpg)
Lotus 1-2-3 (*.123, *.wk?)
MBOX email archives such as Thunderbird, including attachments. In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.
MHT archives (HTML archives saved by Internet Explorer) (*.mht)
MIME messages, including attachments. In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.
MSG files (emails saved by Outlook), including attachments (*.msg). In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.
Microsoft Access MDB files (*.mdb, *.accdb, including Access 2007 and Access 2010)
Microsoft Document Imaging (*.mdi)
Microsoft Excel (*.xls)
Microsoft Excel 2003 XML (*.xml)
Microsoft Excel 2007 and 2010 (*.xlsx)
Microsoft Outlook data files, including attachments. In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.
Microsoft Outlook/Exchange Messages, Notes, Contacts, Appointments, and Tasks
Microsoft Outlook Express 5 and 6 (*.dbx) message stores
Microsoft PowerPoint (*.ppt)
Microsoft PowerPoint 2007 and 2010 (*.pptx)
Microsoft Rich Text Format (*.rtf)
Microsoft Searchable Tiff (*.tiff)
Microsoft Word for DOS (*.doc)
Microsoft Word for Windows (*.doc)
Microsoft Word 2003 XML (*.xml)
Microsoft Word 2007 and 2010 (*.docx)
Microsoft Works (*.wks)
MP3 (metadata only) (*.mp3)
Multimate Advantage II (*.dox)
Multimate version 4 (*.doc)
OpenOffice versions 1, 2, and 3 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications)
Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw)
QuickTime (*.mov, *.m4a, *.m4v)
RAR (*.rar). RAR support currently applies to the Windows version.
TAR (*.tar)
TIFF (*.tif)
TNEF (winmail.dat files)
Treepad HJT files (*.hjt)
Unicode (UCS16, Mac or Windows byte order, or UTF-8)
Visio XML files (*.vdx)
Windows Metafile Format (*.wmf)
WMA media files (metadata only) (*.wma)
WMV video files (metadata only) (*.wmv)
WordPerfect 4.2 (*.wpd, *.wpf)
WordPerfect (5.0 and later) (*.wpd, *.wpf)
WordStar version 1, 2, 3 (*.ws)
WordStar versions 4, 5, 6 (*.ws)
WordStar 2000
Write (*.wri)
XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf)
XML (*.xml)
XML Paper Specification (*.xps)
XSL
XyWrite
ZIP (*.zip)
Automatically-detected fields
FileHold document management software automatically detects fields in the following file formats:
File format |
Fields |
---|---|
Email files (Outlook Express, Eudora, MBOX, EML) |
To, CC, BCC, From, Sent Via, Sender, Recipient, Subject, Date, Attachments |
Outlook items and .MSG files |
To, CC, BCC, From, Sent Via, Sender, Recipient, Subject, Date, Sent Date, Delivered Date, Attachments, contact fields (StreetAddress, CompanyName, etc.) |
Microsoft Word, Excel, PowerPoint |
Document summary information fields |
OpenOffice/Open Document Format |
Document properties fields |
HTML |
META tags; <TITLE> is indexed as HtmlTitle field; <H1>, <H2>, <H3> are indexed as HtmlH1, HtmlH2, HtmlH3, etc. |
XML |
All fields |
DBF |
All fields |
CSV |
All fields (CSV, or comma-separated values, files must have a .csv extension, a list of field names in the first line, and must use tab, comma, or semicolon delimiters) |
PDF files |
Document Properties |
WordPerfect |
Document summary information fields |
MP3 |
All metadata fields |
JPG, TIFF |
EXIF and IPTC metadata fields; XMP (Vista) metadata supported in version 7.40 |
ASF, WMA, WMV |
All metadata fields |
Other file formats supported
FileHold document management software will still index, search, and display other file formats, but they will be treated as binary file types. In other words, all binary codes, etc. will be displayed along with the text.
Image formats
FileHold document management software can extract and display embedded images in these document formats: Word (.doc/.docx), PowerPoint (.ppt/.pptx), Excel, (.xls/.xlsx), Access (.mdb/accdb), RTF, and email files including Thunderbird (mbox/.eml), and Outlook (.pst/.msg) files. Images are displayed using the HTML <img> tag and are not converted, so only images such as .jpg and .png that can be displayed in a browser will appear.
Embedded object and attachment extraction
Embedded objects and attachments are indexed as part of the document that contains them. For example, a spreadsheet object embedded in a PowerPoint presentation would be treated as part of the PowerPoint presentation.
International language support
FileHold document management software search supports all languages through Unicode support. Unicode support allows for indexing and searching of non-English text, including every character set supported by the Unicode standard.