1.877.833.1202

Automatic Document Importation (ADI)

The Automatic Document Importation (ADI) mechanism allows importing a large number of documents into the document management system with minimal user intervention. It runs on the FileHold server to facilitate the mass migration of documents. ADI is similar to the Watched Folders functionality but can also be integrated with various custom migration tools using an API.

Several ADI “jobs” can be created by a Library Administrator or higher role. Each ADI “job” stores the configuration and status of the job. An administrator can configure the source type (Watched Folder or API), a time restriction for the job to run, the user account that is adding the documents, the source folder, target location and so on.

Documents can be imported from three sources:

  • If a Watched Folder is being used for the job, files from a specified directory are added to a queue. Once processed, they are imported into the destination folder in the library using the specified schema and metadata field values (direct), or using indirect metadata. The files from the specified directory can be monitored and brought automatically into the system. The input files can also be deleted.
  • If a Watched FTP site is being used for the job, files from an FTP server can be downloaded and processed. This method is useful when for example a scanning company completes a batch of scans and wants to send them into their customer’s FileHold repository. The scans are zipped along with the metadata and stored on a FTP server. When the file is stored on the FTP server, the download is triggered from either the appearance of the file or a notification email is sent to a specific email inbox. Direct or indirect metadata methods can be used.
  • If the source is an API, documents along with their target location in the library and metadata values are added to the queue using API calls. See the Knowledge Base for more information on API.

Once an ADI job is configured, the user specified in the job is the owner of the documents once the files are processed. This user must have a Document Publisher role or higher and must have access to the schema and destination folder.

For each job, the status which includes the number of processed documents, pending documents, and errors are shown. Within each job, the detailed list of documents, status (pending, completed, error), the date they were added to the queue, date they were processed, the source path and target folder are shown. These import details can be exported into a CSV file. Once a document has been successfully imported, the summary information and the document with associated metadata can be viewed. Summary information can be viewed for any pending documents or documents with errors.

The time at which documents are processed can be set on the job and for a scheduled task. In the job, you can specify when the specified directory is scanned for documents and puts them into the queue. However, when the documents are processed and imported into library is controlled by a scheduled task “FH process ADI job”. For example, you can be adding documents to the queue all day (no time restriction in the job settings) but the actual process of importing the documents occurs only at night (via the scheduled task settings) so the FileHold server is not additionally burdened during the day. The default setting for the scheduled task “FH process ADI job” is to run every 10 minutes indefinitely.

(FileHold 14.2 and higher) Extraction rules can be applied to documents that are imported. The extraction rule is used when the import job is set to use the same schema as the rule. The metadata values that are extracted through the extraction rule take precedence over the metadata values set in the import job. If there is no value mapped in the rule, then the value set in the job is used. Note that if the metadata field is a drop down list, ensure that the value being extracted from the document exists in the list. If the value does not exist then the value set in the job is used.

(FileHold 14.2 and higher) Metadata field values can be extracted CSV file instead of using the static values when using a watched folder or watched ftp site type import job. This is called "indirect metadata". A text delimited file, such as a CSV, that contains the schema, full path and document name, and metadata fields and values, is used to define the values that populate the metadata fields. An auto-filing script can also be used if using indirect metadata. See Using Indirect Metadata in an Import Job for more information.

Automatic Document Importation (ADI) is an optional feature that is controlled in the FileHold license. To purchase this feature, contact sales@filehold.com.

Automatic Document Importation Job for a Watched Folder Source

To create an ADI job for a Watched Folder source

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click Add Job.
  3. Enter the Name of the job.
  4. Enter a Description for the job.
  5. Select a Source Type Watched Folder. Documents are imported from a specified folder path. This folder can be on the server or in a network location; however, the folder must have the designated FileHold service account as a member and have full control permissions. Select this option if you are using direct or indirect metadata.
  6. In the Job Settings area, select the Job is enabled check box to enable the job.
  7. The Restrict operation time fields determine when the documents will be brought into the queue from the Watched Folder. Select the Restrict operation for check box and enter the start and end time that the job will run. If no time is entered, the job runs as a continuous process and documents are added to the queue as soon as they are added to the source (Watched Folder).
  8. In the Max Documents Per Trigger field, enter the maximum number of documents that will be processed per import instance. For example, there can be 100 documents in the source folder but the maximum documents per trigger setting is set to 50 so only 50 documents will be processed when the scheduled task runs. The next 50 documents will be processed when the scheduled task runs again.

TIP: There are two limits to the number of documents that will be processed. In addition to the maximum number of documents there is a timer. The processing will stop when the maximum number of documents is reached or the timer expires. When the timer expires, the document that is currently being processed will be completed. The duration of the timer is set in the library manager web config file and should not normally be changed. It is the "ImportationJobTimeoutSec" key in the appSettings section.

  1. In the User Context field, select the user name from the list that will own the imported documents. This must be a user with a role of Document Publisher or higher.
  2. In the Post Import Actions field, select an option from the list:
  • None — No changes are made to the document
  • Force document format to electronic record — The document format is converted into an electronic record.
  1. If a Watched Folder source was selected, enter the Source Folder Path. This is the folder that is being “watched” for new documents and are brought into the queue.
  • You must use a UNC path for remote folder share locations, making sure that the designated FileHold service account has full control of this remote folder, and that the remote folder is properly shared as well.
  • If using indirect metadata, ensure that the indirect file and documents being imported are in the same directory.
  1. Select the Delete Input Files check box to delete the files from the source folder once they are imported into the library.
  2. Select the Automatically add new files to the queue check box to run this job without user intervention; documents are automatically added to the queue when the scheduled task is executed. If this check box is not enabled, then the job is run manually.
  3. Select the Use indirect metadata check box if you are using an indirect file that contains the metadata field values for the documents. See Indirect Metadata for more information. Fill out the following information:
  • File extension - Enter csv, tab, txt, etc.
  • Field delimiter - Enter the field separator.
  • Value delimiter - Enter the value separator. Enter a character even if you are not using multiple values. Note that the field delimiter and the value delimiter cannot be the same.
  1. Click Select to set the Destination Folder from the library tree.
  2. Select the Document Schema from the list.

TIP: If you choose a document schema here any values for schema and metadata fields in the indirect file will be ignored.

  1. Enter the values in the metadata fields. All fields marked with an asterisk (*) are required.
  2. Click OK to save the job. The job is added to the List of Import Jobs.

TIP: When first setup an ADI job there may be missing or incorrect configuration on your server or in an indirect file. These errors are reported in the Windows event log for FileHold. If documents are not being added to the job queue, you will likely find an error in the event log.

Automatic Document Importation Job for a Watched FTP Site Source

When using a Watched FTP site as the source, documents and /or metadata are downloaded and imported from an FTP server. Downloads are triggered by the presence of a file or via an email.

To create an ADI job for A Watched FTP Site source

  1. Complete steps 1-10 as above except select Watched FTP site as the Source Type.
  2. In the FTP Site Settings area, in the Host field, enter the machine name or server IP address of the Source folder. Click Test Connection to verify the Host is accessible.
  3. Enter the Port number. Uses standard port 21 by default.
  4. Select the Encrypted Connection check box if encryption is used in the FTP connection.
  5. In the Authentication area, select Anonymous if the logon type is anonymous. Leave unchecked if using a normal connection type.
  6. If not using an anonymous connection type, enter a User name and Password for the FTP account.
  7. In the FTP Folder Settings area, enter the FTP source folder path. Provide the full path to the Source folder in this field (for example: /FileHold/Data/Source). Make sure the path begins at the base directory to which the FTP server allows connection. The path must start with a forward slash ( / ).
  8. In the Source Filter field, enter the acceptable file types to be transferred. This will filter out any files that do not match the specified source. To accept all file types, enter *.*. This field is unavailable if the option “Get filenames from the email body using a regular expression to search for filename details and form a complete filename with replace” is enabled.
  9. In the Local Destination Folder Path field, specify the folder location when the files will be downloaded to on the local computer.
  10. In the Post Download Operation area, select any of the following options:
  • Extract archived files — Extracts the downloaded files after they are downloaded. Enter the list of valid archive file extensions in the field.
  • Delete archive files after contents are extracted — Select the check box to delete the zipped files after the contents have been extracted.
  • Rename source files — Renames the source files on the FTP site with a new extension. Enter the new file extension in the New File Extension field. Cannot be used with the Delete source files option.
  • Delete source files — Deletes the source files from the FTP source folder. Cannot be used with the Rename source files option.
  1. In the Watched Folder Trigger, select one of the following options:
  • File appears — Once a file appears in the FTP source folder path, the source files are downloaded to the local destination folder.
  • Email message received — Source files are downloaded when a notification email is received in a configurable email box. Use the following table to fill out the information:

Field

Description

POP3 Server

Enter the address for the POP3 server and click Test Connection to verify.

Port

Enter the port number. Uses standard port 110 by default.

Encrypted connection

Select the check box if the connection is encrypted.

Authentication

Select Anonymous or enter a User name and Password.

Get filenames for the email body using a regular expression to search for filename details and form a complete filename with replace

Select the option to use a regular expression in the Search and Replace options below.

Search

Provide a regular expression that finds each filename in the body of the email.

Replace

Include a regular expression to form a filename using characters found in the search above.

 

  1. Continue to fill out the Local File Processing Settings from step 12 to 17 above.
  2. Click OK to save the job.

Manually Running ADI Jobs

If the “Automatically add new files to the queue” option is not enabled for the job, the job must be run manually for a watched folder.

To manually run a job on a watched folder

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to run.
  3. In the Summary of job page, click Watch Now. Any files in the source folder are added to the queue for processing.

TIP: If you manually watch a folder the documents in that folder will be immediately added to the job queue, but they will only be processed when the "FH process ADI job" scheduled task runs. By default this is every 10 minutes. If you are testing a job you can run the Windows task manually to speed up the testing process.

Editing ADI Jobs

To edit a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to edit.
  3. In the Summary of job page, click Edit Job.
  4. Make the job changes and click OK.

Deleting ADI Jobs

To delete a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to delete.
  3. In the Summary of job page, click Delete Job.
  4. At the message prompt, click OK.

Resetting ADI Jobs

Resetting a job removes all pending and failed documents from the queue and job details and the import folder will be rescanned.

To reset a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. Select the job from the list.
  3. In the Summary of the job page, review if there are any errors. If present, click Reset Job.
  4. The message “Are you sure you want to reset this import job? All pending and failed documents will be removed and the import folder will be rescanned.” is displayed. Click OK to reset the job.

Viewing ADI Job Details

To view job details

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to edit.
  3. In the Summary of job page, click View Details. In the Details of Job page, a list of the files that were processed are shown:
  • The document name, schema type, source location, destination folder, date the file was added to the queue, and the date the import was completed is displayed for each document.

  • The status of pending, completed, or error is displayed. In the case of an error, this indicated the import failed for that document and will need to be re-added to the queue.

  • Click Download as CSV to download the job details as a CSV file.

  • To view the details of a specific document, click the document name. In the Details of <file name> Document screen, the metadata fields and summary for the document are shown. In the case of an error, the Error Log message is displayed. Where the status of a document is “completed”, click Go to Document to view the document in the library. To reprocess the document and add back to the queue, click Re-process Document. Ensure that the issue that caused the error has been corrected prior to attempting to reprocess the document. Click Previous or Next to move to the previous or next document in the details list. Click Return to Job Details to return to the previous screen.

  • To clear the details of the successfully completed documents, click Clear Completed.

  • To clear the details of unsuccessfully imported documents, click Clear Errors.

  • To reprocess all the documents that generated errors, click Re-process Errors. In the Reprocess Errors of <job name>, select the documents to be reprocessed. Ensure that the issue that caused the error has been corrected prior to attempting to reprocess the document. The documents are added back into the queue and reprocessed. If the documents were able to be processed, they will have a status of “completed” in the job details. If the documents were not able to be processed, they will have a status of “error” in the job details.
  1. In the Details of Job page, click Return to Summary to return to the Job Summary page.
  2. In the Job Summary page, click Return to List to return to the List of Import Jobs.

Enabling/Disabling ADI Jobs

To enable or disable a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click Enable or Disable next to the job name.

Using Indirect Metadata in an Import Job

(FileHold 14.2 and higher) For a Watched Folder type import job type, a text delimited file, such as a CSV, that contains the schema, full path and document name, and metadata fields and values, can be used to define the values that populate the metadata fields. This allows you to import documents that have metadata values that vary from document to document. Without indirect metadata, the values in the import job are static or  extraction rules can be used.

The option "Use Indirect Metadata" is available in the import job. When selected, the file extension (typically csv), field delimiter (typically a comma or semicolon) and the value delimiter (typically a comma or semicolon) which is used for multiple selection type metadata fields. Note that the field delimiter and the value delimiter cannot be the same. Field and value delimiters can be any Unicode character

Offline documents can be added with ADI using an API-based import or the indirect metadata method. For the indirect metadata method, the schema listed in the text delimited file must be an offline document schema.

An auto-filing script can also be used when the indirect metadata option is enabled. The auto-filing script configured in the schema or an alternate script can be used.

For offline documents, use only the filename and not the path in the ImportFilename column. The column headers are:

Column heading Row Description
ImportType This value must be set to Document.
ImportFilename
  • The fully qualified path and name of the document for an electronic record or electronic document schema.
  • The document name for an offline schema.
DocumentSchema The name of the document schema to set for the document. 
metadata field

The name of a metadata field. There must be one for every required field in the document schema. Optional fields can be added as needed.

IMPORTANT: Metadata field names and drop down list values must exactly match the configuration in FileHold including case. If you notice a blank field after you import documents a misspelled field name in the indirect file may be the cause.

A sample indirect file is shown below displayed in Microsoft Excel®. Do not change the position of the column headers or the import will not work. 

  • ImportType - This is always set to Document.
  • ImportFilename - The full path and name of the document to import. In the case of offline documents, use the document name only (not the path).

IMPORTANT: The FH_Service account must have full control to this directory.

  • DocumentSchema  - The name of the document schema being used
  • First Name - The metadata field name in the schema.
  • Last Name - The metadata field name in the schema. Use as many metadata fields as needed in additional columns.

Indirect metadata sample CSV file

The indirect file and documents must be in the same directory location when being imported and the designated FileHold service account must have full control.

There are three example files that can downloaded in a zip file and used to create your own CSV files for indirect metadata imports at the bottom of this page.

  1. Indirect metadata, no value delimiters (no multi-select values).
  2. Indirect metadata with value delimeters (multi-select values). Multiple values are separated by a semi-colon (;).
  3. Offline documents.

TIP: Arbitrary Unicode characters can be used as delimiters by prefixing a decimal Unicode value with a backslash. The most common delimiter characters will come from the ASCII Punctuation symbols. A tab character can be expressed as \9, vertical tab is \11, a space as \32, and the \ (backslash) character as \92. A complete list of values is available Unicode Consortium website.

Understanding ADI processing

Like many autonomous tasks in FileHold, ADI normally operates as a Windows scheduled task. By default it is set to run every 10 minutes and will internally turn itself off after the maximum number of documents is processed or it reaches its timeout value, whichever comes first. It should never be allowed to run more than once.

The maximum number of documents for each job is set to unlimited by default and the timer is set to 570 seconds. As the task should never be allowed to run more than once, the Windows scheduled task definition is set to prevent this. If the documents are small and can be added quickly, approximately 30 seconds of processing time will be unused at the end of the 10 minute task interval. If the last document processed before the end of the timer is large and requires more than 30 seconds to process, the task will end after it next scheduled time and processing will be idle until the following scheduled time.

FileHold does not guarantee the order or priority for processing jobs. It is possible that one job may have a large number of documents and effectively consume all task time and starve other jobs from loading documents. In this case, the maximum number of documents can be set to throttle one or more jobs and allow other jobs to process. Since the task is running every 10 minutes, any changes to the job definition will take effect at the next 10 minute interval. This allows you to modify the maximum number of documents or disable a job to accommodate changes in volume, for example.

If you are processing a very large number of documents, such as a migration from a legacy system, you may want to make some additional system adjustments to end the import as quickly as possible. These adjustments must be made with the help of a Windows administrator. The first adjustment is simple, disable all jobs except the migration job and set the maximum number of documents to unlimited. Adjust the maximum duration of the import task to a value appropriate to the volume of documents. The value is in seconds, but it can be set to run for multiple days as needed. Disable the import task in the task scheduler and run it manually from the command line.

fileholdadm /lmprocessimportationjobs

Depending on the expected duration, you may wish to temporarily disable IIS recycling as this can interrupt the task. Recycling is an optional maintenance feature in IIS, but it is typically set to run every day. To improve performance you may also wish to disable the full text search indexing task. This will mean new documents will not be added to the full text index, but they will be queued to be added later.

If you are performing a mass document migration from a legacy system, you are likely using the indirect import option to capture metadata. Before running a job with all your documents, it is advisable to test your import data to make sure it matches the configuration in FileHold. Common problems include missing or misspelled dropdown fields or missing data in required fields. The easiest way to test is on your test FileHold server. Prepare your production configuration and make sure it is loaded on the test server. Prepare ADI as above and run the task manually from the command line. Look to the Windows FileHold event log for warning messages relating to the format of the delimited file. Look to the job details for errors related to mismatched data.