1.877.833.1202

Automatic Document Importation (ADI)

The Automatic Document Importation (ADI) mechanism allows importing a large number of documents into the document management system with minimal user intervention. It runs on the FileHold server to facilitate the mass migration of documents. ADI is similar to the Watched Folders functionality but can also be integrated with various custom migration tools using an API.

Assistance with configuring Automatic Document Importation is a Professional Service and is not covered by FileCare.

Several ADI “jobs” can be created by a Library Administrator or higher role. Each ADI “job” stores the configuration and status of the job. An administrator can configure the source type (Watched Folder or API), a time restriction for the job to run, the user account that is adding the documents, the source folder, target location and so on.

Documents can be imported from three sources:

  • If a Watched Folder is being used for the job, files from a specified directory are added to a queue. Once processed, they are imported into the destination folder in the library using the specified schema and metadata field values (direct), or using indirect metadata. The files from the specified directory can be monitored and brought automatically into the system. The input files can also be deleted.
  • If a Watched FTP site is being used for the job, files from an FTP server can be downloaded and processed. This method is useful when for example a scanning company completes a batch of scans and wants to send them into their customer’s FileHold repository. The scans are zipped along with the metadata and stored on a FTP server. When the file is stored on the FTP server, the download is triggered from either the appearance of the file or a notification email is sent to a specific email inbox. Direct or indirect metadata methods can be used.
  • If the source is an API, documents along with their target location in the library and metadata values are added to the queue using API calls. See the Knowledge Base for more information on API.

Once an ADI job is configured, the user specified in the job is the owner of the documents once the files are processed. This user must have a Document Publisher role or higher and must have access to the schema and destination folder.

For each job, the status which includes the number of processed documents, pending documents, and errors are shown. Within each job, the detailed list of documents, status (pending, completed, error), the date they were added to the queue, date they were processed, the source path and target folder are shown. These import details can be exported into a CSV file. Once a document has been successfully imported, the summary information and the document with associated metadata can be viewed. Summary information can be viewed for any pending documents or documents with errors.

The time at which documents are processed can be set on the job and for a scheduled task. In the job, you can specify when the specified directory is scanned for documents and puts them into the queue. However, when the documents are processed and imported into library is controlled by a scheduled task “FH process ADI job”. For example, you can be adding documents to the queue all day (no time restriction in the job settings) but the actual process of importing the documents occurs only at night (via the scheduled task settings) so the FileHold server is not additionally burdened during the day. The default setting for the scheduled task “FH process ADI job” is to run every 10 minutes indefinitely.

Extraction rules can be applied to documents that are imported. The extraction rule is used when the import job is set to use the same schema as the rule. The metadata values that are extracted through the extraction rule take precedence over the metadata values set in the import job. If there is no value mapped in the rule, then the value set in the job is used. Note that if the metadata field is a drop down list, ensure that the value being extracted from the document exists in the list. If the value does not exist then the value set in the job is used.

Metadata field values can be extracted from a delimited file instead of using the static values when using a watched folder or watched ftp site type import job. This is called "indirect metadata". A text delimited file, such as a CSV, that contains the schema, full path and document name, and metadata fields and values, is used to define the values that populate the metadata fields. Document versions can be imported and auto-filing scripts can also be used if using indirect metadata. See Using Indirect Metadata in an Import Job for more information.

Automatic Document Importation (ADI) is an optional feature that is controlled in the FileHold license. To purchase this feature, contact [email protected]. The FileHold professional services team has used ADI to help customers migrate documents from their legacy systems including Windows shared folders, Sharepoint, Image Now, ImageWare, FileNet, Mango Apps, ImageSilo, Computer Filing Cabinet, ApplicationXtender, boxes of paper and more.

Automatic Document Importation Job for a Watched Folder Source

To create an ADI job for a Watched Folder source

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click Add Job.
  3. Enter the Name of the job.
  4. Enter a Description for the job.
  5. Select a Source Type Watched Folder. Documents are imported from a specified folder path. This folder can be on the server or in a network location; however, the folder must have the designated FileHold service account as a member and have full control permissions. Select this option if you are using direct or indirect metadata.
  6. In the Job Settings area, select the Job is enabled check box to enable the job.
  7. The Restrict operation time fields determine when the documents will be brought into the queue from the Watched Folder. Select the Restrict operation for check box and enter the start and end time that the job will run. If no time is entered, the job runs as a continuous process and documents are added to the queue as soon as they are added to the source (Watched Folder).
  8. In the Max Documents Per Trigger field, enter the maximum number of documents that will be processed per import instance. For example, there can be 100 documents in the source folder but the maximum documents per trigger setting is set to 50 so only 50 documents will be processed when the scheduled task runs. The next 50 documents will be processed when the scheduled task runs again.

There are two limits to the number of documents that will be processed. In addition to the maximum number of documents there is a timer. The processing will stop when the maximum number of documents is reached or the timer expires. When the timer expires, the document that is currently being processed will be completed. The duration of the timer is set in the library manager web config file and should not normally be changed. It is the "ImportationJobTimeoutSec" key in the appSettings section.

  1. In the User Context field, select the user name from the list that will own the imported documents. This must be a user with a role of Document Publisher or higher.
  2. In the Post Import Actions field, select an option from the list:
  • None — No changes are made to the document
  • Force document format to electronic record — The document format is converted into an electronic record.
  1. If a Watched Folder source was selected, enter the Source Folder Path. This is the folder that is being “watched” for new documents and are brought into the queue.
  • You must use a UNC path for remote folder share locations, making sure that the designated FileHold service account has full control of this remote folder, and that the remote folder is properly shared as well.
  • If using indirect metadata, ensure that the indirect file and documents being imported are in the same directory.
  1. Select the Delete Input Files check box to delete the files from the source folder once they are imported into the library.
  2. Select the Automatically add new files to the queue check box to run this job without user intervention; documents are automatically added to the queue when the scheduled task is executed. If this check box is not enabled, then the job is run manually.
  3. Select the Use indirect metadata check box if you are using an indirect file that contains the metadata field values for the documents. See Indirect Metadata for more information. Fill out the following information:
  • File extension - Enter csv, tab, txt, etc.
  • Field delimiter - Enter the field separator.
  • Value delimiter - Enter the value separator. Enter a character even if you are not using multiple values. Note that the field delimiter and the value delimiter cannot be the same.
  1. Click Select to set the Destination Folder from the library tree.
  2. Select the Document Schema from the list.

If you choose a document schema here any values for schema and metadata fields in the indirect file will be ignored.

  1. Enter the values in the metadata fields. All fields marked with an asterisk (*) are required.
  2. Click OK to save the job. The job is added to the List of Import Jobs.

When first setup an ADI job there may be missing or incorrect configuration on your server or in an indirect file. These errors are reported in the Windows event log for FileHold. If documents are not being added to the job queue, you will likely find an error in the event log.

Automatic Document Importation Job for a Watched FTP Site Source

When using a Watched FTP site as the source, documents and /or metadata are downloaded and imported from an FTP server. Downloads are triggered by the presence of a file or via an email.

To create an ADI job for A Watched FTP Site source

  1. Complete steps 1-10 as above except select Watched FTP site as the Source Type.
  2. In the FTP Site Settings area, in the Host field, enter the machine name or server IP address of the Source folder. Click Test Connection to verify the Host is accessible.
  3. Enter the Port number. Uses standard port 21 by default.
  4. Select the Encrypted Connection check box if encryption is used in the FTP connection.
  5. In the Authentication area, select Anonymous if the logon type is anonymous. Leave unchecked if using a normal connection type.
  6. If not using an anonymous connection type, enter a User name and Password for the FTP account.
  7. In the FTP Folder Settings area, enter the FTP source folder path. Provide the full path to the Source folder in this field (for example: /FileHold/Data/Source). Make sure the path begins at the base directory to which the FTP server allows connection. The path must start with a forward slash ( / ).
  8. In the Source Filter field, enter the acceptable file types to be transferred. This will filter out any files that do not match the specified source. To accept all file types, enter *.*. This field is unavailable if the option “Get filenames from the email body using a regular expression to search for filename details and form a complete filename with replace” is enabled.
  9. In the Local Destination Folder Path field, specify the folder location when the files will be downloaded to on the local computer.
  10. In the Post Download Operation area, select any of the following options:
  • Extract archived files — Extracts the downloaded files after they are downloaded. Enter the list of valid archive file extensions in the field.
  • Delete archive files after contents are extracted — Select the check box to delete the zipped files after the contents have been extracted.
  • Rename source files — Renames the source files on the FTP site with a new extension. Enter the new file extension in the New File Extension field. Cannot be used with the Delete source files option.
  • Delete source files — Deletes the source files from the FTP source folder. Cannot be used with the Rename source files option.
  1. In the Watched Folder Trigger, select one of the following options:
  • File appears — Once a file appears in the FTP source folder path, the source files are downloaded to the local destination folder.
  • Email message received — Source files are downloaded when a notification email is received in a configurable email box. Use the following table to fill out the information:

Field

Description

POP3 Server

Enter the address for the POP3 server and click Test Connection to verify.

Port

Enter the port number. Uses standard port 110 by default.

Encrypted connection

Select the check box if the connection is encrypted.

Authentication

Select Anonymous or enter a User name and Password.

Get filenames for the email body using a regular expression to search for filename details and form a complete filename with replace

Select the option to use a regular expression in the Search and Replace options below.

Search

Provide a regular expression that finds each filename in the body of the email.

Replace

Include a regular expression to form a filename using characters found in the search above.

 

  1. Continue to fill out the Local File Processing Settings from step 12 to 17 above.
  2. Click OK to save the job.

Manually Running ADI Jobs

If the “Automatically add new files to the queue” option is not enabled for the job, the job must be run manually for a watched folder.

To manually run a job on a watched folder

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to run.
  3. In the Summary of job page, click Watch Now. Any files in the source folder are added to the queue for processing.

If you manually watch a folder the documents in that folder will be immediately added to the job queue, but they will only be processed when the "FH process ADI job" scheduled task runs. By default this is every 10 minutes. If you are testing a job you can run the Windows task manually to speed up the testing process.

Editing ADI Jobs

To edit a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to edit.
  3. In the Summary of job page, click Edit Job.
  4. Make the job changes and click OK.

Deleting ADI Jobs

To delete a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to delete.
  3. In the Summary of job page, click Delete Job.
  4. At the message prompt, click OK.

Resetting ADI Jobs

Resetting a job removes all pending and failed documents from the queue and job details and the import folder will be rescanned.

To reset a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. Select the job from the list.
  3. In the Summary of the job page, review if there are any errors. If present, click Reset Job.
  4. The message “Are you sure you want to reset this import job? All pending and failed documents will be removed and the import folder will be rescanned.” is displayed. Click OK to reset the job.

Viewing ADI Job Details

To view job details

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click the name of the job to edit.
  3. In the Summary of job page, click View Details. In the Details of Job page, a list of the files that were processed are shown:
  • The document name, schema type, source location, destination folder, date the file was added to the queue, and the date the import was completed is displayed for each document.

  • The status of pending, completed, or error is displayed. In the case of an error, this indicated the import failed for that document and will need to be re-added to the queue.

  • Click Download as CSV to download the job details as a CSV file.

  • To view the details of a specific document, click the document name. In the Details of <file name> Document screen, the metadata fields and summary for the document are shown. In the case of an error, the Error Log message is displayed. Where the status of a document is “completed”, click Go to Document to view the document in the library. To reprocess the document and add back to the queue, click Re-process Document. Ensure that the issue that caused the error has been corrected prior to attempting to reprocess the document. Click Previous or Next to move to the previous or next document in the details list. Click Return to Job Details to return to the previous screen.

  • To clear the details of the successfully completed documents, click Clear Completed.

  • To clear the details of unsuccessfully imported documents, click Clear Errors.

  • To reprocess all the documents that generated errors, click Re-process Errors. In the Reprocess Errors of <job name>, select the documents to be reprocessed. Ensure that the issue that caused the error has been corrected prior to attempting to reprocess the document. The documents are added back into the queue and reprocessed. If the documents were able to be processed, they will have a status of “completed” in the job details. If the documents were not able to be processed, they will have a status of “error” in the job details.
  1. In the Details of Job page, click Return to Summary to return to the Job Summary page.
  2. In the Job Summary page, click Return to List to return to the List of Import Jobs.

Enabling/Disabling ADI Jobs

To enable or disable a job

  1. In the Web Client, go to Administration Panel > System management > Import Jobs.
  2. In the List of Import Jobs, click Enable or Disable next to the job name.

Using Indirect Metadata in an Import Job

For a Watched Folder type import job type, a text delimited file, such as a CSV, that contains the schema, full path and document name, and metadata fields and values, can be used to define the values that populate the metadata fields. This allows you to import documents that have metadata values that vary from document to document. Without indirect metadata, the values in the import job are static or extraction rules can be used.

The option "Use Indirect Metadata" is available in the import job. When selected, the file extension (typically csv), field delimiter (typically a comma or semicolon) and the value delimiter (typically a comma or semicolon) which is used for multiple selection type metadata fields. Note that the field delimiter and the value delimiter cannot be the same. Field and value delimiters can be any Unicode character

Offline documents can be added with ADI using an API-based import or the indirect metadata method. For the indirect metadata method, the schema listed in the text delimited file must be an offline document schema.

Document versions can be imported via ADI. The versions of the document that need to be associated to each other must be defined in the delimited file and with the document in order for the system to connect the versions together. The association can be from a metadata ID, document version ID, document ID, external ID using a quick search, and internal ID. It is not possible to change the document schema or metadata fields when importing a version record.

An auto-filing script can also be used when the indirect metadata option is enabled. The auto-filing script configured in the schema or an alternate script can be used.

The following table describes the columns used for indirect metadata import. When creating the delimited file, the order of the columns does not matter.

Column heading Row Description

adiImportType*

or

ImportType (must be used in versions less than 16)

*required field

Use one of the following:

  • Document – Used for importing documents the first or only version of a document. When additional versions will be included in the same indirect file, set the adiVersionKeyType to InternalId and use the same InternalId GUID for this and every version of the same document.
  • Version (version 16 and higher) – Used for importing one or more additional versions of an existing document. The adiVersionKeyType and adiVersionSequence columns must be configured with this import type. It is possible to import all versions of a document in a single indirect file when the Document and Version import types are used together.

adiImportFilename*

or

ImportFilename (must be used in versions less than 16)

*required field

The full path and name of the document to import.

In the case of offline documents, use the document name only (not the path).

The FH_Service account must have full control to this directory. This field is mandatory.

adiDocumentSchema*

or

DocumentSchema (must be used in versions less than 16)

*required field

The name of the document schema that should be assigned. This field is mandatory when the adiImportType is set to Document and must be blank when set to Version.

adiDocumentName

The document name if different than the import filename. In the case of an offline file, the import filename will continue to be used for the original filename and document name if the new field is not provided. If this new field is provided, the import filename will only be used as the original filename. This is an optional field.

adiOwner

Overrides the user set as document owner in the job for each document version. It is supplied as a user GUID. This is an optional field.

adiVersionKeyType

adiVersionKey

adiQuickSearchName

The version key is used to determine how a version record will be associated as a new version of a document. The adiVersionKeyType and adiVersionKey columns are mandatory for version records and mandatory for a document record that will have subsequent versions in the same delimited file when using InternalId. The adiQuickSearchName is mandatory when the version key type is ExternalId.

In order to import a file as a version of a document, there needs to be some way to associate the version record with the document that it will become a version of. There are generally two cases: the document and versions records are all contained in the same delimited file or the document is already existing in FileHold and only the version records appear in the delimited file. The first case can use an identifier that is unique among all versions of the document and repeat this value in the delimited file for each record. For the second case, the version records in the delimited file will need an external reference to the document in FileHold.

There are five possible values for the version key type. The last four are all related to the external reference scenario. The primary difference between them is that the ExternalId is an universal approach that does not require any information internal to FileHold. The other three all require information taken from the FileHold API or database or otherwise extracted from the system. They all share a performance advantage over ExternalId.

  • InternalId – The adiVersionKey field value will be interpreted as an id unique to the indirect file. This id must have previously been defined with a Document import type record. The unique id must be in the format of a GUID. The scope of this value is limited to the delimited file.
  • ExternalId –  The adiVersionKey field value will be interpreted as a parameter to a quick search. When this version key type is used, a public quick search must also be specified in the adiQuickSearchName column.
  • MetadataId – The adiVersionKey field value will be interpreted as the metadata id.
  • DocumentVersionId – The adiVersionKey field value will be interpreted as the document version id.
  • DocumentId – The adiVersionKey field value will be interpreted as the document id.
adiVersionSequence

This is used to help ensure that versions are added in a specific order. The value starts at 1 and increases by 1 for each version. This sequence number is not related to the FileHold version.

For example, if there are three version records for the same document in a delimited file, sequence numbers 1, 2, and 3 are expected in that order. If there are three version records for the same document in three delimited files, the value 1 is expected for each. The versions are added in the order the delimited files are added to the queue. The version sequence is not required with a document record as this is always the first version of a document. Any related version records in the same file must start with sequence number 1. This is a mandatory column for a version record.
adiCreatedDate

Provides a way to override the created date/time for the document version. This does not affect the last modified date. So, if a document is added on Jun 1 and this field value is provides as Feb 15, the created date will be Feb 15, but he last modified date will be Jun 1. This also does not affect the action date for the Add Document action which will be the actual date. The usage log will include a note/column that indicates the actual create date was overridden.

The ability to use this field depends on a system administration setting “Allow document version create date to be overridden.” By default, this is disabled. Changing this value will create an entry in the system audit log and cause a confirmation prompt: “Compliance or regulatory requirements may require this option to remain disabled. Confirm that you would like to enable setting arbitrary create dates on document versions.”

This is an optional column. When this column is not present, the create date will be the date the document is added with ADI.

adiApprovalStatus

Sets the approval status for the document version using the normal enumeration values. The only valid values are Not submitted for approval, Approved, and Not approved. This is an optional column. When this column is not present, the approval status will be not submitted for approval.

adiVCN

The version control number for the document version. This is an optional column.

adiDCN

The document control number for the document version. This is an optional column.

metadata field

The name of a metadata field. There must be one for every required field in the document schema. Optional fields can be added as needed.

Metadata field names and drop down list values must exactly match the configuration in FileHold including case. If you notice a blank field after you import documents a misspelled field name in the indirect file may be the cause.

The indirect file and files for import do not need to be in the same directory location when being imported, but the designated FileHold service account must have full control.

A sample indirect file is shown below displayed in Microsoft Excel®. There is one document and one version being imported.

  • adiImportType - Set to Document or Version
  • adiImportFilename - The full path and name of the document to import.

The FH_Service account must have full control to this directory.

  • adiDocumentName - Document name used to rename the original filename.
  • adiOwner - The GUID of the owner.
  • adiVersionKeyType - The method used to determine how the document is associated with the version.
  • adiVersionKey - The unique ID that links the document with the version. The number must be the same for both document and version.
  • adiVersionSequence - The order in which the versions are added. Must start at 1.
  • adiDocumentSchema  - The name of the document schema being used.
  • Invoice number - The metadata field name.
  • Total - The metadata field name.
  • Invoice Date - The metadata field name.
  • Vendor - The metadata field name.

Indirect metadata sample CSV file - ADIversionkey

Arbitrary Unicode characters can be used as delimiters by prefixing a decimal Unicode value with a backslash. The most common delimiter characters will come from the ASCII Punctuation symbols. A tab character can be expressed as \9, vertical tab is \11, a space as \32, and the \ (backslash) character as \92. A complete list of values is available Unicode Consortium website.

Special field type conditions with indirect metadata

Date fields

The following short date formats are supported in the delimited file with ADI:

  • mm-dd-yyyy
  • mm/dd/yyyy
  • yyyy-mm-dd
  • yyyy/mm/dd

Once the date metadata value is imported into the library, the date format will match the format set in the date metadata field properties in FileHold. For example, if the date format in the delimited file is mm-dd-yyyy and the date format in the Date metadata field is ddmmyyyy, the format for the date field in the metadata pane is ddmmyyyy.

Dropdown field

It is possible to define dropdown menu metadata fields to allow duplicate values. The behavior of indirect metadata assignment is undefined when there are more than one dropdown menu value matching the value in the indirect metadata. As a general rule, creating duplicate values in a dropdown menu is bad practice. 

Drilldown field

The value for only a single node can be specified in the indirect metadata. This value will be compared against every node value in the tree to find a match. If the leaf node restriction is defined for the field, the comparison will only be performed on the leaf node. The same problem with duplicate values as for dropdown menus exists for drilldown menus.

Understanding ADI processing

Like many autonomous tasks in FileHold, ADI normally operates as a Windows scheduled task. By default it is set to run every 10 minutes and will internally turn itself off after the maximum number of documents is processed or it reaches its timeout value, whichever comes first. It should never be allowed to run more than once.

The maximum number of documents for each job is set to unlimited by default and the timer is set to 570 seconds. As the task should never be allowed to run more than once, the Windows scheduled task definition is set to prevent this. If the documents are small and can be added quickly, approximately 30 seconds of processing time will be unused at the end of the 10 minute task interval. If the last document processed before the end of the timer is large and requires more than 30 seconds to process, the task will end after it next scheduled time and processing will be idle until the following scheduled time.

FileHold does not guarantee the order or priority for processing jobs. It is possible that one job may have a large number of documents and effectively consume all task time and starve other jobs from loading documents. In this case, the maximum number of documents can be set to throttle one or more jobs and allow other jobs to process. Since the task is running every 10 minutes, any changes to the job definition will take effect at the next 10 minute interval. This allows you to modify the maximum number of documents or disable a job to accommodate changes in volume, for example.

If you are processing a very large number of documents, such as a migration from a legacy system, you may want to make some additional system adjustments to end the import as quickly as possible. These adjustments must be made with the help of a Windows administrator. The first adjustment is simple, disable all jobs except the migration job and set the maximum number of documents to unlimited. Adjust the maximum duration of the import task to a value appropriate to the volume of documents. The value is in seconds, but it can be set to run for multiple days as needed. Disable the import task in the task scheduler and run it manually from the command line.

fileholdadm /lmprocessimportationjobs

Depending on the expected duration, you may wish to temporarily disable IIS recycling as this can interrupt the task. Recycling is an optional maintenance feature in IIS, but it is typically set to run every day. To improve performance you may also wish to disable the full text search indexing task. This will mean new documents will not be added to the full text index, but they will be queued to be added later. If server side OCR is enabled, disable that task also.

If you are performing a mass document migration from a legacy system, you are likely using the indirect import option to capture metadata. Before running a job with all your documents, it is advisable to test your import data to make sure it matches the configuration in FileHold. Common problems include missing or misspelled dropdown fields or missing data in required fields. The easiest way to test is on your test FileHold server. Prepare your production configuration and make sure it is loaded on the test server. Prepare ADI as above and run the task manually from the command line. Look to the Windows FileHold event log for warning messages relating to the format of the delimited file. Look to the job details for errors related to mismatched data.

ADI tracks the name of imported files and it will not import the same files twice. It compares the full path of the file with the full path of previously queued files to determine if they are the same. If you attempt to import a file that was previously queued for import it will be silently ignored.