An overview of document migration from legacy systems

Legacy document repositories are a reality for most organizations. Once FileHold is in place there will need to be a strategy for dealing with old and new documents. There are many solutions for addressing challenges with a legacy repository before or after you start using FileHold. The first step is to identify some critical attributes of the legacy repository. The organizational repository may be a collection of many different repositories. The repository attributes may vary in different parts of the organizational repository.

Importance of the documents

We can think about the importance of documents to the organization by looking at four different categories.

  1. Operational
  2. Organizational
  3. Historical
  4. Unknown

Operational documents have the most impact on an organization. By definition they are used for the day-to-day running of the organization. They often document process elements such as invoices or employment records. They are frequently added, updated, and accessed.

Organizational documents describe policies and procedures or document other aspects of the organization. They change infrequently, but it is critical they can be accessed as needed.

Historical documents could originate as operational or organizational documents. They are simply no longer current, but there may still be reasons such as governments regulations that require their retention.

Unknown documents are just that. They could fit into any one of the other three categories. Causes for document importance being unknown include lack of organizational standards for document management or the ability for employees to store documents where they choose. It is also a common side-effect of a corporate acquisition.

Document classification

In broad terms, the quality of document management can be assigned to one of three classifications. 

  1. Informal
  2. Semi-formal
  3. Strictly formal

Informal documentation is common when individuals are given responsibilities for deciding how to store their documents. Some individuals may be very good at storing documents for easy retrieval, retention, etc. while others may do virtually nothing. Organizations with documents in this classification will likely also have documents of unknown importance.

Semi-formal systems often include some structure, but nothing to ensure compliance. These systems are often the most tricky for migration as they have the appearance of formality, but the underlying data is full of errors and requires a lot of cleansing before migration can take place.

As the name suggests, strictly formal classification systems ensure documents are well managed and probably have a formal records management process. Typically there is technology, training, and audits in place to ensure compliance. This class of documents usually provide the greatest options for a complete and consistent migration of all data.

Storage technology

The technology used to store the legacy documents will help to define the techniques that are possible to extract documents and metadata from the repository.

  1. Paper
  2. Microsoft Windows file system
  3. Other file system
  4. Dedicated Document Management Software (DMS)

Paper is familiar to everyone. You can hire a service bureau to take it offsite, convert it to electronic documents, or safely destroy it. FileHold can work with your service bureau to ensure any electronic documents they create can be easily added to FileHold.

The Windows file system is a well understood place to store documents. This method of storage has a very high compatibility with FileHold since FileHold runs on Windows. The amount of metadata that can be extracted is limited to things like the file name, folder path, and, for certain documents, internal metadata.

Other file systems tend to be similar to Windows file systems though they may have limitations such as lower data transfer speeds to the FileHold server.

A dedicated DMS has both the greatest opportunities and challenges and deserves a section of its own.

Dedicated document management software

There are many reasons to move from a existing dedicated DMS to FileHold such as high annual maintenance costs, low quality software, unsupported software, difficult to use software, etc. There is no standard for the internal representation of documents and their metadata across document management systems. Products in the marketplace use many different methods for storing documents and metadata and many different ways for getting documents out. Several questions must be answered to determine what will be the most economical method of dealing with a legacy repository.

  1. The legacy DMS has a document export feature.
    1. Does the export change the format of the documents when they are exported? If it is changed, is the new format acceptable for migration?
    2. Can the export feature bulk export documents or does it only export one at a time?
    3. Can the DMS export metadata, indexes, tags, and or labels?
    4. Can the DMS export documents versions?
    5. Is the export feature of the software licensed for use? 
  2. The legacy DMS does NOT have a suitable document export feature.
    1. Are export utilities available from the original vendor or third parties?
    2. Where are the documents stored: file system or database?
    3. Are the documents stored as pages or complete documents? 
    4. What file format is used to store the documents: proprietary or standard? Are the documents or related data encrypted?
    5. Is a database management system such as SQL Server, Oracle, or MySQL used to store the data or is it stored in a proprietary or embedded database format?

Legacy repository solutions overview

    Depending on the attributes of your legacy repository and answers to questions above there are a number of ways to move forward.

    1. Set the repository to read-only. This is a pretty simple solution for file system based repositories. Users can continue to access the documents as needed. They can even add the documents to FileHold when they are needed for on going use. Where possible setup auditing on the repository to look for areas that are not accessed. Plans can be put in place to take these offline after a set period of time with notification to users. Eventually the legacy repository will be retired.
    2. Virtualize the legacy DMS and archive it. This is a fairly low cost solution, but it can present challenges as the knowledge of how to operate the legacy system start to fade from your organization. The legacy system may not be supported by its vendor any more. The environment may not be easy to virtualize.
    3. Export the legacy DMS documents into a common or universal format. This can be a low cost solution if there is an existing export documents. The universal format typically does not provide an easy method to search the documents.
    4. If the legacy repository is a network share or logical file system there will be very little metadata to transfer. Simply add the documents to FileHold using the Add Folder feature. Storage is fairly cheap which makes this a low cost solution, but the documents must be in a Windows file system accessible folder tree. This method does not provide any transfer of metadata except the original filename to FileHold. The search capabilities will be limited to the original filename and any full text document content. The FileHold server side OCR feature can automatically add text to image only documents. As an option, file properties such as the source folder name can be automatically extracted as documents are imported using the standard extraction rules feature.
    5. In the case above, if the folder names have special meaning they can be mapped to specific metadata fields in FileHold using simple tools and techniques.
    6. When documents needed for ongoing operations are in a legacy DMS there are likely special tools that will need to be built or purchased to extract and transform the documents and metadata. FileHold provides out-of-the-box features such as Manage Imports and Automatic Document Importation (ADI) to enable the import of legacy documents and metadata.

    Typical legacy repository migration activities

    There are a number of activities needed to successfully implement a legacy repository migration to FileHold. Some of these activities are performed by the customer, by a partner, and or by FileHold; some are Mandatory and some are Optional. In all cases customers should provide a project manager or key user who is responsible for ensuring that all activities required of the customer are completed.

    The following list highlights the activities from a typical migration plan from a dedicated DMS, but many variations are possible to accommodate special situations. In some cases the order of items may be switched as needed; some items can run in parallel.

    Activity M/O Description How Responsible*
    1.1 M Document migration analysis and report.

    There may be preceding activities required if there is no "to be" description of the new system.
    Interview customer. FileHold and Customer
    1.2 M Build, configure, and or acquire any necessary migration tools.   FileHold and Customer
    2.1 M Source data cleansing, part 1. Manual or automated tools such as file duplication. Customer
    2.2 M Configure FileHold to accept imported documents. Details are in the document migration report. FileHold and Customer
    2.4 O Copy data from source system to temporary migration location. FastCopy and similar tools. Customer
    2.5 O Run migration tools in simulation mode and verify results.   FileHold and Customer
    2.6 O Source data cleansing, part 2.   Customer
    2.7 M Run migration tools in execute mode and verify results.   FileHold and Customer
    2.8 M Import documents into FileHold. Manage Imports, ADI, Add Folders, etc. FileHold and Customer
    2.9 M Destination data cleansing.   Customer
    3.1 O Perform delta migration to capture any documents added to the source system after the main migration. Repeat activities 2.1 - 2.9. FileHold and Customer

    * The FileHold professional services team or a FileHold partner can assist with many of these items as needed. Hours will be deducted from your implementation package as used by FileHold professional services.

    Legacy document migration cost

    Documents in customer's legacy repositories are often critical to the operation of their organization and migration of these documents to FileHold is a mandatory requirement. The effort required of customers, partners, and FileHold to complete a migration are often the most costly part of any FileHold implementation. As for any expenditure an organization makes it is important to weigh the cost against the benefit.

    Archiving some or all of the legacy repository is generally much less costly than ensuring every document and element of metadata is correctly mapped and transferred to FileHold. Assessing the value of these documents as described above is the most important part of ensuring the cost of migration will provide an appropriate benefit. FileHold professional services can help with this assessment. Depending on the number of different types of documents in the legacy repository and the complexity imposed by the legacy repository technology an analysis and report can often be produced in one to three days. Contact sales@filehold.com to receive a quotation suitable for your situation.