Migrating a folder-based repository into FileHold
It is common for document repositories to start out on a network share in a series of folders. Over time these structures become large and unwieldy and the need for a document management system (DMS) becomes obvious as files are lost, misplaced, deleted and otherwise difficult to locate. Document management software provides a solution for these problems for all new documents, but it may also be desirable to incorporate the legacy documents into the software as well.
Migrating documents from a file share into FileHold
A common approach is to simply load all legacy documents into a single legacy cabinet and use full text search to find documents. This solution is fairly quick and easy to use, but often there is implied information in the folder structure that was used for the legacy documents. This information can be maintained with a little extra planning and configuration into the document management software.
Imagine a scenario where a network share contains all project documents for a company’s clients. Each client has a folder, each project has a folder in the client folder, and each project folder has an Initiating, Planning, Executing, Monitoring and Controlling, and Closing folder. The hierarchy gets deeper as the executing folder has a Work Package and Deliverables folder; the monitoring and controlling folder has a Budget and Timesheet folder. All of these subfolders of the project folder represent implied metadata.
In a document management software repository, metadata is the preferred method for information retrieval. With the right metadata no folder structure is required at all. The software organizes documents into file cabinets. Cabinets have security and access permissions. Each cabinet has one or more drawers. Drawers contain one or more folders. Each folder can have additional security and access permissions. In our example we might implement the repository library with the following structure:
- There is a single cabinet for all clients.
- Each client has its own drawer in the cabinet.
- Each drawer has a standard set of folders including one for projects, one for proposals, etc. This implies that every client project has the same security and access permissions.
With this structure there is everything needed to import the client projects from the shared folders. Each document in the software repository has a scheme or document type. The schema is used to define workflow, retention, and metadata among other things. All of our projects will have a schema called Project Document. This schema will define the several metadata fields. Client will identify the client that the project is for. We can configure this field to be automatically populated as every project in this folder will be for the same client. There will be a Project Number. This number is automatically generated for every project by our job costing system. Finally there will be a Category field. This field will be used to capture the project subfolder implied metadata from the existing file share.
The Category metadata field will be defined as a drill down field. This type of metadata field allows us to create a hierarchy of values. In this case the hierarchy will precisely match our project subfolder structure and will have the following values:
- Initiating
- Planning
- Executing>Work package
- Executing>Deliverables
- Monitoring and Controlling>Budget
- Monitoring and Controlling>Timesheet
- Closing
When a user adds a document to the project folder the client name will automatically be filled, they will add the project number created by the job costing system, and they will select from the category field the correct value by drilling through the possible selections.
Since we have a large number of legacy document we will not add them one at a time, rather we will use the Manage Imports feature of the software to add the metadata and send the document to the correct client’s project folder. A managed import requires two sources of input. The most critical piece is a correctly formatted XML file with the details about what documents should be imported, what metadata should be stored with them, and where the document should be placed in the library structure. The second is the document itself.
We can build this XML file using a variety of techniques. Typically a software engineer or server system administrator will be required to create the necessary scripts or programs. For the sake of this example we will create a Windows script that will traverse the folder structure and produce the correct XML file. An alternate example using Microsoft Excel is also available.
The following script is provided as an example only. We are unable to support it under FileCare. We recommend you consult a qualified resource in the event you would like to use this script in your environment.
@echo off
rem ***
rem *** createImport [dir]
rem ***
rem *** Build an import XML file for use with Managed Imports
rem *** Copyright(c) FileHold Systems Inc. All rights reserved.
rem ***
rem *** Error messages to stderr
rem ***
echo Create Import for FileHol
setlocal
set IMPORTNAME=documents.xml
set ERRORCOUNT=0
set DOCUMENTCOUNT=0
set PROJECTCOUNT=0
set CLIENTCOUNT=0
set DOCUMENTPATH=%~s1
set DOCUMENTPATHPRETTY=%~1
if "%DOCUMENTPATH%" == "" (set DOCUMENTPATH=.)
echo Searching document path "%DOCUMENTPATHPRETTY%"
rem *** Check that a valid directory was given on the command line
if exist %DOCUMENTPATH%\nul goto :FindLevel1
echo Invalid document path [%DOCUMENTPATH%] >&2
set /a ERRORCOUNT=%ERRORCOUNT% + 1
goto :Statistics
:FindLevel1
call :FindLevel %DOCUMENTPATH%
if "%TOKENS%" == "" goto :Statistics
md %IMPORTPATH% >nul 2>nul
echo ^ > %IMPORTNAME%
echo ^ >> %IMPORTNAME%
for /r %DOCUMENTPATH% %%x in (*.*) do call :ProcessDirectory "%%x"
echo ^ >> %IMPORTNAME%
move /y "%IMPORTNAME%" "%DOCUMENTPATH%" >nul 2>nul
if not errorlevel 0 (set /a ERRORCOUNT=%ERRORCOUNT% + 1) & (echo Unable to move %IMPORTNAME% to %DOCUMENTPATH%) & (goto :Statistics)
:Statistics
echo Processing statistics:
echo Documents : %DOCUMENTCOUNT%
echo Clients : %CLIENTCOUNT%
echo Projects : %PROJECTCOUNT%
echo Errors : %ERRORCOUNT%
echo Process complete.
endlocal
goto :EOF
rem ***
rem *** End of script
rem ***
:ProcessDirectory
echo ^