Spectrum Importer

07/02/2018 - Documented Spectrum Importer version 2.6 improvements.

09/08/2016 - Documented an enhancement that implements a Natural Sort by default in the Spectrum 2.0.2 importer and offers a UI-based and profile-based option to revert to a strict ASCII sort if desired.

8/31/2016 - Rearranged the order of some instructional steps for using the Spectrum importer.

05/17/2016 - Added new page for Spectrum Importer.
 

NOTE: This importer is only supported by vMedia version 2.0 or higher.

Overview

 

vMedia’s Spectrum Importer allows you to import and organize over four dozen image file formats as well as non-image files such as doc, docx, wpd, xls, xlsx, txt, etc. The imported documents immediately appear within the Viewing Module of vMedia. You can also add comments and set other values during import, so that you can organize and quickly find your documents in vMedia.

The vMedia database entries can be linked to the original source files or the original files can be copied to the filing destination directory. In addition, the source files can be deleted after import.

 

Specifications

  • vMedia SQL: This feature is only available in vMedia SQL. Please contact Customer Support if you would like to use this feature, but are not currently using the SQL version of vMedia.
  • Required file format: None.
  • Naming: The name of the TIFF file, along with other file attributes, can be used to set vMedia database values. This is discussed in more detail below.
  • Structure: Files must be placed in at least one subfolder of base import path.
  • Logging: A detailed log file named “IMPORT_Spectrum_Log.txt” is located in the vMedia installation directory.

 

Procedure

  1. From the Master Control screen, click on the [Import/Export Module] button.
  2. Highlight your desired vMedia database from the list and click [Select].

 

 

  1. From the Bulk Document Import list, click [Spectrum].

NOTE: Once  a profile is selected, the desired profile will load. Focus remains with the profile selector after the load.

  1. Enter a profile in the Select Profile field at the top of the interface.
  2. Selection Options
    • Click on the drop-down button to display a list of profiles and select your desired profile.
      • Type leading characters to select desired item.
    • Use [Alt-Down Arrow] to display the drop-down list of profiles.
      • Cycle through the profile options with the the up or down arrow keys.
    • Use any combination of the previous selection options.
  3. Edit the [Location] field or accept the default settings.
  4. Click [Source File Index Control] to configure how files are concatenated and any vMedia database values that you want to set.

  • You can set up to three vMedia values using the Index Key fields. To replace a vMedia database field with a data value derived from the source document, use this format:

Fieldname = field value where fieldname is the name of the vMedia database field as specified in the data base configuration, and field value is the expression that will be evaluated when the document is saved.

Example: “fileno=pPath” will set the vMedia File Number to the name of the folder the input file is in. This would be useful if your folders were named according to a Case ID or other identifier.

  • Check the box under [Source File Concatenation] if you want your input files to be concatenated. You can click the [Gear] icon to input the expression that defines how concatenation should work.

Example: “LEFT(pFileName,5)” will cause any files with names starting with the same five characters to be treated as a single, concatenated file. This is useful if your input files were named according to a DocID or other identifier. Only recognized serialized formats (.TIF, PDF, etc.) can be concatenated. “Native” formats such as .XLS, .WPD, .WAV, etc. cannot be concatenated.

NOTE: The "Natural Sort" order that is used by recent versions of Windows is not the same as a typical ASCII sort when dealing with mixed alphanumeric strings. Because of this, Spectrum 2.0.2 was enhanced to implement a Natural Sort by default in Spectrum 2.0.2. The enhancement also offers a UI-based and profile-based option to revert to a strict ASCII sort if desired. This File Name Processing Order feature affects the file concatenation feature primarily; however, it will also affect the order in which documents are imported into vMedia from each source folder.

  • Under File Name Processing Order, select your desired sort order.
    • Natural / Windows Explorer Order (This is the default sort order. Natural sort order is an ordering of strings in alphabetical order, except that multi-digit numbers are ordered as a single character. For example,  "z2" is sorted before "z11" because "2" is sorted as smaller than "11".)
    • ASCII Left to Right Order (All uppercase letters precede lowercase letters; digits and many punctuation marks come before letters; numbers are sorted naïvely as strings; for example, "10" precedes "2".)
  1. Click [Close] to return to the previous window.

STREXTRACT() Function for String Extractions

NOTE: pFileName will include the .PDF extension from the file name. To ignore the .PDF extension you can use the following command: strextract(pfilename,"",".")

vMedia has a useful function to perform string extractions when the sub-strings are bound by specific delimiters. That function is called STREXTRACT() and it takes the following parameters (separated by commas):

  1. Name of the string to extract

  2. Starting delimiter character(s)

  3. Ending delimiter character(s), or empty to end at the end of the string

  4. Occurrence of the starting delimiter to begin string extraction

There are some optional flags to modify function behavior: 1=Case blind delimiter search (for when the delimiter is an alpha string), 2=Ending delimiter may or may not be present, 4=Include delimiters in the returned sub-string

Example:

filename - "ABC1125_1234555567890000_My Document.PDF"

STREXTRACT(pFileName,"","_",1) will return ABC1125

STREXTRACT(pFileName,"_","_",1) will return 1234555567890000

STREXTRACT(pFileName,"_","",2) will return My Document.PDF

 NOTE: This function can only be used in vMedia wherever an expression is allowed.

"pFileName" is a variable for the Spectrum and Media Importers that corresponds to the name of the file being imported, including the extension. A similar variable for the Hanna Importer is "l_Filename", which is the file name without the extension.

The top section allows you to define how the Spectrum Importer handles recognized image and PDF format files that are being imported:

  • Skip
    • Recognized source files (PDF, .TIF, .BMP, etc.) will be skipped and not imported.
  • Copy Source File
    • An entry for the file will be created in the vMedia database and a copy of the file created in the Destination Path.
  • Link to Source File
    • An entry for the file will be created in the vMedia database as a link to the original file. NOTE: No files will be written to the Destination Path.
  • Transform to Serialized TIF
    • An entry for the file will be created in the vMedia database and serialized .TIF files (one for each page) will created in the destination directory. See below for the “Output Image Control” options.
  • Treat MS Word 2007+ (.DOCX) Format Files as Recognized
    • These option will allow MS Word 2007 and higher (.DOCX) to be treated as “recognized” formats.
  • Erase These Source Files Once Imported or Skipped
    • Once the file is imported the source file will be deleted from the import directory.
    • NOTE: If this option is selected in conjunction with the Skip option above, the source file will be deleted from the import location and NO entry made in the vMedia database. To prevent this, you can make the source files or the import directory “read only” in the Windows operating system.

The center section allows you to define how the Spectrum Importer handles unrecognized format files (.WPD, .DOC, .XLS, .WAV, etc.) that are being imported:

  • Skip
    • Unrecognized source files (.XLS, .WPD, .WAV, etc.) will be skipped and not imported.
  • Copy Source File
    • An entry for the file will be created in the vMedia database and a copy of the file created in the Destination Path.
  • Link to Source File
    • The source files will be imported as a link to the original file. NOTE: No files will be written to the Destination Path.
  • Erase These Source Files Once Imported or Skipped
    • Once the file is imported the source file will be deleted from the import directory.
    • NOTE: If this option is selected in conjunction with the Skip option above, the source file will be deleted from the import location and NO entry made in the vMedia database. To prevent this, you can make the source files or the import directory “read only” in the Windows operating system.

The bottom section allows you to specifying dynamic processing conditions that will create sub-folders to control page counts in output location. This is used for high-volume imports; most firms will not need to modify this setting.

Output Image Control:

  • You can set the desired Image Resolution for the resulting image. Higher resolutions result in better quality images and greater storage requirements. This is generally used to speed up the import process while maintaining sufficient image quality.
    • Check the [Skip Output Image Sizing] option if you want to maintain the original resolution. This only applies to raster/image-based PDF files. Text-based files will be output at the largest resolution (dpi) indicated under Black and White or Full Color.
  • Choose one of the options under [Output Format] or accept the default setting. This setting allows you to control whether the imported documents will be color, black & white or both.
    • Automatic (Examine each page.) – Each page will be examined and imported as color if any color is detected. Otherwise, it will be imported as black & white.
    • Automatic (Examine first page only.) – The first page of each document (i.e. the page containing the VSTART command) will be examined to see if it is color or black & white. That document will then be imported using that setting.
    • Black and White – Each page of each document will be imported as black & white.
    • Color – Each page of each document will be imported as color.
    • Note that color images take longer to import.

     

  1. Click [Close] to return to the previous window.
  2. [Save Profile] is optional. Click this button if you want to save the current settings for future use.
  3. Click [Import Documents].

 

Zone OCR

Overview

Zone OCR allows Spectrum Importer to identify a specific OCR element to add to the fields being populated during import.

 

There are three independent zones feeding three processing variables OCR Zone #1, OCR Zone #2, OCR Zone #3 in the bottom tabs.

Each OCR Zone allows input for top, left, height and width settings as fractional inches and should be set to capture the area of the desired data.

  • Top: Top of the page to top left corner of enclosed section.
  • Left: From the left side of the page.
  • Height: Height of the section to the end of section.
  • Width: The width of the section selected.

(reference image below)

OCR Text Format Expression provides an editor option by selecting the setting icon.

vMedia expression Editor allows users to set up the results of the OCR expression.



pOCR1 is the section labeled OCR Zone #1.

pOCR2 is the section labeled OCR Zone #2.

pOCR3 is the section labeled OCR Zone #3.



Filter section allows users to apply the different space filters based on how data should appear saved in vMedia index fields.

 

 

Related Topic

Command Line Support for Spectrum Importer