Spectrum Importer
07/02/2018 - Documented
Spectrum Importer version 2.6 improvements.
09/08/2016 - Documented
an enhancement that implements a Natural Sort by default in the
Spectrum 2.0.2 importer and offers a UI-based and profile-based
option to revert to a strict ASCII sort if desired.
8/31/2016 - Rearranged
the order of some instructional steps for using the Spectrum
importer.
05/17/2016 - Added new
page for Spectrum Importer.
NOTE: This importer is
only supported by vMedia version 2.0 or higher.
Overview
vMedia’s Spectrum Importer allows you to import and organize
over four dozen image file
formats as well as non-image files such as doc, docx, wpd, xls,
xlsx, txt, etc. The imported documents immediately appear within
the Viewing Module of vMedia. You can also add comments and set
other values during import, so that you can organize and quickly
find your documents in vMedia.
The vMedia database entries can be linked to the original source
files or the original files can be copied to the filing destination
directory. In addition, the source files can be deleted after
import.
Specifications
- vMedia
SQL: This feature is
only available in vMedia SQL. Please contact Customer Support if
you would like to use this feature, but are not currently using the
SQL version of vMedia.
- Required
file format: None.
- Naming: The name of the TIFF file, along with other
file attributes, can be used to set vMedia database values. This is
discussed in more detail below.
- Structure:
Files must be placed in at least
one subfolder of base import path.
- If base is C:\IMPORT\ documents
are placed in sub-folders under that path:
- Subfolder name can be used for
indexing.
- Forwarder file number
- Collection-Master file
number
- Document type etc.
- If sub-folders are not used for
indexing, they can be named anything
- Logging: A detailed log file named
“IMPORT_Spectrum_Log.txt” is located in the vMedia installation
directory.
- From the Master Control screen, click on
the [Import/Export Module] button.
- Highlight your desired vMedia database from the
list and click [Select].
- From the Bulk Document
Import list, click [Spectrum].
NOTE: Once a
profile is selected, the desired profile will load. Focus remains
with the profile selector after the load.
- Enter a profile in the Select Profile field at the top
of the interface.
-
Selection Options
- Click on the drop-down button to display a list of profiles and
select your desired profile.
- Type leading characters to select desired item.
- Use [Alt-Down Arrow] to display the drop-down list of
profiles.
- Cycle through the profile options with the the up or down arrow
keys.
- Use any combination of the previous selection options.
- Edit the
[Location] field or accept the default settings.
- Click
[Source File Index Control] to configure how files are
concatenated and any vMedia database values that you want to
set.
- You can set up to three vMedia values using the Index Key
fields. To replace a vMedia database field with a data value
derived from the source document, use this format:
Fieldname = field value where fieldname is the name of
the vMedia database field as specified in the data base
configuration, and field value is the expression that will be
evaluated when the document is saved.
Example: “fileno=pPath” will set the vMedia File Number
to the name of the folder the input file is in. This would be
useful if your folders were named according to a Case ID or other
identifier.
- Check the box under [Source File Concatenation] if you
want your input files to be concatenated. You can click the
[Gear] icon to input the expression that defines how
concatenation should work.
Example: “LEFT(pFileName,5)” will cause any files with
names starting with the same five characters to be treated as a
single, concatenated file. This is useful if your input files were
named according to a DocID or other identifier. Only recognized
serialized formats (.TIF, PDF, etc.) can be concatenated. “Native”
formats such as .XLS, .WPD, .WAV, etc. cannot be concatenated.
NOTE: The "Natural Sort"
order that is used by recent versions of Windows is not the same as
a typical ASCII sort when dealing with mixed alphanumeric strings.
Because of this, Spectrum 2.0.2 was enhanced to implement a Natural
Sort by default in Spectrum 2.0.2. The enhancement also offers a
UI-based and profile-based option to revert to a strict ASCII sort
if desired. This File Name Processing Order feature affects
the file concatenation feature primarily; however, it will also
affect the order in which documents are imported into vMedia from
each source folder.
- Under File Name Processing Order, select your desired
sort order.
- Natural / Windows Explorer Order (This is the
default sort order. Natural sort order is an ordering of strings in
alphabetical order, except that multi-digit numbers are ordered as
a single character. For example, "z2" is sorted before "z11"
because "2" is sorted as smaller than "11".)
- ASCII Left to Right Order (All uppercase letters
precede lowercase letters; digits and many punctuation marks come
before letters; numbers are sorted naïvely as strings; for example,
"10" precedes "2".)
- Click [Close] to return to the previous window.
STREXTRACT()
Function for String
Extractions
NOTE: pFileName
will include the .PDF extension from the file name. To ignore the
.PDF extension you can use the following command:
strextract(pfilename,"",".")
vMedia has a
useful function to perform string extractions when the sub-strings
are bound by specific delimiters. That function is called
STREXTRACT() and it takes the following parameters (separated by
commas):
-
Name
of the string to extract
-
Starting delimiter character(s)
-
Ending delimiter character(s), or empty to end at the end of the
string
-
Occurrence of the starting delimiter to begin string extraction
There are some optional flags to
modify function behavior: 1=Case blind delimiter search (for when
the delimiter is an alpha string), 2=Ending delimiter may or may
not be present, 4=Include delimiters in the returned
sub-string
Example:
filename -
"ABC1125_1234555567890000_My Document.PDF"
STREXTRACT(pFileName,"","_",1)
will return ABC1125
STREXTRACT(pFileName,"_","_",1)
will return 1234555567890000
STREXTRACT(pFileName,"_","",2)
will return My Document.PDF
NOTE:
This function can only be used in vMedia wherever an expression is
allowed.
"pFileName" is a
variable for the Spectrum and Media Importers that corresponds to
the name of the file being imported, including the
extension. A similar variable for the Hanna Importer is
"l_Filename", which is the file name without the
extension.
The top section allows you to define how the Spectrum Importer
handles recognized image and PDF format files that are being
imported:
- Skip
- Recognized source files (PDF, .TIF, .BMP, etc.) will be skipped
and not imported.
- Copy Source File
- An entry for the file will be created in the vMedia database
and a copy of the file created in the Destination Path.
- Link to Source File
- An entry for the file will be created in the vMedia database as
a link to the original file. NOTE: No files will be
written to the Destination Path.
- Transform to Serialized TIF
- An entry for the file will be created in the vMedia database
and serialized .TIF files (one for each page) will created in the
destination directory. See below for the “Output Image Control”
options.
- Treat MS Word 2007+ (.DOCX) Format Files as Recognized
- These option will allow MS Word 2007 and higher (.DOCX) to be
treated as “recognized” formats.
- Erase These Source Files Once Imported or Skipped
- Once the file is imported the source file will be deleted from
the import directory.
- NOTE: If this
option is selected in conjunction with the Skip option
above, the source file will be deleted from the import location and
NO entry made in the vMedia database. To prevent this, you
can make the source files or the import directory “read only” in
the Windows operating system.
The center section allows you to define how the Spectrum
Importer handles unrecognized format files (.WPD, .DOC, .XLS, .WAV,
etc.) that are being imported:
- Skip
- Unrecognized source files (.XLS, .WPD, .WAV, etc.) will be
skipped and not imported.
- Copy Source File
- An entry for the file will be created in the vMedia database
and a copy of the file created in the Destination Path.
- Link to Source File
- The source files will be imported as a link to the original
file. NOTE:
No files will be written to the Destination Path.
- Erase These Source Files Once Imported or Skipped
- Once the file is imported the source file will be deleted from
the import directory.
- NOTE: If this
option is selected in conjunction with the Skip option
above, the source file will be deleted from the import location and
NO entry made in the vMedia database. To prevent this, you
can make the source files or the import directory “read only” in
the Windows operating system.
The bottom section allows you to specifying dynamic processing
conditions that will create sub-folders to control page counts in
output location. This is used for high-volume imports; most firms
will not need to modify this setting.
Output Image Control:
- You can set the desired Image Resolution for the resulting
image. Higher resolutions result in better quality images and
greater storage requirements. This is generally used to speed up
the import process while maintaining sufficient image quality.
- Check the [Skip Output Image Sizing] option if you want to
maintain the original resolution. This only applies to
raster/image-based PDF files. Text-based files will be output at
the largest resolution (dpi) indicated under Black and White or
Full Color.
- Choose one of the options under [Output Format] or accept the
default setting. This setting allows you to control whether the
imported documents will be color, black & white or both.
- Automatic (Examine each page.) – Each page will be examined and
imported as color if any color is detected. Otherwise, it will be
imported as black & white.
- Automatic (Examine first page only.) – The first page of each
document (i.e. the page containing the VSTART command) will be
examined to see if it is color or black & white. That document
will then be imported using that setting.
- Black and White – Each page of each document will be imported
as black & white.
- Color – Each page of each document will be imported as
color.
- Note that color images take longer to import.
- Click [Close] to return to the previous window.
- [Save Profile] is optional. Click this button if you
want to save the current settings for future use.
- Click [Import Documents].
Overview
Zone OCR allows Spectrum Importer to identify a specific OCR
element to add to the fields being populated during import.
There are three independent zones feeding
three processing variables OCR Zone #1, OCR Zone #2, OCR Zone #3 in
the bottom tabs.
Each OCR Zone allows input for top, left, height and width
settings as fractional inches and should be set to capture the area
of the desired data.
- Top: Top of the page to top left corner of enclosed
section.
- Left: From the left side of the page.
- Height: Height of the section to the end of section.
- Width: The width of the section selected.
(reference image below)
OCR Text Format Expression provides an editor option by
selecting the setting icon.
vMedia expression Editor allows users to set up the results of
the OCR expression.
pOCR1 is the section labeled OCR Zone #1.
pOCR2 is the section labeled OCR Zone #2.
pOCR3 is the section labeled OCR Zone #3.
Filter section allows users to apply the different space filters
based on how data should appear saved in vMedia index fields.
Related Topic
Command Line Support for Spectrum
Importer
|