| Products : |
Core Systems
Hosted Online Solutions
|
|
 CrawlScape:

Browser based crawl, index and search deployment framework
CrawlScape (CS) is a browser based user interface for
extensive network crawling and information indexing and useful for Information
Resource Management. Tab and sub menus
contain the functions and features of CS and these are detailed in the sections
and screen shots below.
CS contains additional features
(task-scheduling, moving, copying and merging multiple indexes),
which are useful for scheduling, managing and configuring the complex
processes of a distributed search infrastructure. Moreover these tools
help IT professionals control intricate network and document crawls, their
related indexes and resulting search deployments.
Knowledge, intelligence, and
information repositories can be configured in a variety of creative search solutions
for various network user groups using the automated task scheduling tools.
Some uses are:
- Corporate knowledge and intelligence repositories
- Corporate internal (Intranet) networks and net-servers
- Internet and sites for external crawl and indexing projects
- Local and wide area network shares, ftp and http servers
- Governments and their public, private and corporate security repositories
- Professional applications for specific data analysis, crawl, index and search mapping
- Automated paper and voice conversions with search and retrieval
- ASP data centers for managing client crawls and indexing, and ISP custom-web site search
- Archive and escrow retrieval solutions for paper and voice records.
- Records and archive long term storage

Documents and information repository
The system can be used to crawl and index a vast array of document types, databases, email systems, and
Tuplets that form relationships known here as polytuplet association.
Example: a jpeg and a text (case) document form a searchable crime evidence
pair, while a table and its foreign link form a joint record. Polytuplets are used for multiple associated records identifying the relationships of information in a database table
schema, for example.
CrawlScape simplifies user integration of crawl and index processes in complicated and hard to manage search environments. Screen shots of the various functions mentioned are given
above while the following table summarizes options and configurations for the framework.
This table can also be found in the configuration and pricing
section for quotations and purchases.
| CrawlScape |
Server
Install |
Multiple
Sessions Manager |
Distributed
Host
Servers |
Multi-
Search
Deploy
-ment |
Scheduler |
User
Mgmt |
URL
Crawl |
Database
Crawl |
Email
Crawl |
Index
Merge |
Tuplets |
| CS-1 |
Single |
|
|
|
|
|
x |
|
|
|
|
| CS-2 |
Multiple |
|
|
|
|
x |
x |
|
|
|
|
| CS-3 |
Multiple |
|
|
|
x |
x |
x |
|
|
x |
|
| CS-4 |
Multiple |
x |
x |
x |
x |
x |
x |
|
|
x |
|
| CS-Enterprise |
Multiple |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
CS tabs and sub menus explained:

Account Administration
Account administration provides controls for user access to the crawl infrastructure. The administrator contains provisions for creating multiple
user groups and users within each group. This allows for separation of visibility between groups as they might well be in separate departments in a company, government or
entities within a hosted ASP environment.
Server setup
Server setup introduces the notion of multiple servers in cluster or LAN/WAN distributed systems each independently providing crawl, indexing, site hosting, and search deployments. The crawl manager allows for identifying groups of servers and individual servers that are to be assigned functional responsibilities within the scheduling and crawl-processing framework. One server might be used for crawling Intranet networks while another is used for Internet and database tables. Yet again another cluster might be used in a load sharing distributed search deployment for vast document search capability and high user concurrency needs. CrawlScape provides a user-friendly interface for configuring
the server topology framework to be used in the search enterprise.
CS Crawl Definitions setup
Crawl definitions setup provides behavior rules for individually assigned crawls. These behavior rules are bundled into unique crawl groups that can be assigned to influence the behavior of the crawler depending on the target network, system, or data type, for example. A sales department crawl and an engineering department crawl might well have distinct behavior controls which impact depth of information extraction for instance. An important example of its use is for designating the behavior of the crawler with database systems, polytuplets, docs, images and PDF's.
URL administration
URL administration is a flexible feature for managing multiple URL target crawl lists. The user may combine lists or dedicate them to their own unique crawl.
This is especially useful where multiple targeted indexes are required for deploying many search instances. These indexes can be merged to combine
into a total searchable index and-or as unique and independent
search instances. A highly valuable purpose for crawl list management is to be able to schedule crawls based on these lists and distribute the crawls across multiple servers so as to increase efficiency and balance the crawl load. A URL is any share target (99.999.9999.9/Machine division/sales/competitors/); FTP://; File://; or Http://.
Sessions and controls
Crawl sessions and control is the launch pad for starting and stopping crawl and
indexing processes. It is here that the user determines which URL lists to assign to which crawl servers. The crawler begins
sessions as soon as the user “starts” a crawl, under
manual control, or uses scheduling to automate a recurring
project.
IRM (Information Resource Management)
Move and merge processes are powerful functions for combining individual indexes into enterprise indexes and moving them to desired search servers across the Net. For example one might wish to combine the
financial department index with the business development (forecasting) index.
Therefore, each may operate as independent search deployments or combined
to form a total information search site for both departments
(cross-department). Moreover, enterprise search is possible with the notion of
merged department indexes.

Scheduling Sessions
Scheduling and control, one of the elegant features within CrawlScape, is used for managing and automating crawl and index processes so that IT administrators can act as managers overseeing pre-scheduled
and recurring events. This reduces the concern of managing
complex crawl and search deployments. For example, suppose a need for daily
recurring crawls are required for complex multiple geographically separated servers;
automated scheduling takes care of recurring process management.
For large disperse search infrastructures scheduling provides
excellent oversight and management automation.
Search deployment and control
Search deployment and control facilitates deployment of search instances across multiple servers. Following crawl and index processes, the user may deploy search instances and indexes with ease. This replaces the need for complicated programming of search front ends and complex interfaces to search indexes. PS can be deployed in a variety of different flavors across these servers. The options for PS are shown in the PS product table and the configuration and pricing section.
Reports, logs and views
Reports are information and process monitoring tables. The user can view summaries of all setup parameters within their CS enterprise or simply monitor and view the progress of an active crawl session. Intended for IT professionals, reports and logs are given for monitoring and evaluating system and application activity, performance, progress and results.

|
|