KnowledgeShape - CrawlScape

Main

Products

Solutions

Technology

Pricing

Partners

Support

Corporate Info

Feedback

Products :

Products Overview

Core Systems

Hosted Online Solutions

Developers Configuration Kits

Data Sets

Servers & Storage

CrawlScape:

Browser based crawl, index and search deployment framework
CrawlScape (CS) is a browser based user interface for extensive network crawling and information indexing and useful for Information Resource Management. Tab and sub menus contain the functions and features of CS and these are detailed in the sections and screen shots below.

CS contains additional features (task-scheduling, moving, copying and merging multiple indexes), which are useful for scheduling, managing and configuring the complex processes of a distributed search infrastructure. Moreover these tools help IT professionals control intricate network and document crawls, their related indexes and resulting search deployments.

Knowledge, intelligence, and information repositories can be configured in a variety of creative search solutions for various network user groups using the automated task scheduling tools. Some uses are:

Corporate knowledge and intelligence repositories
Corporate internal (Intranet) networks and net-servers
Internet and sites for external crawl and indexing projects
Local and wide area network shares, ftp and http servers
Governments and their public, private and corporate security repositories
Professional applications for specific data analysis, crawl, index and search mapping
Automated paper and voice conversions with search and retrieval
ASP data centers for managing client crawls and indexing, and ISP custom-web site search
Archive and escrow retrieval solutions for paper and voice records.
Records and archive long term storage

Documents and information repository
The system can be used to crawl and index a vast array of document types, databases, email systems, and Tuplets that form relationships known here as polytuplet association. Example: a jpeg and a text (case) document form a searchable crime evidence pair, while a table and its foreign link form a joint record. Polytuplets are used for multiple associated records identifying the relationships of information in a database table schema, for example.

CrawlScape simplifies user integration of crawl and index processes in complicated and hard to manage search environments. Screen shots of the various functions mentioned are given above while the following table summarizes options and configurations for the framework. This table can also be found in the configuration and pricing section for quotations and purchases.

CrawlScape	Server Install	Multiple Sessions Manager	Distributed Host Servers	Multi- Search Deploy -ment	Scheduler	User Mgmt	URL Crawl	Database Crawl	Email Crawl	Index Merge	Tuplets
CS-1	Single						x
CS-2	Multiple					x	x
CS-3	Multiple				x	x	x			x
CS-4	Multiple	x	x	x	x	x	x			x
CS-Enterprise	Multiple	x	x	x	x	x	x	x	x	x	x

CS tabs and sub menus explained:

Account Administration
Account administration provides controls for user access to the crawl infrastructure. The administrator contains provisions for creating multiple user groups and users within each group. This allows for separation of visibility between groups as they might well be in separate departments in a company, government or entities within a hosted ASP environment.

Server setup
Server setup introduces the notion of multiple servers in cluster or LAN/WAN distributed systems each independently providing crawl, indexing, site hosting, and search deployments. The crawl manager allows for identifying groups of servers and individual servers that are to be assigned functional responsibilities within the scheduling and crawl-processing framework. One server might be used for crawling Intranet networks while another is used for Internet and database tables. Yet again another cluster might be used in a load sharing distributed search deployment for vast document search capability and high user concurrency needs. CrawlScape provides a user-friendly interface for configuring the server topology framework to be used in the search enterprise.

CS Crawl Definitions setup
Crawl definitions setup provides behavior rules for individually assigned crawls. These behavior rules are bundled into unique crawl groups that can be assigned to influence the behavior of the crawler depending on the target network, system, or data type, for example. A sales department crawl and an engineering department crawl might well have distinct behavior controls which impact depth of information extraction for instance. An important example of its use is for designating the behavior of the crawler with database systems, polytuplets, docs, images and PDF's.

URL administration
URL administration is a flexible feature for managing multiple URL target crawl lists. The user may combine lists or dedicate them to their own unique crawl. This is especially useful where multiple targeted indexes are required for deploying many search instances. These indexes can be merged to combine into a total searchable index and-or as unique and independent search instances. A highly valuable purpose for crawl list management is to be able to schedule crawls based on these lists and distribute the crawls across multiple servers so as to increase efficiency and balance the crawl load. A URL is any share target (99.999.9999.9/Machine division/sales/competitors/); FTP://; File://; or Http://.

Sessions and controls
Crawl sessions and control is the launch pad for starting and stopping crawl and indexing processes. It is here that the user determines which URL lists to assign to which crawl servers. The crawler begins sessions as soon as the user “starts” a crawl, under manual control, or uses scheduling to automate a recurring project.

IRM (Information Resource Management)
Move and merge processes are powerful functions for combining individual indexes into enterprise indexes and moving them to desired search servers across the Net. For example one might wish to combine the financial department index with the business development (forecasting) index. Therefore, each may operate as independent search deployments or combined to form a total information search site for both departments (cross-department). Moreover, enterprise search is possible with the notion of merged department indexes.

Scheduling Sessions
Scheduling and control, one of the elegant features within CrawlScape, is used for managing and automating crawl and index processes so that IT administrators can act as managers overseeing pre-scheduled and recurring events. This reduces the concern of managing complex crawl and search deployments. For example, suppose a need for daily recurring crawls are required for complex multiple geographically separated servers; automated scheduling takes care of recurring process management. For large disperse search infrastructures scheduling provides excellent oversight and management automation.

Search deployment and control
Search deployment and control facilitates deployment of search instances across multiple servers. Following crawl and index processes, the user may deploy search instances and indexes with ease. This replaces the need for complicated programming of search front ends and complex interfaces to search indexes. PS can be deployed in a variety of different flavors across these servers. The options for PS are shown in the PS product table and the configuration and pricing section.

Reports, logs and views
Reports are information and process monitoring tables. The user can view summaries of all setup parameters within their CS enterprise or simply monitor and view the progress of an active crawl session. Intended for IT professionals, reports and logs are given for monitoring and evaluating system and application activity, performance, progress and results.