KnowledgeShape - Information Landscape

Landscape contours
Imagine for a moment a document repository that contains travel documents. This is the total document-space of travel-information (the index). The information distributed throughout the repository is clustered into smaller spaces so that one could easily find documents related to “Vacations in the Mediterranean”. Likewise, in this Mediterranean "space", we would expect to find locations and documents about places in Greece in a smaller document space. KS refers to this as a sub space within the Mediterranean travel space. And yes, this is a simple example of the complex notion of information clustering in an index. The greater the clustering, the greater the information contours within the space. Thus information contours are nicely presented as clusters for intuitive navigation and analysis.

This is essentially, the foundation upon which PatternScape sub space clustering is predicated, and it can be applied to the most rigorous test-DNA base-pair sequence profiling, for instance. PS ClusterView helps users navigate and visualize “landscape contours” and "content variations" of the document space. And a landscape can be either undulating or homogeneous. Regardless, PS presents clustering in the most homogeneous conditions, minuscule though the contours might be. Therein lies the value of PS vector distributions (and characteristic equations) to illuminate and differentiate co-existence, occurrence and clustering.

Patterns, Markers, Traits, Symtomatology
Patterns, markers and traits cluster as identifiable elements of documents. Symptoms clustering in complex combinations indicate disease and variants. The more complex the symptoms of a particular disease the better the visualization. ClusterView(s) present the inter-relationships and frequency of occurrence of symptom and disease variants.

Distinction, Variants, Clusters and Analysis
ClusterView presents result variants in a document space, by clustering inter-related information, thereby providing distinction within search results and elements. An element might simply be a term. In the case of DNA, or medical analysis, elements combine in various ways. Variants (clustering) show which combinations present more powerfully, and more frequently, for example. A sales professional trying to gain an edge in selling might compare the competitions strengths and weaknesses to determine an effective sales process. A crime scene investigator armed with evidence-matching profiles can create a suspect list based on distinct differences in case files, for instance.

Vectors and Occurrence (f)
Weights and vector distributions of terms make clustering more pronounced. Vector distributions provide insight into term frequency across the document space, subspaces and individual documents. Vector distributions also show the level of occurrence of terms across documents.

In DNA analysis for example, concentrations, co-occurrence and distribution are indicators and easy to visualize as important. Conversely, in apple pie recipes, one might see little meaning to miniscule ingredient variants... unless of course, you are a product marketer and want to sell more pies based on target market preferences. Thus, how much cinnamon versus sugar to use might be implied by demographics. Naturally, this is a simplification but makes the point that complex and simple use is imaginable.

Polytuplet Document Associations
When many files are associated to the main search record, it usually implies data table relationships or header file association, as is the case with JPEG and image file header identifiers. CS and PS make use of these associations by providing search solutions where users can see the associations and not have to construct them in their mind.

A document describing a work of art, or a photograph for example, or a document describing a house for sale, is an excellent candidate for polytuplet combinations. A photographic library or image bank could allow normal language search of images, the result of which, is a searchable image repository. This is obviously excellent for selling a home as it would match houses and their descriptions with real descriptions and not database key words. A user can copy and paste their desires or interests to DocMap and the landscape of their words will be mapped to search results, which in this case would be polytuplets.

Databases and XML Polytuplets
CrawlScape contains a plug-in for crawling databases in a multi-step process. The process segments tables into individual records. An XML file is created to recombine records across related tables (polytuplet). This allows PS technology to recombine the individual pieces into inter-related records. Real-time search on a dynamic database is made possible if the database administrator writes XML records for all database transactions. CrawlScape will monitor change and apply indexing in real-time to mirror the database changes. All this is done without letting users manipulate or view database applications - ideal for privacy and security while at the same time providing a research environment on data for the pattern profiling searcher.