|
|
Information Landscape:
Landscape contours
Imagine for a moment a document repository that contains
travel documents. This is the total document-space of travel-information
(the index). The information distributed throughout the
repository is clustered into smaller spaces so that one could easily find
documents related to “Vacations in the Mediterranean”. Likewise, in this
Mediterranean "space", we would expect to find locations and documents about places in
Greece in a smaller document space. KS refers to this as a sub
space within the Mediterranean travel space.
And yes, this is a simple example of the complex notion of
information clustering in an index. The greater the
clustering, the greater the information contours within the
space. Thus information contours are nicely presented as
clusters for intuitive navigation and analysis.
This is essentially, the foundation upon which PatternScape sub space clustering is
predicated, and it can be applied to the most
rigorous test-DNA base-pair sequence profiling, for instance.
PS ClusterView helps users navigate and visualize “landscape
contours” and "content variations" of the document space. And a landscape
can be either undulating or homogeneous. Regardless, PS
presents clustering in the most homogeneous conditions, minuscule though the
contours might be. Therein lies the value of PS vector
distributions (and characteristic equations) to illuminate and
differentiate co-existence, occurrence and clustering.
Patterns, Markers, Traits, Symtomatology
Patterns, markers and traits cluster as identifiable elements
of documents. Symptoms clustering in complex combinations
indicate
disease and variants. The more complex the symptoms of a
particular disease the better
the visualization. ClusterView(s) present the
inter-relationships and frequency of occurrence of symptom and
disease variants.
Distinction, Variants,
Clusters and Analysis
ClusterView presents result variants in a document space, by clustering
inter-related information, thereby providing distinction within
search results and elements. An element might simply be a term.
In the case of DNA, or medical analysis, elements combine
in various ways. Variants (clustering) show which combinations present more powerfully, and more frequently, for example.
A sales professional trying to gain an edge in selling
might compare the competitions strengths and weaknesses
to determine an effective sales process. A crime scene
investigator armed with evidence-matching profiles can
create a suspect list based on distinct differences in case
files, for instance.
Vectors and Occurrence (f)
Weights and vector
distributions of terms make clustering more pronounced. Vector distributions
provide insight into term frequency across the document space,
subspaces and individual documents. Vector distributions
also show the level of occurrence of terms across documents.
In DNA analysis for example, concentrations, co-occurrence and distribution
are indicators and easy to visualize as important. Conversely, in apple pie
recipes, one might see little meaning to miniscule ingredient
variants... unless of course, you are a product marketer and want to sell more pies based on
target market preferences. Thus, how much
cinnamon versus sugar to use might be implied by demographics. Naturally, this is a
simplification but makes the point that complex and simple use
is imaginable.
Polytuplet Document Associations
When many files are associated to the main search record, it usually implies data table relationships or header file association, as is the case with JPEG and image file header identifiers. CS and PS make use of these associations by providing search solutions where users can see the associations and not have to construct them in their mind.
A document describing a work of art, or a photograph for
example, or a document describing a house for sale, is an excellent
candidate for
polytuplet combinations. A photographic library or image bank could allow
normal language search of images, the result of which, is a
searchable image repository. This is obviously excellent for selling a home as it would match
houses and their descriptions with real descriptions and not
database key words. A user can copy and paste their desires or interests to DocMap and the landscape of their words will be mapped to
search results, which in this case would be polytuplets.
Databases and XML Polytuplets
CrawlScape contains a plug-in for crawling databases in a multi-step process.
The process segments tables into individual records. An XML
file is created to recombine records across related tables
(polytuplet). This allows PS technology to recombine the
individual pieces into inter-related records. Real-time search on a dynamic database
is made possible if the database administrator writes XML records for
all database transactions. CrawlScape will monitor change and apply indexing in real-time to mirror the database changes. All this is done without letting users manipulate or view database applications - ideal for privacy and security while at the same time providing a research environment on data for the pattern profiling searcher.
|
|