Real-time Content Reuse at Scale
Synopsis: Every wants reuse, but not the pain and difficulty that accompany it. This presentation and live working model, will describe and demonstrate how AI computational linguists can be combined with cognitive services and diverse authoring tools and content management systems to automate reusable content discovery, at scale – even if reusable content assets reside across heterogeneous content management systems and repositories. It will demonstrate how reuse can be made convenient and assistive for content creators, unobtrusively, in real time, as content creators write.
What can the audience expect to learn?
Enabling real-time, content reuse at scale using computational linguistics and cognitive services
Content creation involves authoring and assembling text, graphics, or other media for subsequent consumption by content consumers. This process typically involves searching, identifying, and evaluating assets that exist for reuse. Current approaches typically help content creators by providing repository search capabilities. However, these approaches typically require content creators leave their authoring tool to search for reusable assets. As a result, content creators often waste valuable time searching for and identifying reusable assets. Searching for reusable assets often becomes a deterrent due to one or more factors, including, but not limited to:
- Interruption of authoring
- Lack of search accuracy and precision
- Time to search, evaluate, and retrieve reusable assets
- Assets residing across multiple content management systems and repositories
- Lack of information about the quality and applicability of reusable assets
Some content creators also add metadata about an asset to be used as search criteria by other content creators. There is therefore a need for an approach that helps content creators discover, use, and manage reusable assets using automated methods.
Described is a method and system for the automated discovery, access, and evaluation of reusable content assets for content creators in real time, while authoring, with minimal interruption and intrusion. The method and reference system monitors content as it is being created and provides a ranked list of reusable assets directly within a wide variety of authoring tool using a combination of computational linguistics technology and cognitive services.
In accordance with the method and system, a computational linguistic service continually scans new or existing content to extract keywords such as nouns, noun strings and metadata from a source document in real time as a content creator creates a source document. The extracted keywords from the source document are analyzed and ranked by the computational linguistics service based on frequency, density, and prominence score for each keyword. The keyword prominence score is calculated using a weighted algorithm that factors linguistic value of each extracted keyword and keyword’s context within the source document.
Thereafter, the computational linguistic service programmatically interfaces with a separate content reuse index through an application programming interface (API). The ranked keywords from the source document are provided to the content reuse index, wherein the content reuse index is a repository that contains metadata about identification, location, and other characteristic of reusable assets whose physical content assets reside across one or more distributed repositories. The content reuse index uses the discovered and ranked keywords, and optionally supplied keywords, from the active authoring session to perform programmatic discovery of content that is identical, similar, or closely related to the source document under construction. The communication of the ranked keywords and the programmatic discovery of similar content runs in background as a content creator writes.
In accordance with the method and system, the content reuse index identifies potentially reusable assets using a cognitive search service. The cognitive search service returns the references to potentially reusable assets previously cataloged and classified for inclusion in the index. Potentially reusable asset references that have the highest probability are ranked and returned to the content creator, in the authoring tool, along with more metadata that helps the content creator determine the quality and applicability of reusable asset candidates.
Upon receiving the potential reusable asset references, the computational linguistic service supplies more metadata about potential reusable assets to the content creator. In an embodiment of the method and system, the content creator is notified of the reusable content assets through a visual interface residing within the authoring tool. The referenced assets are also links to the actual asset for review or retrieval.
In another embodiment of the method and system, keywords assigned by the content creator during creation of the source document as optional input to ranked query are used to discover reusable assets along with, or in place of discovered keywords.
In an exemplary embodiment of the method and system, the computational linguistic service is integrated as a plug-in or hardwired into content creation tool or system of the content creator. The content creator can optionally switch to a view that displays a ranked list of recommended reusable assets. Descriptive metadata about each recommended reusable asset contains information about the reusable asset, including, but not limited to title, abstract, use, and quality of the recommended assets.
The supplied metadata enables the content creator to quickly assess value of each asset. The content creator can optionally select a recommended content asset for reuse or indicate that none of the recommended assets are valid reuse candidates, further training the cognitive service to improve precision. The content creator’s response is passed back to the reuse index as more intelligence to train the cognitive search, and/or optionally initiate a new search.
Subsequently, the content creator can also view the recommended asset in a separate browser using a link supplied by the asset reuse catalog to the content creator through the computational linguistics system connector without having to leave the content creation tool. If the recommended asset is reusable by direct link, the content creator can copy the supplied link into the source document. If the recommended asset is reusable by obtaining a copy of an asset’s source, the content creator can request a copy of the actual asset. The content reuse index then acts as a proxy to retrieve an actual rendering of the reusable content in the available format requested by the content creator, if available, and returns the rendered instance of the reusable asset to the content creator.
Such a system is format independent and supports both structured and unstructured content. Structured content in containerized data formats, such as DITA, or any format that supports a micro-content (document object model (DOM)) architecture, provide more granular and precise content discovery and reuse with the addition of taxonomic classification applied to those containers.
Thus, the method and system make asset discovery, evaluation, and reuse a more automated, continuous, and simpler process, at scale, for any number of content creators who work in centralized or distributed organizations with content assets that reside in distributed, non-homogenous content repositories, and who use wide variety of different and otherwise incompatible content authoring and content management systems.
Meet the presenter
Mike Iantosca is a senior program advisor and strategist on the corporate leadership team for IBM Information Development world-wide. Mike has decades of experience with the design and development of diverse authoring and content management systems, processes, and standards. A career IBM Information Developer for more than 36 years, Mike currently leads the strategy, design, development, deployment and support of advanced content systems that involve the practical application of AI computational linguistics and cognitive technologies, at scale.