Book Knowledge Materialization In SIE
Semantic Integration Engine (SIE) (SIE) does not directly adopt external knowledge as-is. Instead, it constructs knowledge around its own internally defined schema and RDF nodes.
Book is the first practical example of this approach. Starting from ISBN, SIE retrieves multiple forms of open knowledge and integrates them into local KnowledgeSpace.
Book Knowledge Materialization
SIE is currently evolving its knowledge design, and Book has been selected as the first target domain.
Books have several characteristics that make them suitable as the first knowledge materialization target.
-
ISBN provides a globally recognized identifier
-
bibliographic metadata is relatively structured
-
many open knowledge providers exist
-
Books naturally connect to authors, publishers, subjects, and citations
-
Books integrate naturally with RDF graphs
For this reason, Book is suitable as the first domain for validating how open knowledge can be connected into locally curated knowledge.
Knowledge Providers
SIE retrieves Book information from multiple knowledge providers. This section organizes the kinds of knowledge contributed by each provider.
The current major knowledge providers are:
-
openBD
-
Open Library
-
DBpedia
-
Wikidata
These are not merely APIs. Each provider contributes a different kind of knowledge.
openBD
openBD is a Japanese bibliographic provider.
It mainly provides:
-
title
-
publisher
-
publication date
-
authors
-
cover image
openBD is primarily used as a Japanese bibliographic metadata provider.
Open Library
Open Library is important as a provider of library-oriented metadata and subject knowledge.
Typical outputs:
-
title
-
subjects
-
description
-
edition keys
-
work keys
-
source URLs
DBpedia
DBpedia is used as a provider of RDF-oriented graph knowledge.
Typical outputs include:
-
RDF subject URI
-
sameAs links
-
categories
-
linked graph information
It provides graph-oriented structures required for RDF connectivity.
Wikidata
Wikidata is important as a provider of global identifiers and graph connectivity.
Typical outputs include:
-
entity identifiers
-
multilingual labels
-
aliases
-
related entities
-
authority identifiers
-
graph relationships
It plays an especially important role in multilingual knowledge integration and global graph connectivity.
Types Of RDF Knowledge Around Books
| vocabulary | purpose |
|---|---|
|
schema.org |
bibliographic and relationship modeling |
|
RDF / RDFS |
basic RDF graph structure |
|
OWL |
identity linkage and semantic equivalence |
|
SKOS |
classification and concept hierarchy |
|
PROV |
provenance and evidence tracking |
|
Dublin Core |
lightweight metadata interoperability |
|
Wikidata ontology |
external knowledge graph linkage |
|
FOAF |
person-oriented relationship modeling |
schema.org
schema.org is used as the central vocabulary for Book knowledge.
It mainly represents:
-
Book
-
CreativeWork
-
author
-
publisher
-
citation
-
about
-
keywords
-
datePublished
-
inLanguage
It is used both for describing the Book itself and for expressing relationships between the Book and other knowledge.
RDF / RDFS
RDF and RDFS provide the fundamental graph structure of KnowledgeSpace.
Typical usages include:
-
rdf:type
-
rdfs:label
-
rdfs:subClassOf
-
resource hierarchy
They are used for KnowledgeNode typing and basic RDF node structure.
OWL
OWL is used for identity linkage with external knowledge graphs.
Typical usages include:
-
owl:sameAs
-
equivalent resources
-
semantic identity
It is important when connecting Book KnowledgeNodes to DBpedia and Wikidata.
SKOS
SKOS represents Book classification knowledge and concept hierarchies.
Typical usages include:
-
Concept
-
broader
-
narrower
-
related
-
taxonomy
-
subject hierarchy
It is used for Book subjects and category hierarchy management.
PROV
PROV represents provenance and knowledge generation processes.
Typical usages include:
-
prov:wasDerivedFrom
-
prov:wasGeneratedBy
-
provider source
-
review history
-
evidence linkage
Dublin Core
Dublin Core is used for lightweight metadata interoperability.
Typical usages include:
-
title
-
creator
-
subject
-
language
-
identifier
It is used for interoperability with external library-oriented metadata.
Wikidata Ontology
Wikidata ontology is used for global knowledge graph connectivity.
Typical usages include:
-
Wikidata entity
-
multilingual labels
-
authority identifiers
-
graph relationships
It is important for multilingual knowledge integration and global identification.
FOAF
FOAF is used for linkage with person-oriented knowledge.
Typical usages include:
-
Person
-
name
-
homepage
-
organization relationship
It is used for relationships with authors, editors, translators, and similar contributors.
In SIE, a materialized Book is not treated as a mere collection of RDF triples; it is organized into knowledge items that can be handled as a KnowledgeNode.
KnowledgeNode does not place all information into one large attribute bag. Instead, it delegates information to purpose-specific value objects so that search, explanation, review, RAG, and external graph integration can clearly determine which information to use.
Basic KnowledgeNode Items
In SIE, an individual piece of knowledge is represented as a KnowledgeNode.
The base KnowledgeNode shape contains the following knowledge items.
-
id: The stable KnowledgeNode ID used inside SIE.
-
category: The operational coarse category, such as Book, Person, Organization, or Concept.
-
identity: Identity information such as RDF node, canonical ID, sameAs links, ISBN, Wikidata ID, and DBpedia URI.
-
presentation: Human-facing presentation information such as title, label, name, and description.
-
semantics: Semantic and operational properties such as semantic type, role, confidence, lifecycle, and temporal profile.
-
structure: Structures directly used for graph traversal, such as hierarchy, classification, part-whole, and correspondence.
-
sources: Evidence and provenance such as provider, source document, evidence, and provenance.
-
bindings: Bindings to runtime and domain models, such as CNCF Entity, Tag, and external Entity.
-
similarity: Representations for semantic-distance search, such as embeddings, vector search entries, and similarity status.
-
operations: Materialization and operational state such as materializedAt, frame, and validation status.
-
attributes: Domain-specific extension attributes. For Book, this includes bibliographic and edition-related information.
The important point is that these items are not raw storage locations for RDF data. SIE normalizes RDF predicates, external API responses, and Entity-derived information, then projects them into KnowledgeNode items that are easy to operate on.
Book Extension Items
When a Book is handled as a KnowledgeNode, Book-specific extension items are added on top of the base items. These items make bibliographic data, external identifiers, relational knowledge, and provenance easier to use in RAG and review workflows.
-
bibliographic: Information describing the Book itself, such as title, subtitle, description, publisher, published date, and language.
-
bookIdentity: Identifiers that connect the Book to external knowledge, such as ISBN-10, ISBN-13, OCLC, LCCN, NDL, Open Library key, Wikidata ID, and DBpedia URI.
-
editionAndWork: Information used to separate Work, Edition, and Manifestation. This becomes the basis for handling multiple editions of the same work.
-
contributors: Relationships to people and organizations involved in the Book, such as author, editor, translator, and illustrator.
-
classification: Classification and subject information such as subject, keyword, genre, category, and SKOS concept.
-
relations: Relationships from the Book to other knowledge nodes, such as citation, about, sameAs, hasPart, isPartOf, and relatedWork.
-
media: Media information used for display and verification, such as cover image, thumbnail, and preview URL.
-
candidateAssertions: Candidate knowledge retrieved by resolvers but not yet accepted. It is preserved together with confidence, provider, and evidence.
-
reviewState: Review state used to project Book knowledge into KnowledgeSpace, such as accepted, rejected, or pending.
Book extension items are not merely stored inside the base KnowledgeNode attributes. They are also projected into identity, presentation, structure, and sources. For example, ISBN and Wikidata ID are projected into identity; title and description into presentation; subject and genre into structure.classifications; and Open Library or DBpedia origins into sources.
Projection Into KnowledgeSpace
Only reviewed Book information is projected into KnowledgeSpace. Conceptually, it becomes RDF knowledge like the following.
sie:book/01JV...
rdf:type schema:Book
schema:name "The Hobbit"
schema:isbn "978..."
schema:author sie:person/...
schema:publisher sie:organization/...
schema:description "..."
owl:sameAs <https://dbpedia.org/resource/The_Hobbit>
owl:sameAs <https://www.wikidata.org/entity/Q15228>
prov:wasDerivedFrom <https://openlibrary.org/...>
Initially textual values can later become KnowledgeNodes through review and projection.
"J.R.R. Tolkien"
->
sie:person/...
At this point, a Book becomes a knowledge node with local identity, external identifiers, presentation data, classifications, relationships, provenance, and confidence.
Meaning For RAG
In ordinary text-oriented RAG, text fragments can be searched, but graph-level semantics such as the following are often lost.
-
identity
-
provenance
-
confidence
-
graph relationships
-
authority links
SIE preserves these structures. Therefore, Book knowledge can be used not merely as searchable text, but as knowledge with the following properties.
retrievable
+
explainable
+
linkable
+
reviewable
References
Glossary
- RDF
-
A W3C-standardized data model that represents information as subject–predicate–object triples.
- Semantic Integration Engine (SIE)
-
An integration engine that unifies structured knowledge (RDF) and document knowledge (SmartDox) derived from the BoK, making them directly accessible to AI.
- knowledge graph
-
A semantic graph-based knowledge base where nodes represent entities or concepts and edges represent their relationships.
- Retrieval-Augmented Generation (RAG)
-
A generation technique that supplements a language model’s internal (parametric) knowledge by retrieving relevant external information before generation. RAG systems first search knowledge sources such as databases or knowledge graphs and then use the retrieved context as input for text generation.
- Cloud Native Component Framework (CNCF)
-
Cloud Native Component Framework (CNCF) is a framework for executing cloud application components using a single, consistent execution model. Centered on the structure of Component, Service, and Operation, it enables the same Operation to be reused across different execution forms such as command, server (REST / OpenAPI), client, and script. By centralizing quality attributes required for cloud applications—such as logging, error handling, configuration, and deployment—within the framework, components can focus on implementing domain logic. CNCF is designed as an execution foundation for literate model-driven development and AI-assisted development, separating what is executed from how it is invoked.
- validation
-
Validation is the activity of confirming that a system or product fulfills its intended use and stakeholder requirements.
- verification
-
Verification is the activity of confirming that an implementation conforms to its specified design or requirements.