Book Knowledge Materialization In SIE

Created: 2026-05-24

Semantic Integration Engine (SIE) (SIE) does not directly adopt external knowledge as-is. Instead, it constructs knowledge around its own internally defined schema and RDF nodes.

Book is the first practical example of this approach. Starting from ISBN, SIE retrieves multiple forms of open knowledge and integrates them into local KnowledgeSpace.

The important architectural point is that SIE does not directly use external RDF subjects as the primary identity. Instead, SIE maintains its own internally assigned RDF node as the primary subject.

Book Knowledge Materialization

SIE is currently evolving its knowledge design, and Book has been selected as the first target domain.

Books have several characteristics that make them suitable as the first knowledge materialization target.

  • ISBN provides a globally recognized identifier

  • bibliographic metadata is relatively structured

  • many open knowledge providers exist

  • Books naturally connect to authors, publishers, subjects, and citations

  • Books integrate naturally with RDF graphs

For this reason, Book is suitable as the first domain for validating how open knowledge can be connected into locally curated knowledge.

Knowledge Providers

SIE retrieves Book information from multiple knowledge providers. This section organizes the kinds of knowledge contributed by each provider.

The current major knowledge providers are:

  • openBD

  • Open Library

  • DBpedia

  • Wikidata

These are not merely APIs. Each provider contributes a different kind of knowledge.

openBD

openBD is a Japanese bibliographic provider.

It mainly provides:

  • title

  • publisher

  • publication date

  • authors

  • cover image

openBD is primarily used as a Japanese bibliographic metadata provider.

Open Library

Open Library is important as a provider of library-oriented metadata and subject knowledge.

Typical outputs:

  • title

  • subjects

  • description

  • edition keys

  • work keys

  • source URLs

DBpedia

DBpedia is used as a provider of RDF-oriented graph knowledge.

Typical outputs include:

  • RDF subject URI

  • sameAs links

  • categories

  • linked graph information

It provides graph-oriented structures required for RDF connectivity.

Wikidata

Wikidata is important as a provider of global identifiers and graph connectivity.

Typical outputs include:

  • entity identifiers

  • multilingual labels

  • aliases

  • related entities

  • authority identifiers

  • graph relationships

It plays an especially important role in multilingual knowledge integration and global graph connectivity.

Types Of RDF Knowledge Around Books

SIE uses multiple RDF vocabularies in order to connect Book knowledge into RDF graphs.

vocabulary purpose

schema.org

bibliographic and relationship modeling

RDF / RDFS

basic RDF graph structure

OWL

identity linkage and semantic equivalence

SKOS

classification and concept hierarchy

PROV

provenance and evidence tracking

Dublin Core

lightweight metadata interoperability

Wikidata ontology

external knowledge graph linkage

FOAF

person-oriented relationship modeling

schema.org

schema.org is used as the central vocabulary for Book knowledge.

It mainly represents:

  • Book

  • CreativeWork

  • author

  • publisher

  • citation

  • about

  • keywords

  • datePublished

  • inLanguage

It is used both for describing the Book itself and for expressing relationships between the Book and other knowledge.

RDF / RDFS

RDF and RDFS provide the fundamental graph structure of KnowledgeSpace.

Typical usages include:

  • rdf:type

  • rdfs:label

  • rdfs:subClassOf

  • resource hierarchy

They are used for KnowledgeNode typing and basic RDF node structure.

OWL

OWL is used for identity linkage with external knowledge graphs.

Typical usages include:

  • owl:sameAs

  • equivalent resources

  • semantic identity

It is important when connecting Book KnowledgeNodes to DBpedia and Wikidata.

SKOS

SKOS represents Book classification knowledge and concept hierarchies.

Typical usages include:

  • Concept

  • broader

  • narrower

  • related

  • taxonomy

  • subject hierarchy

It is used for Book subjects and category hierarchy management.

PROV

PROV represents provenance and knowledge generation processes.

Typical usages include:

  • prov:wasDerivedFrom

  • prov:wasGeneratedBy

  • provider source

  • review history

  • evidence linkage

In SIE, this plays an important role for explainable AI and trust-aware RAG.

Dublin Core

Dublin Core is used for lightweight metadata interoperability.

Typical usages include:

  • title

  • creator

  • subject

  • language

  • identifier

It is used for interoperability with external library-oriented metadata.

Wikidata Ontology

Wikidata ontology is used for global knowledge graph connectivity.

Typical usages include:

  • Wikidata entity

  • multilingual labels

  • authority identifiers

  • graph relationships

It is important for multilingual knowledge integration and global identification.

FOAF

FOAF is used for linkage with person-oriented knowledge.

Typical usages include:

  • Person

  • name

  • homepage

  • organization relationship

It is used for relationships with authors, editors, translators, and similar contributors.

In SIE, a materialized Book is not treated as a mere collection of RDF triples; it is organized into knowledge items that can be handled as a KnowledgeNode.

KnowledgeNode does not place all information into one large attribute bag. Instead, it delegates information to purpose-specific value objects so that search, explanation, review, RAG, and external graph integration can clearly determine which information to use.

Basic KnowledgeNode Items

In SIE, an individual piece of knowledge is represented as a KnowledgeNode.

The base KnowledgeNode shape contains the following knowledge items.

  • id: The stable KnowledgeNode ID used inside SIE.

  • category: The operational coarse category, such as Book, Person, Organization, or Concept.

  • identity: Identity information such as RDF node, canonical ID, sameAs links, ISBN, Wikidata ID, and DBpedia URI.

  • presentation: Human-facing presentation information such as title, label, name, and description.

  • semantics: Semantic and operational properties such as semantic type, role, confidence, lifecycle, and temporal profile.

  • structure: Structures directly used for graph traversal, such as hierarchy, classification, part-whole, and correspondence.

  • sources: Evidence and provenance such as provider, source document, evidence, and provenance.

  • bindings: Bindings to runtime and domain models, such as CNCF Entity, Tag, and external Entity.

  • similarity: Representations for semantic-distance search, such as embeddings, vector search entries, and similarity status.

  • operations: Materialization and operational state such as materializedAt, frame, and validation status.

  • attributes: Domain-specific extension attributes. For Book, this includes bibliographic and edition-related information.

The important point is that these items are not raw storage locations for RDF data. SIE normalizes RDF predicates, external API responses, and Entity-derived information, then projects them into KnowledgeNode items that are easy to operate on.

Book Extension Items

When a Book is handled as a KnowledgeNode, Book-specific extension items are added on top of the base items. These items make bibliographic data, external identifiers, relational knowledge, and provenance easier to use in RAG and review workflows.

  • bibliographic: Information describing the Book itself, such as title, subtitle, description, publisher, published date, and language.

  • bookIdentity: Identifiers that connect the Book to external knowledge, such as ISBN-10, ISBN-13, OCLC, LCCN, NDL, Open Library key, Wikidata ID, and DBpedia URI.

  • editionAndWork: Information used to separate Work, Edition, and Manifestation. This becomes the basis for handling multiple editions of the same work.

  • contributors: Relationships to people and organizations involved in the Book, such as author, editor, translator, and illustrator.

  • classification: Classification and subject information such as subject, keyword, genre, category, and SKOS concept.

  • relations: Relationships from the Book to other knowledge nodes, such as citation, about, sameAs, hasPart, isPartOf, and relatedWork.

  • media: Media information used for display and verification, such as cover image, thumbnail, and preview URL.

  • candidateAssertions: Candidate knowledge retrieved by resolvers but not yet accepted. It is preserved together with confidence, provider, and evidence.

  • reviewState: Review state used to project Book knowledge into KnowledgeSpace, such as accepted, rejected, or pending.

Book extension items are not merely stored inside the base KnowledgeNode attributes. They are also projected into identity, presentation, structure, and sources. For example, ISBN and Wikidata ID are projected into identity; title and description into presentation; subject and genre into structure.classifications; and Open Library or DBpedia origins into sources.

Projection Into KnowledgeSpace

Only reviewed Book information is projected into KnowledgeSpace. Conceptually, it becomes RDF knowledge like the following.

sie:book/01JV...
  rdf:type schema:Book
  schema:name "The Hobbit"
  schema:isbn "978..."
  schema:author sie:person/...
  schema:publisher sie:organization/...
  schema:description "..."
  owl:sameAs <https://dbpedia.org/resource/The_Hobbit>
  owl:sameAs <https://www.wikidata.org/entity/Q15228>
  prov:wasDerivedFrom <https://openlibrary.org/...>

Initially textual values can later become KnowledgeNodes through review and projection.

"J.R.R. Tolkien"
  ->
sie:person/...

At this point, a Book becomes a knowledge node with local identity, external identifiers, presentation data, classifications, relationships, provenance, and confidence.

Meaning For RAG

In ordinary text-oriented RAG, text fragments can be searched, but graph-level semantics such as the following are often lost.

  • identity

  • provenance

  • confidence

  • graph relationships

  • authority links

SIE preserves these structures. Therefore, Book knowledge can be used not merely as searchable text, but as knowledge with the following properties.

retrievable
+
explainable
+
linkable
+
reviewable

In other words, knowledge management in SIE transforms Book information from acquired data into reviewed, evidence-backed knowledge that can be safely used in RAG.

Conclusion

Book materialization in SIE is not merely bibliographic import.

It is a mechanism for building explainable KnowledgeSpace by attaching open knowledge sources around locally curated RDF identities while preserving provenance, confidence, and graph relationships.

References

Glossary

RDF

A W3C-standardized data model that represents information as subject–predicate–object triples.

Semantic Integration Engine (SIE)

An integration engine that unifies structured knowledge (RDF) and document knowledge (SmartDox) derived from the BoK, making them directly accessible to AI.

knowledge graph

A semantic graph-based knowledge base where nodes represent entities or concepts and edges represent their relationships.

Retrieval-Augmented Generation (RAG)

A generation technique that supplements a language model’s internal (parametric) knowledge by retrieving relevant external information before generation. RAG systems first search knowledge sources such as databases or knowledge graphs and then use the retrieved context as input for text generation.

Cloud Native Component Framework (CNCF)

Cloud Native Component Framework (CNCF) is a framework for executing cloud application components using a single, consistent execution model. Centered on the structure of Component, Service, and Operation, it enables the same Operation to be reused across different execution forms such as command, server (REST / OpenAPI), client, and script. By centralizing quality attributes required for cloud applications—such as logging, error handling, configuration, and deployment—within the framework, components can focus on implementing domain logic. CNCF is designed as an execution foundation for literate model-driven development and AI-assisted development, separating what is executed from how it is invoked.

validation

Validation is the activity of confirming that a system or product fulfills its intended use and stakeholder requirements.

verification

Verification is the activity of confirming that an implementation conforms to its specified design or requirements.