AI Collaboration Architecture

SmartDox Site

The SmartDox Site is composed of a collection of SmartDox documents organized as a structured knowledge space. Each document is systematically arranged within directory hierarchies, forming an interlinked network of knowledge.

Currently, there are three kinds of SmartDox documents:

SmartDox: Describes general articles and explanations.
LoxiDox: Defines terms listed in the glossary.
CML: Defines object information models.

A general SmartDox document describes domain knowledge—concepts, rules, and constraints—related to its subject domain.

SmartDox also serves as a metalanguage for DSLs, enabling consistent expression of conceptual and structural knowledge.

Two literate DSLs are built upon this metalanguage: LoxiDox for terminology definitions, and CML for domain model specifications. Both are themselves SmartDox documents and can be transformed into BoK artifacts using the SmartDox command.

SmartDox automatically generates a glossary from LoxiDox terms and builds cross-references linking articles to glossary entries. Through this mechanism, the site becomes a navigable and semantically connected knowledge system.

A CML document defines a domain model operable by software, using entities, state machines, and behavioral specifications. It serves as the foundation for code generation and model-driven application design.

Type	Content	Output	Remarks
SmartDox	Explanation / Design / Theory	HTML / Web Metadata	Readable and indexable knowledge representation
LoxiDox	Glossary Terms	HTML / Web Metadata	Knowledge representation of domain terms
CML	Object–Functional Model Definition	HTML / MCP / Web Metadata	Foundation for model definition and behavioral specification

Type

Content

Output

Remarks

SmartDox

Explanation / Design / Theory

HTML / Web Metadata

Readable and indexable knowledge representation

LoxiDox

Glossary Terms

HTML / Web Metadata

Knowledge representation of domain terms

CML

Object–Functional Model Definition

HTML / MCP / Web Metadata

Foundation for model definition and behavioral specification

Structure of BoK

The SmartDox command analyzes the entire SmartDox site, generates structured HTML and semantic metadata from all documents, and additionally produces MCP definitions from CML documents. The integration of these outputs constitutes the BoK.

Output Layer	Main Format	Content and Role	Primary Consumers
MCP	JSON / YAML	Defines operational and executable interfaces for models and entities, enabling actions by AI and tools.	AI / Agent
HTML	HTML5	Generates a structured, interpretable representation of SmartDox documents for both humans and AI.	Human / AI
Web Metadata	JSON-LD / RDF	Semantically links and classifies SmartDox documents, integrating them into a learnable knowledge graph as the BoK.	AI / Search System

Output Layer

Main Format

Content and Role

Primary Consumers

MCP

JSON / YAML

Defines operational and executable interfaces for models and entities, enabling actions by AI and tools.

AI / Agent

HTML

HTML5

Generates a structured, interpretable representation of SmartDox documents for both humans and AI.

Human / AI

Web Metadata

JSON-LD / RDF

Semantically links and classifies SmartDox documents, integrating them into a learnable knowledge graph as the BoK.

AI / Search System

The BoK generated in this way serves as a foundation for AI collaboration and knowledge assimilation, integrating documents, models, and semantic information into a unified knowledge base.

Furthermore, components generated from CML are registered in the BoK as component assets. In this case, the BoK also functions as a component repository.

Figure 1. SmartDox Site and BoK Site

BoK Generation

To generate a BoK from a SmartDox site, each type of resource stored within the SmartDox site is transformed into corresponding BoK components.

The following diagram illustrates how SmartDox resources are converted into BoK resources.

Figure 2. BoK Resource Generation from SmartDox Site

SmartDox documents are converted into HTML documents and Web metadata by the SmartDox command. At present, the Web metadata is generated in JSON-LD format and embedded directly within the HTML documents.

Since CML documents are also a kind of SmartDox document, they are likewise transformed into HTML documents and Web metadata by the SmartDox command.

In addition, because CML documents possess software-oriented characteristics, they also generate MCP definitions. Through the MCP, behavioral information of the domain knowledge described in the BoK is conveyed to generative AI systems via RAG (Retrieval-Augmented Generation).

Beyond article and term generation handled by the SmartDox command, CML documents are also processed by the Cozy command to automatically generate programs. These generated programs are packaged as components and reused as building blocks in applications.

Terms are defined by merging LexiDox term definitions with object information model definitions written in CML. Each article automatically embeds links from its used terms to their glossary entries, reinforcing the semantic network within the BoK.

The BoK also provides a component repository function to publish and manage these components. Through the information exposed via MCP and HTML documents, users and AI systems will be able to discover appropriate components and obtain usage instructions directly within the BoK.

Example

The following section illustrates a concrete example of the Web metadata stored within the BoK.

Here is an entity model described in CML. We will then examine the corresponding MCP and JSON-LD representations derived from this information.

# Entity
## Person
This entity represents a person. It serves as a basic unit for identifying users, operators, or staff members within an application. It holds personal information such as identifiers, names, and ages.
### Attributep
| name    | type       | mul | description |
|---------+------------+-----+--------------------------------|
| id      | identifier | 1   | A unique identifier assigned within the system |
| name    | name       | 1   | The person's full name |
| age     | age        | ?   | Age, an optional attribute |
### Rule
1. The id must be unique.
2. The name must not be empty.
3. If age is specified, it must be a non-negative integer.
### Description
This entity is a fundamental domain object used across the entire application.
The Person entity is related to other entities (such as Order or Account) and serves roles like creator, assignee, or requester.
Additionally, the Person entity may synchronize with external systems.
In such cases, its identifier may be assigned externally—via an external directory service (e.g., LDAP or IDaaS)—rather than generated internally.

Example: MCP

The following is an example of an MCP generated for the entity defined by the previously mentioned CML model.

{
  "name": "person-manager",
  "version": "1.0",
  "entity": "Person",
  "tools": [
    {
      "name": "create-person",
      "description": "Create a new person entity.",
      "input_schema": {
        "type": "object",
        "properties": {
          "id":   { "type": "string", "description": "A unique identifier assigned within the system." },
          "name": { "type": "string", "description": "The person's full name." },
          "age":  { "type": "integer", "minimum": 0, "description": "Age, an optional non-negative integer." }
        },
        "required": ["id", "name"]
      },
      "output_schema": {
        "type": "object",
        "properties": {
          "status": { "type": "string", "enum": ["success", "error"] },
          "person": { "$ref": "#/definitions/Person" }
        }
      }
    },
    {
      "name": "update-person",
      "description": "Update the information of an existing person.",
      "input_schema": {
        "type": "object",
        "properties": {
          "id":   { "type": "string", "description": "The ID of the person to update." },
          "name": { "type": "string" },
          "age":  { "type": "integer", "minimum": 0 }
        },
        "required": ["id"]
      }
    },
    {
      "name": "delete-person",
      "description": "Delete an existing person entity.",
      "input_schema": {
        "type": "object",
        "properties": {
          "id": { "type": "string", "description": "The ID of the person to delete." }
        },
        "required": ["id"]
      }
    }
  ],
  "definitions": {
    "Person": {
      "type": "object",
      "properties": {
        "id":   { "type": "string" },
        "name": { "type": "string" },
        "age":  { "type": ["integer", "null"] }
      },
      "required": ["id", "name"]
    }
  }
}

This MCP is derived from the Person entity definition in CML, and explicitly exposes operation APIs—such as creation, update, and deletion—within the behavior layer of the BoK.

Example: JSON-LD

The following is an example of JSON-LD generated for the entity defined by the previously mentioned CML model.

{
  "@context": {
    "@vocab": "https://www.simplemodeling.org/vocab#",
    "schema": "https://schema.org/"
  },
  "@type": "schema:Person",
  "@id": "https://www.simplemodeling.org/ja/domain-modeling/person.html",
  "name": "Person",
  "description": "An entity representing a person, used to identify users, operators, or staff members within an application.",
  "schema:identifier": {
    "@type": "schema:PropertyValue",
    "name": "id",
    "description": "A unique identifier assigned within the system."
  },
  "schema:additionalProperty": [
    {
      "@type": "schema:PropertyValue",
      "name": "name",
      "description": "The person's full name."
    },
    {
      "@type": "schema:PropertyValue",
      "name": "age",
      "description": "Age, an optional attribute.",
      "valueReference": {
        "@type": "schema:Integer",
        "minValue": 0
      }
    }
  ],
  "schema:isPartOf": {
    "@id": "https://www.simplemodeling.org/ja/domain-modeling/"
  }
}

This JSON-LD expresses the structure of the CML model at the semantic layer and is integrated into the overall knowledge graph of the SmartDox site.

Knowledge Circulation

In SimpleModeling, the following is defined as the reference architecture for generative AI. The terminology related to generative AI used throughout the framework is based on this definition.

Figure 3. Generative AI Reference Architecture

The Generative AI Reference Architecture defines the AI Collaboration Architecture as a conceptual structure that illustrates how AI activates, assimilates, expresses, and circulates knowledge through collaboration with humans.

Inside AI, the process begins with Knowledge Activation. Here, the AI references external sources such as the BoK as a Retrieval Knowledge Base, as well as its internal Pretrained Parametric Knowledge, to retrieve knowledge relevant to the given prompt. Through mechanisms such as RAG, the AI forms the necessary context for reasoning and generation.

Next, in Knowledge Assimilation, the retrieved knowledge is internally integrated and semantically restructured. At this stage, the AI “understands” the information and aligns it with its internal context.

In the subsequent phase, Knowledge Expression, the assimilated knowledge is expressed in various forms—text, code, or structured data. The result of this process constitutes the AI’s Generated Output.

The generated knowledge then diverges into two paths. The first is Knowledge Promotion, representing AI’s autonomous learning and internal evolution. Useful portions of the generated results are incorporated into the AI’s internal structure as Pretrained Parametric Knowledge, an accumulation of tacit knowledge that strengthens the AI’s cognitive foundation.

The second is Knowledge Circulation, in which humans intervene to evaluate and reorganize AI’s outputs, feeding them back into the BoK. These results are curated and structured as SmartDox documents or CML models, thus reintroducing the AI’s generated knowledge into the system as explicit, shareable knowledge.

Thus, within AI, knowledge is processed in three phases—Activation → Assimilation → Expression— followed by two distinct pathways, Promotion and Circulation, for elevation and sharing. AI not only evolves internally but also collaborates with humans to continuously expand the BoK.

This architecture demonstrates that AI is not merely a generative engine, but an entity that internalizes, expresses, elevates, and circulates knowledge— forming a cyclical relationship with human intellectual activity. Through Promotion, AI accumulates tacit knowledge internally, while through Circulation, humans systematize that output into explicit, shareable knowledge. Together, they enable the BoK to evolve continuously and deepen AI’s understanding of knowledge itself.

Perspective

In SimpleModeling, the SmartDox site integrates human-authored documents and CML models, and the SmartDox command transforms them into a unified BoK. Through RAG, AI internalizes this BoK, connecting the structures described by humans with the knowledge it has learned, thereby forming a shared space of understanding.

By repeating cycles of assimilation and circulation, knowledge evolves through human–AI collaboration. The BoK expands, and AI progressively attains deeper understanding.

This collaborative evolution is expected to shape the future of Literate Model–Driven Development in the AI era, where software development and knowledge development are seamlessly integrated.

References

Glossary

BoK (Body of Knowledge): At SimpleModeling, the core knowledge system for contextual sharing is called the BoK (Body of Knowledge). The goal of building a BoK is to enable knowledge sharing, education, AI support, automation, and decision-making assistance.
CML (Cozy Modeling Language): CML is a literate modeling language for describing Cozy models. It is designed as a domain-specific language (DSL) that forms the core of analysis modeling in SimpleModeling. CML allows model elements and their relationships to be described in a narrative style close to natural language, ensuring strong compatibility with AI support and automated generation. Literate models written in CML function as intermediate representations that can be transformed into design models, program code, or technical documentation.
DSL (Domain Specific Language): A DSL (Domain-Specific Language) is a language designed for a particular domain, enabling direct and concise expression of the domain’s concepts and structures. Compared to general-purpose programming languages (GPLs), DSLs offer a higher level of abstraction tailored for domain-specific problem solving and automation.
RDF: A W3C-standardized data model that represents information as subject–predicate–object triples.
knowledge graph: A semantic graph-based knowledge base where nodes represent entities or concepts and edges represent their relationships.
knowledge assimilation: A temporary internalization of external knowledge retrieved from the BoK through RAG. It enhances reasoning within a session but does not permanently modify the AI model’s parametric knowledge.
Component: A software construct that encapsulates well-defined responsibilities, contracts, and dependencies as a reusable and replaceable unit. In the logical model, it serves as an abstract structural unit; in the physical model, it corresponds to an implementation or deployment unit.
Retrieval-Augmented Generation (RAG): A generation technique that supplements a language model’s internal (parametric) knowledge by retrieving relevant external information before generation. RAG systems first search knowledge sources such as databases or knowledge graphs and then use the retrieved context as input for text generation.
Generative AI Reference Architecture: An architecture that organizes the internal processes executed by generative AI: knowledge activation, assimilation, expression, promotion, and circulation.
Retrieval Knowledge Base: Retrieval Knowledge Base (RKB) is a structured, retrievable subset of Non-parametric Knowledge optimized for use by RAG (Retrieval-Augmented Generation). It contains indexed SmartDox documents, glossary entries, and semantic metadata, allowing AI models to fetch explicit knowledge as contextual input. Through RAG interaction, the RKB serves as a bridge for transforming external explicit knowledge into intermediate assimilated knowledge within the AI model.
Parametric Knowledge: Parametric knowledge refers to implicit knowledge embedded in the parameters (weights) of a neural network. It represents statistical or distributed information acquired during pretraining, rather than explicit facts stored in an external knowledge base. In RAG (Retrieval-Augmented Generation), it is contrasted with non-parametric knowledge, serving as the model’s internal “implicit value.”
Prompt: A structured instruction or contextual representation that bridges retrieved knowledge (RAG) and the AI model’s reasoning process. It transforms the structured knowledge from the BoK into a narrative or directive form that the model can interpret, act upon, and internalize.
assimilated knowledge: A semantically integrated knowledge state within the AI, constructed from the context produced by knowledge activation. Corresponds to “understanding” within the AI.
Knowledge Promotion: A long-term integration process in which structured knowledge from the BoK is permanently incorporated into the model’s Pretrained Parametric Knowledge (PPK) through retraining or fine-tuning.
Literate Model-Driven Development (LMDD): Literate Model–Driven Development (LMDD) is a software development methodology that integrates natural-language narrative and formal model structure within a unified text-based framework. It extends conventional Model–Driven Development (MDD) by treating documentation and models as a single, consistent source of truth. In LMDD, the descriptive and structural elements of development artifacts are expressed together using the SmartDox language. From this unified representation, ModelDox extracts structural data, CML (Cozy Modeling Language) defines domain-specific models, and Cozy generates executable code, documentation, and configuration artifacts. Artificial intelligence participates in the LMDD process by analyzing the narrative context, validating structural consistency, and supporting the refinement of models and generated artifacts. All artifacts are represented in text form, ensuring traceability, version control, and interoperability within standard development environments. By defining a formally connected and machine-interpretable relationship between documentation, design, and implementation, LMDD provides a foundation for AI-assisted model–driven engineering where human authorship and automated reasoning operate on the same representational layer.