AI-Era Executable Specification

What Is an Executable Specification?

Executable specifications are rooted in the test-case philosophy originating from BDD (Behavior-Driven Development).

The core idea is simple: make test cases readable directly as specifications.

In BDD, behavior is primarily described in the form of scenarios.

In contrast, SimpleModeling does not limit itself to behavior alone.

How APIs are used
Constraints and boundary conditions
Properties
Requirement-level scenarios

All tests that carry semantic meaning as specifications are positioned as executable specifications.

A executable specification is not merely a test; it is an alternative form of specification that explicitly states, in an executable way, what requirements the program is intended to satisfy.

Programming Environment

Scala 3 is used as the programming language, combined with cats to practice object-functional programming.

Thanks to the powerful static type system and functional programming features provided by Scala 3, many bugs are eliminated before execution—namely at the compile time.

From the perspective of the Curry–Howard correspondence, compilation in a statically typed functional language can be regarded as equivalent to a proof in intuitionistic propositional logic.

ScalaTest is adopted as the testing framework, and Property-Based Testing (PBT) is actively utilized as well.

As a programming language for the AI era, Scala is considered a strong candidate.

This is because, under the assumption that AI handles much of the coding, factors other than grammatical simplicity for humans become more important:

That AI is less likely to make mistakes
That intentions and constraints can be expressed as types
That the distance between specification and implementation is small

The combination of strong static typing and functional programming naturally satisfies these requirements, providing a foundation that is highly compatible with AI-driven development.

Test Philosophy

Having more test cases is not inherently better. What matters is the perspective of intentionally limiting the number of test cases and avoiding overproduction.

One of the reasons for adopting Scala and functional programming is that they allow the required number of test cases to be drastically reduced.

Through the combination of Scala’s powerful static type system and functional programming, many bugs are eliminated at compile time rather than at runtime.

As a result, a compiled program is already in a fairly “clean” state.

Given this premise, instead of attempting to cover everything with tests, it becomes viable to adopt a policy of covering only the meaningful points with a minimal set of tests.

Types of Tests

In SimpleModeling, to ensure quality under these assumptions, the following tests are prepared and maintained as executable specifications.

First, from a bottom-up perspective, we start from component and class specifications and create executable specifications for verification, confirming that the implementation satisfies those specifications.

Component testsExplicitly fix API usage, preconditions, return values, and error conditions as specifications, and verify that the implementation conforms to the design specifications.

On the other hand, from a top-down perspective, we start from requirement specifications and create executable specifications for validation, confirming that the system as a whole behaves correctly in a semantic sense.

Scenario testsDescribe requirements as scenarios and express use-case preconditions, actions, and outcomes directly in executable form, thereby validating that the requirements are satisfied.

These are not merely tests for quality assurance.

Fixing design specifications through bottom-up verification
Grounding requirement specifications through top-down validation

From these two directions, they are positioned as a fundamental structure for maintaining specifications in executable form.

This two-layer structure enables early detection of gaps between implementation, design, and requirements in the form of executable specifications, allowing both humans and AI to share the same criteria for judgment.

Executable Specification

The purpose of creating executable specifications can be summarized into the following three points.

Clearly describe the specification
Verify that the implementation behaves according to the specification
Serve as examples that demonstrate correct usage of the functionality

A executable specification is not merely test code.

It can be read by humans as a specification
Its correctness is determined by execution
The execution result itself serves as an explanation of the specification

More importantly, executable specifications form a specification representation that is easy for AI to understand and reference.

Specifications written in natural language can be flexibly interpreted by humans, but for AI they leave too much room for interpretation, making it difficult to accurately determine which parts are mandatory requirements.

In contrast, in executable specifications:

Inputs and outputs are explicitly defined
Success and failure conditions are fixed as code
Judgment is made based on execution results

As a result, AI can clearly grasp what constitutes correct behavior and where the boundaries of the specification lie.

A executable specification plays a dual role:

A specification for understanding and verification for humans
A specification representation that AI can use as a criterion for judgment

Thus, it fulfills a dual role.

In this way, executable specifications are a practical and reliable form of expression that allows humans and AI to share the same specifications.

Example of an Executable Specification

A concrete example of an Executable Specification is explained below.

Program Example

It represents API inputs, outputs, preconditions, and required specifications directly as executable tests.

With this format, the specification can be read as documentation while its correctness can be verified through execution.

In the code examples that follow, the test code itself functions as a specification that explicitly states what the API must satisfy.

class PingSpec extends AnyWordSpec with Matchers with GivenWhenThen {
  "Ping API" should {
    "return pong" in {
      Given("a ping API")
      val api = PingApi()
      When("ping is called")
      val result = api.ping()
      Then("pong is returned")
      result shouldBe "pong"
    }
  }
}

The test program also uses ScalaTest’s dedicated syntax (leveraging Scala’s DSL capabilities), making the specification description and the test execution parts clearly distinguishable.

Even from this alone, it is easy to understand what is being tested.

Execution Example

When the Executable Specification is executed, the output appears as follows.

PingSpec:
Ping API
- should return pong
Given a ping API
When ping is called
Then pong is returned
Run completed in 120 milliseconds.
Total number of tests run: 1
Tests: succeeded 1, failed 0

This execution result does not merely indicate that “the tests have passed,” but also that:

The test names serve as summaries of the API’s behavior
Specification sentences conforming to the structured Given–When–Then format are output

As a result, a form of specification text is produced.

In the case of English, the key point is that the output is written in sentences that can be read directly as natural English.

This is precisely what makes it an Executable Specification.

The output format makes it easy for both humans and AI to grasp what the specification actually is.

Key Points of Executable Specifications

Here, from the perspective of how to write Executable Specifications, we organize the appropriate use of ScalaTest.

The important point is that these are not merely test notations, but expressive techniques for conveying specifications to both humans and AI.

AnyWordSpec

By using AnyWordSpec, specifications can be written in a DSL style that is close to natural language.

Because the part to be read as a specification (what) and the programmatic part that performs verification (how) are clearly separated, the outline of the specification can be grasped quickly.

This structure is not only easy for humans to read, but also makes it easier for AI to judge “which parts are specifications and which parts are implementation logic.”

PBT

By using PBT (Property-Based Testing), it becomes possible to fix the properties that must hold themselves as specifications, rather than concrete input examples.

property("reverse twice returns original") =
  forAll { (xs: List[Int]) =>
    xs.reverse.reverse == xs
  }

This approach is primarily used as a verification technique for bottom-up specification verification of components and classes.

By describing properties directly rather than enumerating concrete examples, the essence of the specification becomes clearer, and it becomes easier for AI to grasp the invariants that must be preserved.

GivenWhenThen

By using the GivenWhenThen trait, it is possible to describe specifications with a clear causal relationship between preconditions, actions, and results.

In SimpleModeling,

Component tests
Scenario tests

Given / When / Then is used in both cases.

Given("a registered user")
When("the user submits a valid request")
Then("the system returns a success response")

This format serves as a fundamental pattern for grounding scenarios written as requirement specifications directly into the implementation as Executable Specifications.

Matchers

By leveraging Matchers, verification logic can be expressed using domain-specific terminology.

result should beAccepted
result should haveStatus(Success)

Such expressions do not merely perform true-or-false judgments; they explicitly state what is expected as a specification.

When component developers themselves provide Matchers, Executable Specifications become more specification-like, making the intent easier to understand for both humans and AI.

Trinity

In development that assumes AI assistance,

Written specifications
Implementation code
Executable Specifications

a development style that cultivates all three simultaneously is effective.

What is important here is not to arrange them linearly as phases, but to evolve them through constant cross-reference.

Written specifications serve as an entry point for sharing intent, background, and context between humans and AI.

Implementation code embodies that intent as concrete behavior.

Executable Specifications fix, in an executable form, what requirements that behavior actually satisfies.

These three are not independent:

A failure in an Executable Specification may indicate an implementation error
or ambiguity in the design
or even flaws in the analysis model itself

As a result, development involves

moving from implementation to Executable Specifications
and from Executable Specifications back to design and analysis models

in a repeated up-and-down iteration.

Through this cycle, analysis models become grounded in real behavior, and implementations gain semantic justification.

This development style is different from TDD (Test-Driven Development).

The purpose of Executable Specifications is not to drive implementation step by step, but to continuously fix the meaning and boundaries of the specification.

Maintaining a state in which the three do not contradict each other.

And being able to detect discrepancies early and feed them back into analysis models, design, and implementation.

This is precisely the practical quality assurance of the AI era and the essential value of the trinity development style.

Summary

An Executable Specification is not a technique for testing.

Rather, it is

written specifications
analysis and design models
implementation code

a shared point of convergence for cultivating all three simultaneously.

It grounds analysis models in implementation and feeds the behavior of the implementation back into the models.

By establishing this cycle, humans and AI can share an understanding of what constitutes the correct specification.

A trinity development style centered on Executable Specifications forms the foundation for sustainable and reliable quality assurance in the AI era.

References

Glossary

executable specification: An executable specification is a specification expressed in a form that can be executed to determine correctness.
BDD (Behavior-Driven Development): Behavior Driven Development (BDD) is a development approach that focuses on specifying system behavior through scenarios written in a shared language.
bug: A colloquial term referring to software problems. It has no strict technical definition and is often used broadly to cover Defects, Faults, or Failures.
Component: A software construct that encapsulates well-defined responsibilities, contracts, and dependencies as a reusable and replaceable unit. In the logical model, it serves as an abstract structural unit; in the physical model, it corresponds to an implementation or deployment unit.
verification: Verification is the activity of confirming that an implementation conforms to its specified design or requirements.
error: A generic term used broadly in practice. In software engineering, it is ambiguous and may denote bugs or failures in general. In SimpleModeling, Error is treated as a broad label, with specifics clarified as Mistake, Defect, Fault, Failure, or Deviation.
validation: Validation is the activity of confirming that a system or product fulfills its intended use and stakeholder requirements.
DSL (Domain Specific Language): A DSL (Domain-Specific Language) is a language designed for a particular domain, enabling direct and concise expression of the domain’s concepts and structures. Compared to general-purpose programming languages (GPLs), DSLs offer a higher level of abstraction tailored for domain-specific problem solving and automation.
TDD (Test-Driven Development): Test Driven Development (TDD) is a development practice in which tests are written before implementation, and the code is evolved by repeatedly making tests pass and refactoring.