Why I Built OpenCaw: A Shared Instruction Layer for AI-Assisted Development
AI coding tools are getting better fast, but I kept running into the same problem every agent session could behave a little differently.
One session would write tests. Another would skip them. One agent would update architecture notes. Another would leave the knowledge buried in chat history. One session would wait before opening a PR. Another would push ahead without the same level of verification.
That inconsistency is what led me to build OpenCaw.
OpenCaw is an open source framework library for AI-assisted development. I designed it to standardize agent instructions, roles, skills, reusable commands, architecture guidance, task tracking, verification, and project memory across tools such as Cursor, Codex, and Claude.
In simpler terms: OpenCaw gives your AI coding agents a consistent way to work. You can use it by speaking naturally about what you want, or you can call specific roles, skills, and commands when you want precision. The explicit names are there for control and repeatability, not because every user should have to memorize them.
What Is OpenCaw?
OpenCaw is not a traditional application framework. It does not dictate whether your app should use .NET, Node, Python, React, Angular, Terraform, Kubernetes, or any other stack.
Instead, OpenCaw is a shared AI development baseline that you mount inside an existing repository.
I built it around a simple idea: the reusable AI operating model should live in one place, while project-specific knowledge should stay inside the project.
OpenCaw gives agents a structured operating model that includes:
- Role definitions
- Skill instructions
- Reusable shell commands
- Architecture templates
- Task tracking conventions
- PR readiness rules
- Verification workflows
- Project-local memory files
The mounted OpenCaw folder acts as the shared configuration layer. The host repository keeps its own memory, rules, debug notes, tasks, goals, and project-specific context under .ai/.
That separation matters. I want teams to be able to improve their shared AI workflow without polluting the baseline with one repository's local quirks.
Why I Built It
I created OpenCaw because I wanted AI-assisted development to feel less like improvisation and more like engineering.
Prompting alone is not enough for long-lived repositories. A good prompt can get a useful answer once, but it does not automatically give you repeatable behavior across future sessions, other tools, or other team members.
Without a shared baseline, AI sessions tend to drift:
- One agent writes tests; another skips them.
- One agent updates architecture notes; another ignores them.
- One agent opens PRs too early; another waits for confirmation.
- One agent stores useful lessons in the wrong place.
- One agent understands the stack; another has to rediscover it.
- One agent verifies the work; another simply says it is done.
OpenCaw is my answer to that drift.
The goal is not to create one magic prompt. The goal is to create a durable operating manual for AI-assisted software work: how agents should plan, implement, verify, document, and hand off changes.
How OpenCaw Is Structured
I designed OpenCaw to be installed into a repository as a tool directory such as .codex, .cursor, or .claude.
A typical install looks like this:
your-repository/
├── .codex/
│ ├── AGENTS.md
│ ├── .architecture/
│ ├── .roles/
│ ├── commands/
│ └── skills/
├── .ai/
│ ├── MEMORY.md
│ ├── RULES.md
│ ├── DEBUG.md
│ └── tasks/
├── AGENTS.md
└── ARCHITECTURE.md
The important separation is:
| Area | Purpose |
|---|---|
.codex/, .cursor/, or .claude/ |
Shared OpenCaw baseline |
.ai/ |
Project-local memory, rules, debug notes, tasks, goals, and reports |
AGENTS.md |
Host repository bootstrap that tells the agent to load OpenCaw |
ARCHITECTURE.md |
Canonical architecture contract for the host repository |
OpenCaw includes architecture templates under .architecture/, role definitions under .roles/, reusable task instructions under skills/, and shell automation under commands/.
The repository includes templates and conventions for many common engineering areas, including .NET, Node, Python, Playwright, React, Angular, Vue, Terraform, Kubernetes, Helm, GitHub Actions, Azure DevOps, Docker, databases, and more.
How to Install OpenCaw
Before installing OpenCaw into a project, I recommend forking the repository.
You can install directly from the public repository, but using your own fork gives your team control over updates, local customizations, release pinning, and upstream merges.
In the examples below, replace <your-org> with your GitHub user or organization.
Option 1: Install as a Git Submodule
This is the approach I recommend for most teams.
Use .codex if you want OpenCaw available to Codex-style agents:
git submodule add https://github.com/<your-org>/OpenCaw .codex
git submodule update --init --recursive
Use .cursor if you want OpenCaw available to Cursor:
git submodule add https://github.com/<your-org>/OpenCaw .cursor
git submodule update --init --recursive
Use .claude if you want OpenCaw available to Claude:
git submodule add https://github.com/<your-org>/OpenCaw .claude
git submodule update --init --recursive
A submodule is useful because your host repository can pin a known-good OpenCaw revision while still allowing controlled updates later.
Option 2: Install as a Clone
Use a plain clone when the repository needs a heavily customized copy of OpenCaw:
git clone https://github.com/<your-org>/OpenCaw .codex
Or:
git clone https://github.com/<your-org>/OpenCaw .cursor
Or:
git clone https://github.com/<your-org>/OpenCaw .claude
This is easier to modify directly, but it also makes upstream updates less structured than a submodule.
How I Recommend Using OpenCaw
Once installed, OpenCaw should load through the repository's AGENTS.md bootstrap.
If your agent session does not appear to pick it up, explicitly ask it to read the instructions:
Read AGENTS.md instructions
From there, the most important thing to know is this: you do not have to memorize every role, skill, command, task convention, or goal-flow phrase.
I include exact role names, skill names, and command paths because they are useful. They make workflows auditable, scriptable, repeatable, and easier to debug. But OpenCaw is also designed to work from normal human intent.
You can say what you want in plain English:
Review this project for security issues, create tasks for anything serious, and run whatever validation makes sense before summarizing the results.
OpenCaw gives the agent enough structure to translate that into the right operating mode: likely a security role, task creation, issue management if relevant, validation commands, and an evidence-based final summary.
That means there are two equally valid ways to use OpenCaw:
| Method | What It Looks Like | Best For |
|---|---|---|
| Plain-language intent |
Review this API for security and reliability, create tasks for the high-risk items, and verify your findings.
|
Everyday usage, onboarding, exploratory work, and users who do not want to memorize syntax |
| Explicit controls |
use role security-engineer + sre, use skill create-task-file, then run the relevant validation commands
|
Repeatable workflows, advanced users, debugging, automation, and team conventions |
My recommendation is to start with plain language. Add explicit roles, skills, or command paths only when you need more control.
1. Start With Intent
The most natural way to use OpenCaw is to describe the outcome you want.
For example:
Help me implement this feature safely. Plan the work, update the code, add tests, run the right checks, and tell me what changed.
That is enough for an OpenCaw-aware agent to choose the right workflow. It can infer that this is implementation work, that task planning may be useful, that test and validation behavior matters, and that the final answer should include evidence.
This is the mode I want most people to use day to day. OpenCaw should make the agent easier to direct, not harder.
2. Use Roles When You Want a Specific Perspective
Roles tell the agent which perspective to use.
OpenCaw includes roles such as:
backend-architectfrontend-developerfullstack-engineerqa-engineerproject-managersecurity-engineersredevops-automatordata-engineertechnical-writer
The role index also supports aliases such as security, qa, frontend, backend, devops, sre, and pm.
You can be explicit:
use role security-engineer and review this repository for authentication, authorization, and dependency risks
Or you can say the same thing naturally:
Review this repository like a security engineer. Focus on authentication, authorization, and dependency risks.
Both are valid. The explicit version is useful when you want deterministic behavior. The natural-language version is easier to remember.
3. Compose Perspectives Without Memorizing Role Names
OpenCaw supports multi-role sessions. The first role acts as the primary perspective. Additional roles add specialist constraints or review lenses.
Explicit version:
use role backend-architect + security-engineer and review the payment API design before implementation
Plain-language version:
Review the payment API design from both an architecture and security perspective before we implement it.
I use this pattern often when I want implementation work to carry a second concern, such as quality, security, reliability, or documentation.
You do not have to know the exact role names. Describe the perspectives you want, and OpenCaw gives the agent a structure for mapping those perspectives into its role system.
4. Let Skills Stay Behind the Curtain
Skills are reusable instructions for common types of work.
I separate skills from commands on purpose:
- Skills define the reasoning workflow.
- Commands define repeatable execution.
You can call skills explicitly:
use skill create-task-file + manage-task-issues + test-dotnet
But you can also just describe the work:
Turn this GitHub issue into a tracked task, implement the fix, link the task back to the issue, and run the relevant .NET tests.
That plain-language prompt gives the agent enough signal to use the task-file, issue-management, and .NET testing workflows without requiring you to remember the internal skill names.
Useful skills include:
| Skill | Use It For |
|---|---|
create-task-file |
Creating durable task instructions |
manage-task-issues |
Syncing task tracking with GitHub issues |
orchestrate-subagents |
Planning safe parallel work |
goal-flow |
Coordinating multi-task delivery |
test-dotnet |
Running .NET verification |
playwright-e2e-tests |
Browser-level verification |
clean-context |
Compacting long-running project context |
Skills make agent behavior less dependent on one-off prompt wording. The skill names are helpful labels, but they are not the only way to trigger the behavior.
5. Use Commands for Repeatable Execution
Commands are shell scripts inside the OpenCaw baseline.
Examples:
./.codex/commands/validate-opencaw.sh
./.codex/commands/generate-architecture.sh DOTNET NODE GITHUB_ACTIONS
./.codex/commands/dotnet-build.sh
./.codex/commands/dotnet-test.sh
I like showing the full command paths because they make it clear what is actually available in the repository. They are useful when you are running something manually, writing documentation, debugging a workflow, or wiring OpenCaw into automation.
But you do not need to remember those paths during normal agent usage.
Instead of saying this:
Run ./.codex/commands/dotnet-build.sh and ./.codex/commands/dotnet-test.sh
You can say this:
Build the .NET project and run the relevant tests before summarizing the result.
Instead of saying this:
Run ./.codex/commands/generate-architecture.sh DOTNET REACT GITHUB_ACTIONS
You can say this:
Create an architecture document for this .NET and React project that uses GitHub Actions.
OpenCaw includes commands for validation, architecture generation, .NET restore/build/test, Playwright evidence reporting, task files, issue linking, PR readiness, goal files, sub-agent planning, security scans, and database tooling.
The point is not to make users memorize shell scripts. The point is to give the agent reliable tools it can choose when you describe the outcome you want.
6. Let OpenCaw Create Durable Task Artifacts
For substantial work, OpenCaw uses .ai/tasks/ in the host repository.
A typical task structure looks like this:
.ai/tasks/
├── TODO.md
├── OPEN_ISSUES.md
└── checkout-refactor/
├── TASK.md
└── SUBAGENTS.md
I use this structure because chat history is not enough for serious engineering work.
The task directory gives the repository a durable trail:
- What the task is
- What issue it maps to
- What needs to be done
- Which sub-agent lanes were planned
- What remains open
- What validation happened
You can request this explicitly:
use skill create-task-file and create a task under .ai/tasks for this refactor
Or you can say it naturally:
This refactor is large enough that I want a durable task plan before implementation.
Both should lead the agent toward task planning. The task system exists so work can be paused, resumed, reviewed, and handed off without relying only on chat history.
7. Use Architecture Templates Early
If your repository does not have an ARCHITECTURE.md, OpenCaw can generate one from selected templates.
Manual command version:
./.codex/commands/generate-architecture.sh DOTNET REACT GITHUB_ACTIONS
Explicit agent version:
use role software-architect and generate ARCHITECTURE.md for this repository using DOTNET, REACT, POSTGRESDB, and GITHUB_ACTIONS templates
Plain-language version:
Create an architecture document for this repo. It is a .NET backend with a React frontend, Postgres storage, and GitHub Actions for CI.
By default, I keep the generated architecture file concise by referencing selected templates instead of inlining all template content. The architecture generation script also supports an --inline option when full embedded template content is explicitly needed.
My recommendation is to create or update architecture documentation early. It gives the agent a contract to preserve instead of forcing every session to rediscover the system from scratch.
8. Use Sub-Agent Planning Only When Work Can Be Split Safely
OpenCaw includes project-manager and sub-agent orchestration support.
Sub-agent planning is useful when different lanes can work independently, such as:
- One lane investigates current behavior.
- One lane updates backend code.
- One lane updates frontend code.
- One lane writes tests.
- One lane reviews security or performance.
Explicit version:
use 4 agents with project-manager + fullstack-engineer + qa-engineer to split the checkout refactor into safe parallel lanes, keep write sets separate, then integrate and verify
Plain-language version:
This checkout refactor may be too large for one linear pass. Split it into safe parallel work lanes if that makes sense, keep overlapping file changes to a minimum, then integrate and verify everything together.
I do not recommend forcing parallel work just because multiple agents are available.
Good sub-agent planning should keep write sets narrow, avoid overlapping changes, and prefer fewer high-quality lanes over unnecessary fragmentation.
9. Use Goal Flow for Multi-Task Delivery, Not Auto-Merge
Goal flow is for explicit, multi-task automation.
It can move from task to task, raise PRs after validation, run post-PR QA, and produce a final approval report.
It does not mean automatic merge approval, and it does not skip QA.
Explicit version:
goal: modernize the reporting module across these five tasks. Automatically raise each task PR after validation, run post-PR QA, then continue to the next task. Do not merge PRs automatically.
Plain-language version:
I want to work through these five reporting-module tasks as one coordinated goal. For each task, validate the work, open a PR when it is ready, run post-PR QA, and continue to the next task only if the checks pass. Do not merge anything automatically.
Use goal flow when you have a clear sequence of tasks and want the agent to manage delivery discipline across them.
Do not use it when requirements are still vague or when each step still needs product approval.
Plain-Language Intent vs Explicit Controls
Here is the simplest way to think about it:
Tell OpenCaw what outcome you want. Use exact names only when exactness helps.
The explicit names are still valuable. I keep them visible because teams need stable vocabulary for docs, automation, task files, pull requests, and repeatable workflows.
But the day-to-day user experience should feel natural.
| You Can Say This | OpenCaw Can Translate It Into |
|---|---|
Review this like a security engineer. |
A security-focused role and review workflow |
Create a durable plan before coding. |
Task-file planning under .ai/tasks/ |
This is a multi-step initiative. |
Goal-oriented planning and delivery flow |
Split this up if parallel work is safe. |
Project-manager and sub-agent orchestration behavior |
Run the checks that matter for this stack. |
Relevant validation, build, test, or evidence commands |
Open a PR when it is validated, but do not merge it. |
PR readiness flow with a human merge gate |
This is the balance I wanted OpenCaw to strike: plain-language control for humans, explicit machinery for agents and teams.
Example Prompts
Here are practical prompts I use or recommend after installing OpenCaw. Each example is written in plain language first, with an explicit version where the extra precision may help.
Repository Onboarding
Plain-language version:
Read the project instructions, inspect this repository, summarize the stack, identify missing architecture documentation, and recommend what should be standardized.
Explicit version:
Read AGENTS.md instructions. Then inspect this repository, summarize the stack, identify missing architecture documentation, and recommend the OpenCaw architecture templates that apply.
Architecture Generation
Plain-language version:
Create an architecture document for this repo. It has a .NET backend, React frontend, Postgres database, and GitHub Actions CI. Keep the document concise and easy for future agents to follow.
Explicit version:
use role software-architect and generate ARCHITECTURE.md using DOTNET, REACT, POSTGRESDB, and GITHUB_ACTIONS. Keep the generated file concise with template read directives.
Security Review
Plain-language version:
Review this repository for security risks. Focus on authentication, authorization, secrets, dependencies, logging, and deployment. Create tracked tasks for high-priority fixes and verify anything you change.
Explicit version:
use role security-engineer + sre and review this repository for authentication, authorization, secrets, dependency, logging, and deployment risks. Create a task file for each high-priority fix.
Test Coverage
Plain-language version:
Improve test coverage for this feature. Include edge cases, regression coverage, and a clear summary of the verification you ran.
Explicit version:
use role qa-engineer and generate full test coverage for the current feature, including edge cases, regression tests, and verification evidence.
Full Feature Delivery
Plain-language version:
Build the user settings page end to end. Plan the work, implement the API integration, add UI validation, update documentation if needed, run the right checks, and summarize the evidence.
Explicit version:
use role fullstack-engineer and build the user settings page with API integration, UI validation, tests, task tracking, architecture updates if needed, and final verification.
Existing GitHub Issue
Plain-language version:
Work on issue #123. Turn it into a tracked OpenCaw task, implement the fix, link the task back to the issue, add tests, run validation, and prepare it for PR review.
Explicit version:
work on #123. Import the issue into OpenCaw task tracking, implement the fix, add tests, run verification, and prepare PR readiness evidence.
Parallel Refactor
Plain-language version:
This checkout refactor is large. Split it into safe parallel work lanes only if that reduces risk. Keep file ownership clear, integrate the result, and verify the whole flow before calling it done.
Explicit version:
use 3 agents with project-manager + backend-architect + qa-engineer to plan and implement the checkout refactor. Split only safe parallel lanes, keep write sets separate, then integrate and verify.
Multi-Task Goal
Plain-language version:
Treat these five reporting tasks as one coordinated goal. Complete them in order, validate each one, open a PR when each task is ready, run post-PR QA, and stop if any check fails. Do not merge automatically.
Explicit version:
goal: complete the reporting modernization across the five tasks listed in TODO.md. Raise each task PR after validation, run post-PR QA, and stop if validation or QA fails. Do not merge PRs automatically.
Keeping OpenCaw Updated
If you installed OpenCaw as a submodule, update it with:
git submodule update --remote .codex
Or, for another mount name:
git submodule update --remote .cursor
git submodule update --remote .claude
For production repositories, I recommend pinning a known release tag rather than automatically following the latest commit.
Example:
cd .codex
git fetch --tags
git checkout 2.0.0
cd ..
git add .codex
git commit -m "chore(opencaw): pin OpenCaw 2.0.0"
Replace 2.0.0 with the release tag you want to pin.
This gives you the best of both worlds: centralized updates when you want them, and controlled changes when stability matters.
Best Practices
Keep OpenCaw Shared, Keep Memory Local
Do not put project-specific lessons inside the OpenCaw baseline unless you are intentionally changing the shared framework.
Store project memory in .ai/.
Good places for project-local knowledge:
.ai/MEMORY.md
.ai/RULES.md
.ai/DEBUG.md
.ai/FRAGMENTS/
.ai/LEARNINGS/
.ai/tasks/
This is one of the most important design choices in OpenCaw. The baseline should be reusable. The host repository should own its local truth.
Validate After Changing Roles, Skills, or Commands
Run validation after editing OpenCaw internals:
./.codex/commands/validate-opencaw.sh
This is especially important if you customize roles, add skills, or introduce new commands.
Prefer Clear Intent Over Memorized Syntax
Weak prompt:
fix this
Better plain-language prompt:
Fix this bug like a senior developer. Find the root cause, make the smallest safe change, add regression coverage, run the relevant checks, and summarize the evidence.
More explicit version:
use role senior-developer + qa-engineer and fix this bug. Find the root cause, implement the smallest safe change, add regression coverage, run verification, and summarize the evidence.
Both improved prompts give the agent a perspective, a quality bar, and a definition of done. The explicit role name is helpful, but the intent matters more than the syntax.
Ask for Evidence, Not Just Completion
A strong OpenCaw workflow ends with proof.
Good verification requests include:
Run the relevant tests and include command output in the final summary.
Verify the browser flow with Playwright and attach evidence to the PR comment.
Compare behavior before and after the change and document the difference.
I care about evidence because "done" is cheap. Verified is what matters.
Use Goal Flow Carefully
Goal flow is powerful, but it works best when the task sequence is already clear.
Use it for planned delivery, not open-ended discovery.
Good goal-flow prompt:
goal: complete the migration plan across the five tasks already listed in TODO.md. Raise each PR after local validation, run post-PR QA, and stop if validation or QA fails. Do not merge PRs automatically.
Do Not Skip the PR Readiness Gate
In normal task flow, OpenCaw expects the agent to ask before pushing or opening a PR.
Goal flow is the explicit exception, and even then it still requires validation and post-PR QA.
That gate exists for a reason. AI-assisted development should move quickly, but it should not bypass engineering discipline.
When OpenCaw Is a Good Fit
OpenCaw is especially useful when:
- You use AI coding tools across multiple repositories.
- Your team wants consistent agent behavior.
- You want repeatable architecture and verification standards.
- You want durable task tracking instead of chat-only planning.
- You want role-based AI workflows for architecture, QA, DevOps, security, and full-stack work.
- You want a project memory system that does not pollute the shared baseline.
- You want controlled PR readiness and QA behavior.
It may be more structure than you need for a tiny throwaway project. But for long-lived repositories, especially team-owned codebases, that structure is the point.
Final Thoughts
I built OpenCaw because I wanted AI-assisted development to become more repeatable, more auditable, and more useful across real repositories.
OpenCaw does not replace your coding agent. It gives that agent better rails.
Installed as .codex, .cursor, or .claude, OpenCaw becomes a reusable baseline for how work should be planned, implemented, verified, documented, and handed off.
The result is a more governed AI workflow:
- Plain-language intent tells the agent what outcome you want.
- Roles define perspective when you need precision.
- Skills define reasoning workflows behind the scenes.
- Commands define repeatable execution.
.ai/stores project memory.- Verification evidence keeps the process honest.
For teams trying to move from casual AI prompting to repeatable AI-assisted engineering, OpenCaw is the structure I wish I had earlier. That is why I built it.