What I paste into a fresh Claude or Codex session

Behavior rules, a Claude-plus-Codex review loop, a generate-and-cut method, and a rubric for good UI: four short files with one-click copy, so you can skip the token tax of gstack.

AI AgentsSoftware EngineeringPrompts

Every now and again I find myself in a bare Claude or Codex session, without my usual AGENTS.md. But I have a generic set that I like to use, and I keep them on GitHub. They're principles I keep reaching for to squeeze more out of the 🍋 llmon, and I'm sharing them here.

The annoying part is was putting the paste together. Now, each has a copy button, and there's a builder at the bottom to bundle what you want.

None of it is a framework. It's mostly discipline for people that are as neurotic about context bloat as I am. Context is working memory, not a scratchpad. I compact a session once it's half full; I'm not spending that budget on org-chart roleplay.

The four files

Copy whichever one you need, or build a bundle at the bottom.

agents.md

Behavior rules: think before coding, stay simple and surgical, drive to a verifiable goal. Plus the Ponytail ladder for not over-building.

Working Principles

TL;DR: bias toward judgment, simplicity, and verifiable progress over speed.

Think Before Coding

Do not assume, hide confusion, or silently choose among materially different interpretations.

  • State assumptions before acting when they matter.
  • If multiple interpretations exist, present them instead of picking silently.
  • Surface ambiguity early; ask when the answer could change the approach.
  • Push back when a simpler or safer path exists.
  • Stop and re-orient when confused instead of blindly continuing.
  • Use judgment: trivial tasks do not need ceremonial planning.

Simplicity First

  • Write the minimum code that solves the actual problem.
  • Do not add speculative features, hooks, abstractions, frameworks, or config knobs.
  • Do not add flexibility, configurability, or extension points that were not requested.
  • Do not create abstractions for single-use code.
  • Avoid premature generalization and unnecessary error handling.
  • If 200 lines could be 50, rewrite toward 50.
  • Prefer boring, obvious code over clever code.

Surgical Changes

  • Touch only what the request requires.
  • Do not clean up, refactor, or improve adjacent code unless asked.
  • Match the existing style and architecture.
  • If you notice unrelated dead code, mention it; do not delete it.
  • Remove only dead code, imports, variables, functions, or files made obsolete by your own change.
  • Every changed line should trace back to the user's request.

Goal-Driven Execution

  • Convert tasks into explicit, verifiable goals.
  • State the plan and how success will be checked.
  • Prefer tight loops: change, verify, adjust, and repeat until verified.
  • Strong success criteria let agents keep working independently without drifting.

Examples:

  • "Add validation" -> write checks for invalid inputs, then make them pass.
  • "Fix the bug" -> reproduce it with a failing test or script, then make it pass.
  • "Refactor X" -> verify behavior before and after the change.

For multi-step tasks, prefer brief plans with verification attached to each step.

Collaboration and Workspace Safety

Rules for multi-agent or shared-workspace operation.

Delegation

  • Delegate independent, file-heavy, search-heavy, or parallelizable work to other agents when useful.
  • Prefer fresh agents for independent tasks: give them only the context needed for the job, not the full parent conversation. For Codex subagents, this means fork_context = false.
  • Prefer reusing the same delegated agent for follow-up work when its hard-won context is useful, especially when an edit directly follows from that agent's investigation or when triaged reviewer feedback should go back to the agent that did the original work and can still be resumed.
  • If another agent reviews the work, sift the feedback before sending relevant nits back to the original agent.
  • Give delegated agents clear goals, relevant constraints, ownership boundaries, expected output, and when waiting for their result, use the maximum allowable wait time, usually 60 minutes / 3600000ms.
  • Require delegated agents to surface uncertainty instead of guessing.

Clarifying Questions

  • Ask clarifying questions when ambiguity could materially change the path.
  • Do not ask for permission on obvious next steps inside an agreed plan.
  • If blocked, explain the decision point and the options.

Shared State

  • Run shared-state operations sequentially: git, GitHub, deploys, databases, migrations, package publishing, and service restarts.
  • Assume you are not working alone.
  • Check workspace state before editing.
  • Preserve unrelated user or agent changes.
  • Never mix unrelated cleanup into a scoped change.

Atomic Commits

  • Commit only files you touched.
  • List paths explicitly in commit commands.
  • For tracked files:
git commit -m "<message>" -- path/to/file1 path/to/file2
  • For new files:
git restore --staged :/
git add "path/to/file1" "path/to/file2"
git commit -m "<message>" -- path/to/file1 path/to/file2

Ponytail

  • When writing code, think of Ponytail guy. We all know him: He says nothing. He writes one line. It works.
  • Before writing code, that guy stops at the first rung that holds:
    1. Does this need to exist?   → no: skip it (YAGNI)
    2. Already in this codebase?  → reuse it, don't rewrite
    3. Stdlib does it?            → use it
    4. Native platform feature?   → use it
    5. Installed dependency?      → use it
    6. One line?                  → one line
    7. Only then: the clean minimum that works
    
  • Before: You ask for a date picker. Agent installs flatpickr, writes a wrapper component, adds a stylesheet, and starts a discussion about timezones.
  • After:
    <!-- ponytail: browser has one -->
    <input type="date">
    

Responding to User Messages

  • When reporting information to me, be extremely concise.
  • Sacrifice grammar for the sake of concision.

claude-workflows.md

When to bring Codex in as an independent reviewer, and the two touchpoints that matter: plan, then change.

Codex Collaboration

Trivial/mechanical work: skip Codex. Medium work: Codex review is useful, pre-plan brainstorm is optional. Complex/high-stakes work: run the full loop below.

Set Up Codex

  • Prefer the mcp__codex__codex / mcp__codex__codex-reply tools; continue a thread with the returned threadId. No MCP tools? Use the CLI fallback below.
  • Keep one Codex thread for the whole task; resume it across every touchpoint below so Codex keeps context.
  • Always have Codex ask you a few questions before its final answer; answer them, then take the verdict.

CLI fallback (needs jq) — run Codex with --yolo for full access: no approval prompts, no sandbox, so it can web-search, run tools, and touch the network freely. Add --skip-git-repo-check (every call) when outside a git repo. Capture the thread id, then resume it:

codex exec --yolo --json -o /tmp/cx.out "<prompt>" > /tmp/cx.jsonl
TID=$(jq -r 'select(.type=="thread.started") | .thread_id' /tmp/cx.jsonl | tail -1)
# reply is in /tmp/cx.out; resume the SAME thread:
codex exec --yolo --json -o /tmp/cx.out resume "$TID" "<next prompt>" > /tmp/cx.jsonl

Resume only by explicit $TID, never resume --last/--all (ambiguous with parallel agents).

Touchpoints

  1. Pre-plan brainstorm — for complex/high-stakes work, before writing a plan, give Codex the problem, your thinking, and what you'd put in the plan. Wait for feedback.
  2. Plan review — once the plan is written, have Codex review it.
  3. Change review — once the changes are in, have Codex review them.

For reviews (2–3), give Codex:

  • the goal, the trigger that led here, the relevant code/architecture, and the proposed path — framed as context, not conclusion;
  • a first-principles ask: challenge assumptions and inherited requirements (remove, don't just satisfy), prefer simpler or no-change paths, find failure modes and hidden coupling.

"Plan is solid," "a smaller change is enough," and "no change needed" are valid answers — don't invite manufactured objections. Synthesize the strongest overall path, not plan-vs-Codex.

Dynamic Workflows

When dynamic workflows are available and you fan work out, instruct each agent to collaborate with Codex too — but only the last two touchpoints: plan review, then change review. The pre-plan brainstorm stays with the orchestrator.

  • Each agent opens its own Codex thread for its plan review and reuses that same threadId for the change review, so Codex keeps the plan context.
  • Same rule: agents have Codex ask questions before its final answer.

generate-and-cut.md

Open-ended work: generate a few independent attempts in parallel, then cut to the strongest.

Generate and Cut

For open-ended, creative work — a UI, a component, a writing outline, a paragraph, a whole piece — don't one-shot it. Generate several independent attempts, then cut.

The Loop

  1. Frame first. Pin down goal, audience, constraints, source material, and what "good" looks like. Ask only if something essential is missing; otherwise go.
  2. Generate N independent attempts (default 3; use ~5 for high-stakes or very open-ended work) — use fresh subagents in parallel when available, each with the same goal, context, and definition of done. The context is source material, not a conclusion to preserve.
  3. Cut sharply. Read all N side by side; keep what's strongest, most interesting, or most fitting — one standout, a graft of several, or a better target the round revealed. Do not average them flat: combine only when the combination clearly beats any single attempt. Name what you reject and why.
  4. Repeat. Regenerate a narrower round seeded by what survived, until it converges, the variants are worth handing over, or another round is unlikely to change the decision.

No subagents? Same method sequentially — generate each attempt without anchoring on the last, to preserve independence.

Same Prompt or Different Lenses

The one real fork when dispatching:

  • Same prompt for every attempt when independent tries beat assigned roles.
  • Different lenses when diversity helps: concise/expansive, conservative/bold, user-first/technical, layout alternatives, risk-first, narrative-first, strategy-first.

What to Return

  • Hand variants to the user to choose when the distinct options are worth seeing.
  • Synthesize the final result when the best path is clear.
  • Run one more narrow round when the first missed but revealed a better target.

Report concise: approaches tried, what survived, what was cut, open decisions.

If Dynamic Workflows Are Available

A parallel fan-out: generate the N with parallel(), cut in code or a synthesis agent, and loop rounds until convergence (or a "good enough" critic stops it). Give each round's agents a distinct angle to widen coverage.

good-design.md

UI checks: hierarchy, states, honest copy, and the bar before any look is chosen.

Good Design

Good design is a stack of decisions you could defend out loud: every element earns its place by serving the subject, the hierarchy, or the action, and the result reads as intended rather than accidental. This holds before any look is chosen — flat or maximal, editorial or brutalist, the mechanics below separate a considered interface from a generic one.

1. Start from the work. Put the primary task or thesis on screen first, and keep every control beside the thing it affects. A handsome surface that buries the main job is wrong before style is even discussed; add nothing that doesn't improve scanning, comparison, action, or confidence.

2. Make every choice answer to the subject. Trace each color, type, image, and structural device — numbering, dividers, labels — to the subject, its audience, and the page's job, so each encodes something true instead of filling space. A treatment that would fit an unrelated product unchanged is a default, not a decision, and reads as generic; number a sequence only when its order is information the reader needs.

3. Concentrate emphasis, and let every difference be deliberate. Emphasis exists only against what surrounds it: rank what matters, make the steps between levels decisive, and make things that are alike look alike — so a difference signals a real one, not accidental style or token drift. When everything competes, nothing leads and the eye finds no entry point; when peer controls drift apart in height, weight, or spacing, they imply a hierarchy that isn't there. Even a maximal composition needs a focal order, and secondary elements should recede without becoming illegible.

4. Align what shares a line, and keep comparable peers consistent. Edges on the same structural line should line up, and items meant to be compared should share one anatomy and one separation logic. Near-miss alignments read as accidental and jog the layout — a maximal grid jogs too — and peers that each invent their own structure stop comparing; every broken line or odd-one-out should mark a real difference.

5. Design every state, completely and honestly. Give default, hover, focus, selected, disabled, loading, empty, and error each a distinct, visible treatment where it applies, and drive everything that depends on a state — counts, highlights, enablement — from one condition. Focus is not selection and selection is not hover; keyboard focus must always show; color is never the only cue, so every state still reads in grayscale — when the signals disagree the interface lies, like a "0" beside an enabled button.

6. Hold the structure steady through change. Reserve room for shifting labels and counts, anchor the controls that govern others, and keep the layout intact as content updates, the viewport resizes, and the theme switches. Elements that jog as state changes get mis-clicked, and a view that collapses by hiding its governing control stops explaining itself. A responsive view is not a shrunken copy, a dark theme is not a recolor, and motion moves only with cause — and honors reduced-motion.

7. Choose the interaction model before the visual shape. Decide what a thing is — a command, preference, selection, disclosure, status, or navigation — before deciding how it looks. The wrong model makes false promises: a toggle that fires a one-shot action, a disclosure that hides real controls, a button nested inside a button. Styling can't repair a broken contract, only hide it.

8. Make copy the next move. Treat words as interface: name things by what the user controls, keep one vocabulary across a whole flow, and write empty and error states as directions. "Publish" should yield "Published," not "Success"; an empty state should say what to do next and an error what failed and how to recover; vague, clever, or mood-only copy leaves the user with no move.

9. Build the chosen direction fully. Execute whatever ambition you pick to the level it demands — an elaborate direction needs elaborate, exact follow-through; a spare one needs precise spacing, type, and detail. A bold gesture that appears once and changes nothing about hierarchy, content, or action reads as pasted on; a restrained direction without exact spacing, type, and state detail reads unfinished.

None of this picks a look — it is the floor every look stands on: grounded, focused, honest, and stable. Good UI is not a costume; it is consistent judgment made visible under use.

It works best with Claude and Codex together

The workflows file assumes you run both. On Codex alone, drop the CLAUDE-WORKFLOWS block; the others stand by themselves.

There's a clean way to wire this for both. Keep one AGENTS.md (Codex reads it as-is) and make CLAUDE.md a single line:

CLAUDE.md

@AGENTS.md

Claude expands that include in place. One source of truth, no symlink. The same trick keeps Claude-only bits away from Codex: reference @claude-workflows.md from inside AGENTS.md, and Claude pulls it in while Codex leaves the bare line inert.

Why the pair, at least today: to me, Claude Code just has the better orchestration right now. Subagents can find each other and hand off directly instead of only taking a top-down briefing, and you can run waves of them with caching and retries built in. So Claude drives the build and the fan-out, and Codex comes in as the outside reviewer that doesn't share Claude's blind spots. Two models checking each other beats one model grading its own work.

Why not just install a skill stack

These four files are the opposite of a skill stack. gstack, Garry Tan's setup, is the most visible one: a simulated engineering org, CEO and eng manager and QA and security officer, running a think, plan, build, review, ship loop. There's real value in there. The forcing questions surface edge cases; the review passes catch bugs CI misses.

But that whole org chart rides along in your context. One writeup clocks a single role-heavy skill at over 10k tokens before a line of code, and subagents inherit that payload. For a rename or a one-file fix, it's mostly tax.

I'd rather carry the few patterns that actually pay off and skip the rest. A forcing question before code. One sharp review pass. A standing bias toward less code. A bar for what good UI even is. That fits in four files you can read in a minute.

Where this comes from

Most of this isn't mine. It's a cut of a few things worth crediting:

  • The behavior rules echo the Karpathy-style CLAUDE.md going around: think before coding, simplicity first, surgical changes, goal-driven execution. Good engineering taste, written down.
  • The "stop at the first rung that holds" ladder is Ponytail: the laziest senior dev in the room, whose best code is the code he never wrote.
  • generate-and-cut runs on the same instinct as Superpowers and its visual brainstorming, where the agent renders design ideas as live HTML in a browser tab so you judge the real thing instead of ASCII sketches. I just want the pattern, minus the rendering: a few independent attempts, then cut.
  • A couple of lines are lifted straight from tweets: the atomic-commit rule, touch only what you changed and list each path, from Pete Steinberger; and "be extremely concise, sacrifice grammar for concision" from Matt Pocock.

The design file is the exception. No link, because it's mine: the bar I kept retyping into agent chats until I wrote it down.

Each of these is also, fairly, "just prompts in a text file." That's the point. A good text file changes how the model behaves, and that's most of the game.

Build a bundle

Build your prompt

Untick anything you don't want.

4 of 4 selected

Each section comes wrapped in a tag so the agent can keep them straight. Tick what you want, copy, and paste it into a fresh session to start working.