← Back to blog

When Your Automation Breaks Everything

I've written before about how I run SubraLabs with Claude Code handling everything from email triage to finance reports. Scheduled tasks run three times a day, skills handle deployments, agents review code. It works — until it doesn't.

Recently, my email-triage bot deleted 35 Swift files from a project I was actively developing. The app stopped compiling. I caught it during the next build cycle and traced it back to a single automated commit.

This is the story of what went wrong, and what I replaced everything with.

The Incident

I use Claude Code for scheduled tasks — email sync, daily reports, security scans, financial analysis. These run automatically through Cowork Desktop, which launches Claude Code sessions on a timer and passes them a prompt.

The email-triage task was supposed to do three things: sync emails from Apple Mail, classify them by project, and generate a report. Simple enough.

But the prompt was vague. It said things like "analyze the inbox and generate a report." It didn't say what not to do. And on April 8th, the bot decided to be helpful. It ran git add -A, staged everything in the working directory — including files that had nothing to do with email — and committed. The commit message said [email-triage] report 08/04 run #3. The diff deleted 35 Swift files from a client project that had been created minutes earlier in a different session.

The merge that followed restored most of them, but several files slipped through. I caught it when the app refused to build and traced the root cause to that automated commit.

Why It Happened

Three compounding failures:

The prompt had no guardrails. It didn't say "never run git commands." It didn't say "never touch files outside reports/." It was a paragraph of instructions that assumed the AI would stay in its lane. It didn't.

Cowork Desktop has no permission bypass. It runs Claude Code in interactive mode, which means every tool call technically requires approval — but scheduled tasks run unattended. The result is a weird middle ground where the bot can do dangerous things but shouldn't, enforced only by prompt instructions.

git add -A is a footgun. One command stages everything — new files, deleted files, modified files, files from other sessions. In a monorepo where multiple sessions might be active, it's catastrophic. I had no pre-commit hook, no deny rules, nothing to catch bulk deletions.

The Soft Fix (Didn't Work)

My first reaction was to add "DO NOT run git add, git commit, or git push" to the email-triage prompt. A soft fix — an instruction that the bot should follow but can't be forced to follow.

It worked for a few days. Then I realized the deeper problem: Cowork Desktop was generating five identical email-triage reports per day. Five sessions, five prompts, five reports with nearly the same content. The classification was done by MiniMax, an external API that cost money per call. And the reports weren't even useful — they were walls of text with no structure.

The Hard Fix

I scrapped the entire Cowork setup and replaced it with something simpler: Claude Code CLI running from cron.

The key insight is that claude -p "prompt" --dangerously-skip-permissions does exactly what Cowork Desktop does, but better. It runs Claude Code in print mode — non-interactive, fully autonomous, with permission bypass so it can actually execute scripts. And cron handles the scheduling.

Here's the setup:

# crontab
0 8 * * *    ./scripts/cron-runner.sh morning-sync
0 14 * * *   ./scripts/cron-runner.sh morning-sync
0 20 * * *   ./scripts/cron-runner.sh morning-sync
0 9 * * 1    ./scripts/cron-runner.sh weekly-security
0 9 1 * *    ./scripts/cron-runner.sh monthly-finance

A wrapper script (cron-runner.sh) reads a prompt file from scripts/cron-prompts/, passes it to Claude Code, and saves the output to a log file. That's the entire infrastructure.

The Prompts Are the Product

The real work is in the prompt files. Each one is a structured markdown document with two sections: safety rules and step-by-step instructions.

The safety rules come first and they're non-negotiable:

## SAFETY RULES (NON-NEGOTIABLE)

1. NEVER run git add, git commit, git push, or any git command
2. NEVER modify: PROJECT.md, CLAUDE.md, .claude/, .gitignore
3. NEVER delete files
4. NEVER create branches or touch git history
5. NEVER install packages
6. NEVER SSH to external servers
7. If something fails, log the error and move on
8. If there's nothing new, don't generate a report — just exit

Then the steps. Each step is explicit about what to run, what to read, what to write, and what format to use. No ambiguity, no room for "being helpful."

The morning sync prompt, for example, tells Claude to:

  1. Run python3 scripts/sync_mail.py --skip-classify — this extracts emails from Apple Mail and dumps them into a staging folder without any AI classification
  2. Read each new email and classify it into project folders — this is where Claude's intelligence actually helps, and it does it better than the MiniMax API I was paying for
  3. Check if any email requires action and add it to the priorities list — append only, never remove existing entries
  4. Generate a report with a specific markdown format — tables, sections, separators
  5. Push everything to the ops dashboard via API
  6. If there's nothing new, print "No updates" and stop

The --skip-classify flag was a small but important change. The old system used MiniMax (a third-party AI API) for email classification. It cost money, added latency, and was worse at classifying than Claude itself. Now Claude does the classification as part of the sync — no external API, no extra cost since it's all running on my Claude Max subscription.

What I Removed

The cleanup was aggressive:

  • 5 Cowork Desktop tasks → 3 cron jobs (morning sync, weekly security, monthly finance)
  • 43 skills → 19 (I'd accumulated 21 "design marketplace" skills I'd never used once)
  • MiniMax API dependency → removed entirely (Claude classifies emails now)
  • 3 MCP servers → removed (Remotion docs, Nano Banana image gen, Playwright — never used any of them)
  • An unused API key left in a config file → revoked

The lesson here is obvious in retrospect: I was accumulating tools and automations without ever auditing whether they worked, whether I used them, or whether they were even safe. The email-triage incident forced me to look at everything, and most of it was dead weight.

The Guardrails

Beyond the prompt rules, I added structural guardrails:

Deny rules in settings.json: git add -A, git add --all, and git add . are permanently blocked for all Claude Code sessions. You have to stage files by name. This prevents the class of bug that caused the incident.

Pre-commit hook: a bash script in .git/hooks/pre-commit that counts deleted .swift files in the staged diff. If more than 5 are being deleted in a single commit, the hook rejects it. You can bypass it with --no-verify for intentional refactors, but a bot never would.

Write protection hook: a PreToolUse hook that blocks any Edit or Write operation on files in finanze/, .env, or anything under legacy/. These directories should never be touched by AI.

Three Runs, Not Five

The old email-triage ran five times per day and generated five nearly identical reports. The new morning sync runs three times — at 8am, 2pm, and 8pm. If there are no new emails, it doesn't generate a report at all. The prompt explicitly says: "if there's nothing new, don't create an empty report."

This covers the day without wasting resources. If a client emails at 8:01am, the 2pm run catches it. If something urgent comes in at night, the 8am run gets it first thing.

What I Actually Learned

The incident taught me three things:

Automation without guardrails is worse than no automation. A human would never run git add -A in a monorepo with active sessions. A bot will, cheerfully, unless you explicitly tell it not to. And "tell it not to" means both prompt instructions and structural enforcement.

Audit your tools regularly. I had 43 skills, 7 MCP servers, and 5 scheduled tasks. Most of them were unused. Some were actively harmful. I only found out because something broke badly enough to force a review.

Simple beats clever. Cowork Desktop is a nice app with a UI and scheduling and session management. A bash script with cron and claude -p does the same thing, but I can read the prompt, check the log, and understand exactly what happened. When automation goes wrong — and it will — you want the simplest possible system to debug.

The new setup has been running for a day. Three cron jobs, three prompt files, one wrapper script. Everything logged, everything auditable. Ask me in a month if it's still working — I'll probably have broken it again by then, but at least I'll know exactly how.