Garbage Cans, Bitter Lessons, and Your AI Strategy

On the outside, most companies present as pristine and organized. But this couldn't be further from the truth. Classic org theory calls this the Garbage Can reality: lots of unwritten rules and ad‑hoc workarounds. The old way to deploy AI was to document and standardize processes first, then automate. A newer path, shaped by the Bitter Lesson from AI research, is to define excellent outputs, assemble examples, and train agents to hit the target without hard‑coding the how. You still need governance, but you may not need to untangle every spaghetti workflow before you see value.

The Map vs. The Mess

Ethnographer Ruthanne Huising studied teams that mapped actual work from raw materials to finished goods. The maps revealed unused outputs, semi‑official shortcuts, and duplicated efforts. In one case, a CEO saw the full picture for the first time and groaned that things were worse than he imagined. The point: reality is more tangled than the slide deck.

Organizational theory has a name for this: the Garbage Can Model. Problems, solutions, and decision‑makers slosh around until decisions happen when the right items collide. Structure still matters, but much of what “works” was negotiated over time and rarely documented.

Meanwhile, employees are already using AI informally. Shadow adoption is real, and it is growing. Scaling that usage across the enterprise is hard because traditional automation needs clear rules and clean handoffs, which Garbage Can organizations often lack.

Bitter Lesson Meets Agents

AI’s Bitter Lesson says that as problems get harder, handcrafted human knowledge underperforms general methods plus compute. Chess did not yield to encoded grandmaster tips; systems that learned through self‑play and brute‑force search did. For leaders, spend less time teaching the machine each step and more time defining what a great result looks like. Measure and govern the output.

That connects directly to agents. Early agents were carefully crafted with long system prompts, checklists, and to‑do routines. They work, but they are brittle and expensive to maintain. A newer class of generalist agents can plan, browse, type, and produce end‑to‑end deliverables like spreadsheets and slides. The key shift is to train for final output quality, not to imitate the human path. If the Bitter Lesson keeps holding, outcome‑trained agents will scale faster than bespoke prompt engineering.

What This Means for Enterprise AI

Here is the fork in the road for leaders: clean up the maze and then automate it, or define what a great output looks like and train agents to deliver it through the maze. Most companies will mix the two, using process where the how matters and outcome where the what is easy to judge. The guide below shows when to pick each path and how to blend them.

You have two viable strategies. Many firms will blend them.

A) Process‑First (Map → Standardize → Automate)
• Best when compliance, traceability, and safety are paramount.
• Requires heavy process discovery and change management.
• Slower to value in Garbage Can environments.
Use when: finance controls, regulated workflows, SOX‑sensitive tasks.

B) Outcome‑First (Define “Great,” Then Train to It)
• Specify the target deliverable and its rubric.
• Provide high‑quality exemplars.
• Let agents find their own path through messy realities.
Use when: outputs are easy to judge and data is abundant, for example sales briefs, support replies, research summaries, or first drafts of RFP responses.

Progress suggests agent reliability on longer tasks is rising, although we are not at fully autonomous week‑long projects. Plan for co‑pilot today and supervised agent tomorrow.

The Outcome‑First Playbook (90 Days)

This is a quick start plan for leaders who want results without mapping every workflow first. Pick one high value output, define what “good” looks like in plain language, and wrap a simple scoring loop around it so the agent improves week over week. It is practical, controlled, and measurable, with humans in the loop and policy on rails.

Step 1: Pick one valuable, judgeable output.
Examples: weekly enterprise sales pipeline brief, Tier‑2 support reply, competitive teardown slide.

Step 2: Define “good.”
Turn tribal knowledge into a 10–12 point rubric: accuracy, coverage, sources, formatting, tone, data freshness, policy safety, time to produce, and a simple pass‑or‑fail gate.

Step 3: Assemble golden records.
Collect 25–200 example outputs that truly meet the bar. Label them against your rubric. Keep 20 percent as a holdout test set.

Step 4: Build a scoring loop.
Use humans and simple checks, for example link validity and number checks, to score candidate outputs. Avoid Goodhart’s Law by keeping a human composite score, not only automated metrics.

Step 5: Train or steer the agent to the rubric.
Use few‑shot exemplars, tool access, retrieval, and structured instructions. Where available, use fine‑tuning or reinforcement from human feedback on the final output, not the steps.

Step 6: Guardrails.
Policy (PII handling, sourcing rules, escalation triggers). Controls (citation requirements, tool whitelists, spend caps, audit logs). Red‑team common failures (hallucinated facts, reward hacking, over‑confident summaries).

Step 7: Shadow mode to limited production.
Run the agent in parallel. Compare time‑to‑output, revision rate, quality score, error rate, and cost. Promote only if it beats the baseline with acceptable risk.

Step 8: Iterate weekly.
Add failed cases to training, tighten the rubric, expand the gold set, and push the agent to handle more edge cases.

The Strategic Bet

If Bitter Lesson dynamics hold in workplace tasks, competitive moats built on “we documented every process” may matter less than moats built on defining quality and owning the examples. You decide the outputs that win. Your agents learn the path. Your governance keeps it safe.

Be realistic though. Today’s agents still struggle with long, messy, multi‑hour jobs without supervision. Capabilities are improving quickly, so treating agents as supervised colleagues, then ratcheting up autonomy where metrics justify it, is a sensible near‑term strategy.

Previous
Previous

Prompt Testing is the Dress Rehearsal Your Ads Need

Next
Next

Doom Prompting: AI’s Version of Doomscrolling