Why you need this guide
You already use AI in your business. You have an account with Claude, ChatGPT, or Gemini. You pay for them every month. But somewhere, deep down, you feel that you're not getting everything you could be getting.
That feeling is not misplaced. The difference between someone who gets mediocre results from AI and someone who gets professional results is not talent, nor technical experience. It is vocabulary. There are 40 to 50 key terms in technical English that, when dropped into your commands, make the AI respond at a completely different level of quality.
This guide gives you those words. No theoretical padding, no useless jargon. With concrete examples and real-life analogies.
Hours rewording the same command
You get a draft, it is close to what you want but not quite, you ask for changes, you get something else, you start over. With the right vocabulary, the first delivery is the good one in 80% of cases.
Double AI costs for the same output
Every "try again" costs tokens. Every revision cycle eats into your subscription. Precise commands cut consumption by 30 to 60%.
Dependency on freelancers you can't audit
You pay someone to do "something technical", you don't understand what they delivered, you can't verify. With the vocabulary in this guide, you can ask questions that show you know what you're asking for.
Who this is for
Founders, managers, marketers, and business operators who already use AI in their work, but feel that the results are below their real potential. You don't need to be a programmer. You just need to be curious and want to take your digital work to the next level.
If you already speak fluent technical English and have built AI systems for clients, this guide is too basic for you. For everyone else, it is exactly what you need.
Three ways to use it
The guide teaches you the words. If you want to use them without building the infrastructure yourself, there are two RoboMarketing products that do exactly that:
RobOS
Memory, workspaces, and anti-improvisation verification on top of Claude Code. "Don't guess, verify" becomes default behavior.
Agent Factory
Dedicated VPS with 4 AI agents publishing daily. Idempotency, rate limiting, secrets management, pre-installed.
★ Recommendation
Whichever mode you pick, do two things in the first 15 minutes: (1) print the cheat sheet and put it next to your monitor, (2) memorize the five words in the Top 5. The rest comes with practice.
How to use this guide
This guide is not meant to be read once, from start to finish. It is built as a working tool: you open it when you have to write a command to an AI agent and you don't know how to phrase it to get what you want.
Each chapter covers a family of "golden words", technical English terms that, when you drop them into your command (even if the rest of the command is in plain English), signal to the AI agent that you expect a certain level of quality, rigor, or type of thinking.
For each word you will find:
- What it means, in a couple of sentences, in plain language
- A real-life analogy, so it sticks in memory
- When to use it, concrete business situations
- Weak command vs. command with golden words, examples side by side
- Traps, mistakes most people make
★ The single most important piece of advice
You don't need to memorize the whole guide. Memorize just the five words in the "Top 5: On the wall" section. Look up the rest when you need them. The guide has a cheat sheet at the end for exactly that purpose.
Why words matter
Imagine you've hired an extremely talented programmer who has read everything ever written about programming, business, design, and marketing. The only problem: they can't read your mind.
If you say "build me a Shopify integration", they will build something, probably what they've seen others build, on the quick side, with no verification, with no thought to the cases where things break. It will work "kind of".
But if you say "build me a Shopify integration, idempotent, with rate limiting, and before you say it's done write evals for it", the same programmer will deliver something substantially better. Not because the words are magic. Because each word activates in their mind a set of standards and checks they would have skipped otherwise.
AI agents work exactly the same way. They were trained on millions of technical documents where, for example, the word "idempotent" always shows up next to safety procedures, retry logic, and protection against duplicates. When you drop it into your command, the agent pulls along all those standards.
The big secret: the agents' language
Here is the good news for you, if you don't speak technical English: you don't need to speak it fluently. You just need to know these 40 to 50 key terms and slip them into your everyday sentences.
It's like going to a mechanic when you know nothing about cars, but saying: "I want you to check the suspension and the brakes before you give me the car back". The fact that you used the words "suspension" and "brakes" tells the mechanic that you are not completely lost and that you expect a serious check, not a quick patch. AI agents work the same way.
This guide gives you the mechanic's vocabulary. You don't need to become a mechanic. You just need to know what to ask for.
Anatomy of a good command
A good command to an AI agent almost always has three parts:
| Part | Role | Example |
|---|---|---|
| 1. WHAT? | The concrete task | "Build the Shopify integration." |
| 2. HOW? | Standards and constraints (this is where the golden words live) | "Make it idempotent, with rate limiting and retry exponential backoff." |
| 3. DONE WHEN? | Checks and acceptance criteria | "Before you say it's done, write evals and run them." |
Most people write only part 1: "build the Shopify integration". Then they wonder why the result is disappointing. Golden words are the tool you use to fill in parts 2 and 3 effortlessly.
5 rules before you start
1. Use the words in English, even if the rest of your sentence is not.
AI agents understand technical terms best in English. Write "make it idempotent", not "make it so that it gives the same result when run multiple times...". The technical term is more precise than any paraphrase.
2. Don't use words you don't understand.
If you slip in a term just to "sound technical" and the agent asks you a follow-up about it, you will be stuck. Use only words you can explain in two sentences.
3. Start simple, then add rigor.
For a small task, you don't need all 50 words. Use 1 to 3, the relevant ones. For a big project, go up to 5 to 10.
4. Always ask for verification.
The single best rule in the whole guide: whatever you ask, add at the end "don't guess, verify" or "write evals and run them". That alone eliminates 70% of agent mistakes.
5. Re-read the result with a critical eye.
Golden words improve the result, but they don't make it perfect. You are still the one who decides whether what the agent delivered is good enough for your business.
✓ Verdict before Chapter 1
If you stop here and just apply the rules above, you will already get considerably better results. The chapters that follow give you the concrete vocabulary. Read them at your own pace.
Quality and verification
These five words are the most important in the entire guide. If you remember only this much from this document, you already place yourself in the top 10% of people working with AI agents.
They are all about the same thing: how you force the agent to verify its own work before handing it to you. Without them, you will always get results that "look fine" until the moment you put them in production and discover that they are not.
evals
short for "evaluations"
What it means: Sets of automated tests that check whether your agent (or the functionality you built) actually does what it should. They are like a written exam you give the agent, with hundreds of questions, and the agent reports how many it got right.
Think of a baker who wants to buy a new oven. Before paying, they want to test it: bake 5 loaves at different temperatures, see if the crust comes out right, if it bakes evenly, if anything misbehaves. Evals are the 5 trial loaves. Without them, the baker buys on faith, and finds out something is wrong only when 200 customers are waiting.
When to use it
Always, when you build something new. Before you "release" any new functionality, whether it's an agent that answers customers, a script that sends emails, or a Shopify integration, ask for evals. And especially when you modify something that already works: evals tell you whether the change broke something else.
Weak command vs. command with golden words
⚠ Common trap
Many people ask "write evals" and stop there. The agent will write 3 trivial tests that all pass and tell you it's done. Specify how many tests you want and what types of scenarios they should cover (happy path + edge cases, see the words in the next chapters).
ground truth
literally "the truth on the ground", the reference answer
What it means: The correct answer, verified by a human, against which the agent's result is compared. If evals are "the exam", ground truth is "the answer key". Without the key, you don't know whether the answers are good or bad.
In a math exam, the teacher has a sheet with the correct answers. That is ground truth. Without it, they can't grade the students. Likewise, without ground truth, your agent doesn't know whether the answer to "how many orders did we have yesterday?" is correct, it only knows that it gave an answer.
When to use it
Always next to the word "evals". Ask the agent to define ground truth for each test. For example: if it asks "how many orders did we have yesterday?" and the ground truth (manually verified) is 47, the agent knows that an answer of "around 50" is too vague, and "23" is clearly wrong.
Weak command vs. command with golden words
sanity check
literally "common-sense check"
What it means: A quick, obvious verification that a result makes sense in the real world. Not a deep check, just "could this be true?". Sanity checks catch the big, obvious mistakes that would otherwise slip past.
If a child says they ran 100 meters in 3 seconds, you don't need to be an Olympic coach to say "wait, that's impossible". That is a sanity check. If your agent reports that you had 50,000 Shopify orders in one hour (when the average is 20 per hour), a sanity check would catch it instantly.
When to use it
On any automated report, before it gets forwarded (to a client, on Slack, by email). On any large transaction. On any bulk change (for example, price updates on 500 products).
Weak command vs. command with golden words
dry run
running through the steps without any real effects
What it means: You run the action, but nothing actually happens. The agent shows you what it would do, so you can verify before you press "do it for real". It is the most important safety net when the action has consequences (sending emails, changing prices, deleting customers).
Before a pilot takes off, they go through an entire checklist without touching the real controls: "check the throttle... ok, but don't move... check flaps... ok, but don't move". That is a dry run. Then, once they are satisfied that everything is ready, they start doing things for real. Likewise, before sending 5,000 emails to your customers, you want to see the list of 5,000 without anything being sent.
When to use it
Mandatory for: mass email, bulk price changes, data deletions, large system syncs, any batch operation. Recommended for: new scripts you're running for the first time.
Weak command vs. command with golden words
★ Tip
Combine dry run with sanity check. Dry run shows you what would happen, sanity check verifies whether it makes sense. Together, they are almost impossible to fool.
smoke test
the term comes from electronics
What it means: The smallest possible test that confirms the system isn't completely broken. The name comes from engineers who, after repairing a device, would power it on for the first time while standing back: if it smoked, something fundamental was wrong.
When you start your car in the morning, before going on a family road trip, you listen to the engine for 5 seconds. If you hear something weird, you stop and check. If it sounds normal, you go. You don't do a full service every morning, just that short smoke test. Likewise, after any change to your system, run 2 or 3 key actions to confirm that nothing fundamental is broken.
When to use it
After every change, however small. After a version update. After rotating a token. After a rebrand. After a server move. Smoke test = the one bit of assurance that the change didn't kill something critical.
Weak command vs. command with golden words
Chapter 1 recap
| Word | In a sentence | When |
|---|---|---|
| evals | Automated tests that check the agent | Always, when building or changing |
| ground truth | The correct answer to compare against | In every eval |
| sanity check | Quick "does this make sense?" | Reports, transactions, updates |
| dry run | Run with no real effects | Mass email, deletions, big changes |
| smoke test | Minimal "nothing is broken" check | After any change |
◆ Combined command you can copy
"Build [X]. Before you say it's done: write evals with happy path and edge cases, define ground truth for each test, add sanity checks on the key numbers, do a dry run if the action has real effects, and run a smoke test at the end."
"evals", "ground truth", "sanity check", done automatically
If you already use Claude Code, there is a product that does exactly this: it verifies claims before writing, stops you from closing a session without saving, keeps an audit of decisions. All the words from this chapter, installed with one command.
See RobOS →How the agent thinks
The words in this chapter don't ask the agent to do anything different, they ask it to think differently before acting. The difference between an agent that throws the first idea that comes to mind, and one that actually reasons.
think step by step
literally what it says
What it means: You tell the agent to break its thinking into explicit steps, instead of "jumping" to the answer. It sounds trivial, but it is one of the most studied techniques in AI: agents make significantly fewer mistakes when forced to think step by step.
In school, when the teacher asked you to solve a math problem, they said: "show me the steps, not just the answer". Not because the steps were interesting, but because writing the steps forced you to think correctly. If you only wrote the answer, you were guessing. AI agents have the same weakness and the same fix.
When to use it
On any request that involves a complex decision, a calculation, a comparison, or analyzing a situation. Not on simple tasks ("send email X"), where it just adds noise.
first principles
reasoning from fundamentals, not from analogies
What it means: Start from the fundamental basics of the problem, not from "how it's usually done". It is the opposite of imitation thinking. Ask the agent not to simply copy what it has seen in other projects, but to ask "what is actually needed here?"
Everyone built rockets by reusing NASA's old design. Someone asked from first principles: "why is a rocket expensive? What is it made of? How much do the raw materials cost?" They discovered that materials were 2% of the price, the rest was labor and the loss of the rocket after launch. That is how reusable rockets were born. The same kind of thinking can find simple answers to apparently complex business problems.
When to use it
When you suspect that the agent will give you a "standard" solution that doesn't fit your case. Or when you want to optimize costs / processes and everyone else does "things this way because that's how it's always been done".
chain of thought
the full reasoning chain
What it means: A more detailed version of "think step by step". You ask not just for the steps, but for the complete reasoning: why you picked step A over B, what you considered and rejected, what assumptions you made. Used correctly, it lets you "audit" the agent's thinking and catch flawed reasoning.
The difference between a consultant who says "I recommend you invest in Meta ads" and one who says "I analyzed 3 options: Meta, Google, and TikTok. I picked Meta because your target audience is 35 to 55 years old (70% of them are there), the budget is small (Meta works well under $1,000 per month), and you already have quality visual content." The second is chain of thought. You can argue with it if you spot a flaw. With the first, you can't.
When to use it
For strategic decisions. For cases where the agent gives you a recommendation and you want to understand what it is based on, not just what the answer is.
red team
military term: the "red team" that simulates the enemy attack
What it means: You ask the agent to attack its own solution. To put itself in the role of a critic, hacker, or unhappy customer and find every way the solution could break. It is the exact opposite of what an agent does naturally (justify what it built).
In the military, when planning a mission, two teams are formed: one plans the attack ("blue team"), the other plays the enemy trying to stop it ("red team"). That is how they find weak points before they become casualties. Likewise, once the agent builds you a solution, have it take on the role of "a customer trying to break it".
When to use it
On critical solutions (anything involving money, security, customer communication). On new code before it goes live. On campaign plans before you sign off. On anything that, if it breaks, costs you dearly.
steel-man
the opposite of "straw-man", build the strongest version of the opposing argument
What it means: Before you make a decision, you ask the agent to build the strongest argument against what you want to do. Not a caricature of the opposition, but the smartest, most serious version. Only then do you know whether your decision actually holds up, or whether it is just enthusiasm.
Before buying a house you love, ask a smart friend to tell you the best reasons NOT to buy it, not jokes, not surface-level criticism, but the real risks they see. If you survive their steel-man, your decision is solid. If it shakes you, you've learned something important before you wire the money.
When to use it
On big decisions: launching a new product, changing strategy, signing a major contract, hiring. Steel-man protects you from your own enthusiasm.
Chapter 2 recap
| Word | In a sentence | When |
|---|---|---|
| think step by step | Force explicit steps | Complex decisions, calculations |
| first principles | From fundamentals, not copying | Optimizations, non-standard solutions |
| chain of thought | Full, auditable reasoning | Strategic decisions, recommendations |
| red team | Attack your own solution | Before going live in production |
| steel-man | Strongest counter-argument | Big decisions |
Anti-guess: force rigor
The biggest problem with AI agents is not that they don't know things, it's that they don't know what they don't know. The words in this chapter are the tools you use to force them to admit uncertainty and verify before asserting.
don't guess, verify
literally what it says
What it means: Probably the single most powerful instruction in this guide. You tell the agent explicitly: do not invent. If you don't know for sure, search, ask, verify in the docs. Better to admit you don't know than to confidently say something wrong.
The difference between a good doctor and a bad one: the bad doctor gives you a diagnosis on the spot, to look sure of himself. The good doctor says "there are 3 possibilities; we need to run these tests to be sure". You want your AI agent to be the second one, not the first. "Don't guess, verify" is exactly the command that turns one into the other.
When to use it
Almost always. Add it at the end of any command involving facts, concrete data, or actions with consequences. It costs three words and probably eliminates 50% of agent "hallucinations".
⚠ Uncomfortable truth
AI agents are trained to be helpful. That means if they don't know an answer, they will invent a plausible one instead of saying "I don't know". "Don't guess, verify" is the direct antidote.
cite your sources
literally what it says
What it means: For every factual claim, you require the concrete source. A link, a file name, a document, a section in the docs. It forces the agent to ground its answers in something verifiable, not in something that just sounds right.
In school, when you wrote a paper, the teacher required a bibliography. Not because they cared about the list of books, but because the simple obligation to cite forced you not to make up facts. Same with AI agents.
When to use it
On research, technical comparisons, supplier recommendations, statistics, any claim of the form "X% of customers...", "studies show that...", "best practice is to...".
show your work
show the calculations and reasoning
What it means: For any calculation, decision, or result, you want to see how it got there, not just the final answer. Different from "chain of thought", which is for full reasoning. "Show your work" is more granular, for concrete steps, calculations, formulas.
When the accountant gives you the final profit number, you want to see how they got there: what revenue, what costs, what taxes. If they only give you the number, you can't verify it. Same with the agent: if it says "the campaign generated $47,000", you want to see where the 47,000 comes from, how many sales, through what channels, over what period.
When to use it
On reports with numbers. On decisions based on calculations. On any result that will drive a significant action. Not on creative or conversational tasks.
acceptance criteria
what must be true for the task to count as done
What it means: Before the agent starts, you define together a concrete list of conditions that, if all met, mean the task is done. It is like a contract: "if you do A, B, and C, we are done. If not, we are not."
When you hire a painter, the contract doesn't say "paint well". It says concrete things: "two coats of paint, smooth surfaces, no drips, straight edges". Those are acceptance criteria. Without them, "well" means anything. With them, you have a basis to say "wait, you're not finished yet".
When to use it
At the start of every non-trivial task. Ask the agent to propose acceptance criteria, you confirm them, then let it run. This alone eliminates 80% of "I thought you wanted something else" situations.
definition of done
the definition of "done"
What it means: A stricter cousin of acceptance criteria. A universal quality standard that any deliverable has to hit before being declared "done". Acceptance criteria are specific to a task; definition of done applies to everything you deliver.
In good restaurants, before any plate leaves the kitchen, the chef inspects it: right temperature, clean presentation, fresh ingredients. That is the definition of done for anything leaving the kitchen. It doesn't matter if it is soup or steak, everything goes through that filter.
Example of definition of done
- ✓ All tests (evals) pass
- ✓ Code is commented in complex areas
- ✓ User-facing documentation exists
- ✓ Sanity check has been done on results
- ✓ A smoke test has been run
- ✓ Errors are logged (observability)
- ✓ Tokens are in secrets, not in code
- ✓ Results are reproducible with the same inputs (idempotent)
★ Tip
Print your definition of done and put it somewhere visible. Then, on any major command to an agent, just write: "follow the standard definition of done". The agent will walk through each point.
Chapter 3 recap
| Word | In a sentence | When |
|---|---|---|
| don't guess, verify | Don't invent, search, verify | Almost always |
| cite your sources | Sources for every claim | Research, statistics, "best practices" |
| show your work | Show the calculations, not just the result | Reports with numbers, numeric decisions |
| acceptance criteria | List of conditions for "done" | At the start of every task |
| definition of done | Universal standard for all deliverables | Defined once, applied everywhere |
"Don't guess, verify", applied continuously, without you having to write it
The most powerful instruction in this guide becomes useless if you have to repeat it on every command. There is a product built on exactly this philosophy: it says "I don't know" instead of inventing, keeps a decisions journal you can audit 6 months later, and refuses to improvise when it has no basis.
See RobOS →Code and architecture
Now we step into "construction" territory, how what the agent delivers is built, so it is solid and doesn't break in 3 months. These words are not just for programmers. With them, even if you don't see the code, you can demand it meet quality standards.
separation of concerns
each piece does one thing
What it means: Your code should not be a "tangle" where everything depends on everything else. Each piece should have one clear responsibility. The piece that sends emails should not be the same one that calculates discounts.
In a good kitchen there are separate stations: one for cutting, one for sauces, one for cooking. If a new chef wants to change the sauce, they only touch the sauce station, they don't touch the cutting knives. Likewise, if you want to change how you send emails, you shouldn't have to dig through the code that calculates discounts.
single source of truth
SSOT for short
What it means: For any important piece of information, there is one single place where it "lives". If you want to change it, you change it in that single place and the rest of the system updates itself. The opposite is chaos: the same price written in 5 different places.
In an office, if every employee keeps their own price list, in two weeks you have 5 different lists and customers get 5 different quotes. If there is one official list, everyone looks at it. That is single source of truth.
idempotent
math term: an operation that, repeated, yields the same result
What it means: An action is idempotent if you can run it 10 times and the result is the same as after the first run. Crucial for any action with consequences (sending emails, payments, stock changes), because in the real world, something will get triggered twice by mistake.
Pressing "Floor 5" in an elevator is idempotent. Press it 10 times, the elevator still takes you to floor 5, not to 50. Pressing "Place order" on a website is not idempotent, if the customer clicks 3 times, they get 3 orders and 3 invoices. You want your system's actions to behave like the elevator button, not the "Place order" button.
When to use it
Always. For any integration. Any script that sends emails. Any sync. Any action triggered by a webhook.
★ Top 3 reasons "idempotent" saves your business
1. Shopify webhooks sometimes fire 2 to 3 times (this is in their documentation).
2. If your script crashes halfway, you will want to re-run it, and not have it send things it already sent.
3. Customers click buttons 2 to 3 times. Idempotency saves them from duplicates.
observability
you can see what the system is doing while it runs
What it means: Your system "talks" about what it is doing, via logs, metrics, alerts, so that when something breaks you can investigate quickly. Without observability, a system that stops working is a black box.
Your car has observability: tachometer, fuel gauge, "check engine" light, tire pressure sensor. If something breaks, you immediately see what. Now imagine a car with no dashboard, just the steering wheel. That is what systems without observability look like. You drive until it stops suddenly, with no idea why.
fail loudly vs fail gracefully
"crash noisily" vs. "fail with grace"
What it means: Two complementary philosophies. Fail loudly, in development, you want errors to be obvious so you can fix them. Fail gracefully, in production, you want errors to be absorbed gracefully (friendly message, automatic retry, fallback), so the customer experience is not ruined.
When you are learning to cook, you want your partner to tell you straight: "too much salt", fail loudly. That is how you learn. When serving guests, you want them to politely pick the better slice of meat if one is over-salted, without announcing it at the table. Fail gracefully. Your systems need both, at different moments.
DRY
acronym: "Don't Repeat Yourself"
What it means: If you notice the same logic / code / configuration showing up in 3 or 4 places, it is time to centralize it. Repetition leads to bugs: you change it in one place, forget another, and the system becomes inconsistent.
If you write your company address separately on every invoice, contract, brochure, and website, when you move you have 50 places to update and you will surely miss one. Better to have one "official source" and have the rest pull from it.
YAGNI
acronym: "You Aren't Gonna Need It"
What it means: The antidote to the urge to build features "for the future". Build only what you need now. Add the rest when you actually need it.
When you buy a house, you don't build 6 bedrooms because "maybe someday we'll have 4 kids and 2 nannies". You build the bedrooms you need now. Same with AI agents: without YAGNI, you end up with a system with 30 features, of which you use 5.
Chapter 4 recap
| Word | In a sentence | When |
|---|---|---|
| separation of concerns | Each piece, one thing | Larger projects |
| single source of truth | One place per piece of info | Tokens, configurations |
| idempotent | Repeated run = OK | Integrations, webhooks, payments |
| observability | Logs, metrics, alerts | Everything running in production |
| fail loudly / gracefully | Noise in dev, grace in prod | Error handling |
| DRY | Don't repeat yourself | Refactoring, reviews |
| YAGNI | Only what you need now | Anti over-engineering |
Integrations
(Shopify, Klaviyo, WhatsApp)
Almost every modern business depends on "integrations", the bits of code that connect two external systems. Integrations are the most fragile type of code in the world. The words here are the tools you use to build integrations that don't break at the first storm.
rate limiting
you cap how many requests you send per second
What it means: Every external API (Shopify, Klaviyo, WhatsApp) has a clear cap: "you can't send more than X requests per second". If you exceed it, they block you. Rate limiting is the code that makes sure you don't exceed the cap.
On the highway there is a speed limit. You can go faster, but the police will fine you. Likewise, Shopify says "40 requests per second, max". If your script sends 100 in one second, Shopify "fines" you, it blocks the requests. Rate limiting is like cruise control that always keeps you under the limit.
When to use it
In every integration. Not knowing this is one of the main causes of "scripts that worked perfectly for 30 days and then suddenly broke".
retry with exponential backoff
retry with increasing wait times
What it means: When a request fails, you don't give up immediately. You retry, but with a wait that grows each time. First time you wait 1 second, then 2, then 4, then 8. That gives the other server time to "breathe".
You call a friend. They don't answer. If you call right back, and again right back, you annoy them and they block you. If you call once now, again in 5 minutes, then 15 minutes, then 1 hour, that is civilized, and they will answer. Servers behave the same way with retries.
★ Tip
Set a "ceiling" on the backoff (60 or 120 seconds). Otherwise the 10th attempt would be ~17 minutes later, which usually no longer makes sense.
webhook vs polling
"you are notified" vs. "you keep asking"
What it means: Two opposite ways of learning that something happened in an external system. Polling: you ask the system every 5 minutes. 99% of the time the answer is no. Wasted effort. Webhook: the system notifies you automatically when something happens. Efficient, real-time.
Polling = calling the courier once an hour to ask "has the package arrived?". Webhook = letting the courier ring your doorbell when they are at the door. Which is better? Obvious.
idempotency key
a unique code per request
What it means: A unique code you send with every important request (payment, order, email). If you accidentally send the same request twice, the server sees the same idempotency key and understands "duplicate, already processed", and does not process it again.
When you pay with a card, each transaction has a unique code. If you accidentally pay twice with the same code, the bank refuses the second one. That is idempotency key. Without it, duplicates are inevitable.
pagination
you fetch the data in pages, not all at once
What it means: When you have 10,000 orders in Shopify, you can't ask for all 10,000 at once, the server will refuse or, worse, give you only the first 50 and you lose the rest without noticing. Pagination = you take the orders in "pages" and walk through every page.
Imagine a 1,000-page book. You don't read all of it at once, you go page by page. If someone asks you for the book's contents and you only hand them the first 5 pages with "that's all of it", you lose 995 pages. That is what scripts without pagination do.
Chapter 5 recap
| Word | In a sentence | When |
|---|---|---|
| rate limiting | Stay under the API limit | Any integration |
| retry with exponential backoff | Retry with growing waits | Any call to an external API |
| webhook vs polling | Get notified, don't keep asking | External notifications |
| idempotency key | Unique code to avoid duplication | Payments, orders, side-effect actions |
| pagination | Fetch data in pages | Bulk syncs |
★ Master command for any integration
"Build the integration with [X]. Apply: rate limiting per official limits, retry with exponential backoff (max 5), idempotency key per request, pagination for bulk data, webhooks instead of polling where possible. Add full observability. Write evals."
One sentence = a professional integration.
Rate limiting, retry, observability, pre-installed
All these patterns are already built and running on a VPS at the end of the course. You don't spend a month planning the architecture, hunting libraries, or debugging rate limits. Everything runs from day 1. 4 live AI agents, a visual control panel, 24/7 monitoring. You leave with the system, not a list of tutorials.
See Agent Factory →Security for business
Security covers everything that can turn from a minor technical detail into a legal, financial, or reputational disaster. These 4 words are the minimum any business handling customer data should demand.
least privilege
each entity has exactly what it needs
What it means: When you grant access to a token or user, you give them only the permissions they need. Nothing more. If your script only reads orders from Shopify, its token must not have permission to delete orders.
When you hire a cleaner, you give them the key to the office. You don't also give them the safe key, bank account access, and your home Wi-Fi password. They have what they need for the job, nothing extra. If someone steals their key, the damage is limited. That is least privilege.
secrets management
handling secrets safely
What it means: API tokens, passwords, private keys, all are "secrets". They are never written directly into code, never sent over email, never committed to Git. They live in a dedicated place (environment variables, vault).
The key to the safe doesn't sit on the reception desk. And you don't print it on the company brochure "for transparency". It lives in a dedicated place, with limited and logged access. Same with API tokens: they are the keys to your digital business.
⚠ Quick test
Ask the agent: "If someone had access to our code repo, could they steal the tokens?". If the answer is yes, you have an urgent problem.
PII
acronym: "Personally Identifiable Information"
What it means: Any information that lets you identify a real person: name, email, phone, address, national ID, IP, photo. In the EU (and the UK, with similar rules), PII is protected by GDPR. There are strict legal obligations about how you store it, who has access, and how long you keep it. The US has CCPA and other state-level laws with overlapping requirements.
Your customers' data is like their ID cards. If they handed you their ID at your desk, you wouldn't photocopy it, send it on WhatsApp to friends, or post it in the storefront window. The same rigor applies to digital data.
audit log
who did what, when
What it means: A journal of every important action: who did it, when, on what, what changed. Different from technical logs. The audit log shows activity (who deleted a customer, who changed a price).
In accounting, every change is recorded: who made the correction, when, why. If 6 months later someone asks "why is this number this way?", you can reconstruct the history. Without an audit log, any issue stays unresolved.
Chapter 6 recap
| Word | In a sentence | When |
|---|---|---|
| least privilege | Only strictly necessary permissions | Every new token |
| secrets management | Tokens in secrets, NOT in code | Always, no exceptions |
| PII | Personal data (GDPR / CCPA) | Any project handling customer data |
| audit log | Journal: who, what, when | Irreversible changes |
Secrets management, audit log, VPN, from day 0
The 4 words in this chapter are the difference between an AI system that grows your business and one that puts it at risk. On the VPS built at Agent Factory, the token vault, encrypted VPN access, audit log, and privilege separation are configured before you connect for the first time. Security is not something you "add later".
See Agent Factory →Product and decisions
The last chapter is different. Here we don't talk about how to build, but what to build. The words help you make better choices and guide the agent away from building things that shouldn't be built.
MVP
acronym: "Minimum Viable Product"
What it means: The simplest possible version that still solves the real problem. Not the best version, the minimum, but functional. You build the MVP in 2 weeks, test it with real users, learn, then improve.
You want to see whether people would buy a truffle tart. Two options: (A) Invest $5,500 in equipment, buy expensive truffles, rent a space, open a bakery in 3 months. (B) Bake 5 tarts tomorrow, take them to a weekend market, charge $7 each. If people buy them, invest. If not, you lost $15. B is the MVP.
happy path first
build the ideal case first
What it means: When you build something, start with the case where everything goes perfectly. Only after the happy path works, start handling edge cases (customer abandons mid-checkout, payment is declined, internet drops).
When you build a house, you first put up straight walls and a roof. Then you think "but what if a magnitude 7 earthquake hits?" and add an anti-seismic system. You don't pour the anti-seismic foundation for a house that doesn't exist yet.
north star metric
the one metric that actually matters
What it means: The single metric your team will not compromise on. If you have 50 metrics on a dashboard, in practice you focus on none. The north star is the one that, if it grows, you know your business is healthy.
For an online store selling household goods, "monthly sales" sounds like the north star. But maybe the real north star is "customers who buy a second time within 90 days", because it reflects quality, not just quantity.
feedback loop
how fast you learn whether something worked
What it means: The time between when you take an action and when you find out the result. Short feedback loops = you learn fast, correct fast. Long feedback loops = you operate blind.
When you learn to cook, you have a 30-minute feedback loop (you cook, taste, adjust). When you learn to make wine, the loop is a year. That is why most people cook well and very few make good wine. You want short feedback loops: launch fast, measure fast, adjust fast.
Chapter 7 recap
| Word | In a sentence | When |
|---|---|---|
| MVP | Minimum version that solves the problem | Start of a new project |
| happy path first | Ideal case first, edge cases later | Start of any build |
| north star metric | The only metric that matters | Dashboards, strategy |
| feedback loop | Speed of learning if it worked | Launches, experiments |
Top 5: put these on the wall
If you take only 5 words from the entire guide, pick these. We selected them by one criterion: impact per word.
Recommendation: print this page and put it next to your monitor. For the first 2 weeks, glance at it before every long command. After 2 weeks, it becomes reflex.
don't guess, verify
Kills guessing
Added at the end of any factual command, it cuts ~50% of agent "hallucinations". Costs 3 words, saves hours of manual checking.
evals
Kills "hope it works"
Forces the agent to verify its own work before declaring "done". With evals, you find bugs at delivery. Without evals, your customers find them.
idempotent
Kills duplicates
Duplicate orders, emails sent twice, double-charged payments, all come from non-idempotent functions. One word saves you a year of complaints.
acceptance criteria
Kills "I thought you wanted something else"
Defined at the start, they save you from 80% of cases where the agent delivers next to the topic. It is the written contract with the agent.
observability
Kills blind debugging
When something breaks (and it will), observability tells you why in 5 minutes, not 5 days. The difference between "panic" and "operational calm".
★ How to use them together
One command combining all five:
"Build [X]. First define acceptance criteria, I confirm, then you start. Make it idempotent. Add observability. Write evals. Don't guess, verify, search the official docs if you're not sure."
One sentence = your product, delivered properly.
Command templates
These templates are written in plain English with the golden words highlighted. Copy them directly and replace only the bits [in brackets].
Template 01, Build a new integration
When you connect two systems (Shopify ↔ Klaviyo, etc.)
- separation of concerns: independent components for each system
- idempotent: repeated run = same result
- rate limiting per official documentation of each API
- retry with exponential backoff (max 5 attempts)
- idempotency key per request
- pagination for bulk data
- webhooks instead of polling where possible
- secrets management: tokens in .env, not in code
- observability: log every call and every error
Template 02, Automated reporting
When you want a weekly / monthly business report
- First identify the north star metric and put it large, at the top
- 3 to 5 supporting metrics under it that explain the north star
- For every figure: show your work, data source, period, formula
- Sanity check before sending: if a figure falls out of range, stop
- Don't log PII, use internal IDs
- Idempotent: if it runs twice, don't send twice
Template 03, Marketing campaign
Mass email, mass SMS, push notifications
- Dry run required: show me how many recipients, to what segments, 3 example messages
- Sanity check: if the count is ±30% off from the average, stop
- Idempotency key per recipient: if the script runs twice, each person gets a single message
- Rate limiting per provider limits
- Respect GDPR and PII
- Audit log: who approved the send, when
Template 04, Launch a new feature
New functionality on site / app / internal system
- Evals on the key scenarios
- Red team the feature: what could break, how could it be broken
- Smoke test in staging before production
- Set a short feedback loop: measure daily for the first 2 weeks
- Define the threshold at which we stop if it isn't working"
Template 05, Debugging when something breaks
System failing, unknown error, weird behavior
- Think step by step: what data do we receive? what do we process? what do we deliver? where does the chain break?
- Don't guess: don't infer the cause from memory. Check observability (logs)
- Cite your sources: for every hypothesis, point to the log line / network request
- Show your work: walk me through your reasoning
- Once you find the cause, propose a minimal fix + an eval that catches this issue if it returns"
Template 06, Modify a running system
Change to code / configuration already in production
- Tell me the chain of thought: what exactly you'll change, what it could affect
- Red team your own change: what could break?
- Ensure it preserves backward compatibility
- Smoke test: run the key actions
- Run existing evals as regression
- Log the change in the audit log"
Common mistakes
In the first months of using "golden words", almost everyone makes the same 8 mistakes. We've listed them so you can spot them early.
Over-stuffing with technical terms
You put all 40 words into a single command, thinking "more is better". Result: the agent gets confused, tries to satisfy every criterion, and satisfies none of them well.
Fix: Use 3 to 7 relevant words per command. For simple tasks, 1 to 3. The 5 from the Top 5 + 2 or 3 specific to the context, that's it.
Using words you don't understand
You slip "idempotency key" into a command because it sounds good, but you don't know what it means. The agent asks a clarifying question and you're stuck.
Fix: Use only words you can explain in two sentences. The list you're sure of will grow naturally over time.
Missing acceptance criteria
You rush and skip the "define what 'done' means" step. Then the agent delivers something that is technically correct but doesn't solve your real problem. You start over.
Fix: 5 minutes invested in acceptance criteria at the start save 5 hours of rework later. No non-trivial task without them.
Evals without ground truth
You ask for "evals" and get 10 tests that all pass. In short: the agent verified its own assumptions with its own assumptions. Useless. Evals without human-verified ground truth are theater.
Fix: For every important eval, you have a correct answer verified manually. That is the ground truth.
Ignoring sanity checks
You let the report send automatically without a sanity check. One day, a bug makes the report show "$0 in sales last week". You forward that to 15 people on Slack.
Fix: On every automated report or action, define 2 or 3 sanity checks. "Numbers must be between X and Y; otherwise, stop."
Webhooks without idempotency
You build a system that reacts to Shopify webhooks. Works perfectly for 2 weeks. Then Shopify retries a webhook (perfectly legitimate) and your system processes the same order twice.
Fix: Every webhook handler is idempotent by design. Use event_id as the deduplication key. This is not optional.
Tokens in code or in Git
You put the Shopify token directly in config.py "to move fast". Later, someone clones the repo and you have 200 bots trying to log in. Or you accidentally commit to public Git and a scanner finds it in 30 minutes.
Fix: Never tokens directly in code. Never .env in Git. Use secrets management from day one.
Confusing "evals" with "does it work?"
You ask "does it work?" and the agent says "yes, it works". You assume it has verified. It hasn't, it just estimated that it probably works. "Does it work?" is not a verifiable question; "do the evals pass?" is.
Fix: Don't accept vague answers. Always demand concrete evidence: "how many evals pass? which don't? show me the output."
Final cheat sheet
All the golden words at a glance. Print this section and keep it next to your monitor.
Not sure which solution fits you?
See the side-by-side comparison of Claude Code, Agent Factory, and RobOS. No signup, no email.
Now that you know the words...
Vocabulary is the first brick. The rest depends on where you are and what you want to build.
RobOS
The system that runs by itself: 5-layer memory, anti-improvisation verification, multi-workspace. All the words from chapters 1 to 3, automated.
Agent Factory
Build your own agent: dedicated VPS with 4 AI agents publishing daily articles and video. All the patterns from chapters 4 to 6, pre-installed.
Direct conversation
15 minutes on WhatsApp about your business. We figure out together what fits, no sales pressure.