Integrating AI into Your Development Workflow: Part 1 - Getting Started with Cursor and Agents
Author’s Note: I’m a Director of Engineering who’s spent the past year exploring AI coding tools with my team, and personally on nights and weekends. I’m not an AI researcher, or a full-time coder these days, but I’ve logged many hours tinkering with these assistants, and discussing pros and cons with developers on our codebase. This guide shares what I’ve learned - the good, the bad, and the practical - for busy developers who want to get up to speed with AI in their workflow.
Note: If you want to get straight into my agent cheat sheet, jump to Part 1 below. This prologue is for those who enjoy a journey.
Part 0: Prologue - My AI Tooling Journey
When GPT-based coding assistants first emerged, I was intrigued but skeptical. Like many of you, I didn’t have time in the day job to figure out which tools actually helped vs. which were hype. So I treated it as a side project: late nights working on my own personal apps and side experiments, testing AI suggestions, and refining prompts. The learning curve was steep, but it helped me guide my team as we drive the adoption across Jane, and see our development team ramp up in incorporating AI fully into their workflows.
Early on I asked agents for broad changes and got nonsense or half-baked code in return. Over time, I discovered one golden rule: clarity and constraints are everything. When I started giving AI very specific tasks in small chunks, the results improved dramatically. The golden rule is that, for the most part, you get out of AI what you put in.
I Started with the Shiny Stuff
When I first opened Cursor, I defaulted to the biggest tools: Sonnet, MAX mode, full context dumps. It felt powerful, but it wasn’t sustainable. I had no mental context for which agent to use when, so I burned a lot of credits on overkill models (looking at you, Sonnet) for trivial tasks. I’d often think “why did I just spend 5 minutes and a bunch of tokens for something a quick search could answer?”
Responses were slow, costs racked up, and results were often overwritten or misaligned. I wasn’t thinking in systems, I was hoping the agent would sort it out for me.
Useful, but unsustainable. It showed me what was possible, then I started learning what actually worked.
Then I Got Scrappy
I started experimenting with smaller agents: Grok for quick tests, Haiku for tight loops. I learned that the fastest agent that could mostly do the job often outperformed the smartest one, because I could iterate faster. I wasn’t waiting around. I was doing more passes, catching errors sooner.
I also started using ChatGPT differently. Not to solve the problem, but to help me frame it. I’d dump messy thoughts into a prompt, and let it help me structure a task list or turn raw context into a clean ask for Cursor. It became a great way to fill in templates or suggest which agent I should be delegating to next.
I Started Thinking in Prompts
The breakthrough wasn’t model selection, it was learning to think like the AI. I began to preempt failure modes: giving tighter constraints, adding context explicitly, anticipating what the agent might miss, highlighting areas that I didn’t want it touching.
I wrote prompt templates. I rewrote them. I tested their failure points. I figured out where being vague would cost me three back-and-forths and 10x the tokens.
Eventually, I started to feel like I wasn’t just using AI, I was teaching it how to work with me.
And Then I Started Replacing Steps
There were things I always put off - tests, test harnesses, one-off migrations, documentation stubs. AI made those easier. I could describe what I wanted, review the output, and revise instead of starting from scratch. That lowered friction for things I would have deprioritized or skipped.
I also found it useful for planning: when I didn’t know where to start, I could brainstorm with Sonnet or Codex and then pick what to implement myself vs. hand off. It felt like pair programming with a prompt-driven peer.
What It’s Not
This isn’t about automating development. The AI doesn’t own the code. It doesn’t understand context like a human does. It can hallucinate, miss edge cases, or silently break logic; but then, so do I (& every developer I’ve ever worked with).
But as a thinking and building partner, it’s earned its place in my workflow, especially when scoped well and reviewed rigorously.
Final Thought
If you’re new to this, you don’t need to master it all at once. Start with one agent. One task. One prompt. The payoff isn’t in a single interaction, it’s in building a repeatable pattern that saves you time and reduces friction over the long haul.
That’s been the real learning curve, and it likely won’t end any time soon.
Above all though, working with AI is FUN! I haven’t enjoyed developing actual code this much in… maybe ever? It’s really nice to be getting my hands dirty again, focus on the best parts of the process, and let AI do the heavy lifting for the rest.
This is also my advice to leaders who are encouraging their teams to use more AI in their approaches - find out what they enjoy the least in their day to day work, and start from there.
Part 1: A Guide to Agents - Choosing the Right AI Model in Cursor (and ChatGPT)
One of the first things you’ll encounter in Cursor is a menu of agents to choose from: Grok, Haiku, Sonnet, Composer, Codex, and more on the horizon. Each has strengths, each has costs, and no one model is right for everything.
Think of them like tools in a workshop: your job is to pick the smallest, fastest one that reliably solves the problem, and only reach for the big, expensive gear when it’s warranted.
Here’s a practical breakdown of what each agent does best and when to use it.
🟢 Grok - Quick and Scrappy
What it is:
A fast, super-cheap code model (from xAI), best used for small tasks and rapid iteration. Think junior dev with good instincts and limited attention span. Best of all, right now it’s still free in Cursor.
Best for:
- Quick code lookups or edits
- One-line bug fixes
- Simple refactors
- Test generation
What to watch for:
Can stumble on complexity or lose track of nuance. If it seems confused or goes in circles, step up to a more capable model.
🟡 Claude Haiku 4.5 - The Speedy Specialist
What it is:
A smaller Claude model optimized for cost and latency. Ideal for slightly more involved tasks where speed still matters. It is surprisingly capable, and has been my go-to for most tasks recently.
Best for:
- Moderate to large single-file edits
- Small to moderate utility functions
- Quick prompt-response cycles
- Real-time Q&A about your code
What to watch for:
Struggles with long chains of reasoning. Keep tasks tight and scoped - Haiku tends to overdeliver for the price.
🔵 Claude Sonnet 4.5 - The Deep Thinker
What it is:
A large-context Claude model designed for deep reasoning. More expensive, slower, but capable of understanding architecture-level tasks.
Best for:
- Multi-step problem solving
- Architectural planning
- Debugging across systems
- Code review and QA suggestions
What to watch for:
Latency and cost. Sonnet is 3× the price of Haiku per token and responds more slowly. Use it when you need precision, not iteration speed. Also very useful when you’re not exactly sure what you may want and need a partner to help diagnose problems. Small note that in my opinion, 3x the price does NOT mean 3x the quality of code, it’s about complexity. Use it for the right things, otherwise use Haiku.
🔸 Composer (Cursor) - Fast & Balanced
What it is:
Cursor’s proprietary agent, tuned for interactive, context-aware development. Trained specifically for code, with strong tool integration. Admittedly I only started experimenting seriously while it was in the free period, I struggled to find a fit for it early on. Now I see the value and the gap it fills in the thought-process workflow, I suspect I’ll keep it around for some time.
Best for:
- Day-to-day coding inside Cursor
- Reasoning across files in small scopes
- Multi-step edits within a module
- Keeping momentum during development
What to watch for:
Still a frontier model; anecdotally fast and sharp, but not always as explainable or thoughtful as larger Claude or GPT models. Also not the cheapest option (when it’s not free), so best used when speed is more valuable than savings.
🨠 OpenAI Codex (ChatGPT) - Your Autonomous Coder
What it is:
An AI developer in the cloud. Feed Codex a task (e.g. “add a feature”, “refactor X”, or “help me architect an EDA interface”) and it will clone your repo, apply changes across files, run tests, and return a diff or PR. Alternatively it will give you a thorough and in-depth plan that you can then feed into other agents to implement.
Best for:
- Well-scoped batch changes
- Refactors across multiple files
- Tasks where you want a fresh second opinion
- Hands-off implementation of a known design
- Building plans for other agents to implement
What to watch for:
You lose some control. Codex can take 5-30 minutes to return a solution and may miss subtle business context. You’ll still need to review its work carefully, just like you would from any developer. That said, it can be an enormous time-saver when used well, and it comes bundled with ChatGPT Pro, which makes it cost-effective for heavy users.
Some of the results when I’ve used it for planning have been nothing short of exceptional. It can be slow and feel cumbersome, but for the right task, it’s worth the overhead.
🧱 Agent Selection: A Simple Rule of Thumb
Start small. Escalate only as needed.
- Try Grok for quick tasks and simple edits.
- Use Haiku when Grok gets confused or you want lightweight support.
- Switch to Composer for mid-size, multi-step coding.
- Call on Sonnet for architectural planning or debugging across subsystems.
- Use Codex (ChatGPT) when you want an autonomous end-to-end solution.
- Mix & Match! Use Haiku or Composer to implement a plan from Codex, use Grok to fix the bugs, then use Sonnet to review the PR! If you’re curious, try redoing the same work entirely with Sonnet, then compare quality and cost.
🧠 Thinking vs. Non-Thinking Models
Last note, Cursor lets you toggle between “thinking” and “non-thinking” modes for most Claude models. It’s not just a gimmick, it changes how the model reasons, responds, and structures its output.
TLDR; unless you’re doing planning, use non-thinking mode. Thinking mode is amazing when you need it, but it can result in scope creep, unexpected results, and (in my experience) is more prone to hallucinations. You’re essentially giving the agent permission to colour outside of the lines, so use caution.
Non-Thinking Mode:
Fast, surgical, and task-focused.
Use it when:
- You know what you want done
- You’re working on a small number of files
- You don’t need explanations, just execution
Thinking Mode:
Slower, more analytical, more verbose.
Use it when:
- You want the model to help you design a solution
- You’re doing cross-system or architectural planning
- You’re debugging something weird and want a second opinion
💡 Tip: Start with Thinking mode to shape the work, then switch to Non-Thinking mode to apply it quickly and precisely.
Part 2: Workflow Recipes and Templates is available here, and drills into some more practical examples to use in your workflow.
Part 3: Managing Cost and Complexity is available here, giving insights into how to get the most value from your agent use.