Orchestrating AI Agents for Automated Team Reporting

How I built a system that generates executive-ready reports from Slack, GitHub, and Jira, without asking anyone to write a status update.

Orchestrating AI Agents for Automated Team Reporting

Howdy! My name is Scott, and I'm a Senior Manager, Cloud Engineering here at Jane. I've been in tech for almost 30 years, and AI is the biggest shift I've ever seen, possibly bigger than the Internet itself. It's not just about generating code; it's about automating the tasks we don't enjoy so we can focus on what we actually love.

One thing I love about Jane is that we don't do email. Everything happens in Slack instead. Slack is powerful, and I appreciate that most discussions happen in open channels where the whole team can benefit. But that visibility comes at a cost. How do you keep up with dozens of channels? Unless you spend your whole day reading through them, it's basically impossible.

We all finally have access to a new superpower: AI Assistants. Mine is called "Jarvis" (don't get me started on how hard it was to get Claude to recognize Jarvis as who it is....) because I'm both unimaginative and I love Iron Man. Let me tell you how my system came to be.

The Status Update Tax

We all know this ritual too well. Each week, I open a blank document and begin the archaeology: digging through Slack threads, cross-referencing GitHub PRs, checking Jira boards, trying to reconstruct what my team accomplished. By the time I've synthesized everything into a coherent narrative, an hour has passed. And the information is already stale.

The irony isn't lost on me. As engineers, we build systems to automate everything else, yet status reporting remains stubbornly manual. The signals are all there (conversations in Slack, commits in GitHub, tickets moving across Jira boards) but they live in silos, disconnected and unstructured.

What if I could teach AI to do this synthesis for me? Not just dump raw data, but actually understand context, correlate activity across platforms, and produce the kind of narrative summary I would write myself?

The Context Window Problem

The obvious first approach: give an AI model access to all your data sources and ask it to generate a report. Modern language models are remarkably capable, and tools like MCP (Model Context Protocol) make it easy to connect them to Slack, GitHub, and Jira.

This works... until it doesn't.

The problem is context windows. Even with models supporting 100K+ tokens, a single day's activity from a mid-sized engineering team can easily exceed those limits. Forty Slack channels, six GitHub repositories, multiple Jira projects, each with threads, comments, and metadata. The data volume grows quickly, and cramming everything into one context leads to degraded output quality, missed details, and eventually, hard failures.

I tried aggressive filtering and summarization to fit everything into a single context. It worked for daily reports with reduced channel coverage, but weekly reports (synthesizing five to seven days of activity) consistently hit limits.

The insight that changed everything: I needed to stop thinking about AI as a single entity processing everything at once, and start thinking about AI as a team of specialists working together.

Multi-Agent Architecture: AI as a Team

The solution was to decompose the problem into specialized agents, each with focused responsibilities and fresh context:

┌─────────────────────────────────────────────────────────────────┐
│                     Orchestrator (Claude Code)                  │
└─────────────────────────────────────────────────────────────────┘
                               │
           ┌───────────────────┼───────────────────┐
           ▼                   ▼                   ▼
    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
    │    Slack    │     │   GitHub    │     │    Jira     │
    │  Collectors │     │  Collector  │     │  Collector  │
    │   (Haiku)   │     │   (Haiku)   │     │   (Haiku)   │
    └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
           │                   │                   │
           └───────────────────┼───────────────────┘
                               ▼
                    ┌─────────────────────┐
                    │     Correlator      │
                    │  (Cross-reference)  │
                    │       (Haiku)       │
                    └──────────┬──────────┘
                               ▼
                    ┌─────────────────────┐
                    │      Compactor      │
                    │  (Editorial filter) │
                    │      (Sonnet)       │
                    └──────────┬──────────┘
                               ▼
                    ┌─────────────────────┐
                    │     Synthesizer     │
                    │   (Final report)    │
                    │   (Sonnet/Opus)     │
                    └──────────┬──────────┘
                               ▼
                        Final Report

Each agent gets a fresh context window with only the data it needs. The collector for GitHub doesn't need to know about Slack. The synthesis agent doesn't need raw API responses, just the refined summary from compaction.

This architecture solves the context window problem, but it also provides isolation. When GitHub's API rate-limits your collector, Slack and Jira data still flows through. When one agent produces unexpected output, you can inspect its specific inputs and outputs without debugging the entire system.

What Each Phase Does

Collection runs multiple agents in parallel, one for each data source. For Slack, I run four collectors simultaneously (one per channel tier) to further parallelize. Each collector fetches raw data and writes structured JSON to disk. No analysis happens here, just clean data extraction.

Correlation is where the magic starts. This agent reads all collected data and cross-references it by person and by project. It answers questions like: Who authored this PR? Did they discuss it in Slack? Is there a related Jira ticket? The output is a unified view connecting signals across platforms.

Compaction applies editorial judgment. Not all information deserves equal attention. This agent distinguishes between critical channels needing full detail and peripheral channels warranting only a summary. It extracts the essence: decisions made, blockers identified, risks emerging, work completed.

Synthesis transforms the compacted summary into a polished report following a template structure (BLUF, incidents, highlights, risks, per-person activity) with narrative prose that reads like a human wrote it.

Key Design Patterns

Building this system taught me several patterns that apply to any complex AI automation. Here's what worked, what I learned the hard way, and how these principles generalize.

Match Model Capability to Task Complexity

When I first built this system, I used Opus for everything. More capability means better results, right?

The daily report was costing $30-40 to generate.

This forced me to actually think about what each phase was doing. Not all tasks require the same level of intelligence. Data collection is straightforward. The model needs to understand the API, make the right calls, and structure output. There's no deep reasoning involved. Synthesis, on the other hand, requires genuine judgment: identifying what's important, crafting narrative, detecting patterns across disparate data points.

Phase Model Rationale
Collection Haiku Fast, cost-effective API interaction
Correlation Haiku Structured cross-referencing logic
Compaction Sonnet Requires judgment about importance
Daily Synthesis Sonnet Narrative generation with nuance
Weekly Synthesis Opus Pattern recognition across 5-7 days

The daily report now costs $2-3. Same quality, 90% cost reduction.

The broader principle: most AI workflows contain a mix of simple and complex subtasks. Identify which tasks actually require sophisticated reasoning, and route accordingly. In practice, 80-90% of work can run on fast, cheap models. The expensive models only appear where they add genuine value.

Where else this applies: Customer support (Haiku routes and gathers context, Sonnet handles complex responses), code generation (Haiku for boilerplate, Opus for complex algorithms), research workflows (Haiku for data gathering, Opus for analysis).

File-Based Inter-Agent Communication

How do agents communicate? The tempting answer is to chain them conversationally, having one agent pass output directly to the next. But this reintroduces context accumulation problems.

The better approach: agents communicate through files.

Each agent reads from disk and writes to disk. The filesystem becomes a shared state layer: persistent, inspectable, and decoupled from any single model's context.

This pattern initially felt primitive, but proved remarkably practical:

Incremental processing. If you've already collected Monday through Thursday, generating a weekly report on Friday only requires collecting Friday's data.

Debugging becomes trivial. When the final report looks wrong, you can inspect each intermediate artifact. Was the problem in collection? Correlation? Compaction? The data is frozen in time.

Recovery from failures. A rate limit or timeout doesn't mean starting over. You resume from the last successful checkpoint.

Unlimited effective context. The synthesis agent might only need 10K tokens of compacted summary, even though it represents 500K tokens of raw data that flowed through the pipeline.

Where else this applies: ETL pipelines, ML workflows with distinct training/evaluation phases, any workflow needing audit trails and reproducibility.

Proactive Data Collection Reveals Absence Patterns

Early iterations only collected data that was explicitly referenced. If a PR was mentioned in Slack, I'd fetch its details. If not, I'd skip it.

This seemed efficient but missed an important class of insights: the absence of discussion.

Some of the most valuable signals are negative space. A PR that merged with no Slack discussion might indicate routine work, or something that should have been discussed but wasn't. A Jira ticket that completed with no associated PR might be non-code work, or a gap in linking practices.

By collecting all data sources proactively and independently, then correlating them, the system can surface patterns like:

  • "3 PRs merged this week with no Jira ticket linked. Possible untracked operational work."
  • "PLAT-234 completed with no associated code changes. Confirming this was process/documentation."
  • "PR #891 was discussed extensively in #infrastructure but authored by someone outside the team."

Reactive collection can only tell you about connections that exist. Proactive collection reveals connections that are missing.

Where else this applies: Anomaly detection, compliance auditing, operational monitoring (absence of expected signals).

Explicit Information Hierarchy

One early mistake was treating all Slack channels equally. Sixty channels, each with up to 50 messages. That's 3,000 messages to process. Most of it was noise, and resulting reports were diluted with irrelevant context.

The solution was explicit hierarchy:

Tier Detail Level Use Case
Critical Full quotes, complete threads Incidents, team channel, leadership
Important Key quotes, decisions Cross-team coordination, infrastructure
Context Summaries only Project channels, announcements
External High detail + privacy markers Vendor communications

This tiering applies during compaction, not collection. I still collect everything (proactive data gathering) but apply editorial judgment before synthesis. The result reflects actual priorities rather than treating a casual emoji reaction with the same weight as an incident escalation.

Where else this applies: News aggregation, alert systems (severity determines routing), content recommendation, risk assessment.

Cross-Platform Identity Resolution

An engineer named Jordan might appear as @jordan-martinez in Slack, jordan-m in GitHub, and jordan.martinez@company.com in Jira. Without explicit mapping, these look like three different people.

The goal is unified narrative:

"Jordan had a productive week. Merged 3 PRs, closed 2 tickets, and led the incident response discussion in #infrastructure."

Not fragmented observations:

"GitHub user jordan-m merged 3 PRs. Jira user jordan.martinez closed 2 tickets. Slack user @jordan-martinez participated in incident response."

The correlation agent maintains identity mappings and uses them to build coherent activity profiles per person. This synthesis, combining signals about the same entity from different sources, is where real understanding emerges.

The Orchestrator as Coordinator

With all these agents and phases, something needs to coordinate them. Importantly, the orchestrator doesn't process data itself.

The orchestrator's responsibilities are narrow:

  1. Spawn agents with appropriate instructions and context
  2. Verify outputs exist and meet basic validity checks
  3. Handle failures by retrying or gracefully degrading
  4. Report progress to the user

The orchestrator never reads the semantic content of messages or PR descriptions. It verifies that files exist and contain expected structure, but actual understanding happens entirely within specialized agents.

This separation keeps the orchestrator simple and reliable. It's essentially a workflow engine that spawns AI agents rather than calling traditional functions. Verification at each phase catches failures early, preventing cascade failures.

Where else this applies: Any multi-stage pipeline, distributed systems with health checks, ETL workflows with data quality gates, CI/CD pipelines.

What a Report Looks Like

After all this processing, here's the structure of a typical daily report:

# Engineering Team Daily Update: Monday, January 20, 2026

## BLUF
**Overall Status:** 🟡 Yellow

One P2 incident (API latency) identified and resolved within SLA.
Webhook rate limiting feature completed and merged.

## Incidents & Issues

### P2: API Latency Spike (Resolved)
- **Detected:** 09:15 PT | **Resolved:** 11:45 PT
- **Root Cause:** Database connection pool exhaustion
- **Owner:** [Team Member]
- **Resolution:** Increased pool size, added alerting

## Highlights & Outcomes

- **Webhook Rate Limiting Shipped** ([Team Member])
  - PR#1234 merged with 2-hour review cycle
  - Closes PLAT-567

## Risks & Watch Items

| Risk | Likelihood | Impact | Owner |
|------|------------|--------|-------|
| Redis EOL deadline approaching | Medium | High | [Team Member] |

## Team Activity

**[Team Member A]**
- Merged PR#1234 (webhook rate limiting)
- Completed PLAT-567
- Active in #infrastructure discussions

**[Team Member B]**
- Led P2 incident response
- Documented timeline and root cause

The BLUF (Bottom Line Up Front) gives executives the summary in 10 seconds. Sections below provide progressive detail. Weekly reports follow a similar structure but add trend analysis: recurring themes, multi-day initiatives, patterns across the week.

The Compounding Value

The system runs daily, generating reports automatically. More importantly, the AI notices things humans miss:

Cross-platform patterns. "PR #1234 closed ticket PLAT-567 but wasn't discussed in any Slack channel. This was a quiet ship."

Activity gaps. "No GitHub activity from Jordan today despite being assigned to PLAT-890. Worth checking if blocked."

Recurring themes. Weekly synthesis identifies multi-day initiatives: "The Redis upgrade work spanned three days with contributions from four team members."

Incident correlation. Automatically links incident discussions to the PRs that resolved them and the tickets that tracked them.

But the real unlock isn't the daily report. It's what happens when you run this system for months.

Each report becomes a data point. Each week adds to the corpus. Over time, you're building a longitudinal dataset about how your team actually works. After a few months, I can ask questions like: "What types of work tend to block the team?" or "Which initiatives historically take longer than estimated?"

When the same friction appears repeatedly (PRs sitting unreviewed, certain tickets causing confusion, infrastructure work constantly deprioritized), that's signal. These aren't random events; they're symptoms of deeper issues. The data shows where team members are stretching into new areas, where they're stuck, where they're thriving. Instead of relying on memory or quarterly reviews, I have continuous visibility into growth trajectories.

The system isn't replacing my judgment as a manager. It's giving me the context I need to apply that judgment more effectively. The first report saves me an hour. The hundredth report reveals insights I couldn't have discovered manually.

What I'd Do Differently

Start much smaller. I tried to boil the ocean on day one: all data sources, full correlation, complete reporting. If I started over, I'd begin with just Slack collection for 3-4 channels and a basic summarizer. Get that rock-solid. Then add GitHub. Then correlation. Then Jira. Each addition would build on a stable foundation.

Build incremental collection from the start. Right now, each run recollects all data for the target day. If I'd built incremental collection early (tracking what was already fetched and only requesting new data), subsequent runs would be nearly instant.

Add alerting and routing. Some items need immediate attention: P1 incidents, blocked PRs sitting for days, team members silent across all platforms. Building alert triggers would make the system proactive rather than reactive.

Version control the prompts. I kept agent instructions in various places. I should have committed every significant prompt to version control with clear annotations. Treating prompts as code (with commits, diffs, and history) would have made debugging easier.

The Bigger Picture

We're at an inflection point. Not because AI is new, but because we've crossed a threshold where these tools are accessible enough for individuals to build their own automation without dedicated engineering teams.

Everyone now has access to this superpower.

The traditional manager's dilemma is brutal: as your team grows, your time doesn't. More people means more 1:1s, more status updates to synthesize, more context to track. The natural response is to add management layers or accept less direct visibility.

Jarvis changes this equation. I have continuous, comprehensive visibility into my team's work without adding overhead to them or consuming my days reconstructing context. This is what scaling yourself as a manager actually looks like: building systems that extend your reach without burning out.

Here's what excites me most: I built this myself. I'm not a machine learning researcher. I don't have a team of engineers supporting me. I'm a manager who learned to work with AI agents using freely available tools.

Five years ago, this would have required a dedicated engineering team, months of development, significant infrastructure investment, and specialized ML expertise. Today, I built it in a few weeks of evening and weekend work using Claude Code, some MCP integrations, and basic scripting. Total operational cost: $5/month.

The pattern generalizes to any scenario where you need to process large volumes of unstructured data, synthesize information from multiple sources, apply judgment consistently at scale, or make hidden patterns visible. Customer support teams could synthesize ticket trends. Product managers could automate competitive analysis. Sales leaders could track deal progression.

The common thread: information exists in various systems, but synthesizing it requires human effort that doesn't scale. AI agents can do this synthesis automatically, continuously, and often more comprehensively than manual approaches.

The tools are democratized. The models are capable. The patterns are proven. The question isn't "Can AI do this?" It's "What repetitive knowledge work in your life could you automate if you spent a few weekends learning how?"

The result is automation that would have seemed magical a few years ago: a system that reads my team's Slack conversations, understands the context of our GitHub PRs, connects them to Jira tickets, and produces a summary that sounds like I wrote it myself.

Not because a single superintelligent model understood everything at once. But because I decomposed the problem intelligently, gave each agent a focused task, and let them collaborate through the simplest possible interface: files on disk.

Sometimes the best architectures are the boring ones. And sometimes the most transformative tools are the ones you build for yourself.

Read more