♦I wanted bugs filed in Jira to turn into draft pull requests on GitHub without anyone needing to shepherd them through the middle.
That’s the one-line version. The actual version took about two weeks and ended up with four moving parts:
- A Lambda that takes a Jira webhook, classifies the ticket, mirrors it as a GitHub issue, and copies attachments to S3.
- A triage workflow that generates a repo map and decides, for every freshly opened GitHub issue, whether to assign Copilot coding agent or just post a diagnosis comment.
- A log analyser in dev-scripts/ for the heavier path, where attached logs need to be turned into a structured root-cause analysis first.
- Copilot coding agent itself, which opens the draft PR.
None of the pieces were especially hard on their own. Each one was some Python, some Terraform, and some agent instructions. The time went into the joins: Jira’s idea of valid JSON, webhook retries, Copilot’s token rules, S3 log links, and a model that decided to ask for more information instead of checking the repo.
So this is the long version. The small annoying bits are most of the story.
The shape♦Stage 1: The LambdaThe Lambda is the boring bit you only notice when it gets something wrong.
When a Jira ticket is created or updated, it receives the webhook, decides whether the ticket is actionable, and opens or updates the matching GitHub issue. It also carries over attachments or S3 links so the GitHub side has enough context to do something useful.
The classifier itself is mostly regexes and form fields. Not glamorous. The parts that slowed me down were the places where Jira, AWS, and GitHub all had slightly different ideas of what “simple webhook” meant.
Webhook payloads are user-controlled JSON, sometimes barelyJira’s automation rules let you POST a custom JSON body to a URL. You write the body as a template and Jira fills in the values from the ticket. In theory, simple. In practice, the validator that decides whether your template is “valid JSON” is brittle in ways nobody documents.
Things I had to discover the slow way:
- Some smart-values aren’t supported on every tenant. The literal {{issue.url}} text was being left in the body on mine, breaking the JSON.
- Array-valued smart-values have to come last in their object, or the validator fails before you can even save.
- Free-text fields like description blew the body up whenever a user pasted text with control characters or unescaped quotes.
I ended up bisecting the body field by field, saving the rule each time, until I found which smart-value was breaking it. The validator’s error message is basically the same regardless of which line is wrong.
What I now do by default: send the smallest possible payload — usually just the ticket key — and have the Lambda fetch everything else via the Jira API. One extra call per webhook is free. Debugging the validator is not.
Make it safe to retry, then assume it will beWebhooks have at-least-once delivery. The Lambda can see the same event twice, see an update while a previous run is still in flight, or trigger itself by editing the same ticket. None of those should create duplicate GitHub issues or comments.
Three mechanisms, roughly:
- A hash of the classification result, written back to the ticket. If the new hash matches the stored one, skip everything.
- A sentinel label that says “the classifier just touched this.” The Jira rule excludes that label so the Lambda’s own writes don’t loop.
- Reading the existing GitHub-issue mapping on every event, not just on updates.
Stage 2: The triage layerBy the time I reached the GitHub-issue side, the Lambda was mirroring tickets reliably enough that the next question was obvious: can Copilot do anything useful with them?
The naive plan was: assign Copilot coding agent to every issue the Lambda creates, let Copilot figure it out.
That plan falls over as soon as the first vague ticket arrives. Copilot coding agent is not a triage tool.
What Copilot coding agent actually doesWhen you assign Copilot to an issue, it:
- Reads the issue body and existing comments at the moment of assignment.
- Researches the repo in its own GitHub Actions VM.
- Drafts a plan.
- Opens a draft PR — success or otherwise.
- Requests review.
What it does not do:
- Post “I need more info before I try”
- Decide the issue isn’t fixable and abstain
- Use your domain-specific tooling
- Read comments added after assignment
If the issue is vague, you get a low-quality draft PR you’ll close. If the issue is a duplicate, you get a draft PR. If it’s a “the docs don’t make sense” question, you get a draft PR for that too.
Useful tool. Wrong contract for “triage every opened issue.”
The three-way decisionWhat I actually needed before Copilot ran was a small decision point:
ClassificationActionauto_fixableAssign Copilot, let it open a draft PRneeds_infoComment listing what's missing, don't assigndiagnosis_onlyComment with root cause + workaround, don't assign
Copilot only fires when there is a real fix to make and enough information to make it. Everything else gets a comment and stops there.
The triage model is Claude Sonnet 4.6 routed through the Copilot SDK: same billing surface as the coding agent, but chat completions instead of the cloud agent. In practice the pipeline uses two different shapes of agent. Claude does the messy issue reasoning. Copilot coding agent does the repo-aware code edit.
The token mazeThis is the part I would shortcut hardest if I started over.
Copilot SDK has its own auth contract, separate from regular GitHub auth. The SDK does not accept:
- GITHUB_TOKEN (the built-in Actions token)
- ghp_* classic PATs
- ghs_* GitHub App installation tokens
It accepts:
- gho_* OAuth user tokens
- ghu_* GitHub App user tokens
- github_pat_* fine-grained PATs with Copilot Requests: Read
The fine-grained PAT path looks easy until you discover that org-owned fine-grained PATs don’t expose the Copilot Requests permission. There’s an open GitHub issue about it. If your repo is in an org, that path is blocked.
The OAuth route works but requires running a device flow, which is annoying when what you want is “give CI a secret and move on”. After two days of permission spelunking, I found the shortcut: the ghu_* token already exists on any machine signed into Copilot. It's sitting in ~/.config/github-copilot/apps.json. Pull it out, drop it into a secret, done.
That’s the SDK token. Then there’s the assignment token.
The Copilot coding agent assignment goes through a separate GraphQL call (replaceActorsForAssignable), and that one needs a PAT that can see Copilot in suggestedActors. The Actions GITHUB_TOKEN cannot — GitHub explicitly filters Copilot out of suggested actors for the Actions identity. This is by design: the same loop-prevention rule that stops Actions from triggering other Actions.
So I tried to consolidate. Use GITHUB_TOKEN for assignment, simpler workflow, fewer secrets. The error was crisp:
Copilot is not in suggestedActors — coding agent is not enabled
for this repository, or the token lacks the scope to see it.
Coding agent was enabled. The token just couldn’t see it.
Final shape: three tokens.
SecretWhat it doesToken typeCOPILOT_SDK_TOKENTriage + log analysis (Copilot SDK inference)ghu_* from local CopilotCOPILOT_ASSIGN_TOKENAssign coding agent to issueFine-grained PAT, repo-scopedGITHUB_TOKENComments, labels, gist fetchesBuilt-in Actions token
Three tokens for three different jobs. Annoying, but at least explicit.
The two pathsOnce auth was out of the way, the workflow branched on a label:
TriggerPathissues.opened (no label)Generate repo map → Claude triage → comment → maybe assignlabeled: analyze-logsDownload log → run log_analyze.py → log-triage comment → maybe assign
Path A is cheap. The repo map gives the model project layout, Claude classifies the issue, and assignment is gated on confidence >= 0.7.
Path B is heavy. The Lambda renders log attachments as markdown links to S3 pre-signed URLs. When the analyze-logs label gets added, the workflow downloads the log and runs the multi-agent log analyser from stage 3. That already produces root_cause, possible_fixes, and code references, so there is no point asking a smaller triage prompt to rediscover the same thing.
Most issues take Path A. The expensive path only runs when there is a log worth spending time on.
Grounding the triageThe fix was not a smarter model. It was making the procedure less optional.
I’d already given the triage agent the same search_repo and read_repo_file tools that log_analyze.py uses. Tools alone weren't enough. The model treated them as optional. So the prompt got a numbered procedure:
- Extract every identifier from the issue body
- search_repo each one
- Follow the path-chain: registry → template → implementation
- read_repo_file to confirm the leaf
- Only then classify
I also added a small set of owner-to-file routing rules that I had internalised but the model had not. Things like “templates owned by namespace A live in config X, namespace B lives in config Y”. Encoding those cut a whole class of “model guessed the wrong file” misses.
Then citation discipline. diagnosis and copilot_instructions must include file:line references with before/after values, not vague paths. Vague paths gave Copilot a worse starting position than no instructions at all.
And one carve-out. The original needs_info rubric was too bug-report-shaped: repro steps, expected vs actual, environment. That is right for a crash, but wrong for a change request like "bump version to 7" or "rename flag X to Y". Those have no repro steps because they do not need any. The model was pattern-matching on missing bug fields and refusing to classify obvious edits as fixable. The carve-out is simple: when the body names an explicit target value, do not demand a repro before considering auto_fixable.
After all four edits, the same issue went auto_fixable → assign Copilot → draft PR. Copilot still does the work. The triage layer just stops getting in its way.
Single LLM vs orchestrated pipelineI wrote about this gap before, in Computer Says No. It applies here too.
A vanilla LLM call on the issue body would have classified needs_info and stayed there forever: no tools, no grounding, no way to verify. The orchestrated version reads actual files, traces actual chains, and only then decides. Same model. Different shape.
The annoying part is that Copilot coding agent already does this internally. It researches the repo before drafting. That’s why assigning it directly worked on some issues my own triage was bouncing. The triage layer needed the same kind of grounding before deciding whether to hand off. Otherwise it was just a worse version of Copilot gating a better version of itself.
Once I made the triage agent use its tools the way Copilot uses its own, the pipeline started behaving the way I wanted: most issues either get a useful comment or a draft PR within minutes of opening.
Stage 3: The log analyserStage 3 is the heavy path the triage layer hands off to. I built it before the triage layer existed, because the bugs that mattered were arriving as megabyte-sized application logs and reading them by hand was killing my afternoons. By the time I needed a triage agent, this tool was already doing useful work.
The shape:
♦The split that mattersThe line I kept coming back to was: deterministic where it can be, model-driven where it has to be. If you can compute something from the log without judgement, compute it. If it needs judgement, give it to a model with grounded tools. Try not to blur the two.
What that meant in practice:
- Actor detection. Logs contain both the orchestrator side and the worker side, sometimes on the same machine, both logging under the same [orchestrator] tag. A regex over thread-name patterns determines which actors are present and which one to prioritise (worker-side first, because that's where root causes live). No model involved.
- Window selection. Logs are 50–100 MB. Models can’t usefully read the whole thing. The deterministic layer offers anchors such as last_task, last_abort, and last_traceback, then slices the relevant ~500 lines. The model never sees the rest unless it asks for more.
- Evidence ranking. Within the window, traceback frames beat worker-side exceptions beat protocol-level exceptions beat task-abort summaries beat generic warnings. This priority is hard-coded; the model can override it only with explicit reasoning. Without this, models default to “the first ERROR line is the cause” and you get diagnoses that point at the wrapper.
- File reference extraction. If the log mentions sdk/foo/bar.py:247, the deterministic layer captures that and pre-loads the file as context. The model doesn't have to figure out it's relevant.
By the time the scout agent runs, it is looking at a couple hundred lines of high-signal log plus pre-resolved file references. Not the raw log. Not a generic instruction to “find the bug.”
The agent stackThe analyser uses three separate Copilot SDK sessions, with a different model for each role:
RoleModelWhyScoutgpt-5-miniCheap. Plans which files/searches matter. Doesn't need to reason deeply.Analystclaude-opus-4.6Strong. Does the actual root-cause reasoning with grounded repo tools.Reviewergpt-5.4Strong, different family. Challenges the analyst. Up to three rounds of disagreement.
The reviewer loop is the part I am most attached to. Without it, the analyst picks an answer and you take it. With it, the reviewer either accepts or sends a structured “no, here’s why I disagree” back to the analyst, which reruns with that as additional context. After three rounds, whatever they converge on is the answer. If they still disagree, an optional orchestrator model reconciles.
This is more expensive than a single-model call. It is also much better on the awkward 15–20% of investigations where the first-pass answer is plausible but wrong.
The tools, for realThe agents don’t get “use search_repo” as a hint. They get actual SDK-defined tools backed by Python implementations:
search_tool = define_tool(
"search_repo",
description=(
"Search the monorepo for lines matching regex patterns. "
"Use this to find relevant code when the supplied evidence is "
"insufficient to diagnose the issue."
),
handler=_handle_search_repo,
params_type=SearchRepoParams,
skip_permission=True,
)
_handle_search_repo does a real ripgrep-style scan over the checked-out repo, returns hits with path, line, text. read_repo_file reads bounded snippets (default 40 lines of context) from a file the model names. Path resolution allows relative paths or unique-filename suffixes — the model can ask for dataframe.py and the tool finds sdk/data/sources/dataframe.py if it's the only match.
The bound repo_root matters. The tool can't escape the checkout (path traversal blocked at the resolver layer), can't read absolute paths, can't see ignored directories. Read-only by construction. The agent has every relevant lookup it needs and zero ability to do anything destructive.
This is what makes the analyst’s diagnoses grounded. Every file path it cites came from a real read_repo_file result. Every code reference was a real search_repo hit. The output is still model-synthesised, but the raw material is real.
The instruction fileDomain rules about how to read these specific logs aren’t in code; they live in log_analyze_instructions.md, loaded automatically and appended to every agent's system prompt. The file is short, opinionated, and mostly negative — it tells the models what not to do:
- “Treat GenericAbortError as a wrapper unless deeper evidence is missing."
- “Do not report wrapper messages as the root cause if the selected window contains earlier causal evidence.”
- “Prefer multiple small targeted investigations over one large unfocused pass.”
- “If the model owner is not internal, bias toward the model input path, not opaque model internals.”
These were learnt the expensive way. The first version of the analyser kept reporting “GenericAbortError” as the root cause for every failure. Technically true, completely useless. The wrapper-error rule fixed that. The third-party model rule came after watching the analyst speculate about model internals it could not read, when the actual bug was in the data pipeline feeding the model.
The rule I took from this: domain knowledge belongs in instructions, not code. Encode the rule once in markdown and every agent in the stack inherits it. The --agent-instruction and --agent-instruction-file flags let me steer per-run without editing the repo.
Streaming and timeoutsEach SDK call has a timeout: 180s for scout, 420s for analysis/review. They also use streaming events. Streaming matters for two reasons: progress logs appear in stderr while the model is still thinking, and if a turn times out before completing, the partial content can often be salvaged instead of throwing the whole investigation away.
The fallback chain when a turn times out:
- Did we get a final assistant message before timeout? Use it.
- Did we accumulate any streamed parts? Concatenate and use them.
- Can we read the latest assistant message from session history? Use that.
- None of the above? Raise — the run is genuinely lost.
I built this after the third time a seven-minute analysis call basically succeeded but threw on the timeout boundary. The work was done; the SDK just had not formally closed the turn. The fallbacks recover that work.
What came out of building itlog_analyze.py taught me most of what the triage agent in stage 2 needed:
- Tools beat prompts. Give the model real search_repo and read_repo_file, not a description.
- Deterministic preprocessing wins. Don’t make the model read 50 MB; pre-rank evidence and slice the window.
- Domain rules go in instructions, not code.
- Multi-agent isn’t just “more is better” — it’s specifically scout-cheap, analyst-strong, reviewer-different-family.
- Defensive parsing is part of the contract.
- Streaming + timeout-fallback turns flaky into robust.
The triage layer reuses build_repo_tools() directly. It shares the same search_repo / read_repo_file implementations as the analyst. It gets the same grounding for free. That code reuse is why the triage prompt can stay fairly short: the heavy lifting is in tools the analyser already proved out.
Stage 4: Copilot assignment and draft PRIf you made it here, thank you. This is actually the easy part.
Once an issue is deemed auto_fixable, the workflow assigns Copilot coding agent. It analyses the request in the cloud agent environment and opens a draft PR.
The thing I like is that there is still a human review point, just later. The workflow does not merge code. It only spends Copilot/GitHub minutes when the triage layer thinks there is a real edit to make.
Some open questions readers might haveWhy not use an off-the-shelf tool? The simple answer is that I didn’t want to. I had fun building this, and I learnt more by sitting in the annoying bits myself.
Could something like n8n have done this instead? Yes and no. It could have saved me time on the boring routing parts, and would have been a great choice if the pipeline was “Jira event in, GitHub issue out, maybe a Slack ping”. I still would have had to do my own work for the AWS infrastructure, the agent grounding needs custom code, and the Copilot auth dance still needs extra hip movement. I preferred the learning curve to be focused on building blocks rather than tools.
Why Jira? It is the workflow tool my company already uses. I wanted to minimise friction for non-engineer colleagues.
Why GitHub Copilot instead of OpenAI or Anthropic directly? Our code already lives in GitHub and we already have Copilot enabled, so it felt natural to try that route first.
Why do the S3 dance for logs? The bug reports already arrive with S3 links pointing to the relevant logs. Whatever orchestration tool I picked, I still had to get the logs out of S3 and into the analysis path.
Where it landsThe end-state is a pipeline that, on every Jira bug:
- Mirrors the ticket to a GitHub issue with the right team’s repo
- Mirrors any attachments to S3 with pre-signed URLs in the GitHub issue body
- Generates a repo map for grounding
- Routes the GitHub issue through Claude triage (cheap path) or log_analyze.py (heavy path)
- Posts a structured diagnosis comment
- Conditionally assigns Copilot coding agent when the issue is auto-fixable with high confidence
- Marks the issue auto-triaged to prevent double-handling
- Re-classifies and cross-repo-moves cleanly when the team label changes
- No-ops idempotently when nothing’s changed
Two LLMs, three tokens, two paths, one Lambda, one workflow. Most of the value isn’t in the model calls — it’s in the gates between them.
If you’re doing something similar: don’t try to make Copilot coding agent a triage tool. It’s a fix tool. Build the triage layer separately, and let it decide whether to hand off.
And if you’re plumbing webhooks into AWS and wondering why your auth isn’t working — curl it directly, layer by layer. The error code you see in the audit log is rarely from the layer you think it is.
Have you wired Copilot agents into a custom workflow? I’d love to hear what auth maze you got stuck in — and whether your triage layer is gating better than mine.
♦From Jira Bug to Draft PR was originally published in Code Like A Girl on Medium, where people are continuing the conversation by highlighting and responding to this story.