How I Use GitLab Duo CLI for Code Review — A Community Maintainer's Take

Disclosure: I'm a GitLab Community Contributor with a Duo Enterprise seat through the contributor program. Duo CLI requires a GitLab Premium or Ultimate plan with a Duo Pro or Duo Enterprise add-on. I'm not a GitLab employee, so this is my personal experience.

It Started with an Accidental Discovery

For a long time I only used GitLab Duo Chat on the web. The chat box on MR pages, asking it "what does this code do" or "write me a commit message." Useful enough, but nothing special.

Then one day I needed to review several modules at once, and clicking through the web UI one question at a time was painfully slow. I dug through the glab docs and found glab duo cli run. Tried it. Immediately thought: this is how it's meant to be used.

The biggest difference from the web version? You can script it. AI analysis becomes a repeatable process, so you stop sitting in front of a browser going back and forth.

CLI and web sessions are synced, which I did not expect. You can start a conversation on the web, then pick it up in the terminal with the /sessions command. Or the other way around. IDE extensions too. Three entry points, one conversation.

Your First Duo CLI Command

After installing glab (brew install glab or follow the official docs), Duo CLI has two modes:

Interactive mode opens an AI chat session in your terminal:

glab duo cli

Once inside, just type your questions. Use /model to switch models, /sessions to list previous conversations (including ones from the web UI or IDE), /new to start fresh.

Headless mode is for scripts and runs to completion:

glab duo cli run -C ~/my-project --goal "Summarize the architecture of this project in three sentences"

-C points to the repo. --goal is your instruction. It runs, prints the result, done.

My first reaction: "Oh, this is what I wanted." Output right in the terminal, ready to pipe, redirect, or feed into other tools. And since sessions sync to the web, I can always continue asking follow-up questions in the GitLab UI afterward without starting over.

Switching Models Changed Everything

When you use Duo Chat on the web, you don't really think about which model is behind it. But Duo CLI has a --model flag that lets you pick.

This seems minor. It's not.

Out of curiosity, I ran the same code review task against 9 different models. Here's what I found:

Claude Opus 4.6 returned 10.2 KB: the most thorough, covering every angle. Great when you want the full picture.
Claude Opus 4.7 returned 5.9 KB: concise and focused on key points. Good when you roughly know the situation and just want confirmation.
Gemini 3.5 Flash returned 4.4 KB: fastest response, but shallower. Good for a quick scan.
GPT 5.4 returned 0.3 KB and refused the task entirely. Safety guardrails blocked it. (GPT 5.5 handled the same task just fine.)

Same question, 2.3x difference between longest and shortest. Length is not the whole story either. Each model thinks differently.

My habit now: for any review that matters, I run at least two models and cross-compare. The cost difference is negligible (it's all Credits anyway), but I regularly find one model catching something the other completely missed.

# Same task, different models
glab duo cli run -C ~/project --goal "review error handling" --model claude-opus-4-6
glab duo cli run -C ~/project --goal "review error handling" --model claude-opus-4-7
glab duo cli run -C ~/project --goal "review error handling" --model gemini-3-5-flash

My Daily Workflow

Here's how this fits into my day. I use several AI CLI tools, each with different strengths:

Duo CLI is my breadth tool. Its biggest advantage is multi-model switching plus cost: it runs on GitLab Credits, which makes it the most economical option for anyone with an Enterprise seat. I break a large repo into modules and run a quick analysis on each, often with different models for cross-validation.

Claude Code is my depth tool. When Duo flags something as "this might be an issue," I hand it to Claude Code for deep analysis. It has cross-session memory, which helps for problems that need several rounds of back-and-forth.

Neither one replaces the other. They cover different ground. Duo CLI scans the landscape and flags suspicious spots. Claude Code digs into those spots. One does breadth, the other does depth.

A Trick That Made a Real Difference

Asking AI to "review this repo" cold isn't great. It'll wander through the file tree and git history on its own, not knowing what matters and what doesn't.

What I do instead: run git log first to find relevant commits, then feed the results into the prompt:

# Find relevant commits first
git log --all --grep='refactor\|fix\|validate' \
  --regexp-ignore-case --name-only \
  --format='%H %cs %s' > anchors.txt

# Feed commit history as context
glab duo cli run -C ~/project \
  --goal "Here's the recent commit history related to this area:
$(cat anchors.txt)

Based on these commits, is the refactoring direction consistent?
Are any modules being left behind?"

Why does this work so well? Because the AI now has concrete anchors: it knows which files changed, when, and why. Analysis stays focused on meaningful scope instead of wandering aimlessly.

This technique works especially well for tech debt assessment. Grep for TODOs, FIXMEs, and HACKs in your commit history, feed it to Duo, and ask it to analyze the distribution and severity. Way more precise than letting the AI hunt on its own.

Teaching the AI to Filter Its Own Noise

A common pain point with AI analysis: it reports a ton of findings, most of which are noise. You spend time running the analysis, then spend even more time manually filtering.

A simple fix I've learned: append validation criteria to your prompt so the AI self-screens before reporting.

glab duo cli run -C ~/project \
  --goal "Check the error handling in this module.

Before reporting each finding, self-check:
1. Does this issue exist under default configuration?
2. Is there already an existing mechanism handling this?
3. Does the impact extend beyond normal usage?

Mark anything that fails as 'not applicable' with an explanation."

The results can be dramatic. One task originally returned 21 findings. With validation criteria, the AI filtered out 19 on its own and only flagged 2 that actually mattered. The time I saved on filtering went straight into properly analyzing those 2 real issues.

Things I've Learned the Hard Way

Headless runs are fresh sessions. Each glab duo cli run starts clean. But you can use --existing-session-id to continue a previous session, or switch to interactive mode and use /sessions to find prior conversations. You can also inject files as context with --ai-context-items.

Credits run out. GitLab AI features are billed via Credits, and different plans have different quotas. Getting carried away with batch runs and forgetting to check your balance is a real thing that happens. Ask me how I know.

Model quality varies wildly. Same task, same prompt, and responses can differ by more than 2x. Don't draw conclusions from a single model's output on anything important.

Watch out for --dangerously-skip-permissions. You need this flag for batch automation (otherwise it pauses for manual confirmation every time), but as the name suggests, it skips all permission checks. Only use it on repos you trust.

Wrapping Up

On the web, GitLab Duo is a chat box. Many people try it, think "okay, it's fine," and move on. But the CLI entry point opens up a completely different way of working: repeatable, automatable, comparable analysis.

As a community maintainer, I use this workflow every day for review work. It won't replace your judgment, but it saves a massive amount of "first pass" time so you can focus your energy on the parts that actually require thinking.

If you're already on GitLab, glab duo cli run is worth a try. You might find, like I did, that once you go CLI, the web chat feels like going back to a flip phone.

Next up, I'll share more advanced patterns: breaking tasks into a matrix, handling timeouts, and automatically organizing results. Stay tuned if that sounds useful.