Automate Python Code Reviews with Free Local LLMs and GitHub Actions

Paying for GPT-4o or Claude API calls every time someone opens a pull request adds up quickly on a busy repo. A self-hosted Ollama instance on a machine you already own — or a GPU-enabled self-hosted GitHub Actions runner — lets you run a capable open-weight model for the cost of electricity. The result is a first-pass automated review that catches common Python issues and leaves a comment on the PR before any human reads the diff.

This is not a replacement for human review. An open-weight 7B model running locally will miss subtle concurrency bugs, architectural problems, and context it has never seen. What it reliably does is reduce the amount of low-signal noise a human reviewer has to wade through: undocumented parameters, obvious type mismatches, functions that shadow builtins, missing error handling in obvious paths. That alone is worth setting up if your team is small and review time is scarce.

The Shape of the Workflow

The basic loop has four parts: a GitHub Actions workflow triggers on pull_request, a Python script fetches the diff via the GitHub REST API, the script sends that diff to a locally-running Ollama server, and the response comes back as a PR review comment posted through the same API.

Here is a minimal workflow file for a self-hosted runner that has Ollama already installed and the model pre-pulled:

# .github/workflows/llm-review.yml
name: LLM Code Review

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - '**.py'

jobs:
  review:
    runs-on: self-hosted   # requires GPU runner with Ollama installed
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - name: Wait for Ollama
        run: curl --retry 10 --retry-delay 2 --retry-connrefused http://localhost:11434/api/tags

      - name: Run review script
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}
          MODEL: qwen2.5-coder:7b
        run: python scripts/llm_review.py

The paths filter limits runs to PRs that touch Python files, which avoids burning runner time on documentation-only changes. If your runner is not persistent (for example, you spin it up on demand), remove the Wait for Ollama step and replace it with the Ollama install script before the review step.

The Python review script does three things: fetch the diff, prompt the model, post the comment. Here is a stripped-down version:

# scripts/llm_review.py
import os, json, textwrap
import urllib.request

GITHUB_API = "https://api.github.com"
OLLAMA_URL = "http://localhost:11434/api/generate"

def gh(path, method="GET", body=None):
    token = os.environ["GH_TOKEN"]
    req = urllib.request.Request(
        f"{GITHUB_API}{path}",
        data=json.dumps(body).encode() if body else None,
        headers={
            "Authorization": f"Bearer {token}",
            "Accept": "application/vnd.github+json",
            "X-GitHub-Api-Version": "2022-11-28",
        },
        method=method,
    )
    with urllib.request.urlopen(req) as r:
        return json.loads(r.read())

def get_diff():
    repo = os.environ["REPO"]
    pr = os.environ["PR_NUMBER"]
    req = urllib.request.Request(
        f"{GITHUB_API}/repos/{repo}/pulls/{pr}",
        headers={
            "Authorization": f"Bearer {os.environ['GH_TOKEN']}",
            "Accept": "application/vnd.github.v3.diff",
        },
    )
    with urllib.request.urlopen(req) as r:
        return r.read().decode()

def ask_ollama(diff):
    prompt = textwrap.dedent(f"""
        You are a Python code reviewer. Review the following git diff for:
        - Bugs or likely runtime errors
        - Missing or incorrect type annotations
        - Functions that shadow Python builtins
        - Missing error handling in obvious paths
        - Style issues that violate PEP 8

        Be concise. List specific findings only. Do not repeat the diff back.
        If the change looks correct, say so briefly.

        DIFF:
        {diff[:12000]}
    """)
    payload = {"model": os.environ["MODEL"], "prompt": prompt, "stream": False}
    req = urllib.request.Request(
        OLLAMA_URL,
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
        method="POST",
    )
    with urllib.request.urlopen(req, timeout=300) as r:
        return json.loads(r.read())["response"]

def post_comment(body):
    repo = os.environ["REPO"]
    pr = os.environ["PR_NUMBER"]
    gh(f"/repos/{repo}/issues/{pr}/comments", method="POST", body={"body": body})

if __name__ == "__main__":
    diff = get_diff()
    if not diff.strip():
        print("Empty diff, skipping.")
    else:
        review = ask_ollama(diff)
        post_comment(f"**LLM first-pass review** (model: `{os.environ['MODEL']}`)\n\n{review}\n\n---\n*Automated review. Not a substitute for human review.*")

The diff is truncated at 12,000 characters before being sent to the model. For a 7B model with a 4K–8K context window, sending a 40-file diff wholesale will silently truncate or produce incoherent output. The 12,000-character ceiling keeps the prompt within a safe range for 7B models while still covering most single-feature PRs. For larger diffs, you can split by file and send one prompt per changed file, then aggregate.

Choosing a Model

Three models are worth considering for this specific task. The tradeoffs map directly to the RAM available on your runner.

qwen2.5-coder:7b is the practical default. It runs in approximately 6–7 GB of VRAM or RAM, fits on a consumer GPU (RTX 3060 or similar), and performs well on Python-focused tasks. Alibaba’s Qwen2.5-Coder series was explicitly trained on code, which matters more for targeted review work than general instruction-following ability.

mistral:7b is an acceptable alternative if you already have it pulled or if you want a model with stronger general-language generation for more verbose review comments. It is not specifically trained on code, so it will miss some language-specific patterns that a coder model catches, but its instruction-following is reliable.

qwen2.5-coder:32b or similar 30B+ models produce noticeably better reviews — they can reason about multi-file interactions and catch subtler bugs — but require roughly 22–24 GB of VRAM. That pushes you toward A100 or multi-GPU setups, which changes the cost calculus significantly.

For a first deployment, start with qwen2.5-coder:7b. You can upgrade the model string in the workflow env var without touching anything else.

Self-Hosted Runners and the Cold-Start Problem

If you run Ollama on a persistent self-hosted runner — a spare workstation, a homelab server, or a cloud VM you control — the model stays in memory between runs and job startup time drops to a few seconds. The runner registers with GitHub via github.com/settings/actions/runners and picks up jobs like any other runner.

The cold-start problem appears when you do not have a persistent machine. In that case, you have two options. First, install Ollama and pull the model at the start of every job:

- name: Install Ollama
  run: curl -fsSL https://ollama.com/install.sh | sh

- name: Pull model
  run: ollama pull qwen2.5-coder:7b &

- name: Start Ollama server
  run: ollama serve &

- name: Wait for server
  run: curl --retry 15 --retry-delay 3 --retry-connrefused http://localhost:11434/api/tags

This works on any Linux runner but adds several minutes per run. Second, cache the model files. Ollama stores models under ~/.ollama/models by default. You can cache that directory with actions/cache keyed on the model name, which reduces subsequent pull times to a cache-restore operation — usually under 30 seconds for a warm cache. The cache approach is documented in community workflows and is the most practical path for ephemeral runners.

For GPU runners on cloud providers, actuated.dev offers GPU-enabled ephemeral runners with NVIDIA driver pre-installed. That cuts driver setup time to roughly 30 seconds (cached) and keeps the security model of ephemeral environments while giving you access to the hardware Ollama needs for sub-minute inference on 7B models.

Honest Limits

Automated LLM review works best as a first filter, not a gate. A few specific limits to plan around:

A 7B model will miss logic bugs that require understanding the broader codebase context — any bug that requires tracing through three or four files is unlikely to be caught. The model only sees the diff, not the full project.

Hallucinated findings are real. The model will occasionally flag something as a bug that is intentional. Human reviewers need to treat the output as a checklist to consider, not a verdict to accept. Adding the disclaimer line to the posted comment (as in the script above) makes that expectation explicit.

Diff truncation silently degrades quality. If your PR changes 3,000 lines, the model sees only the first portion. You either need to split by file, raise the truncation limit (and accept worse performance on smaller context models), or move to a 32B+ model with a longer context window.

The model has no knowledge of your codebase conventions. It will not flag violations of internal style guides, project-specific API contracts, or patterns that are acceptable in your context but look wrong in isolation. A .github/REVIEW_GUIDELINES.md pasted into the system prompt can help — up to a point.

With those limits stated, the setup described here takes an afternoon to wire together and costs nothing ongoing if you have a machine to run it on. For teams where review bandwidth is the bottleneck, filtering out a third of the review noise before a human looks at a PR is a real productivity gain.

FAQ

Can I use GitHub-hosted runners instead of self-hosted?

You can install Ollama from scratch on a GitHub-hosted runner and pull the model during the job, but this adds 5-10 minutes to each run and incurs GitHub Actions minutes cost. For low-volume repos it is workable; for busy repos the latency and cost make a persistent self-hosted runner more practical.

What happens if the model produces a false positive and blocks a valid PR?

The workflow as written posts a comment but does not block merging. Treat it as a required-to-read note, not a required-status-check. If you later promote it to a status check, add human override instructions to the workflow so engineers can bypass it with a label or comment command.

How do I keep the model from reviewing auto-generated files or lock files?

Add a paths-ignore block to the workflow trigger, e.g. paths-ignore: ['**/poetry.lock', '**/requirements*.txt', '**/migrations/**']. You can also filter file extensions in the Python script before constructing the diff string sent to the model.

Automate Python Code Reviews with Free Local LLMs and GitHub Actions

The Shape of the Workflow

Choosing a Model

Self-Hosted Runners and the Cold-Start Problem

Honest Limits

FAQ

Aider vs Continue.dev: Terminal-First vs Editor-First AI Coding in 2026

MCP Servers Worth Wiring Into Your Editor in 2026

AI Code Review Tools Compared: CodeRabbit, Greptile, and Diamond in 2026

Using Claude Code Subagents for Parallel Refactoring: A Hands-On Workflow

Cline vs Roo Code: Comparing Open-Source Agentic Coding Extensions in 2026

Get the best tools, weekly