GitSage: An AI Agent for Automated Release Notes

Turn messy commit messages into polished release notes automatically

Patrick Kalkman

Jan 14, 2025 — 13 min read

Making sense of countless Git commits for release notes—image generated by Midjourney.

There has to be a better way to write release notes, I thought, staring into the abyss of my Git commit history. I had just wrapped up a big update to an Android TV app, and the familiar grind was upon me: translating commit messages into something users would care about.

Release notes aren’t just a time sink; they’re a bridge between developers and users. They turn technical jargon into updates that inform, engage, and build trust.

But what if they could write themselves?

Having previously developed AI agents for tasks like time registration, I recognized an opportunity to automate this process. This led me to build GitSage, an AI agent designed to make release notes write themselves.

GitSage takes a different approach. Drawing on my experience building AI agents, I designed it to combine large language models (LLMs) with code analysis, creating an automated system that complements existing development practices.

In this article, we will explore how GitSage transforms repository data into release notes without imposing constraints on your development workflow. You’ll learn how it analyzes Git commits, interprets code changes, and generates documentation that effectively serves developers and end-users.

Let’s start by exploring how GitSage’s architecture addresses these challenges, starting with its fundamental approach to automated documentation.

Developers eager to explore GitSage’s capabilities firsthand will find a walkthrough in the “Seeing GitSage in Action” section below. The implementation, including source code and documentation, lives in this GitSage’s GitHub repository.

The GitSage approach

When I designed GitSage, one of the key architectural decisions I faced was determining its level of agency — the degree of autonomy and independent decision-making it would possess.

Understanding agency in AI systems

In agent-oriented software, like GitSage, agency refers to the degree of autonomy and goal-directed behavior an agent exhibits.

As Lin Padgham and Michael Winikoff explain in Developing Intelligent Agent Systems: A Practical Guide:

An intelligent agent is a software entity that operates autonomously, responds to changes in its environment, and proactively pursues its goals. Its ability to adapt and make decisions independently defines its level of agency.

This concept manifests as a spectrum in modern AI systems. At one end, low-agency systems follow predetermined pathways with minimal deviation. On the other, high-agency systems can analyze situations, adjust their approach, and make autonomous decisions.

Choosing an agent architecture: workflow vs. LLM-driven

This brings us to the next crucial architectural choice: selecting the right approach to building an agent. Broadly speaking, there are two main paradigms.

Workflow-based systems: These systems rely on a predefined sequence of steps or a workflow. Within this structure, you might use an LLM for specific tasks, such as generating text or classifying information. However, the overall flow and decision points are pre-programmed.
LLM-driven systems (or LLM-as-engine systems): In this approach, the LLM itself acts as the core engine driving the agent’s behavior. You give the LLM access to a set of tools (e.g., APIs, external programs) and instructions, and it determines the best course of action based on the current context and its overall objective. It decides which tools to use, in what order, and how to interpret the results.

GitSage’s architecture

I designed GitSage as a workflow-based agent system using LangChain’s framework, balancing agency with reliability. The system maintains moderate agency. It is free enough to make intelligent decisions about commit analysis and documentation generation, but constrained within a workflow structure.

This balanced approach emerged from several practical considerations.

First, release note generation benefits from clearly defined processing stages. Each commit needs to be discovered, analyzed, and transformed into user-friendly content in a predictable sequence. A workflow architecture naturally maps to these distinct phases while maintaining flexibility within each stage.

Second, LangChain’s framework provides tools for building AI workflows. Having used it in previous projects, I appreciated its ability to maintain state across processing steps while offering control over each phase of the pipeline.

The core workflow

GitSage’s workflow comprises three main stages:

A flowchart showing three connected stages of GitSage’s workflow: “1. Retrieve Git commits” (in green) flows to “2. Analyse commit message & code” (in pink), which leads to “3. Release Notes Generation” (in white), with “Release Notes” as the final output — The core workflow of GitSage, image by the author

Commit Retrieval: The workflow begins by collecting Git commits, establishing a view of changes since the last release. This stage handles various repository states, from tagged releases to ongoing development.
Intelligent Analysis: Rather than enforcing rigid commit conventions, GitSage analyzes both commit messages and code changes. This dual analysis provides context-aware understanding of each update, extracting insights even from informal commit messages.
Release Notes Generation: Using the gathered context and analysis, GitSage generates structured release notes that capture both the technical changes and their significance to users.

With this understanding of GitSage’s high-level approach, let’s dive into the technical architecture that makes it possible.

Technical architecture

At its core, GitSage leverages LangGraph, a library for creating complex, multi-agent workflows, to implement a node-based architecture.

This architecture divides the release note generation process into five interconnected nodes that work together.

Recall our initial three-step workflow: Commit Retrieval, Analyze Commit & Code, and Release note generator. GitSage refines this process by decomposing it into five LangGraph nodes: Commit Discovery, Analysis Planning, Code Context, Analysis, and Release Notes Renderer.

Each node contributes its specific piece of information to a central hub called the “Agent State.” Think of the Agent State as the shared memory of GitSage, holding all the data gathered and processed by each node.

This design offers several advantages. First, it provides modularity. We can develop, test, and improve each node independently, making the system more maintainable and easier to update. Second, the shared Agent State enables easy information sharing between nodes.

Nodes can access and use data generated by other nodes, leading to a more holistic and context-aware understanding of the codebase changes. This allows GitSage to generate more accurate, insightful, and user-friendly release notes than would be possible with a simpler, linear approach.

The architecture diagram below illustrates how these five nodes interact and contribute to the central Agent State, creating a dynamic flow of information throughout the release note generation process.

The workflow of nodes of GitSage, image by the author

This node-based architecture forms a progression, with each node enriching the Agent State with increasingly refined information about the repository and its changes.

The vertical flow represents the primary processing sequence, while the diagonal arrows show how each node contributes specialized data to the shared state.

Information flow and state management

Each numbered flow in the diagram represents a distinct contribution to the Agent State:

CommitMetadata: The Commit Discovery node initializes the state with fundamental repository information, including commit history, version tags, and reference points. This forms the foundation for all subsequent analyses.
AnalysisPlan: The Analysis Planning node employs an LLM to evaluate commit message quality. It examines each message’s clarity and completeness, determining whether it provides a sufficient context for release note generation. The node enriches the state with these assessments and strategic decisions about how to process each commit. When a commit message lacks clarity, the node flags the commit for additional code analysis.
CodeInsights: The Code Context node adds technical depth to our understanding, contributing analysis of API changes, dependency updates, and schema modifications. This becomes crucial for commits flagged during the planning phase as needing additional context.
ImpactAssessment: The Analysis node performs an intelligent synthesis of all available information. For commits with coherent messages, it directly transforms these into release note content. However, for commits flagged as needing additional context, it combines the original commit message with code change insights, using another LLM interaction to generate release note descriptions. This dual-path approach ensures quality regardless of the initial commit message clarity.
FormattedContent: Finally, the Release Notes Renderer node transforms all insights into polished documentation, adding structured content ready for different output formats.

Now that we understand the overall architecture, let’s examine each component to see how they work together.

Node architecture deep dive

In this section, we will dive into the details of the state of our workflow and the nodes. We will first start with the state and then continue with each node in order.

AgentState

Effective state management is crucial for GitSage’s ability to generate release notes. The Agent State, a shared memory that acts as the central hub, enables nodes to share information seamlessly.

We structure the state as a TypedDict. This is a dictionary from Python's typing module that enforces specific data types for each entry. This ensures, for example, that commit messages are stored as a list of CommitInfo classes, and generated descriptions as single strings.

class AgentState(TypedDict): 
    # Commit Discovery Output 
    commits: List[CommitInfo] 
    commit_count: int 
    context: str 
    last_tag: Optional[str] 
     
    # Analysis Planning Output 
    analysis_plan: AnalysisPlan 
     
    # Code Context Output 
    code_context: CodeContext 
     
    # Analysis Output 
    impact_analysis: ImpactAnalysis

1. Commit discovery node

The Commit Discovery Node forms our entry point, using GitPython to interface with repositories.

It intelligently handles various repository states, from tagged releases to ongoing development branches. The node’s implementation focuses on reliable commit retrieval and tag management. See the implementation for more details.

2. Planning node

The Planning node is our first integration point with LLMs, serving as the intelligence layer that evaluates commit message quality.

I used Groq’s LLM service through LangChain, using their Mixtral-8x7b-32768 and Llama-3.2–3b-preview models. These models provide an excellent balance of performance and reliability for our commit analysis.

During development, I encountered several challenges with LLM response handling. Initially, the models would occasionally return additional fields or vary their output structure, which could break downstream processing.

More subtly, they sometimes attempted to escape certain characters in the JSON output, leading to parsing errors. For instance, newlines in commit messages might be escaped as ‘\n’, creating invalid JSON structures. The solution emerged through prompt engineering, explicit output constraints, and robust response validation.

Here’s the crafted prompt template that solved this challenge:

MESSAGE_ANALYSIS_TEMPLATE = """ 
You are an expert at analyzing Git commits for clarity and completeness. 
Given a commit message, analyze its effectiveness in communicating changes. 
 
Commit message to analyze: 
{commit_message} 
 
Return a strict JSON response with exactly these fields as shown in this example: 
{ 
    "message_clarity": 0.8, 
    "needs_code_review": false, 
    "suggested_improvements": ["Add more context about the feature",  
                             "Include related ticket numbers"], 
    "is_breaking_change": false 
} 
 
Important JSON formatting rules: 
1. message_clarity must be a float between 0 and 1 
2. needs_code_review must be a boolean 
3. suggested_improvements must be an array of strings 
4. is_breaking_change must be a boolean 
5. Do not add any additional fields 
6. Keep it as a single-line JSON without pretty printing 
"""

For more details about the implementation of this node, see the implementation of this node in GitHub.

3. The code context node

The Code Context Node performs an analysis of repository changes to extract meaningful insights about code modifications. Rather than simply tracking line additions and deletions, it examines structural changes to identify significant updates like API modifications, schema alterations, and dependency changes.

This technical analysis becomes valuable when dealing with commits that have minimal or ambiguous descriptions.

The node implements a weighted analysis system that recognizes the varying importance of different changes. For instance, modifications to configuration files or API endpoints carry more weight than updates to test files or documentation. This ensures that the generated release notes focus on changes that most impact users and developers.

For a detailed look at the implementation, including the specific algorithms and optimization strategies used, you can explore the Code Context Node source code in our GitHub repository.

4. Analysis node

The Analysis Node is the heart of GitSage’s intelligence, it transforms raw technical data into meaningful release notes content.

The node combines commit information with technical context from previous stages, using a LLM to generate clear, user-focused descriptions of changes.

The node again uses Groq’s LLM service, continuing the approach established in the Planning Node. However, the Analysis Node faces a challenge: it must produce consistently structured output while handling varying levels of input quality and technical complexity.

The key to solving this challenge lies in careful prompt engineering. Here’s our prompt template:

CHANGE_ANALYSIS_TEMPLATE = """ 
You are an expert at analyzing software changes for release notes. 
Analyze this change combining commit information and technical context. 
 
Commit Information: 
{commit_info} 
 
Technical Context: 
{technical_context} 
 
Return a strict JSON response with exactly these fields: 
{ 
    "title": "A clear concise title without quotes or special characters", 
    "description": "A clear description without file paths or technical details", 
    "impact": "A user-focused impact description", 
    "breaking": false 
} 
 
IMPORTANT FORMATTING RULES: 
1. Never use nested quotes in strings 
2. Remove any underscores or special characters 
3. Keep all content on a single line 
4. Avoid technical details like file paths 
5. For the title field: 
   - Keep it short and clear 
   - No technical terms 
   - No quotes inside the value 
6. For the description field: 
   - Focus on what changed, not how 
   - Avoid listing files or paths 
7. For the impact field: 
   - Focus on user-visible changes 
   - Keep it simple and clear 
 
Example with good formatting: 
{ 
    "title": "Updated channel logo styling", 
    "description": "Changed the background color of channel logos to improve visibility", 
    "impact": "Channel logos now have a clearer background making them easier to read", 
    "breaking": false 
} 
"""

The development of this prompt template revealed several challenges that might seem minor but proved critical for reliable operation.

Initial versions would occasionally generate malformed JSON when commits contained quotation marks or special characters. We also found that the LLM would sometimes include technical file paths or implementation details in user-facing descriptions.

To address these issues, we implemented formatting rules in our prompt:

Explicitly prohibiting nested quotes in strings
Removing special characters that could break JSON parsing
Keeping all content single-line to prevent formatting issues
Providing an example of the desired output format

The node processes each commit chronologically, maintaining a careful balance between technical accuracy and user comprehension. For commits flagged as needing additional context, it incorporates the technical analysis from the Code Context Node, enabling descriptions even when commit messages are terse.

For the complete implementation, including error handling and state management details, you can refer to the Analysis Node source code in our GitHub repository.

5. Release notes renderer node

The Release Notes Renderer represents the final stage in GitSage’s pipeline. It transforms our structured analysis into polished documentation. While the current implementation generates Markdown, we designed it with extensibility in mind, allowing for change to different output formats.

The node organizes content into distinct sections, each serving a specific purpose in the release notes:

# Project Name Release Notes v1.2.3 
Generated on: 2025-01-08 
 
## Summary 
This release includes 15 changes: 
- 2 breaking changes 
- 13 improvements and fixes 
 
## ⚠️ Breaking Changes 
- **Updated Authentication Flow** 
  Changed the OAuth token handling process 
  Impact: Applications need to update their token refresh logic 
 
## Changes & Improvements 
- **Enhanced Search Performance** 
  Optimized database queries for faster search results

This structured approach ensures consistent organization while maintaining readability. Breaking changes receive special attention, marked with a warning emoji and including specific impact statements to help users understand the implications of updates.

While we currently output Markdown for its compatibility, the architecture supports multiple output formats. Adding new formats requires only implementing additional rendering functions, without modifying the core analysis pipeline.

For instance, we could easily extend the system to generate HTML, PDF, or even automated release announcements for platforms like Slack or Discord.

For the complete implementation, including template management and metadata handling, you can explore the Release Notes Renderer source code in our GitHub repository.

With the technical foundation established, let’s see how these components come together in practice.

Seeing GitSage in action

Setting up and running GitSage takes advantage of UV’s approach to Python package management. This modern tool chain simplifies the setup process while ensuring dependency resolution across environments.

Environment setup

Start by installing UV, which combines dependency management and execution in a single tool.

curl -LsSf https://astral.sh/uv/install.sh | sh

Next, clone the GitSage repository to your local environment:

git clone https://github.com/PatrickKalkman/gitsage.git 
cd gitsage

Configuration

GitSage uses the Groq API for its language model capabilities. Create a .env file in your project root with your API credentials. Get your free API key here.

echo "GROQ_API_KEY=your-api-key-here" > .env

Running GitSage

UV’s run command handles both dependency installation and execution, streamlining the entire process. To generate release notes for the latest GitSage release, use the following command.

uv run python ./src/gitsage/workflow.py --repo-path .

An animated GIF that shows a terminal running GitSage while creating release notes of itself. — GitSage in action, creating the release notes of itself, image by author

The generated release notes reveal GitSage’s analysis capabilities. For version 0.8.2, the tool identified two significant improvements, automatically categorizing their impact and presenting them in a structured format:

# Gitsage Release Notes v0.8.2 
Generated on: 2025-01-08 
 
## Summary 
 
This release includes 2 changes: 
- 2 improvements and fixes 
 
**Note:** This release has been marked as `high` risk. 
 
## Changes & Improvements 
 
- **Fixed an issue with a test**   
  A fix was implemented to resolve an issue discovered during testing 
 
- **Code Prompt Removed from Planning Node, CLI Arguments Added**   
  The code prompt has been removed from the planning node and command line arguments can now be provided to the tool

Notice how GitSage automatically evaluates the release’s risk level and organizes changes into clear categories. The tool transforms technical commits into user-friendly descriptions, maintaining essential details while eliminating unnecessary complexity.

Each change receives a concise yet informative explanation, helping users understand the impact of updates without drowning in technical specifics.

This output shows GitSage’s core strength: its ability to bridge the gap between developer activities and user documentation, creating release notes that serve both technical and non-technical audiences effectively.

GitSage supports additional parameters for fine-tuned control over the analysis process. Here’s an example that shows these options:

uv run python ./src/gitsage/workflow.py \ 
  --repo-path /path/to/your/repo \ 
  --output-dir release_notes \ 
  --model mixtral-8x7b-32768 \ 
  --verbose

While GitSage already offers powerful capabilities for automated release notes, there are several exciting opportunities for enhancement.

Future directions and opportunities

Through developing and testing GitSage, I’ve identified several key areas for enhancing automated release note generation. These improvements aim to make GitSage faster, more insightful, and easier to use.

Enhanced code analysis and shift towards an agentic architecture

The rapid evolution of language models presents a major opportunity to enhance GitSage. I plan to transition from the current pattern-based approach with targeted LLM use to a more agentic, LLM-driven architecture.

In this new model, a powerful LLM like Claude, GPT-4, or an advanced open-source alternative will be the central engine. Instead of a fixed workflow, the LLM will leverage a set of tools, such as direct diff analysis, commit message parsing, and potentially codebase interaction tools, to generate release notes.

This shift promises deeper insights into code changes, greater adaptability to different project structures, and potentially a more sophisticated understanding of the intent behind the code.

Performance optimization through intelligent batching

Performance optimization is a next step. The current approach of evaluating each commit individually, while thorough, can be time-consuming.

Implementing intelligent batching, where we group related commits and analyze them together, offers the potential for speed improvements.

This approach would use the large context windows of modern language models to reduce the number of API calls.

Streamlined distribution and CI/CD integration

Making GitSage more accessible is a priority. While the current source-based approach serves developers, packaging GitSage as a command-line interface (CLI) tool would significantly lower the barrier to entry for a wider range of users.

Integrating GitSage with GitHub Actions would enable automated release note generation directly within existing CI/CD pipelines. This would make GitSage a part of the release process, saving developers valuable time and effort.

The future of automated release notes

GitSage’s architecture provides a foundation for these enhancements, all while maintaining its core promise of transforming technical changes into clear, meaningful documentation.

As development practices and language models continue to advance, automated documentation tools like GitSage will play an important role in streamlining the software release process.

I see a future where developers can focus more on building great software, leaving writing release notes to intelligent tools like GitSage.

The complete implementation, including source code and documentation, lives in this GitSage’s GitHub repository.