Mapping the Future: Using AI to Link Test Cases and Requirements

By Arrhen Knight | Published on April 9, 2025

Meta AI Grok Automation Testing Experiment

I’m exploring AI-driven automation with a script I built as a personal experiment to tackle a common challenge in software testing: linking test cases to requirements. This task can be tedious and error-prone in hypothetical software projects, but with the help of an AI API, I’ve turned it into a streamlined, efficient workflow for this hobby project. Let’s dive into how this tool works, the hurdles I faced, and why it’s a significant improvement over manual methods.

The Old Way: Manual Mapping Challenges

Ensuring that test cases align with requirements is a key step in validating software functionality in theoretical projects. Historically, this meant manually reviewing each test case—its title, description, and steps—and matching it to a corresponding requirement based on shared intent or functionality. For a sample project with hundreds of test cases and requirements, this could take weeks of effort.

The process was slow, and mistakes were common. I might miss a connection due to subtle differences in wording, or link a test case to the wrong requirement, leading to gaps in validation. Scaling this work was difficult, and collaboration could introduce inconsistencies. I knew AI could help, so I set out to build a script that would automate this mapping as a personal challenge using an AI API known for its conversational abilities.

A Smarter Solution: AI-Powered Mapping

The script I developed takes two synthetic, non-proprietary datasets—one containing sample test cases and another with mock requirements—and uses AI to determine which test cases validate which requirements. Here’s how it works:

Data Loading: It starts by reading two spreadsheets—one with test cases (including titles, descriptions, and steps) and another with requirements (summaries and descriptions), all created as synthetic examples.
Pre-Filtering: To optimize performance, the script uses a text similarity algorithm (Jaccard similarity) to pre-filter requirements that are likely matches for each test case, reducing the number of API calls.
AI Analysis: For each test case and its filtered requirements, the script sends a prompt to the AI API, asking it to score the functional similarity (0-1) between the test case and requirement. The prompt focuses on intent and core functionality, ignoring implementation details.
Fallback Mechanism: If the AI doesn’t find strong matches, the script falls back to text similarity, ensuring no test case goes unmapped.
Output: The script updates the test case dataset with matched requirement IDs, saving the results incrementally to handle large synthetic datasets without data loss.

The script includes robust error handling—retries for API failures, a watchdog timer to prevent infinite loops, and detailed logging to a CSV file for debugging. It even estimates token usage and costs, which is handy for managing API expenses in this experiment.

The Transformation: Speed and Accuracy

Before AI, mapping a single test case to requirements in a hypothetical scenario could take 10-20 minutes, especially for complex examples. With this script, it’s down to seconds per test case. For a synthetic dataset of 1,000 test cases, what might have taken days now finishes in a few hours. The AI’s scoring system ensures more consistent matches, catching connections I might have missed manually—like linking a test case about data input to a requirement about validation, despite different phrasing.

The script also handles edge cases better than manual efforts. It pre-filters unlikely matches to save API calls, and its fallback mechanism ensures every test case gets mapped, even if the API score is low. Plus, the logging gives me a clear audit trail of every decision, which is great for reviewing this hobby project.

Automating this mapping with AI felt like handing off a mountain of paperwork to a super-smart assistant. It’s not just faster—it’s smarter, catching nuances I’d miss after hours of staring at sample spreadsheets.

Challenges and Reflections

Building this script wasn’t without challenges. The API sometimes struggled with vague or overly technical descriptions in the synthetic data, so I had to refine the prompt to focus on functional intent. Rate limits were another hurdle—I added batch processing and retries with exponential backoff to keep things running smoothly. Ensuring the script could handle large mock datasets required careful design, like incremental saving and a watchdog timer to prevent timeouts.

Looking back, the pre-AI era of manual mapping feels outdated for this kind of experiment. This script has saved me countless hours, improved the accuracy of my hypothetical testing process, and given me confidence that the test cases truly validate the mock requirements. It’s another step in my personal journey to harness AI for software exploration, and I’m excited to see how I can push this further—maybe by adding AI-driven test case generation next. Stay tuned for more updates on my hobby adventures with AI!