The RAG Challenge: A GitHub Copilot Experiment

Posted on Nov 21, 2024, 6 minute read

Tim Kitchens recently released an enlightening video comparing three coding assistants - Aider, Cursor, and Windsurf - in building a RAG (Retrieval-Augmented Generation) application, without any coding. Just a detailed prompt and then simple prompt. The prompt is shared by Tim Kitchens here and it uses textual version of Langchain’s RAG tutorial. However, one notable player was missing from this comparison: GitHub Copilot. As someone who’s been exploring various LLM tools, I couldn’t help but wonder how Copilot, especially with its recent improvements especially with the availability of Claude 3.5 sonnet would fare in this challenge. So, I decided to embark on this journey myself, armed with GitHub Copilot and its newly released Edits functionality.

The Challenge

The task seemed straightforward: recreate Tim’s RAG application using GitHub Copilot. But as they say, the devil is in the details. The application needed to process documents, create embeddings, and provide a clean interface for querying - all while following specific constraints about package versions and project structure.

First Attempts: When Things Go South

Remember when Thanos said “Reality is often disappointing”? Well, my first few attempts with Copilot felt exactly like that. Let me break down the journey:

Attempt #1: The UI Nightmare

My first attempt turned into what I’d call “The Great Virtual Environment Debacle.” As someone who isn’t primarily a Python developer, watching Copilot struggle with .venv setup and requirements.txt was like watching a cat try to swim - technically possible, but not pretty.

The problems started stacking up: virtual environment issues, dependency conflicts, and a seemingly endless stream of configuration tweaks. There came a point where my developer instincts screamed “Start over!” And like any good developer who knows when to cut their losses, I listened.

Attempts #2 & #3: The Directory Structure Saga

The next two attempts were what I’d call “The Tale of Two Directories” - or more accurately, too many directories. Copilot seemed determined to create a complex directory structure that would make even the most ardent microservices architect blush. By the third attempt, I realized I needed to be more explicit in my requirements for a flat structure.

The Fourth Time’s the Charm

Finally, on the fourth attempt, everything clicked. The key difference? Just four messages, and only one overlapping with Tim’s original attempts - dealing with BeautifulSoup’s content extension permissions. What made this attempt successful was the combination of:

Using the #terminalLastCommand helper to give Copilot context about terminal operations
Leveraging GitHub Copilot Vision (through the ‘Vision for Copilot Preview’ extension) - pasting a screenshot turned out to be the game-changer. While this requires setting up your API keys, having visual context capabilities puts Copilot on par with tools like Aider and Cursor in terms of understanding visual error contexts

The Power of Copilot Edits

Copilot Edits, while still in preview, showed promising capabilities. It’s like having a junior developer who’s eager to learn and quick to adapt. The integration with VSCode’s terminal, allowing direct command insertion, was particularly impressive. While it might not yet match Aider’s prowess in multi-file editing, it’s definitely getting there.

Tools and Models: A Level Playing Field

In this experiment, like Tim’s original comparison, I used Claude 3.5 Sonnet as the underlying model. This created a level playing field for comparing the tools themselves - Aider, Cursor, Windsurf, and in my case, GitHub Copilot. It’s interesting to note that while all these tools can leverage the same powerful model, their approaches to interfacing with it and handling multi-file projects differ significantly.

These differences in approach highlight both the strengths and areas for improvement in each tool. While Aider excels in file management and Cursor shines in its visual feedback, Copilot brings its own advantages through tight VSCode integration and familiar interface.

Looking Ahead

Building on these strengths, while GitHub Copilot might still be playing catch-up in some areas, particularly multi-file editing capabilities compared to tools like Aider, recent improvements are promising. The latest VSCode release has heavily focused on enhancing the Copilot experience, and Copilot Edits, though in its early stages, shows potential.

A Developer’s Guide to Copilot RAG Development

After playing around with both Copilot and Aider, I’ve discovered some tricks to make Copilot work more like its command-line cousin. Here’s how you can level up your Copilot game:

Setting Up Read-Only Context

One of Aider’s superpowers is its ability to handle read-only files in a session - something that developers often need but rarely get right. VSCode has recently caught up with this functionality¹, and here’s how you can leverage it:

{
    "github.copilot.chat.codeGeneration.instructions": [
        {
            // Add prompt file that contains RAG implementation details
            "file": ".prompts/rag_app_prompt.md"
        },
        {
            // Add reference documentation for context
            "file": "reference/langchainRag.txt"
        }
        // Instead of passing multiple files like this, we can also pass direct text:
        // {
        //     "text": "Follow these coding conventions: ..."
        // }
        // This is particularly useful when we want to provide read-only examples,
        // coding conventions, or other reference material like DB schemas
        // or API documentation
    ]
}

You can add this either to your global settings.json for all projects, or in .vscode/settings.json for project-specific context. I took Tim’s prompt from his video description and added LangChain’s RAG guide as reference material. This ensures Copilot always has this context available during our chats - like having a senior developer who’s always read the documentation.

Managing File Access with Copilot Edits

While Aider excels at managing file access during sessions, Copilot Edits has its own approach. You can explicitly control which files Copilot can modify - in my case, I limited it to .py files and requirements.txt. Everything else, including the crucial .env file, was off-limits. This is particularly important given Tim’s experience where Cursor accidentally exposed his OpenAI key - a reminder that even AI needs boundaries!

The Magic of Minimal Prompting

Once you’ve set up the proper guardrails, the actual development becomes surprisingly straightforward. Here’s what worked for me:

Start with a simple Create RAG app prompt
Use the #terminalLastCommand helper to feed error messages back to Copilot and only saying fix it
Make use of visual references - While Windsurf currently can’t process images, both Aider and Cursor handle them well. Thanks to the ‘Vision for Copilot Preview’ extension, Copilot has joined this club. After installing the extension and configuring your API keys, you can share screenshots of errors or expected outputs, making debugging sessions much more intuitive.

A funny moment came during the second iteration when the app couldn’t parse HTML pages. Instead of diving into complex debugging, I just pointed out the obvious: “It’s impossible that three different sites can’t be parsed with this logic.” Sometimes, a little common sense goes a long way in AI-assisted development!

Lessons Learned

Tool Interface Matters: How each tool interfaces with the model can significantly impact the development experience
Visual Context Helps: Copilot Vision can significantly improve understanding and solution generation
Keep It Simple: Sometimes, a flat structure is better than an over-engineered solution
Iteration is Key: Don’t be afraid to start over if things aren’t working

Final Thoughts

While GitHub Copilot might not yet be the one-stop solution for complex multi-file projects, it’s evolving rapidly. The combination of Copilot Edits, Vision, and integration with powerful models like Sonnet makes it a formidable tool in the AI-assisted development landscape. We’re witnessing the evolution of coding assistants, each finding their unique strengths and use cases.

May the AI Force be with you…

There is a good guide about providing convention files here ↩︎