A Function Index to Improve Gemini CLI's Code Context

July 15, 2025 • Wietse Venema

Gemini CLI explores a new codebase similarly to how you would. It starts with a list of all files, then reads files and searches for keywords to learn more. The file tree acts as a basic map, guiding Gemini CLI as it navigates the project. I explored if it'd be helpful to provide an outline of all functions and classes in the codebase.

From File Tree to a List of Functions and Classes

I've detailed in a previous post how Gemini CLI starts with a recursive file listing and uses tools to explore the codebase. That file tree already tells a lot about the codebase, but it has its limitations.

For example, a file listing might show that src/core/engine.py exists, but it offers no clue as to what it does. Is it a game engine, a search engine, or a risk engine?

Once you add a list of classes and functions, it paints a much clearer picture. You might discover that engine.py is a recommendation engine for an online store:

--- src/core/engine.py ---
  Classes:
    - RecommendationEngine
  Functions:
    - load_user_purchase_history
    - find_similar_products_by_embedding
    - generate_homepage_recommendations
    - track_recommendation_click

While this example may seem contrived, the core point holds true even for descriptively-named files. There's always more semantic nuance in the function names. Does the user.py file have fetch_user_profile and get_account_balance, or rather batch_process_transactions and stream_realtime_data?

Building an Indexer with Pygments

I wrote a small, self-contained Python script that generates a function and class map. I relied on the Pygments library, an established syntax highlighter that supports hundreds of languages.

Pygments is a lexer, which means it scans your code and breaks it down into individual tokens. Because it's used for syntax highlighting, I'm assuming it's built to be very forgiving and can usually figure out what's what, even if the code isn't perfectly valid.

Here's what the script does in broad strokes:

# Main script execution
def main(directory_to_scan):
    # 1. Find all relevant source code files
    source_files = find_source_files(directory_to_scan)
    
    # 2. Analyze each file to find identifiers
    file_map = analyze_files(source_files)

    # 3. Print the results in a structured format
    print_formatted_map(file_map)

The code for processing an individual file looks similar the following:

# Function to analyze a single file
def analyze_file(file_path):
    code = read_file_content(file_path)
    lexer = guess_lexer_for_filename(file_path) # Using Pygments
    tokens = list(lexer.get_tokens(code))
    
    identifiers = set() 
    for i, (ttype, tvalue) in enumerate(tokens):
        # Check if the token is a standard Function or Class name
        if ttype in (Token.Name.Function, Token.Name.Class):
            identifiers.add(tvalue)
            
        # Handle TypeScript where `function` is a keyword
        # followed by the function name token
        elif ttype is Token.Name.Other:
            if i > 0 and tokens[i-1][0] is Token.Keyword.Declaration:
                 if tokens[i-1][1] in ("function", "class", "interface"):
                    identifiers.add(tvalue)

    return list(identifiers)

Instead of these incomplete snippets, you might want to read the complete script.

I tested the script with a Python codebase (google/adk-python) and a TypeScript codebase (google-gemini/gemini-cli), but it should work for codebases in many other languages. I should note that I already had to add some specific handling for TypeScript, because it didn't use the Token.Name.Function and Token.Name.Class in the same way as the Python lexer did. I expect similar tweaks will be necessary for other languages.

How to Use the Script with Gemini CLI

I saved the complete script as indexer.py in a ~/tools directory that is on my $PATH. This means I can run the script from any directory.

In Gemini CLI, I switch to shell mode by typing ! on an empty prompt and then run the script. When it finishes, I press the ESC key to return to prompt mode. The output of the script is now part of the conversation history, so Gemini CLI uses it as context for the rest of the session.

The Impact: Does It Actually Work?

While I don't have a formal benchmark, I do have a few observations to share. When starting a session with the list of functions and classes, Gemini CLI:

Needs fewer searches and locates relevant files more quickly, especially for questions where it's hard to come up with a good search term.
Seems less likely to reimplement logic that already exists elsewhere in the codebase

I also found a drawback: when Gemini CLI uses its search tool to explore, it sometimes leads to serendipitous discoveries. With fewer searches, I'm more likely to miss out on those.

When Search Alone isn't Enough

You might argue that Gemini CLI doesn't need a better map to begin with, because it can use its search tool. For example, if you ask it the following question:

Can you tell me how the memory feature is implemented?

If there are files in the file listing with 'memory' or a related term in the filename, it'll try to read them. If they're not there, it'll do a text search for the term 'memory' and find the relevant files anyway.

However, for broad, exploratory queries, search is less effective. Let's say you start a new session and ask:

Can you describe what this codebase does?

Now, there are no obvious keywords to search for. Gemini CLI generates a response based on the file and directory structure and any GEMINI.md files, without reading additional files, as demonstrated here:

What About the `--all-files` Flag?

Another way to provide more context is to use the --all-files flag when starting Gemini CLI. This reads the entire content of every file into the initial context. I cover this in more detail in my previous post. While powerful, this approach can be less efficient for large projects and may introduce noise.

Conclusion

A function and class index provides a middle ground between the built-in recursive file listing and ingesting the entire codebase. In my exploratory testing, it seems to lead to fewer searches and increased codebase awareness when editing.

* * *

More on: Gemini CLI: Google's coding agent for the terminal →