A Function Index to Improve Gemini CLI's Code Context
Gemini CLI explores a new codebase similarly to how you would. It starts with a list of all files, then reads files and searches for keywords to learn more. The file tree acts as a basic map, guiding Gemini CLI as it navigates the project. I explored if it'd be helpful to provide an outline of all functions and classes in the codebase.

From File Tree to a List of Functions and Classes
I've detailed in a previous post how Gemini CLI starts with a recursive file listing and uses tools to explore the codebase. That file tree already tells a lot about the codebase, but it has its limitations.
For example, a file listing might show that src/core/engine.py
exists, but
it offers no clue as to what it does. Is it a game engine, a search
engine, or a risk engine?
Once you add a list of classes and functions, it paints a much clearer picture.
You might discover that engine.py
is a recommendation engine for an online
store:
--- src/core/engine.py ---
Classes:
- RecommendationEngine
Functions:
- load_user_purchase_history
- find_similar_products_by_embedding
- generate_homepage_recommendations
- track_recommendation_click
While this example may seem contrived, the core point holds true even for
descriptively-named files. There's always more semantic nuance in the function
names. Does the user.py
file have fetch_user_profile
and
get_account_balance
, or rather batch_process_transactions
and
stream_realtime_data
?
Building an Indexer with Pygments
I wrote a small, self-contained Python script that generates a function and class map. I relied on the Pygments library, an established syntax highlighter that supports hundreds of languages.
Pygments is a lexer, which means it scans your code and breaks it down into individual tokens. Because it's used for syntax highlighting, I'm assuming it's built to be very forgiving and can usually figure out what's what, even if the code isn't perfectly valid.
Here's what the script does in broad strokes:
# Main script execution
def main(directory_to_scan):
# 1. Find all relevant source code files
source_files = find_source_files(directory_to_scan)
# 2. Analyze each file to find identifiers
file_map = analyze_files(source_files)
# 3. Print the results in a structured format
print_formatted_map(file_map)
The code for processing an individual file looks similar the following:
# Function to analyze a single file
def analyze_file(file_path):
code = read_file_content(file_path)
lexer = guess_lexer_for_filename(file_path) # Using Pygments
tokens = list(lexer.get_tokens(code))
identifiers = set()
for i, (ttype, tvalue) in enumerate(tokens):
# Check if the token is a standard Function or Class name
if ttype in (Token.Name.Function, Token.Name.Class):
identifiers.add(tvalue)
# Handle TypeScript where `function` is a keyword
# followed by the function name token
elif ttype is Token.Name.Other:
if i > 0 and tokens[i-1][0] is Token.Keyword.Declaration:
if tokens[i-1][1] in ("function", "class", "interface"):
identifiers.add(tvalue)
return list(identifiers)
Instead of these incomplete snippets, you might want to read the complete script.
I tested the script with a Python codebase
(google/adk-python) and a TypeScript
codebase
(google-gemini/gemini-cli), but it
should work for codebases in many other languages. I should note that I already
had to add some specific handling for TypeScript, because it didn't use the
Token.Name.Function
and Token.Name.Class
in the same way as the Python lexer
did. I expect similar tweaks will be necessary for other languages.
How to Use the Script with Gemini CLI
I saved the complete script as indexer.py
in a ~/tools
directory that is on my $PATH
. This means I can run the script from any
directory.
In Gemini CLI, I switch to shell mode by typing !
on an empty prompt and then
run the script. When it finishes, I press the ESC
key to return to prompt mode.
The output of the script is now part of the conversation history, so Gemini CLI
uses it as context for the rest of the session.
The Impact: Does It Actually Work?
While I don't have a formal benchmark, I do have a few observations to share. When starting a session with the list of functions and classes, Gemini CLI:
- Needs fewer searches and locates relevant files more quickly, especially for questions where it's hard to come up with a good search term.
- Seems less likely to reimplement logic that already exists elsewhere in the codebase
I also found a drawback: when Gemini CLI uses its search tool to explore, it sometimes leads to serendipitous discoveries. With fewer searches, I'm more likely to miss out on those.
When Search Alone isn't Enough
You might argue that Gemini CLI doesn't need a better map to begin with, because it can use its search tool. For example, if you ask it the following question:
Can you tell me how the memory feature is implemented?
If there are files in the file listing with 'memory' or a related term in the filename, it'll try to read them. If they're not there, it'll do a text search for the term 'memory' and find the relevant files anyway.
However, for broad, exploratory queries, search is less effective. Let's say you start a new session and ask:
Can you describe what this codebase does?
Now, there are no obvious keywords to search for. Gemini CLI generates a response
based on the file and directory structure and any GEMINI.md
files, without
reading additional files, as demonstrated here:

What About the --all-files
Flag?
Another way to provide more context is to use the --all-files
flag when
starting Gemini CLI. This reads the entire content of every file into the
initial context. I cover this in more detail in my
previous post. While powerful, this
approach can be less efficient for large projects and may introduce noise.
Conclusion
A function and class index provides a middle ground between the built-in recursive file listing and ingesting the entire codebase. In my exploratory testing, it seems to lead to fewer searches and increased codebase awareness when editing.
* * *
More on: Gemini CLI: Google's coding agent for the terminal →