Claude plays GeoGuessr
Jerry Wei
January 06, 2025.
Constantly seeking ways to evaluate and benchmark the capabilities of our AI models helps us understand their current performance and drives future development and improvements. In this spirit, I recently examined a fun task: How well can Claude play the popular web game GeoGuessr?
What is GeoGuessr?
For those who are unfamiliar, GeoGuessr is an online game that challenges players to guess the location of a randomly-selected Google Street View image. Players can pan the camera and navigate along roads to gather more context before placing their guess on a world map. The closer the guess is to the actual location, the higher the score.
Traditionally, GeoGuessr is a test of geographical knowledge, visual perception, and deductive reasoning. But could an AI excel at this task? I decided to find out.
Experimental setup
Preface: code for these experiments is publicly-available on my GitHub.
While the full GeoGuessr game allows for camera movement and map-based guessing, I simplified the task for my experiments. I presented Claude with a single static Google Street View image and asked it to directly output its guess as a latitude–longitude coordinate pair. I used imagery from the OpenStreetView-5M dataset, which contains 5.1 million images spanning 225 countries; each image is annotated with latitude–longitude coordinates representing its geolocation. For these experiments, I used the test set containing 210,122 image–coordinate pairs.
I briefly iterated on prompts using the Anthropic workbench and ended up with the following prompt that instructs Claude to first analyze the image, then use chain-of-thought to come up with a set of predicted coordinates for the image.
The assistant is playing GeoGuessr, a game where an image of a random Google Street View location is shown and the player has to guess the location of the image on a world map.
In the following conversation, the assistant will be shown a single image and must make its best guess of the location of the image by providing a latitude and longitude coordinate pair.
Human: [INSERT IMAGE] Here is an image from the Geoguessr game.
* Please reason about where you think this image is in <thinking> tags.
* Next, provide your final answer of your predicted latitude and longitude coordinates in <latitude> and <longitude> tags.
* The latitude and longitude coordinates that you give me should just be the `float` numbers; do not provide any thing else.
* You will NOT be penalized on the length of your reasoning, so feel free to think as long as you want.
Assistant: Certainly! I'll analyze the image in <thinking> tags and then provide my reasoning and final estimate of the latitude and longitude.
<thinking>
Scoring Metric
To evaluate Claude's performance, I used the scoring function from GeoGuessr's in-game metric:
score = 5000 * exp(-distance / 1492.7)
Here, distance is the Haversine distance (in kilometers) between Claude's guessed coordinates and the ground-truth location. This exponential-decay function heavily penalizes errors at close ranges but is more forgiving of misses at large scales. For example, guessing 250 km away from the correct answer versus 200 km away from the correct answer will have a much greater impact on the score than being 1,050 km away versus 1,000 km away.
Models used
I used the following models from the Claude-Haiku* and Claude-Sonnet families. I didn't test Claude-Opus models because of the large inference cost.
* I used an internal version of Claude-3.5-Haiku with multimodality enabled; this model is not available to the public yet as of this blog post's publication.