Blog | Using GPT-3 to Write Captions Based on Image Keywords, People, Albums, and Locations

Using GPT-3 to Write Captions Based on Image Keywords, People, Albums, and Locations

Have you ever wanted to caption your photos automatically? With the GPT-3 Davinci model from OpenAI, you can do just that! By using image keywords, people, locations, and the album name, you can use AI/ML to generate captions that are not only descriptive, but also entertaining (and frequently hilariously wrong).

In this post, I’ll explore the capabilities of GPT-3 for writing captions based on image data, and how it can add a new dimension to your photos.

Extracting Image Keywords and Locations

Before we dive into using GPT-3 to write captions, let's first understand the importance of image keywords and location data. Image keywords and locations provide context and background information about a photo, which can greatly enhance the storytelling aspect of the photo. This data is extracted by Apple Photos automatically and stored in SQLite on your device.

Using GPT-3 for Caption Generation

GPT-3 uses advanced natural language processing to get an understanding of what data is in the image and guess at what a good caption would be.

GPT-3 uses a type of artificial intelligence called transformer networks to generate human-like text. When provided with a prompt and the image's keywords and locations, GPT-3 can generate a descriptive and fitting caption for the photo.

For example, in this photo of Les Claypool playing Coachella, OpenAI generated the following title:

"Rocking Out at Coachella"

And writes this caption from the keywords, album, location, and person info:

Les Claypool performs at Coachella on April 17, 2010 in front of a crowd of cheering fans. He is wearing a hat and colorful clothing, strumming a guitar and basking in the spotlight. Music is his recreation and passion, and he is proving it with a stunning performance.

As you can see, GPT-3 creates a descriptive and somewhat fitting caption. It’s technically wrong as Les is a bassist and wearing all black, but it’s pretty darn close.


In conclusion, using GPT-3 to write captions based on image keywords, locations, people, and album names is a fantastic way to add a new dimension to your photos. With its advanced natural language processing and text completion, GPT-3 can generate captions that are descriptive, fitting, and entertaining. So, why not give it a try and see how GPT-3 can make captioning photos a painless process?


  • GPT-3
  • Image Keywords
  • Locations
  • Celebrity Detection
  • Album Names
  • Caption Generation
  • Natural Language Processing
  • Computer Vision
  • Artificial Intelligence



Post date:

Wednesday, February 15th, 2023 at 8:40:16 AM