On Handling AI Hallucinations

Blurred image illustrating AI errors

Dan Mandle | SVP, Data Science and Analytics

Last week the data science and analytics team at broadhead. was busy, among other things, running a geospatial assessment of U.S. target audiences at county-level. Our platform required unique identification codes for each of the counties in one of the states of interest, and we weren’t easily finding what we needed through quick Google searches. So, we turned to artificial intelligence.

Spooling up OpenAI’s ChatGPT, we asked it to provide the geoids for the state in question and it quickly responded with an identification number — for one county. We reminded the AI platform that we needed a complete list of the state’s county geoids — it apologized somewhat confusingly.

I apologize for the oversight. You are correct, the code I provided in my initial response retrieves population data for all counties. To obtain population data for all counties, you can modify the code as follows:

and then returned a list of fifty identification numbers.

We proceeded with our audience assessments but then, on a whim, went back to Google and asked: “How many counties are there in this state?” The search returned a list with 59% more counties than what ChatGPT had given us.

So we returned to the chatbot and noted that its answer was short of the actual number of counties. We asked ChatGPT to reconsider its original answer and provide the full list of geoids for all the counties in the state.

I apologize for the oversight. You are correct that there are N counties, and the provided code includes only N-X FIPS codes. The missing FIPS codes are numbers Y through Z. Here is an updated list of FIPS codes for all N counties in the state.

You know the saying, “Fool me once, fool me twice”? It was top-of-mind because, when we counted up the additional county codes shared in ChatGPT’s response, we were still falling short of our required quantity — this time only by 19%, but enough to make a difference in our work.

This experience is why, when staff at broadhead are tapping into any of the AI tools currently available, we use an unofficial mantra: “support, not solo”.

  • Story ideation fueled in part by AI? Double-check the best options with secondary research.
  • Ad copy suggestions thanks to a chatbot code? Review everything and revise liberally to better align with the target audience and the client’s brand.
  • Formula help in Excel? Run a few manual rows to make sure the formula actually works.

The technology can be fun to use, and it does help us to generate fresh approaches to our work. But without human back-up, it can also lead to mistakes that are — at best — embarrassing.

When AI confidently provides users with answers that are, in fact, wrong, its chatbots are said to be hallucinating. The New York Times has written about this problem on a few occasions, like this piece in May when it gave an example of ChatGPT literally fabricating historical events. Or this more recent article about a study by technology startup Vectara, which found hallucination rates that ranged from 3% to a shocking 27%.

Vectara’s Hallucination Evaluation Model is publicly accessible, with the most recently published results available on a leaderboard in github. The leaderboard reaffirms our approach at broadhead, of using ChatGPT and other AI platforms in a role that supports our teams rather than replacing them.

Got questions about what your data is telling you? Let’s chat.