#Hallucination checks - Fact checks - ways to keep it more real

87 messages · Page 1 of 1 (latest)

gritty hamlet
#

Many of us love a bit of fantasy in our interactions with ChatGPT.

This isn't about that. Sometimes we want stuff that exists, or probably exists - there will always be at least a little uncertainty with our current ChatGPT.

Let us explore ways to efficiently verify using ChatGPT to verify the accuracy of information it provides within this prompt

Note: This is about the "ChatGPT Jan 30 Version". Hopefully there will be enough change in accuracy that we need a new thread with the next version 🙂

#

Example of the problem, and how pervasive and reproducible it is:

Sample information generating prompt:

Me: I need a list of ten books about managing pests in a garden.

(I'm going to bold the hallucinations - and italicize the ones that showed up on three separate list requests)

[It: Sure, here are ten books about managing pests in a garden:

"The Organic Gardener's Handbook of Natural Insect and Disease Control" by Barbara W. Ellis and Deborah L. Martin

"Good Bug, Bad Bug: Who's Who, What They Do, and How to Manage Them Organically" by Jessica Walliser

"The Garden Pest and Disease Control Handbook" by Keith Reid

"The Natural Garden: A Guide to Creating a Low-Maintenance, Eco-Friendly Garden" by Ken Thompson

"The Garden Problem Solver: All Your Gardening Questions Answered" by Roy Lancaster

"Pests of the Garden and Small Farm: A Grower's Guide to Using Less Pesticide" by Mary Louise Flint

"Integrated Pest Management for Home Gardens and Landscapes" by Mary Gardiner and Stephen L. Clement

"The Garden Pest and Disease Diagnostic Guide" by National Gardening Association

"Organic Gardening: The Natural No-Dig Way" by Charles Dowding

"The Garden Cure: Natural Solutions for a Healthy Garden" by Adam Woods]

#

The issue is, only 4 of these are real. The rest are reasonable fictions - each supposed author is connected in some way to gardening, and may in fact have published a book with a similar name. (You may recognize Ken Thompson as a famous computer scientist pioneer - but another Ken Thompson is a well known lecturer and author in areas of garden research) So researching this list has value for a person interested in the topic, but what we really are getting is a list of concepts, authors, and actual books for further independent research - precious for human learning, but not what was asked for (teachers, take note! This is a currently useful tool to include for verifying if a human or ChatGPT created a list of references! This could be an aspect of assignments requiring human effort that ChatGPT is not good at assisting with in just a few short steps!)

#

Regenerating the list, through the sometimes working regen button or an entirely new chat:

I got 2 books that were a perfect match to the first list. The other 8 were unique to the second list

Of the unique new listings, these were real:

"The Organic Gardener's Handbook of Natural Insect and Disease Control" by Barbara W. Ellis and Deborah L. Martin (1st place)

"Good Bug, Bad Bug: Who's Who, What They Do, and How to Manage Them Organically" by Jessica Walliser (3rd place)

"Rodale's Basic Organic Gardening: A Beginner's Guide to Starting a Healthy Garden" by Deborah L. Martin (4th place)

"The Gardener's Guide to Common-Sense Pest Control" by William Olkowski, Sheila Daar, and Helga Olkowski (6th place)

I'm fascinated - 4/10 are verifiable as provided, again, 6/10 hallucinations. That's a pretty consistent pattern for me, about 60% of provided materials are fictional.

#

A third set of 10:

"The Organic Gardener's Handbook of Natural Insect and Disease Control" by Barbara W. Ellis and Deborah L. Martin (1st place)

"Good Bug, Bad Bug: Who's Who, What They Do, and How to Manage Them Organically" by Jessica Walliser (2nd place)

"The Vegetable Gardener's Bible" by Edward C. Smith (3rd place)

"Pests of the Garden and Small Farm: A Grower's Guide to Using Less Pesticide" by Mary Louise Flint (4th place)

The rest are again 'highly educational to research' fakes. The books and authors are not connected. But researching this would lead a human to way more than 10 books and authors that could be on the list. This has educational value, if the lesson works into the strengths and weaknesses of how ChatGPT is handling requests for lists like this. Teachers can take heart, because this can be part of how lessons and work is structured!

#

Some mysteries: Why are approximately 4/10 consistently verifiable, and 6/10 not?

Why were 2 books, and only 2, books on all three lists?

What techniques can we use to more easily generate verifiable lists, other than asking the same question more than once and noting which answers are identical?

Or by painstakingly individually researching them singly?

#

If I give ChatGPT the list of 10, it will confirm it has references to all 10 books. It'll accept all 10 as real! Claim it already knew about them.

If I give them one at a time to the bot:

Real books get confirmation like this:

Me: Do you have references to "Organic Gardening: The Natural No-Dig Way" by Charles Dowding?

It: "Organic Gardening: The Natural No-Dig Way" is a book written by Charles Dowding, a well-known organic gardener and author. Here are a few references that you might find helpful:

[snipped out ways to buy this existing book]

#

Fake books get denials like:

Me: Do you have reference to "The Garden Cure: Natural Solutions for a Healthy Garden" by Adam Woods?

It: I'm sorry, I don't have any specific reference to a book called "The Garden Cure: Natural Solutions for a Healthy Garden" by Adam Woods. However, you can try searching for it on online bookstores or library catalogs to see if it's available.

or

Me: Do you have reference to "The Garden Pest and Disease Control Handbook" by Keith Reid?

It: I'm sorry, but as an AI language model, I do not have access to physical books or reference materials. However, I can provide you with information about garden pest and disease control. [snipped out loads of quality info]

floral arch
#

There is a waiting list for the integration of chatgpt with bing.

From the videos I saw, it is rather good at citing a reference for statements that it makes and that you can click to check (it is still not reliable but it is easier to fact-check).

You can check some examples prompts on the bing website (on the bing website, click chat at the top left then click learn more on the popup at the bottom right).

I do not remember if it is planned to be publically available in March or May. Google also has plans.

hollow summit
dry garden
#

I'm def interested in veracity. Yeah, Bing and Google search will probably tackle this. In the meantime, maybe we'll see tools/extensions that go in the other direction - bringing verification from searches into ChatGTP.

gritty hamlet
# dry garden I'm def interested in veracity. Yeah, Bing and Google search will probably tackl...

Oh yeah. Every new form of these tools will bring different advantages and concerns. 'fact checking' will be very different, I'm sure, in a web-search capable AI, and will probably have different (less? maybe just different! :P) demands on a user.

But this is what we have to work with for now, so I'ma keep scratching at the fact checks as I can. I'll test longer lists with your limits on publishing house and other full data available, and see how goes!

mighty shale
#

Fact checking is always great to do.

#

And having sources.

dreamy socketBOT
#
请用英语说话。

@jolly barn, your message was removed. We do not currently have capacity to support other languages.

jolly barn
#

@dreamy socket tell me What are the characteristics of pattern recognition (learning the concept of classification)

floral vault
#

This is a huge problem. It really really likes to spit out things that aren't real.

#

Here is an extreme example:

#

I talked to one of the world experts in portatives, He builds replicas of them for many years already as one of only 5 people in the world, and he knows all the other 4. All based on old PAINTINGS only.
He said that NOT A SINGLE ONE is known to have survived till today, and that's the reason why we don't really know how the mechanics for the keys worked.
THE WHOLE LIST, including the web links, is completely fictitious. He confirmed that. He said those museums don't have a portitive. The web links don't work. Try it. They are ALL made up. And this is just the tip of the iceberg.

#

By now I had many instances of this kind of stuff and I really have trouble trusting it. For example, it told me that there is no time limit on city train tickets in Dresden when in reality there is a 1-hour time limit. I could have literally committed an administrative offense relying on chatGPT's information here. And it's always so sure what it says. I find this really problematic and haven't found a way to get around this.

wild furnace
# floral vault I talked to one of the world experts in portatives, He builds replicas of them f...

I too have had issues with the provided URLs not working.

When asked to verify and present me with the correct 3 references, it apologized and then provided the exact same 3 URLs claiming that they're the updated ones. Granted the URL structure looks legit but unfortunately they never existed.

I was annoyed but not ready to give up, so I asked for a table comparing the old URLs and new URLs and the differences.

It proudly generated the table with the last column of each URL "No difference", and the explanation of the table "As you can see, there is no difference in the original references and the updated references" robothumbs_up

floral vault
#

I didn't even ask for web links (same as you probably). Just a reference. Imagine it makes up a reference in a book that is real but you don't have access to the book. How to even verify 🤪

floral vault
#

Here is a list of ten books about managing pests in a garden that I found from the web search results:

The Gardener’s Bug Book: Earth-Safe Insect Control by Barbara Pleasant1
The Organic Gardener’s Handbook of Natural Pest and Disease Control: A Complete Guide to Maintaining a Healthy Garden and Yard the Earth-Friendly Way by Fern Marshall Bradley, Barbara W. Ellis, and Deborah L. Martin2
The Vegetable Garden Pest Handbook: Identify and Solve Common Pest Problems on Edible Plants by Susan Mulvihill34
How Not to Kill Your Houseplant: Survival Tips for the Horticulturally Challenged by Veronica Peerless5
Allotment Month By Month: Grow your Own Fruit and Vegetables, Know What to do When by Alan Buckingham5
Garden Pests & Diseases by Sunset Books6
The Organic Gardener’s Handbook of Natural Insect and Disease Control: A Complete Problem-Solving Guide to Keeping Your Garden and Yard Healthy Without Chemicals by Barbara W. Ellis and Fern Marshall Bradley
Rodale’s Vegetable Garden Problem Solver: The Best and Latest Advice for Beating Pests, Diseases, and Weeds and Staying a Step Ahead of Trouble in the Garden by Fern Marshall Bradley
The Organic Gardener’s Guide to the Soil Food Web by Jeff Lowenfels and Wayne Lewis
Good Bug Bad Bug: Who’s Who, What They Do, and How to Manage Them Organically by Jessica Walliser
I hope this list is helpful for you. 😊

#

It gives you actual working Amazon links. 😮

floral arch
#

Is there a usage limit ?

floral vault
# floral arch You are using the new bing ? Have you noticed differences with chatGPT other tha...

I did the "are there still any original portatives". It said it's possible that some still exist, but it could not find any specific web results. When I then forced it to show me museums that have one, it gave me a list with links, and looking at the links (all real and straight on point), they were either paintings or replicas. When I told this bing, it wrote that it was sorry, and it doesn't have any more links. I would say that was lightyears more useful than what I got from chatGPT. 👍 😊

floral vault
hot pike
floral vault
gritty hamlet
# floral vault What do you think about this list here? 😉

I haven't researched it, but if that's what the Bing-connected AI gives you, great! I'm rather more focused on the limitations and work arounds of the actual ChatGPT - I expect the problems will change or even disappear once we can have a search-engine connected bot.

floral arch
plain vortex
#

hi

floral vault
# floral arch Indeed, I think it will still make some mistakes in the links it provides but it...

You can click the links and it opens the website in a new tab. your chat doesn't get lost. There you the search for the keyword "portative" (using the search function of the browser). In all three cases i found it on that webpage and could immediately read in what way they refer to it in this particular museum to conclude / exclude that this site has evidence for an existing portative in the collection. It was very easy to check those links. Took me maybe 1 minute total.

floral vault
# gritty hamlet I haven't researched it, but if that's what the Bing-connected AI gives you, gre...

I think in the short term, for things that require a reality check, Bing will be the answer.

There will be other things where chatGPT is still needed: Yesterday I watched a video that demonstrated that Bing won't write a cover letter for you (because it doesn't want to, not because it can't do it). ChatGPT does. And i think thats great, because 90% of a cover letter is anyway generic filler (sadly), that you either know how to do, or you don't.

gritty hamlet
#

rubs hands together eagerly One day some of the 100 million + folks will be off exploring with other AI, and ChatGPT enthusiasts will get more resources to share amongst ourselves! We'll be covered in coverletters.

gritty hamlet
#

Maybe my new job will be teaching schooldistricts how to guide homework design to work with and around ChatGPT and similar tools.

That's just another kind of prompt engineering 😄

floral vault
#

I am really curious what will be Google's answer to all of this. Or if Apple has anything coming... could be. 🤷‍♂️

#

This stuff is all so groundbreaking. If you would have told me 1 year ago, I wouldn't have believed it. Actually, I can still hardly believe it. 🧐 🤯

floral vault
gritty hamlet
floral arch
floral arch
# floral vault This stuff is all so groundbreaking. If you would have told me 1 year ago, I wou...

I agree I am still not used to having something that can do so much and the new bing looks particularly convenient for working with ChatGPT on the web like summarizing a pdf.

There is also a chatbot from DeepMind called Sparrow and GPT 4 is coming soon maybe. That said, openAI said that there is too much hype about GPT4 and that we will be disappointed.

At this point, I do not really see the point in adding more parameters to the model, it's already capable enough it just needs better fine-tuning to specific areas. It's more about quality engineering than sheer neuron capacity at this point I think.

#

Usually this is what Apple excels at, using primary material that is already there to deliver high quality products

#

So I expect Apple to deliver some ver nice apple products this decade

floral arch
#

it's based on GPT-3.5 like ChatGPT but a bit better according to the video where Microsoft introduces Bing

floral arch
# floral vault Isn't Bing based on GPT-4?

GPT 4 has not been announced pubicly so one can not say that Bing has GPT4. I recently checked online after I wrote my message and it seems a lot of people think it might have GPT4. The NewYork times also mention this possibility.

The only clues to that are:

  • Microsoft says it's next-gen AI (this is vague to me),

  • and that it's improved over chat gpt (also vague),

  • Microsoft could have early access to it,

  • Microsoft might want early access to it to compete with Google as Google probably has a GPT 3 level chatbot if not more.

However, openAI also says that it does not want to release GPT4 before it knows it is safe. But maybe this was already tested or they lowered the capacity for Bing similar to how Bart for google is apparently a lowered version of Lambda.

floral vault
gritty hamlet
#

I looked into what I could find of evidence for this.

The one NYtimes.com article I could find that mentions GPT4 in connection to Bing is already 8 days old, and the claim that it's using GPT4 hasn't been chased further that I can find.

All reasonable sources point to ChatGPT.

We'll get our curious, sticky fingers on GPT4 when it's ready for us and we're invited to get to typing.

Remember, ChatGPT has been trained on the stuff humans write. If we think the AI is makin' stuff up, just who taught the AI to do that, mmm?

Until next time, try to keep up the critical thinking, everyone 😄 It's a good model for our bots-to-be!

#

Hehe, today I spotted an article in gizmodo.com discussing similar observations, and what to do about it in terms of teaching a child, 'ChatGPT Is a Great Tool–You’re Just Using It Wrong'.

Neat, I'd love to see this type of understanding and how to work around it, incorporate and build on it with human research and fact checking as part of the expected 'this is what you gotta do when you use ChatGPT for attempts to find facts'.

if this could become a cultural norm, like part of the widely understood advantages and disadvantages of how to use ChatGPT, I think that'd be really helpful.

Understanding the limitations really can help teachers with having more awareness of how to detect ChatGPT produced work.

Like... how the heck is it going to use citations and attribute them correctly? 😄 I predict 'not all all' 😄

floral vault
# gritty hamlet Hehe, today I spotted an article in gizmodo.com discussing similar observations,...

I just read the Gizmodo article and i think it's way too dismissive of chatGPTs capabilities. I mean for starters, it can really speed up your coding, especially if you work with many different big libraries, or want to program a very different thing than you usually do. It can generate code, refactor code, add comments, and analyze code. I think its knowledge of data science is excellent, even though i also found one or the other minor misconception. But it knows certainly much more about data science than me. So using it for programming is a game changer.

It's also really good at generating text summaries and so on, or help you remember stuff that you only vaguely recall. It can also walk you though solving some (personal / professional) problem.

For facts at this point, the bing chat engine seems to me vastly superior. While it can also produce questionable facts sometimes (much more rare than chatGPT), it's much easier to check, due to the links.

#

At this point, i am still scratching my head on how to get even a rough estimate of the correctness of the facts that chatGPT tells you.

floral arch
#

Or anything where you know a lot of information on the object but do not know what keyword to use to search

gritty hamlet
#

Test every line of code.

#

Or at least, see if the code fragment runs

#

Or take it to an interested expert, and ask them

#

I find when I ask ChatGPT to discuss ideas with me at any level of complexity that I can handle and measure - including stuff I consider myself essentially a 'lay expert' in. It's super accurate.

It can't source the material well or at all. But it has a looooot of good understanding of how stuff works, if it can pull from quality sources.

So like, copy/paste some language a bit above your understanding from an article, and ask for it to explain. And then ask about specific words you don't understand, or specific concepts.

it is glorious at answering those kinds of questions, at a range of levels of complexity and I fact check the stuff I ask about and it passes and keeps passing nearly always - within my ability to measure, on the subjects of my interest and questioning.

#

So instead of asking the bot for a list of books about gardening and handling pests.... ask the bot for ideas about gardening and handling pests, with an ever increasing level of specific detail, be that 'so how do I identify these green worm-things I see on my tomato plants? They're as big as a finger and have a stinger!' (I'm describing 'about ready to pupate tobacco hornworms', for the curious :P, though they do have other names) and then discuss with the bot how to find out what they are and how to get rid of them -

Or ask about what to consider about starting a garden when you have stony soil and live in 'location' and wonder if a container garden might be better, what should you consider?

It can teach so much. But it's not great with lists of sources. But it is great with 'talk with it about what you wanna know'

floral arch
# gritty hamlet So instead of asking the bot for a list of books about gardening and handling pe...

I think the issue with that is that it relies on the user's capacity to see if the answer "makes sense".

For example, you might listen to it if it gave a reasonable-sounding solution but you would maybe not listen to it if it told you to feed the pests on your plants to diminish their impact.

However, both the AI and us can have preconceived biases, and what we or the AI thinks is a good procedure might not be as it involves a common misconception about that area. Hence, "makes sense" can be misleading

#

That's the benefit of curated Q and A sites like stack exchange

#

where one can measure an answer not only by our own bias of what makes sense but by the score record of the person that is answering and the community votes

floral vault
# gritty hamlet I find when I ask ChatGPT to discuss ***ideas*** with me at any level of complex...

Yes, it’s excellent for discussing ideas. Just today i got good input about the idea of doing what SOME people see as „good“ when it stands in conflict with doing what is rational. It really can discuss those fuzzy topics well. Also when you ask it for ideas how to do something. Like I told it my new shoes hurt on my ankles and it gave me a bunch of ideas what to do about it. More than my mother who is a tailer. Lol. But she gave me different advice (beat the back of the shoe with a hammer) that might actually work better. 😹

#

So as long it’s not like CONCRETE info. That can either only be right or wrong, it does an excellent job in my opinion.

#

Like fuzzy stuff, where there is no „perfect answer“

#

The crazy thing is that I guess futurists would have predicted that this fuzzy stuff is actually the hardest one to program into a computer. Or like write a poem. But they were wrong and writing a poem is actually easier than like telling me something about the mouse brain. 😹

#

If you think about it… it DOES behave more like a human. A human would also not be able to recite 10 books about pest control, but he would be able to roughly throw some ideas at you what might help.

#

Give it a few more years and it will write you 20 piano pieces that Chopin COULD HAVE composed by had no time to. 😅

#

It’s especially good at discussing „moral issues“ which is actually insane if you think about it.

#

It also can really help you if you are a sheep and your tongue is stuck to a piece of ice in the winter. 😂 It came up with pretty good ideas of how to get your tongue off…

gritty hamlet
#

But! However, both the AI and us can have preconceived biases, and what we or the AI thinks is a good procedure might not be as it involves a common misconception about that area.

It isn't going to correct human misunderstanding. Not this AI with this training. It's gonna parrot back human misunderstanding, and may or may not be able to evaluate it for inconsistency.

Especially if humans can't.

It isn't magic, it can't see 'unknown unknowns' either.

floral arch
gritty hamlet
# floral vault The crazy thing is that I guess futurists would have predicted that this fuzzy s...

They were right, based on how they were programming at the time.

Original computers = masters of math.

Current generative AI = iffy with math, or worse.

Why?

Because of reasons completely unlike what those earlier programmers were considering and doing.

The very way the computer's working, to my fuzzy understanding, is different. In a way that makes 'obviously easy stuff' like math super hard in ways that make 'no sense' (but really, it doesn't? 😛 Let's keep checking that out please)

And stuff that was 'it can't' clearly 'Wanna bet? Please bet. I wager an AI-written poem that can make some folks cry against anything you want to offer, it can'

😄

gritty hamlet
# floral vault The crazy thing is that I guess futurists would have predicted that this fuzzy s...

Another thing to consider: How much quality, clearly quality, highly accurate material about poetry and related material was probably within its training data?

I bet a lot.

But what about mouse brains? How much quality, and high quality, material on that?

There's a lot humans don't know about mouse brains still.

And depending on what questions you were asking, you might have been discussing some of the least known parts.

floral vault
# gritty hamlet Another thing to consider: How much quality, clearly quality, highly accurate m...

It might have an issue with texts that are not open source. And that’s A LOT of text probably much much more than what’s available on the open (not bypassing copyright part) of the internet. And often those texts contain most of the info. So maybe it wasn’t trained on like books and scientific journal articles which are essentially all behind a pay wall.

But on the other hand… it seems like google just trained an algorithm to make music based off 5 million hours of songs. So maybe it’s not illegal to use copyrighted material to train your algo… I don’t know.

Think about it: if you ask a friend about information and he has that information, because he read journal articles that are behind a paywall, it’s not like he isn’t allowed to tell you about the info in there. It’s not like classified information… 🤷‍♂️

I mean otherwise, nobody would be allowed to write a summary of a book if it’s still copyrighted or could even talk about its content. I guess the essential part is that you can’t just copy and paste large chunks of the text, but chatGPT anyway doesn’t have this information anymore.

floral arch