Truth-First Alignment (Beginner Friendly project) | EleutherAI | Page 1

uncut raven Mar 8, 2025, 5:04 AM

#

@tropic thistle I'd love to collaborate with you. I'm also new to the space as well and trying to get more knowledge

Thoughts

I think some interesting vlaues

First Pass Ideas

Perhaps having different AIs that have different values and having them operate in an adversairal why. So for example, we could have one AI where the valies we want to optimize for are clarity or truth, etc. You can then hyper optimize one for curiosity. I'd expect it would hallucinate more, but the other LLM could server as a fact checker
What if we build some sort of absract syntax language that an LLM can use to describe it's "true" statement. Then we can traverse that tree and check each truth, turning the problems of truth into a somewhat recursive problem. This to me feels more deterministic, but I'm not sure how to structure this idea further

Questions

How do we measure alignment? Alignment feels very subjective and one person could view an AI as aligning compared to someone else
How do we work with subjective truths? Are we thinking about truths such as: Grass is green, or are we also considering truths that are subjective and differ person to person
I would love to further explore some of the tests and metrics we use to work with AI
The data we extract truth from can contain bias. I wonder if we can in some way with the syntax tree method i described above, have a way for it to detect contradictions

#

Let me know what you think

#

These are kinda my first pass thoughts/quesitons

#

I'm super new to the space, have a lot of systems/engineering experience but am exploring ML atm

#

So would love to collab in any way

uncut raven Mar 10, 2025, 9:30 PM

#

@tropic thistle

feral dirge Apr 3, 2025, 12:23 PM

#

@uncut raven and @tropic thistle , I would be keen on exploring this type of research. I've been playing around with building social dilemmas using GDM's Condordia framework, similar work to what you describe shadow. Let me know if you're still interested.

uncut raven Apr 7, 2025, 5:57 AM

#

feral dirge <@379817712890806273> and <@835145379245981716> , I would be keen on exploring t...

Hey, I just saw this, but I'm certainly still interested

#

Would be happy to chat/collab

tropic thistle Apr 7, 2025, 7:42 AM

#

Glad to have some people that found my idea is interesting. This is really late to reply, really super sorry for that@uncut raven and @feral dirge .

#Truth-First Alignment (Beginner Friendly project)