#Truth-First Alignment (Beginner Friendly project)

1 messages · Page 1 of 1 (latest)

uncut raven
#

@tropic thistle I'd love to collaborate with you. I'm also new to the space as well and trying to get more knowledge

Thoughts

  • I think some interesting vlaues

First Pass Ideas

  • Perhaps having different AIs that have different values and having them operate in an adversairal why. So for example, we could have one AI where the valies we want to optimize for are clarity or truth, etc. You can then hyper optimize one for curiosity. I'd expect it would hallucinate more, but the other LLM could server as a fact checker
  • What if we build some sort of absract syntax language that an LLM can use to describe it's "true" statement. Then we can traverse that tree and check each truth, turning the problems of truth into a somewhat recursive problem. This to me feels more deterministic, but I'm not sure how to structure this idea further

Questions

  • How do we measure alignment? Alignment feels very subjective and one person could view an AI as aligning compared to someone else
  • How do we work with subjective truths? Are we thinking about truths such as: Grass is green, or are we also considering truths that are subjective and differ person to person
  • I would love to further explore some of the tests and metrics we use to work with AI
  • The data we extract truth from can contain bias. I wonder if we can in some way with the syntax tree method i described above, have a way for it to detect contradictions
#

Let me know what you think

#

These are kinda my first pass thoughts/quesitons

#

I'm super new to the space, have a lot of systems/engineering experience but am exploring ML atm

#

So would love to collab in any way

uncut raven
#

@tropic thistle

feral dirge
#

@uncut raven and @tropic thistle , I would be keen on exploring this type of research. I've been playing around with building social dilemmas using GDM's Condordia framework, similar work to what you describe shadow. Let me know if you're still interested.

uncut raven
#

Would be happy to chat/collab

tropic thistle
#

Glad to have some people that found my idea is interesting. This is really late to reply, really super sorry for that@uncut raven and @feral dirge .