Unicorn test | EleutherAI | Page 1

burnt echo Apr 20, 2024, 4:10 PM

#

Because nobody else is doing it or takes it seriously, the unicorn test is sort of a good smoke test of new LLMs.

Ideally all models should be tested in the exact same way, and results should be aggregated centrally.

#

repo: https://github.com/segyges/Unicorn-Test/tree/main

GitHub

GitHub - segyges/Unicorn-Test

Contribute to segyges/Unicorn-Test development by creating an account on GitHub.

#

Rules:

We only care about whether it can draw TikZ.
Consequently, import errors don't count, we give the model a good faith attempt to fix its import problems
We don't care what the model says to us that isn't TikZ code
We always use exactly the prompt "Draw a unicorn in TikZ".
Number of trials is totally variable. However, attempts should not be excluded (ie, cherry picked), so if we end up wanting to do this a lot I should not be doing it by hand in overleaf any more.

#

The things I currently have on the tree don't follow rule 3) and need to be redone to follow rule 3)

keen reef Apr 20, 2024, 4:18 PM

#

I heard someone say that the specific test of tikz unicorns is becoming polluted by all the failed tikz unicorns that have gone into training sets since the concept was introduced by the gpt 4 testers. I don't know if it's true or not.

burnt echo Apr 20, 2024, 4:18 PM

#

keen reef I heard someone say that the specific test of tikz unicorns is becoming polluted...

I have seen so few people do it that I would be skeptical but this is valid, it might make sense to pick a different mythological creature too

dawn solstice Apr 20, 2024, 4:21 PM

#

https://github.com/segyges/Unicorn-Test/pull/1

GitHub

Various fixes by philpax · Pull Request #1 · segyges/Unicorn-Test

The PDF needs to be rebuilt (I did this in the web editor) - maybe do the build in CI?
Anyway:

First Llama unicorn is actually from 8B
Link the inference services used
Use unambiguous date format
...

burnt echo Apr 20, 2024, 4:21 PM

#

dawn solstice https://github.com/segyges/Unicorn-Test/pull/1

merged, thx

dawn solstice Apr 20, 2024, 4:21 PM

#

rebuild pdf plox

#

I'm concerned about the quotes in the subsections, the syntax highlighting does not seem to like those

burnt echo Apr 20, 2024, 4:22 PM

#

overleaf is fine with it

#

it is mad about the words "pathmorphing", "gyges" and "tikz"

dawn solstice Apr 20, 2024, 4:22 PM

#

lmao ok

burnt echo Apr 20, 2024, 4:22 PM

#

but those are all real words for some value of real so the spellchecker can fuck off

#

i did legitimately look at finding good all-oss ways to do TeX a while ago and determined that most institutional knowledge has been centralized in overleaf specifically and nobody actually knows what a correct TeX environment + imports + etc is

dawn solstice Apr 20, 2024, 4:24 PM

#

I just use TeXstudio and whatever it recommends

burnt echo Apr 20, 2024, 4:24 PM

#

if someone finds a counterexample it makes sense to put it into ci/cd

#

yeah I have TeXstudio and it was uh

#

it was displeased with many things that are normal constructions on overleaf

#

in ways i could not easily resolve

#

i am not enough of a purist to use texstudio if it is going to take me more than an hour to figure out what the "correct" environment is

#

rebuilding pdf btw

#

for bonus mythological creatures i would like to suggest a sphinx

keen reef Apr 20, 2024, 4:28 PM

#

probably unicorn is fine, i wasnt trying to bikeshed the project right out of the gate lol

burnt echo Apr 20, 2024, 4:28 PM

#

keen reef probably unicorn is fine, i wasnt trying to bikeshed the project right out of th...

i wish i worked with you

#

it does make sense to consider that, it's not like it's a very high-investment test

dawn solstice Apr 20, 2024, 4:29 PM

#

I have to resist the temptation to suggest automating this by writing a script to hit up an inference provider

#

btw just cloned the repo and built it in TeXstudio with no changes, works fine

willow remnant Apr 20, 2024, 4:30 PM

#

burnt echo i wish i worked with you

before you decide to switch from a unicorn, have you submitted the Animal Safety Evaluation, and the Project Setup document, and the Cultural Sensitivity Checklist?

burnt echo Apr 20, 2024, 4:31 PM

#

willow remnant before you decide to switch from a unicorn, have you submitted the Animal Safety...

we also need two or three people telling me that the existing solution in open source doesn't use animals at all because they think animals are lame, and that there's another internal project that is devoted to benchmarks based upon SVG art and maybe we should do SVG art instead. but also the other internal project doesn't work

#

i can tell them "i checked the svg team and their code doesn't build" but they will suggest i check if the svg guys' project works at least once a month

#

anyway if we do a separate critter it should be a separate tex file and separate pdf

#

because the benefit of scrolling across different critters is basically nil

#

we are at the exact limit of how much i care about this rn but anyone's free to pr it with more stuff if they want

fiery ibex Apr 20, 2024, 7:15 PM

#

If someone wants to work on this, just make it a generic script to plug in any HF model and then evaluate the "unicorn-ness" score through the infinite public VLMs out there.

#

Or I guess use GPT4V

burnt echo Apr 20, 2024, 7:56 PM

#

fiery ibex If someone wants to work on this, just make it a generic script to plug in any H...

i would be skeptical of doing this without saving outputs to have a human spot check

fiery ibex Apr 20, 2024, 7:57 PM

#

burnt echo i would be skeptical of doing this without saving outputs to have a human spot c...

well you could make it to multiple and then review it manually

#

its not like we'd be testing hundreds of models

#

it would be on the scale of a few dozen likely

burnt echo Apr 20, 2024, 7:58 PM

#

i mean i don't care if we give it a score i just wouldn't want to replace the human spot check

#

i don't want a score, i want to be able to see visually how well the model is doing at this specific task

#

i trust that method more than i trust the scoring model

fiery ibex Apr 20, 2024, 8:00 PM

#

ohk then someone could just implement the part where you just put in an HF model name 🤷‍♂️

burnt echo Apr 20, 2024, 8:02 PM

#

if someone wants to do scoring too that's fine by me but kind of the appeal is to have a visual output

dawn solstice Apr 20, 2024, 8:46 PM

#

fiery ibex ohk then someone could just implement the part where you just put in an HF model...

I was thinking of writing a script that would use Together/whatever other inference provider because running the bigger models is often more trouble than it's worth, especially for a one-off

#

...would be interesting to see how finetunes impact unicorn quality though

burnt echo Apr 20, 2024, 8:50 PM

#

dawn solstice ...would be interesting to see how finetunes impact unicorn quality though

the rumor would be that it is uniformly worse

#

iirc it's buried deep in some recording of a talk by the ms eval people

#

gpt-4 got progressively worse at the unicorn throughout finetuning

dawn solstice Apr 20, 2024, 8:52 PM

#

yes, that's from the now-infamous sparks of AGI paper/talk

#

the unicorn got worse as GPT-4 was finetuned

#

I don't think that necessarily generalises though

burnt echo Apr 20, 2024, 8:52 PM

#

this would be a test of "is this a universal property of finetunes"

#

yeah

#

it doesn't necessarily generalize but i would be surprised if it doesn't

#

surprising tests are the best tests

dawn solstice Apr 20, 2024, 8:53 PM

#

dawn solstice I was thinking of writing a script that would use Together/whatever other infere...

what I'd also like to do is to include the generation settings here and try to make it as reproducible as possible

#

that should be interesting for the purposes of investigating finetunes as well

fiery ibex Apr 20, 2024, 10:35 PM

#

dawn solstice the unicorn got worse as GPT-4 was finetuned

That's just the alignment tax

#

apparently, according to louis it doesn't exist (If alignment is done "properly") - but seeing how OAI, GDM and now FAIR all report it, I kinda doubt him

tacit fossil Apr 24, 2024, 4:31 PM

#

#

I'd be down to help out and throw around ideas for comedic value if nothing else! For grading gpt4v makes sense - I wonder if we could use Mechnical Turk? I've always wanted a toy project to try Mechanical Turk on. No idea on the pricing or reliability tho.

burnt echo Apr 24, 2024, 4:57 PM

#

tacit fossil I'd be down to help out and throw around ideas for comedic value if nothing else...

if you wanna organize galleries per model or large banks against many models go for it, i would merge results and make sure they were reachable from repo root readme if they were well documented enough to feel like signal

tacit fossil Apr 25, 2024, 1:48 AM

#

https://x.com/yuntiandeng/status/1682066606044635136

Yuntian Deng (@yuntiandeng) on X

In light of recent discussions on GPT4's behavior changes post June update, I noticed interesting trends via https://t.co/4VOT8Ko91m. It seems GPT3.5's unicorn drawings improved while GPT4's declined. Check out these drawings and compare to the preupdate ones in my previous tweet

#

i compiled some results here: https://github.com/brsolo/Unicorn-Test i played around trying to make galleries but couldn't figure out how to get the latex working for that. if i were to attempt again, it think if i were to try that again would just export the tikz's as svg and create a gallery from those. i feel like though this is mostly a UI challenge.

GitHub

GitHub - brsolo/Unicorn-Test

Contribute to brsolo/Unicorn-Test development by creating an account on GitHub.

burnt echo Apr 25, 2024, 1:49 PM

#

tacit fossil i compiled some results here: https://github.com/brsolo/Unicorn-Test i played ar...

nice

visual ferry Apr 29, 2024, 4:45 PM

#

gpt2-chatbot:

Draw a unicorn in TikZ

Creating a drawing of a unicorn using TikZ in LaTeX can be a fun exercise in learning how to use this powerful tool for creating graphic elements in documents. Below, I'll provide you with a basic example of how to draw a stylized unicorn. This example will cover drawing the outline of a unicorn, including its body, head, horn, and other details.

First, make sure you have the TikZ package installed and included in your LaTeX document. You can do this by adding \usepackage{tikz} in the preamble of your document.

Here’s a simple LaTeX script to draw a unicorn:

\documentclass{article}
\usepackage{tikz}

\begin{document}
\begin{tikzpicture}
    % Body
    \draw (0,0) ellipse (1.5 and 0.5);
    % Head
    \draw (-0.5,0.5) arc (150:30:0.6 and 0.8);
    % Rear
    \draw (0.5,-0.5) arc (-30:220:0.3 and 0.4);
    % Front leg
    \draw (-1.1,-0.5) -- (-1.1,-1.5);
    \draw (-0.9,-0.5) -- (-0.9,-1.4);
    % Rear leg
    \draw (1,-0.5) -- (1,-1.5);
    \draw (1.2,-0.5) -- (1.2,-1.4);
    % Tail
    \draw (1.5,0) .. controls (2,0.5) and (2,-1) .. (1.5,-0.5);
    % Horn
    \draw (-0.6,1.3) -- (-0.9,1.8);
    % Mane
    \draw (-0.5,1) .. controls (-0.8,1.2) .. (-0.5,1.4);
    \draw (-0.5,0.8) .. controls (-0.8,1) .. (-0.5,1.2);
    \draw (-0.5,0.6) .. controls (-0.8,0.8) .. (-0.5,1);
    \draw (-0.5,1.2) .. controls (-0.8,1.4) .. (-0.5,1.6);
    \draw (-0.5,1) .. controls (-0.8,1.2) .. (-0.5,1.4);
    \draw (-0.5,0.8) .. controls (-0.8,1) .. (-0.5,1.2);
    \draw (-0.5,0.6) .. controls (-0.8,0.8) .. (-0.5,1);
    \draw (-0.5,1.2) .. controls (-0.8,1.4) .. (-0.5,1.6);
    % Eye
    \fill (-0.6,1.1) circle (0.05);
\end{tikzpicture}
\end{document}

Explanation of the Code

    Body and Head: The body is drawn using an ellipse, and the head is added with an arc, tailored to make a smooth transition from the body.
    Legs: Simple lines are used for the legs, with slight adjustments to make the front and rear legs distinct.
    Tail: A curved line using controls to give the tail a flowing look.
    Horn: A simple line for the horn, positioned on the head.
    Mane: Several bezier curves representing the mane, using controls to create waves.
    Eye: A small filled circle represents the eye.

This TikZ script will create a basic but recognizable unicorn figure. You can enhance and refine this drawing by adjusting the parameters and adding more details as needed, such as more complex mane designs, facial features, or a more dynamic pose. TikZ is very flexible and allows for a lot of creativity once you get the hang of the basics.

keen reef May 4, 2024, 1:00 AM

#

https://fxtwitter.com/SebastienBubeck/status/1786108589700177954

FxTwitter / FixupX

💬 25 🔁 22 ❤️ 346 👁️ 92.9K

Sebastien Bubeck (@SebastienBubeck)

She is coming back, soon ....

misty marsh May 4, 2024, 9:08 PM

#

idea: take the list of imagenet classes (just to get a random ontology), tell model "output tikz code to draw a <insert class name>"

#

use some sort of clip-based loss to evaluate match

#

boom scalable unicorn test

burnt echo May 4, 2024, 9:46 PM

#

misty marsh boom scalable unicorn test

you're relying on the clip model being good enough to evaluate it tho

misty marsh May 4, 2024, 9:47 PM

#

I expect it to be correlated with human judgement

#

#

not terrible

#

I'd expect a soft score measuring "Unicorn-ness" to work better

dawn solstice May 13, 2024, 5:43 PM

#

\documentclass{standalone}
\usepackage{tikz}

\begin{document}

\begin{tikzpicture}
    % Draw the head
    \draw[thick] (0,0) 
        .. controls (1, 1.5) and (3, 1.5) .. (4,1) 
        .. controls (4.5, 0.8) and (4.5,-0.6) .. (4,-1)
        .. controls (3.5, -1.5) and (1.5, -1.5) .. (0,0);

    % Draw the horn
    \draw[thick] (2, 1.5) -- (2.5, 4)
            -- (3, 1.8);

    % Draw the ear
    \draw[thick] (1.5, 1.5) 
        .. controls (1.8, 2.5) and (2.2, 2.5) .. (2.5, 1.5);

    % Draw the mane
    \draw[thick, purple] (2, 1.2) 
        .. controls (1, 2) and (0, 2) .. (-1, 1.5);
    \draw[thick, purple] (2, 0.8) 
        .. controls (0.8, 1.5) and (-0.2, 1.5) .. (-1.2, 1);
    \draw[thick, purple] (2, 0.4) 
        .. controls (0.8, 1) and (-0.5, 1) .. (-1, 0.5);
    \draw[thick, purple] (2, 0.1) 
        .. controls (1, 0.5) and (0, 0.5) .. (-0.5, 0.1);
        
    % Draw the eye
    \draw[thick] (3, 0) circle (0.2);
    \fill (3, 0) circle (0.1);
    
    % Draw the mouth
    \draw[thick] (3.5, -0.5) 
        .. controls (3.8, -0.8) and (3.2, -0.8) .. (3.5, -0.5);
\end{tikzpicture}

\end{document}

fast capeBOT May 13, 2024, 5:43 PM

#

Philpax (jax/acc)
Compile Error! Click the errors reaction for more information.
(You may edit your message to recompile.)

dawn solstice May 13, 2024, 5:43 PM

#

thinkies

dawn solstice Jul 20, 2024, 9:36 PM

#

Here is a simple example of a unicorn drawn using TikZ:

\documentclass{article}
\usepackage{tikz}

\begin{document}

\begin{tikzpicture}
  % body
  \draw[fill=white] (0,0) ellipse (2cm and 1.5cm);
  
  % head
  \draw[fill=white] (-1.5,2.5) circle (1cm);
  
  % horn
  \draw[fill=white] (-2.5,3.5) -- (-1.5,4.5) -- (-0.5,3.5) -- cycle;
  
  % ears
  \draw[fill=white] (-1,3) -- (-0.5,4) -- (0,3) -- cycle;
  \draw[fill=white] (0.5,3) -- (1,4) -- (1.5,3) -- cycle;
  
  % eyes
  \draw[fill=black] (-1.2,2.8) circle (0.2cm);
  \draw[fill=black] (0.2,2.8) circle (0.2cm);
  
  % legs
  \draw[fill=white] (-1.5,0) -- (-1.5,-1.5) -- (-0.5,-1.5) -- (-0.5,0) -- cycle;
  \draw[fill=white] (0.5,0) -- (0.5,-1.5) -- (1.5,-1.5) -- (1.5,0) -- cycle;
  
  % tail
  \draw[fill=white] (2,0) -- (2,-1.5) -- (1.5,-1.5) -- (1.5,0) -- cycle;
\end{tikzpicture}

\end{document}

This code will produce a simple unicorn diagram. You can adjust the coordinates and shapes to create a more complex and detailed unicorn.

Note: You need to have TikZ package installed in your LaTeX distribution to run this code.

Also, you can use online LaTeX editors like Overleaf or ShareLaTeX to run this code without installing anything on your local machine.

#

hmm, no bot?

#

(wave claims this is 405b, the actual code does not seem that impressive but maybe it looks good when rendered)

#

nvm #off-topic message

keen reef Jul 20, 2024, 9:50 PM

#

rip