#Empty/Newline output from the model

14 messages · Page 1 of 1 (latest)

dense mortar
#

I am running inference on unsloth/ models (codellama-7b and mistral-7b) for code completion and quite reguarly I get no output from the model, or output that just fills the max_new_token limit with newlines.

I am wondering if anyone here has any pointers on what might be the cause of that or how to investigate it better.

hidden flare
#

Hey! Oh this is not on a finetuned model right? On Codellama hmm - maybe its the prompt format

#

Oh wait Codellama doesnt have a prompt format my bad

#

Could it be possible for you to give an example? 🙂 Maybe a screenshot?

lethal thorn
#

@dense mortar has the issue being solved yet? 🙂

dense mortar
#

@lethal thorn Hello! Sorry, was trying to reproduce the results and got a different thing now)

So here is an example prompts:

#include <iostream>
#include <string>
#include <vector>
#include <cstring>
#include <climits>
#include <algorithm>
#include <map>
using namespace std;

int main() {
int a, b;
while (cin >> a >> b) {
int c = a + b;
int res = 0;
if (c == 0) {
cout << 1 << endl;
continue;
}
while (c) {
c /= 10;
res ++;
}

And here is an answer (new tokens):

uminate

#

Or for example:

int main(void)
{
std::cin.tie(0);
std::ios::sync_with_stdio(false);

double a, b, c, d, e, f, x, y;
while (std::cin >> a >> b >> c >> d >> e >> f) {
x = (b * f - e * c) / (b * d - a * e);
y = (c - a * x) / b;
if (x == 0) x = 0;
if (y == 0) y = 0;
std::cout << std::fixed << std::setprecision(

Output:

uminate::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::(discord character limit a this point but goes on like this)

#

And even when it kinda finishes the thing correctly I still get the "uminate" thing:

#include<iostream>

using namespace std;

int main()
{
int h[10];
int i,j;
int temp=0;

for(i=0; i<10; i++)
{
    cin>>h[i];
}

for(i=0; i<9; i++)
{
   for(j=i+1; j<10; j++)
   {
         if(h[j]>h[i])
         {
              temp=h[j];
              h[j]=h[i];
              h[i]=temp;    
          }
   } 
    }

    for(i=0; i<3; i++)
#

uminate
cout<<h[i]<<" ";
}

cout<<endl;

for(i=0; i<10; i++)
{
    cout<<h[i]<<" ";
}

cout<<endl;

for(i=0; i<10; i++)
{
    cout<<h[i]<<" ";
}

<this block of code repeats until generation limit>
#

As a prompt I am just supplying a string with code, nothing else

hidden flare
#

Do you know if the original HF example works?

#
import socket\n\ndef ping_exponential_backoff(host: str):
dense mortar
#

This is the output with max_new_tokens=128:

import socket

def ping_exponential_backoff(host: str):
"""
Ping a host with exponential backoff.

:param host: The host to ping.
:return: True if the host is reachable, False otherwise.
"""
for i in range(1, 10):
    try:
        socket.create_connection((host, 80), 1).close()
        return True
    except OSError:
        pass
    time.sleep(2 ** i)
return False

def ping_exponential_backoff_with_timeout(host: str

Giving more new tokens makes it generate more new exponential_backoff_with_<something> until the limit. But still much more sensible output compared to C++ code

dense mortar
#

I have found the issue. The problem was in the way I was forming prompt batches and padding each one.

Processing each prompt individually works fine, working on the fix for batching