#Seeking Advice: Finding the right column and cell where to fill in the text using an LLM

44 messages · Page 1 of 1 (latest)

latent hedge
#

Im encountering a problem here i have a lets say excel file like in the image attached. I cant find a way how to make the AI find the right columns and cells here to fill in the text. I tried converting it to an csv an analyzing it but it always says some bs even though i tell him what the column contains (f.e. search for a column where exact descriptions are written). My files dont contain the name of the column in the first row

finite sedge
#

hey
do this ;

#

Detect where the real headers start (row 9 in your case).

Slice the sheet to only that part.

Rename headers → normalize.

timid flume
#

I've had my own set of problems with getting an LLM to help me with spreadsheets. I never did get a solution to that...
Maybe we can work together to create a prompt that works?

finite sedge
#

@timid flume elobrate your issue

timid flume
#

Sometimes, when I tried getting Gemini to help with a spreadsheet problem, it assumed I has csv inputs instead of working with data directly in a spreadsheet, even though I already said that I was working on Google Sheets.

Other times, it misinterprets copy/pasted data or gets the cell references wrong even though I told it enough that it should have been able to figure it out correctly.

latent hedge
# finite sedge Detect where the real headers start (row 9 in your case). Slice the sheet to on...

hi thanks for your answer but i dont think its what i meant. sorry if i explained it wrong i will try again:

  1. im converting my excel file to csv - the excel file looks different each time but the main idea of the columns is the same so for example there will be always a column with requirments and a comment column
    2- i tried splitting it in slices but in the beginning the column content of the comment for example is emoty so if i have more columns the LLM wont be able to identity which of them is representing what
    3-once it finds the header of each column i need the exact postition like in excel A2 or idk B6
  • once i got the position i can use each cell below the header column for the content that is filled by the LLM based on what is needed
#

idk if you got the idea

timid flume
latent hedge
#

i have such a script already i need to show the llm how to find the header columns

#

when the form of the csv changes

glad sentinel
#

are you looking to fine tune an llm to do this? if so do you have enough example data? Then its just a matter of training. Or do you mean something else?

latent hedge
#

i dont really know how to prepare the data for that kind of training

glad sentinel
#

awesome happy to help someone intersted in fine tuning models - so fine tuning depends on two things - is it a common/easy pattern to learn? if yes then you need less data if no you need a lot of data. By less for an easy pattern i'd say around 500 samples of q/a type data to train the model on. csv is easier for an llm to learn on. so want llms want is input csv and output csv to train on. do you have at least around 500 examples? do you know how to do fine tuning (transformes/weights/lora etc?) if not this is probably beyond your ability and you need to learn that first.

timid flume
#

is csv better than json?

glad sentinel
# timid flume is csv better than json?

either are ok - it just needs clearly structured data in plain text format to tokenize. As you probably want the answer in csv its probably easier to just leave it in csv. Excel files etc will dirty the data with their internal formatting so dont use them. What you want is your input csv, output csv of what you want the answer to look like. then feed that into a q/a json format. eg

{
"instruction": "Give three tips for staying healthy.",
"input": "",
"output": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
},

replace the instruction with whatever instruction you are training on, input as your csv file in plaintext, output your output answer csv file in plain text.

If you've not done fine tuning yet give this a go first - https://github.com/aatri2021/qwen-lora-windows-guide

GitHub

Contribute to aatri2021/qwen-lora-windows-guide development by creating an account on GitHub.

latent hedge
#

like thats the first problem im encountering before even answering any question by the LLM

glad sentinel
#

maybe an example of what you are trying will help me understand as its not clear what you mean. Are you saying you have a spreasheet with a specific type of column name and its randomly located in the spreadsheet with data under its column that you want to take in as and have the llm produce an output and put it somewhere else? Can you give a simple example? the snippit above in the table you posted doesnt explain that.

latent hedge
latent hedge
#

I want the LLM to find the data so it knows D6 is the Requirement column from D7-D21 and D23-D41

#

i just pasted random things into this column but normally there will be stuff like "It should do this and this"

#

and then using the LLM i need to find the Response column F6 and then put the answers for the requirement there

#

so when D7 say "It should be yellow" then F7 would be the answer to taht "The system is yellow"

#

I understand the question-answer finetuning things so i need a large dataset - okay. but i have no clue how to make the LLM understand the strucutre

timid flume
#

you have merged cells? That could complicate things

latent hedge
#

thats the problem yes

timid flume
#

how did you even convert it to a csv then?

latent hedge
#

using pandas

timid flume
#

I mean, how are merged cells represented there?

latent hedge
#

in that example it just the first cell so A is the title and the B-G are just empty cells

timid flume
#

so it doesn't preserve it, instead just taking the first cell as the cell that it's in

latent hedge
#

it takes the first merged cell

#

so if i merge D-F then its in D

timid flume
#

are you sure this is what you want to happen for your merged cells?

latent hedge
#

yes

timid flume
#

So just make more of those and train the AI with it

latent hedge
#

so the input is the csv file with the question where is the requiremnt data?

timid flume
#

sure

glad sentinel
#

as Dude said above that will work q where is x, input your csv, output its at column, row. But i question why you even need to do that if the heading is always marked requirement etc you can find it programmatically much more easily than train an llm to find the location. just ask an llm how to to create a script to align the data to a set starting point. give it that script as a tool to call or just have the scritp shift it all to set columns/rows.

latent hedge