#LLM Extract Does Not Do Whole Page?

11 messages · Page 1 of 1 (latest)

static blaze
#

@clever steppe Trying To Extract Structured Data From A Website. But All The Data Is Not Being Scraped. Only The First Entries At The Top Of The Page Are Being Scraped. Any Suggestions?

full cradle
#

Hey, could you share your request url/schema so we can replicate?

#

My guess is that it has to do with the page loading on scroll

static blaze
#

Thank you very much for helping! Just a caveat, I am not a coder or developer. So, there's a likelihood I am missing something in the request. Here you go:

#
clever steppe
#

Where is your extraction schema? thats the most important parameter to pass because it tells the model exactly what format it should return the data in

static blaze
#

Caleb! Great to hear from you. In my ignorance, I put the schema in the prompt. The extraction produced the desire result as far as structuring the output correctly. However, It stopped after 10 extractions when there were over 500 more to do. Should I explicitly state the extraction schema?

clever steppe
#

Yes, explicitly state the extraction schema!

static blaze
#

Caleb. Understood and thank you for your time. I declaared the schema to scrape data off a simpler website. Here is the updated code I used:

#

class ExtractSchema(BaseModel):
Address: str
Location: str
Price: int
Beds: int
Baths: float
SqFt: int
Px_SqFt: int
Time_On_Redfin: str