Hey everyone, just finished a short mid-week hack for a website that allows you to talk to GPT-3.5 about one or more URLs and files (of nearly any format and size!): https://cachechat.pagekite.me/. The best part: everything's built simply with numpy and OpenAI's API, no vector DBs or anything overcomplicated. If you like what you see, please star the repo: https://github.com/andrewhinh/CacheChat
#CacheChat
57 messages · Page 1 of 1 (latest)
Thanks for pointing it out. It’s a currently known issue, and will be fixed soon
Alright website fixed, thanks
Can it work with contents of websites, not just articles URLs?
Because my output looked generic instead of content-specific to the URL
Yeah that seems to be the work of a minimum similarity requirement between the prompt and sources, guess I should remove it for the sake of general questions like this. Again, thanks for pointing this out!
Issue fixed, thanks!
@snow garnet your app didn't parse a URL I inputted ||https://www.kbp.org.ph/accreditation/|| for the question.
The prompt was "what requirements are needed"
Let me check it out
Well that's an invalid URL:
Though yes it should've failed earlier
Thank you for pointing it out. It's a server issue 😦
I tried it on another one -- https://www.liquor.com/brands/tequila-rose/
How'd it go?
Oh interesting, will look into it, thanks
Btw, it's hard to fail early for a URL since web parsers get easily confused even with bad web pages. Will look into how to make this better
Ah, seems I forgot to do a unit test case. Fixing it now
Literally ah
Turns out that website had a TON of junk, which made the context window overflow for the first question. Those BeautifulSoup web parsers are nice, but they get confused sometimes 🙈
Yes, sometimes some engineering is needed to get the gist
Yup, fixed now
Tried also a jpg file with text and a docx file. But there is another error
Ah yes the issue of no text being found, on it
Fixed it
Nvm, now it's fixed 😎 Apologies for the confusion 🙈 Had a logic error that wasn't complicated matrix math!
Thanks for the tests!
You could have provided acceptable file formats as a list so people would be oriented about which to upload.
My pleasure.
Yep you’re right, thanks
Really love this idea but having trouble ingesting documents when running my own instance. Your hosted website had no problem ingesting this PDF and other documents but I couldn't upload after running streamlit
Thanks for trying it out! That’s weird since I have the current main branch code deployed to the website. Mind sharing more of your error logs?
@snow garnet Does Beautiful Soup parse Scribd documents? I have a Scribd document link but it shows a paywall. Trying Cache Chat for the document it won't access the part I was looking for
Yeah any url that isn't public, free, and valid won't work
A workaround for Scribd documents should exist. I found a downloader but it already broke.
The problem is they state To access this document, upload one of yours, or subscribe with a 30 day free trial. Trying to get around this security measure doesn't seem right to me.
Of course I could implement logging in with Google, etc. to get this to work, but that isn't really the point of the project
However, I do feel a bit inspired to build a production-grade version of this tool.
Login-to-use-this-project shouldn't be necessary
Ahhhh my log would only output the "package punkt is already up-to-date!" message and it looks like I get that same output even when file ingestion is successful. I am a quite 'code illiterate PM' so turning on better Steamlit logging proved.....difficult for my small brain. Problem is certainly PEBKAC here though because I re-did the d/l and step-up on a different machine than my primary and it worked just fine
Also, again, code illiterate product manager here - I couldn't get "export PYTHONPATH=." to work on my windows system so I replaced it with "set PYTHONPATH=.", but, couldn't figure out how to modify "echo "set PYTHONPATH=.:$PYTHONPATH" >> ~/.bashrc" to work for me since my system couldn't find a ".bashrc" file, so I wouldn't have to do set the path each time I am starting this up - NOT expecting you to solve this for me but just sharing so you know how dumb some users are 😃
No that’s a fair point, I should make it inclusive for windows users. Thanks for bringing it up.
Actually, does replacing set with setx make the path change persistent? If so, happy to add this to the repo README
I'll check
So I couldn't just use "setx PYTHONPATH=." but tried to use "setx machine PYTHONPATH=." and "setx mypath PYTHONPATH=.", however, neither seemed to make that path persist once starting up another conda env.
Yeah that’s the issue with Windows, it’s never consistent even between other windows machines. I’ll add a link to a website I found for info around this in the README.
Comment can be found in step 2 of the setup
Thank you for the help friend
Not sure, though it seems to still be able to give an answer
@snow garnet Did your page got removed?