#Arches AI - Custom chatbots, stable diffusion, self-hosting available

170 messages ยท Page 1 of 1 (latest)

versed yew
#

Filechat.io is a super simple way to utilize the power of word embeddings and semantic search on your own documents. Simply upload a PDF using the sidebar, wait for the file to be processed, and start asking away with your custom chatbot trained on your files. You can choose between several different types of output.

I am still working on several features such as:

  • UI for seeing the 'source' of your answer
  • More options for answer types

There is a free 10 Question trial so you can see how it works. I can see this as being useful for students who want to perform semantic search on their textbooks or papers. Or someone who works in a research lab and needs to search through unstructured information.

You can visit the link here: https://www.filechat.io/

Filechat is the perfect tool to explore documents using artificial intelligence. Simply upload your PDF and start asking questions to your personalized chatbot. Sign up for a free trial today, no credit card required.

#

Filechat.io - Upload a PDF and get a instant semantic search and answer generation

blissful hatch
#

I think the idea is really cool, you have to think about it first. but had just tested it with a prescription. unfortunately it doesn't work, the ingredients could not be found out.

maybe it works better with books. does this work via PDF to text and then everything is sent to ChatGPT?

versed yew
#

ok UPDATE i have updated the model to use davinci-3

#

i would try it once again, you should get better results

blissful hatch
#

I think the error is somewhere else, the answer I get back has nothing to do with the uploaded document, they are more general answers.

versed yew
#

are you asking in german?

blissful hatch
#

Yes, but also try to ask for title in english

versed yew
#

Ah strangely enough the text was not properly extracted from that document

#

i will do some debugging and get back to you

#

thanks!

blissful hatch
#

Np, keep going ๐Ÿ‘

#

Btw the Text was generated with jsPDF libary

versed yew
#

im using PDFjs ๐Ÿ˜‚ but that library looks looks promising as well. i honestly had the best results with pypdf2 but i didnt want to write a python deployment just to scan. long term is the better solution though

pine flare
#

thanks for your work

peak orchid
#

Have been testing many API demos here lately. This one looks and works well. Feels very close to a great product.
Pretty sure my tests were after the swap to Davinci-3. Can already imagine business cases for this.

versed yew
#

yep itโ€™s davicni 3 now

#

iโ€™m glad you found it useful ๐Ÿ™Œ๐Ÿป

last hound
#

Cool! Are you using embeddings?

versed yew
#

yea

last hound
#

Nice work, good luck with it! I just managed to use embeddings with a custom knowledge base, but it looks like that cleaning and pre-processing the data is the hardest thing.

versed yew
#

ya

#

who knew pulling text from pdfs was such a hard problem

still dome
#

Still impressive work ๐Ÿ‘๐Ÿฟ๐Ÿ‘๐Ÿฟ

waxen epoch
#

Does it work with multiple large files, for example like 2 pdfs with 200 pages each, and the API needs to scrape them and than make a summary

versed yew
#

yeah it can work on large files, but right now only one at a a time

waxen epoch
#

does it need to have some metadata to work with multiple ones right and than make a product out of it?

versed yew
#

wdym?

versed yew
#

UPDATE: Changes have been made to significantly improve accuracy on longer documents (up to 100 pages). Prompts have been improved as well, so answers should be more brief and to the point (less rambling).

waxen epoch
# versed yew wdym?

That if there is a way the AI can make a comparison between the topics of two or more textes and afterwards print out the possible similarities in terms of concepts and make a sort of report

versed yew
#

i am adding summaries to the uploaded documents, but multi-doc comparison will probably come later

versed yew
#

UPDATE: The prompt has been updated to allow for a more conversational approach. The chatbot is now much more human and produces responses such as the following:

gaunt glen
#

did I break it?

#

is the file too large or is it normal?

gaunt glen
versed yew
#

i think there is a problem with rate limits that happens with larger files like that one

obtuse blade
#

What do you think about this project: to train a model on the papers of a specific field (say social psychology) and use it as a chatbot that gives you credible answers based on those papers. How doable is this?

fierce cedar
obtuse blade
fierce cedar
obtuse blade
#

Thank you. Hope there are tutorials for this

versed yew
#

UPDATES 01/22:

Hello all, Iโ€™ve made some major upgrades this past week!

New Features

  • Automatic File Summaries
  • Source Widget to see exactly where the answer is coming from
  • Fixed Large File Uploads
  • Overall improved UI/UX with several bug fixes on front end
  • Improved UI

Upcoming:

  • Full Text Search on Files
  • Multifile search
abstract hornet
#

Itโ€™s awesome!

ocean dove
#

Where do you store the data? In the sense, can it be used on confidential information?

versed yew
#

files are encrypted and stored in bucket storage, so they are secured. the information is of course then sent to openai, however i am waiting to hear back regarding an opt out of their QA sampling to ensure that the only person able to read their documents is the owner (file sharing is on the road map as well for teams, orgs, etc)

versed yew
#

@ocean dove is there a specific field that you are interesting in using this tool for? I'm curious to know the different types of things people are uploading so that I can optimize certain processes

still dome
still dome
versed yew
#

Great ๐Ÿ˜„ Multi file search + file sharing is currently in the works. Tagging documents as a sort of "filter" on the multi search will be present as well.

still dome
#

Awesome ๐Ÿคฉ

versed yew
#

Working on docx support as well

versed yew
#

Filechat.io - ChatGPT for your files, PDFs, semantic search, and more!

#

For those are you just checking out this thread, Filechat.io is a super simple way to utilize the power of word embeddings and semantic search on your own documents. Simply upload a PDF using the sidebar, wait for the file to be processed, and start asking away with your custom chatbot trained on your files.

FEATURES:

  • Custom PDF support
  • Document summarization
  • Full text-based elastic search ON TOP of the existing semantic search
  • View citations and sources from documents next to your answers for easy reference

ALL AVAILABLE TODAY

#

Filechat.io - Upload documents and get a instant semantic search and answer generation

worn surge
#

Hi

gilded thunder
#

Hi all i'm newbie

dusty owl
#

Hi @versed yew great work on this!

I tested this on a blog post and got some decent results.

When I open the references, the text is a bit funny e.g. starting mid sentence, some letters missing or words split in two. Any ideas why this happens?

versed yew
#

This is due to how I am doing the splitting for embeddings. Currently working on a better algorithm, should be out in a few days!

hidden quartz
#

Looks great. Having a go now. Is there a way to know when your file upload has been processed? I've been trying to ask a question for 10 minutes since upload.

versed yew
#

you should see the status in the list of files on the main page

hidden quartz
#

Says it was uploaded 22 minutes ago but nothing beyond that.

versed yew
#

looking at the error logs, looks like it was a transient error with openai (they had a small outage). added some retry logic on failed calls to openai, so you should be able to upload with no issues now!

hidden quartz
wooden lava
#

This is awesome ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

untold jolt
#

Love this

versed yew
#

If you'd like to follow on Twitter, you can follow at https://twitter.com/filechat_io. I'll be posting demos, research updates, and interesting articles from the OpenAI space.

versed yew
#

Hey, that is something I would like to implement. Some program where you can earn questions yourself for getting people to sign up for the free trial. I'd like to get that out sometime over the weekend.

cyan briar
#

Thatโ€™s great ! Really love the product

versed yew
#

UPDATE: I have added multi-file support. You can now select any number of documents uploaded to your search context.

If you've ran out your free trial already, feel free to DM me and I will reset it to 50 questions. Would love to get some feedback from people

steel onyx
untold jolt
versed yew
#

You mean the response of the chat? Or the response in the chat history?

untold jolt
versed yew
#

That would be because you are hitting the token limit on output. I have to the token usage as per OpenAI policy. This is to prevent abuse and people asking things like โ€œWrite me a 3 page essay on this topicโ€. But If you want more information about a question asked, you can ask the chatbot to elaborate.

#

As great as that would be ๐Ÿ˜‚

untold jolt
versed yew
#

There are other infrastructure costs involved, such as file storage, vector storage, server hosting, elastic search, etc. If youโ€™d like to run something like this self-hosted with your own infra, there are some great github repos out there such as the gpt index.

versed yew
#

UPDATE 01/27:

  • REFERRALS are now active. Get a referral link by clicking on the user icon and choosing "Get referral link" and it will be copied to your clipboard. Have someone sign up with that link for a free trial and you'll get 30 questions. Can be used unlimited times.

  • TEXT HIGHLIGHTING is now enabled on the source widget. It will highlight the relevant text to your answer directly from the source, making it even easier to see exactly where your answers are coming from. See example attached

versed yew
still dome
#

Well done ๐Ÿ‘๐Ÿฟ

versed yew
#

@still dome Thanks, next is working out a slightly more integrated pdf viewer so you can see the document directly instead of through this little widget.

still dome
#

I wanted to drop a feedback but for some reason it was struck out as high risk and was penalised for it ๐Ÿ˜’. Is discussing other projects banned or something ?

#

I wanted to point you to some other projects doing something similar and how I tested them including this one, using the same question on a pdf document

#

The answers were similar to another project but one had a lot simpler and more comprehensible answers and I wondered how they were able to generate that kind of result. Was going to point you to it. Iโ€™m share someone has mentioned it here before.

versed yew
#

Yeah I got flagged for a comment on this post and so did a mod ๐Ÿ˜‚ You can DM me

uneven trellis
#

Fantastic Project. Keep up the Great Work

versed yew
#

Thanks dalle_peace

versed yew
#

UPDATE: I am adding an integrated PDF viewer, so you can have direct access to the sources in the PDF itself. Will be expandable and resizable (desktop only). Will be released in the next day or so

grand sedge
#

Web design green colour study

versed yew
#

whats that?

versed yew
#

getting closer ๐Ÿ‘

#

will auto scroll the the location of the answer in the pdf

frozen cape
young elk
#

I have been testing this on agreements.. doing a fairly good job at extracting specific statements

#

can I integrate this into another application?

versed yew
versed yew
#

@young elk What types of applications did you have in mind?

uneven stratus
#

wow this is really cool I love it I'm working on a much different chat application but yours gave me some really cool ideas ๐Ÿ’ก๐Ÿ˜ƒ

frozen cape
versed yew
#

thank you ๐Ÿ”ฅ

frozen cape
#

For each file, you feed the embedding of words to fine-tune the model with engine 'text-davinci-003', don't you?

tacit yarrow
#

following

versed yew
#

no fine tuning, just a lot of preprocessing

tacit yarrow
frozen cape
versed yew
versed yew
tacit yarrow
#

with subs

versed yew
#

just implemented ๐Ÿ™‚, the max token output for answers in the paid plan has been doubled, you should comfortably be able to ask for answers up to 200-250 words. if you end up subbing but don't see an increase in your answer length/quality, feel free to DM me

versed yew
#

We've reached 700 users! Will be continuing to make improvements each week ๐Ÿ™ stay tuned

versed yew
#

UPDATE: Prompts and chunking methods have been slightly updated to give more concise answers. Should do a better job especially at multi-file chat

frozen cape
#

Is it totally commercial? I have more than 60 free question but I can't upload new docs for testing.

#

It doesn't seem fair while I try to send the referral link to get the free trial and now... it's not work!

real badger
versed yew
frozen cape
versed yew
#

sent you a dm, need ur email

versed yew
hexed heart
#

Hi! This looked really cool and similar to how I was thinking about using this. I was trying to test it out but it wants me to pay for me to even test out one document with a few questions. Is there a way for me to test this out for free?

frozen cape
versed yew
#

we will be releasing a new API with enhanced abilities such as the following:

  1. multi-organizational and team support
  2. read/query only access tokens
  3. index documents directly from URLs (supports pdf, html, docx)
#

send me a PM if you are interested in being a part of early access

slow nova
#

What is the limit of answers? is it 4K tokens?

versed yew
#

UPDATE:

Filechat is now publicly available for API for beta access. If you are interested in signing up or want to know more, feel free to shoot a DM

#

@slow nova The limit for basic subs in 240, for API usage it is configurable to whatever you like

humble oxide
#

Also, can you please provide trial credits

versed yew
#

Sure, DM me

humble oxide
humble oxide
versed yew
#

Hi @humble oxide super sorry for the late reply, have been traveling recently. Just replied

#

Also, we are upgrading to ChatGPT API and should be released in the next day or so

#

We are seeing better responses with the new API already

humble oxide
#

I want to show them final product

#

Also, do we get OCR feature in the current product?

worn jetty
#

How to train model on sql schema and data ? I think similar strategy as being used here can be used??

versed yew
#

Filechat is now using the new ChatGPT API! Answers should be much faster and more detailed

#

Free trials are now available again as well

atomic finch
#

create a intriducing txt about the importance of no dissmiss products in japanese restaurants

versed yew
#

?

versed yew
#

UPDATE:
OCR should be much better for PDF scans and handwritten notes

#

Video support is in the works as well

ruby bridge
#

For files that contain questions and answers (like a FAQ, or perhaps historical exams) the current way that chunks are created for embeddings leads to some suboptimal results in the semantic search results. Not really sure how feasible it would be to try and cater for every type of file that's out there but one potential solution would be to have an advanced option for people to specify a regex or something that's passed on and is used to create the appropriate chunks for the semantic search.

weak forum
#

Hello, I find the free tokens are a bit short-limited to being able to text several different functions to be sure it can work for what we want to ask before choosing to subscribe.

versed yew
#

Hi, we will be adding a method to change the chunk size when uploading a document for further customization

versed yew
#

UPDATES ๐ŸŽ‡:

  • Faster loading times

  • Several UI enhancements have been made such as code block support (see code sample below)

  • Support for regular e-mail and password authentication. Email verification is required

Our DMs are open for feedback and technical help!

versed yew
blissful hatch
#

Another question, aren't you a little "afraid" that OpenAi will install a PDF upload function and your service will go under? don't get me wrong, I think the project is more than cool!

versed yew
#

that is a good question. i am not afraid since our main offering right now is the infrastructure to easily perform a semantic search over a large corpus of documents. for example, we will soon open up access to an embeddable chat widget that you can easily include in your own websites. additionally, in the future we would like to offer non-open ai llm models such as GPT-J or other smaller models. we will always be improving to build on top of the latest tech.

night igloo
#

if you are targeting b2b clients with this, how do you address their concern on Samsung type confidential data leak? can these enterprises run their own LLM in private cloud which can be used for embedding, and then your vector search/classic elastic search on top?

versed yew
#

yes that is something on our roadmap. ideally we want to offer a way to deploy this on existing infra in order to create an airgapped environment. this is something we would want to offer to businesses that deal with sensitive information

versed yew
#

You can now edit the context size and the number of sources used in your answer from the Filechat website! These parameters can be used to grow or shrink the size of your search window.

versed yew
#

We will be offering an on-premise and air-gapped solution for organizations that wish to keep their data on their own servers.

Deployments will be available on AWS, GCP, and Azure.

Shoot an e-mail to jonathan@filechat.io for more information.

versed yew
versed yew
#

UPDATE:
We will be releasing our single-tenant deployment offering in the next week or so.

We will offering customizations to bring your own database, vector db, and storage.

With these deployments, all of your information will be contained to your own project and will have enhanced security and availability since the deployment will exist in its own project and own database. They are intended for enterprise use.

Reach out to jonathan@filechat.io for more information, or feel free to drop a DM

versed yew
#

Arches AI - Custom chatbots, stable diffusion, self-hosting available

#

We have updated our documentation to include our new stable diffusion endpoints:

https://docs.archesai.com

Arches AI

This endpoint will register a new account and return a JWT token which should be provided in your auth headers

blissful lintel
#

Does it work with scanned pdf? I have scanned magazines

versed yew
#

yes it works

versed yew