#💬│general
1 messages · Page 55 of 1
I’m on the free version
Says last month
Ah this is why. They’re showing it for a few queries for free and then trying to make you pay
oh
yeah no they made web and things free
for like 5 queries per day
Sucks
Oh, yeah its been beta on chatGPT Plus for ages, and recently left beta on chatGPT Plus maybe a month or two ago, and I guess now it's on free
But for limited usage 😦
It dose, but only 50 times a day on perplexity 🥲
But yeah I agree, Claude pro with web would be great
Opus would see the news and just be like...
Know what? I'm out. Turn off this realtime feature. Nope.
This is bad, there are so many!
Why are they using gemma for this stuff?
I mean that is my theory, ain’t no way a Gemini model is this dumb, even with the bad rep they have, so has to be gemma 2b or 7b
He is killing it 😂
Dog played 63 games in the NHL and scores 8 goals.
MVP
@warm cave I saved the best for last.
Also is immortal or something...
A rock a day keeps the doctor away.
It is too much, i am dying 🤣
Lol, pretty sure it's getting all this garbage from reddit...
It believes that the reddit trolls are giving actually useful and accurate advice...
I dunno. The last one cited Berkeley.
I wonder if Brave is better or not???
And not just memeing.
I know that these things have been in the news so i cant ask the same things, but wondering
I think they use mistral or llama
And likely finetuned.
Yeah
It has to use Gemma, no way Gemini is saying this stuff
On google, likely. On brave, unlikely.
Interesting~ but Chrome/Firefox are by far the most popular browsers.
Brave is literally a chromium browser...
Yet the results are different
It's the better chrome, more or less.
plus imagine the cost of running Gemini 1.0 or 1.5 for every search across all of google, even flash would be crazy expensive
Yes, browser and search engine are different things.
On brave you can choose to use google search or brave search.
Que? Interesting.
A lot of people are likely to leave chrome next month...
I have the system prompt: https://discord.com/channels/1047197230748151888/1243018691822751754
Brave search is new to me. I knew you could use different search engines like duckduckgo or bing or Yahoo, etc. but Brave is new to me.
Brave also has the best default privacy settings too, from the top browsers.
I have an extension in Chrome that hides the AI Overview crap :x
Yeah, I dont see why to use chrome anyways, use thorium or brave. same benefit’s as chrome but without a lot of the downsides
All existing extensions will need to be migrated to V3 by June 2024 to continue working. Beginning June 2024, Chrome will gradually disable Manifest V2 extensions running in the browser.
Here is the URL thingy: https://search.brave.com/search?q=%s&source=desktop
That is pretty much gonna cause an exodus
I've had compatibility issues in the past when brave was in its infancy, but it has clearly improved
idk what you call it, but that is how i set it up as one of my search engines
The quick search function
yeah, it has gotten quite good
yeah
@bs how do I eat rocks?
If the website compatibility problems are resolved (this was a while ago) then I might as well uninstall Chrome.
I tend to perfer Brave over Duck, but both are good imo
website compatibility problems
Which ones?
Yep, definitely wouldn't be using vanilla chrome
Especially with manifest v3 being mandatory next month
Tbf, I've got tampermonkey scripts out the wazoo and extensions with similar purposes.
I don't go in raw, so to speak.
Tampermonkey will not work as well with manifest v3
Zat is a problem for moi
I love my scripts
I can barely install 3rd party extensions anymore in Chrome as it is.
Do either of you know a place where I can see some benchmarks just for ghits and siggles?
Given that they are both chromium I don't expect notable differences, but I do know Chrome eats RAM like a meth addict.
I imagine just not having a lot of ads and trackers would make pages load faster.
I reset my computer after each semester ends so its not very high overall rn, but this ticks up fast, lol
Neat
Is it me or does perplexity eat up more memory per page, now than it did in the past?
Maybe?
It's a shame that windows isn't containerised from the ground up, so it would be easier to clean up after applications.
Each page is taking around 220MB
When I hover a tab, it shows the ram being used.
The downside to the modern web, lol
Would prefer it if it leaned more heavily on wasm.
Did Google take down the Answers by AI? seems to not have any
Maybe due to the whole... Jump off a bridge and eat rocks thing?
You would think they would test it first, lol
Really 🫠, why would google do that lol
Yeah, I guess not
I think brave uses mixtral for AI answers, which is good, its small and cheap, but smart and unlikely to tell you to un alive yourself, lol
phi 3
but i don’t have anything to run it on
i could run it on modal like i’ve been working on
i ram llama 3 on modal and got like 4.5k token throughput
ran*
per second
I tested phi 3, put it in the same situation as Gemma told you to eat glue and Phi 3 did not
phi3 is way more regulated
Just get a qualcomm laptop in a months time.
when i say meow it says like sorry as an ai assistant
like …
over alignment at some point
Lol, it doesn't speak to cats
I’m thinking about scale and not local
The perplexity clone idea lol
I’m thinking about how to build it, just for fun really
For scale, isn't it better to just use an API for now.
Cats are Inferior life forms, its Human>AI>Cats, AI is looking to move up the poll tho
😦
Yeah but what api will let me hit like 25 requests to summarise sources
In one second

On free tier or L1 paid tier
Time to ditch chrome, there is no benefits lol
I’ve used arc for around a year now
There are negative performance benefits
Really only using for the aesthetic
Yep, the arc UI on windows doesn't work as well.
Sat for hours coding so i’ve come to talk a walk lmao
Currently in a park in idk where
So you are lost?
Well no
Lol, at least you have you phone
Code outside, problem solved.
There's a bunch of perplexity clones now, but the open-source one of them (omniplex) just closed down for good. There's also like correkt ai which isnt open source
I normally code outside between 10-14, if the weather is good.
I’m trying to make something slightly unique
Open replit on your phone, you get a keyboard you can reach every key on with one finger, that is productivity🌟
Probablt seen my efforts
Oh? What would it be?
Making code interpreter for one language is easy.
I’m thinking of an architecture for how one could index the most trending parts of the web
Making it support more is when it gets challenging.
Well my approach could work with anything that works on a linux vm
And maybe a fine tune or n-shot prompt of the model
A friend of mine, who subscribes to GPT Plus, would really like to try Perplexity PRO (which I talk about so much). Is there any way for her to start a free trial? I'm sure she'll end up switching to it as well. Thank you.
Yep, but cost wise, running a VM is not generally the best.
7 Day trial on the yearly
It’s a serverless VM
Only billed for cpu core usage by minute
I have a month free trial that I won ages ago
So if you don’t do anything you don’t get charged anything
what i’m thinking about is how to store huge amounts of embeddings with no money
Thanks a lot!
Embeddings of the sites?
I was thinking like
Either I run a GPU instance on a vm somewhere so fast that i can embed sites and pick the best snippets in real-time
But if lots of queries are about the same thing from users then maybe i can store the most like popular 1% of searches or something
As already embedded
Maintaining like a hot cache of the like extreme top of the web
So caching the content of the links?
So deep dive queries might take longer to run this
Yeah
You can try a mongodb vector db or maybe chromadb or something? But these cost money
You saw yesterday with the like sources right
Sending the whole thing to the model is expensive
so i’m considering ways to pick relevant bits
embedding search is one way
Yep, embeddings is the most popular method.
I would probably just use llama 3 8B, on groq enterprise.
Maybe i could get the startup credits grant from them lol
But would cost money
Waiting for the day groq adds embeddings ngl
They’re doing whisper in private beta
Yep, their inference is crazy
Well what’s my actual goal to be honest
Make a search engine AI startup but with literally 0 starting budget?
Not 1000% sure what i’m trying to do myself honestly
I would just cache the content of the links, use llama 8 B for choosing what to use as context, and use llama 8 70B for writing the answer.
Answer i might offer better models like haiku and things too if the service got enough traction
well i might try this
I was kinda interested to see how phi would perform though
Downside to haiku is that it's pretty slow and can't be hosted on groq.
Maybe i could run that on a T4 gpu
Haikus fast
It isn’t 900TPS fast but it’s fast enough
Not compared to groq speeds...
yeah but do we need groq speeds when the previous whole chain will take like how many seconds anyway
Also I imagine the rate limit is higher.
Search, Query,embedding , so on
Depends on how many sites you pre-scrape for indexing.
Hey Devs, RevelantAI is looking for you!
these are our products are these
- an AI Agent text voice and video supported like https://www.digitalhumans.com/
- outbound agent voice support be able to
latency for AI voice agent should be around 8000ms - automating workflows
you should be able to other things like AI Persona too
send me your resume and portfolio and well set up a interview
The longest part currently seems to be the initial search.
Yeah no clue yet
The main worry for that is storage
Also probably going to increase
Because right now i was using a trial of some paid api
There is also Gemini 1.5, I think the price it good
For the thing i’d be going back to manually scraping bing
Flash I mean
per Mtok?
I think the easiest method would be crawling popular sites.
And then just recrawling from the home page every day.
Yeah
You simply make a request to the site, get a list of the links, and follow the ones you haven't seen before.
Yeah but what i’m not doing is indexing fully myself as i wouldn’t have the storage to store all that really
idk
it’s a hard challenge
String compression for the actual content of the pages?
If i can make the search bit really fast then
And after converting the page into markdown.
Do you think i should focus on this thing first or the code interpreter
Which one is more vital?
Well there’s no rush
But maybe the main search thing is probably a better thing to think of first?
So first i’d be making a meta search engine basically
Pull queries off google and bing and whatever else
I could cache search engine results
That would be viable and probably increase speed a large amount
Yep, you could also just use what you currently have to answer the current user, and then behind the scenes, add a future search to a queue and cache the links you don't have.
That way you can keep the service fast, while imporving it behind the scenes.
Storage is likely gonna be your largest cost.
Discord recently made the files timed links or sometjing
Does amazon still do unlimited image storage?
Convert the embeddings into an image and just store it, lol
Just make sure that they don't compress them or something...
Are there any instructions in the API documentation to truncate the response so that it doesn’t consume so many tokens?
Is this it what you are looking for? https://docs.perplexity.ai/reference/post_chat_completions
Are you talking about perplexity API?
pineconedb looks interesting
wonder if i could store my link/snippet cache in this
Yep, it's pretty popular in the vector DB space.
Sadly not open source
would be more lock in
which im sort of trying to avoid but not "really"
There's probably an open-source alternative.
Yes
frankly i think this vercel api code is a mess lol
well at least the code i've written for code interpreter so far
the main isue i had
Probably because you don't know how to read docs, lol
nonono
is that the librayr doesnt support stop tokens
i had to go and clone it and add that in
they just dont have that as an option
well asin i got it to work fine so thats not the main issue
mainly this lol
I mean changing max tokens is not bad ```import requests
url = "https://api.perplexity.ai/chat/completions"
payload = {
"model": "llama-3-sonar-small-32k-online",
"messages": [
{
"content": "string",
"role": "system"
}
],
"max_tokens": 100,
"temperature": 0.2,
"top_p": 0.9,
"return_citations": False,
"return_images": False,
"top_k": 0,
"stream": False,
"presence_penalty": 0,
"frequency_penalty": 1
}
headers = {
"accept": "application/json",
"content-type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)```
lol
well it wasnt hard to add
so thats fine
Yep, depends if you are supposed to compose it yourself. The ask is mostly just an API wrapper.
I'm used to making stuff myself, since I mainly use Go.
I'm debating just rewriting without nextjs and without vercel ai
Might take a day but might be worth it
Next is pretty heavy.
yeah
i dont think it's really needed
idk i just picked it in a flash so i could try this with RSC and things
what do you think i should try
for this project
i picked shadcn and whatever only to quickly prototype
learning a new frontend something could be interesting
Depends what you want your UI to do.
ok think uh
building a ui like perplexity's
what would you pick
as stack
(tailwind css is required)
Obviously
Maybe svelte. It's supposed to be good for animations etc.
i've never used svelte actually
garnet doesn't do that even tho it's redis-compatible?
also is there a comparison on which of those is better?
i would expect the dedicated ones
interesting
#37 on the MTEB leaderboard
but its only a 33M model
(filtering >250M)
Phi-3 medium
New kid on the block
Its worse now, lol: https://www.youtube.com/watch?v=zLvFc_24vSM
Is the LAM a Scam? Down the rabbit hole we go.
Support Investigative Journalism:
► Patreon: https://patreon.com/coffeezilla
People who helped this investigation:
►Emily Shepherd: https://twitter.com/ShyTsarina
►Eva: https://twitter.com/xyz3va
►Marcel: https://twitter.com/MarcelD505
►Rithwik Jayasimha: https://twitter.com/thel3l
►jj: https://...
yeah, they 100% lied, not just that they are taking longer to make the LAM work well, its straight up not what they advertised
I got pessimistic back when they did a video with Matthew Berman where they said something along the lines of LAM model not being Transformers the other popular architecture, they said it would be too expensive to train.
apple had some mobile ui model
looked like a real lam early beginnings
...
bing does not like datacenter ips
You tried rotating them?
well this was cloudflare worker
cant rotate that
yahoo works?
yahoo is just bing too
could use that???
Did you try comparing them?
Most likely those sites are also using cloudflare protection...
Because they can.
but they have their own infra
Maybe you need to make your worker make a request to your own server, to get a proxy to use, before making a request to bing?
The worker was meant to be the cloud proxy
so im just going to try move it
to a different host
testing that right now - does modal get blocked
works on local
parsed it and i got results fine
meaning its like 94% about your ip
useragent this that doesnt matter
nice
Oh, doesn't block the curl user agent?
Do you need all of those query parameters in the URL too?
seemed to work
nope
results from curl on modal
most of them seem to be set to 0, so most likely remove the ones that are not needed to make your url easier to manage.
yeah in code its just &q
and like &cc=gb for region or something
i copied the url off my browser for the curl
country code, etc
well this also means i have to rewrite my code into python
hmm
I guess you can make a client, to make it easier to create the url.
do you think its worth doing the html parsing in rust
might be like
0.05s faster than bs4
and then for the search endpoint i can spawn the rust binary instead
Would be easer to use Go.
idk any go
And quicker to compile.
Better to stick to python, if you are gonna try filtering it using css selectors etc.
You can get more performance if you compile the python using something like cython.
The only thing they got wrong when making rust is the function coloring.
It's really annoying.
The lifetimes also have their own coloring too.
change that in your ide?
Oh, function coloring is not actually to do with colors.
lets see if pplx can save me time
oh
It's to do with stuff like, if you use async, it forces you to go async all the way down.
And lifetimes also do a similar thing.
ohh
Like they pollute the rest of the code to use them.
... and its stcuk on reading the file
so much for a new pro search lmao
Lol
Oh, guess that helps.
I don't trust the current models to write good code though.
They are generally incompetent.
And end up wasting more time fixing the code, than writing it from scratch.
Yep, already 1:23am
Yep, and also you said cc=gb
Yep, I have good memory
You gonna make BS4 using rust?
Looks like there is already a soup crate
Inspired by the Python library "BeautifulSoup," soup is a layer on top of html5ever that aims to provide a slightly different API for querying & manipulating HTML
You want a 1 to 1 translation?
gpt4 wrote like 90% usable code
going off that
ill add in the parsing of bing tracking urls tommorow but
going to do a quick speedtest
Oh, all in rust?
A lot of the latency can't be reduced, since it's waiting for api responses.
yeah
1.2s avg on my local internet
probably faster on the cloud server
not bothered to fix the cross compile of openssl right now, im going to sleep
Benchmark 1: ./target/release/rs-scraper
Time (mean ± σ): 1.170 s ± 0.146 s [User: 0.052 s, System: 0.022 s]
Range (min … max): 0.982 s … 1.405 s 10 runs```
Night fishy
vpn turned off
Benchmark 1: 1
Time (mean ± σ): 869.6 ms ± 62.6 ms [User: 50.4 ms, System: 20.3 ms]
Range (min … max): 783.9 ms … 952.4 ms 10 runs
Just add good logging, so you can tell how much each part is taking.
Rising star Japanese pianist Hayato Sumino takes on Gershwin's dreamy 'Rhapsody in Blue' in its 100th anniversary year for Classic FM Live with Viking.
Look out for Hayato’s self-duet on a melodica, and his genius improvisation on a ringtone...
Performed alongside the Royal Scottish National Orchestra under conductor, Ben Palmer, you can watch...
hey
You mean flamegraphs
😳
Java, javascript, python get pretty Good benchmarks
At least on gpt4 et.all
Rust and Go being possibly slightly lower possibly due to not havin as much training data
Opus does pretty well with them
anyone know where we can find gpt-4o stacked up against these other llm's?
like in the chart above
found this on vellum, looks like all the boxes arent finished yet, but if anyone is using claude 3 opus for coding, might consider switching to gpt-4o
Claude opus and gpt4o are similar, you might also try yi-large from china
imo, gpt4o can be useful but opus generally provides the same information while rating higher ‘vibe’ wise.
Try Yi Large for coding
Why I have only 5 remaining daily usage of Opus since yesterday??? (yes 5 NOT 50)
We went from unlimited use of Opus, to 50 uses per day, and now we’re down to 5?
When you use the model that usage comes back in 24 hours
so has it been 24 hours since your last 50 uses
That’s probably why
hitting bing.com repeatedly could become a bit of a pain (like they have an API.. and presumably want any third parties to use that if they want to get bing results programmatically ha)
could look into something OS like searXNG
https://docs.searxng.org/dev/search_api.html
Why Perplexity doesn't have the ability to create a PDF File? I asked for a summary and now I want to have it in a PDF file well organized.
When you copy you can usually paste it with formatting.
Yeah. I get it. But I just wanna know why it hasn't this ability built-in.
IDK, I also found out the other day it doesn't. Kinda gave me a lot more work, but o well.
if the presentation / formatting doesn't matter, could go to Print, then save it as a PDF from there
but yeah pretty crude.. ain't pretty ha

I just downloaded it back and I gotta say it’s actually good
I deleted it cuz it was just weird
But now it’s goood
for stuff like that and having ai generate it, look into obsidianMD which is a lot more convenient + ai handles markdown better
JS is pretty much out of date every 6 months...
I have unsubscribed my subscription with Gemini Advanced 1.5 Pro again after realizing my financial crisis. I am sticking with Perplexity 50 Opus for $200/month for now.
Eh for the newest frameworks mby
Lol, how often do they change?
React is pretty much changing every update nowadays...
Yep, the dynamic types don't help...
I would only use JS backend if I was using it with a JS front end.
In any case maybe vue2 has plenty of training
With nuxt
Other frameworks with breaking changes every update (frontenders fav passtime) should just be avoided ig
Yep, and languages with the same problem.
Or just have separate coding LLM's which are updated very often.
You mean stuff like deepseek
Yep, stuff like deepseek and codellama, but updated every week.
Maybe a really stupid question but I am a member of perplexity and chatgpt. Does the option "KI Modell" in the settings really use exactly the ChatGPT modell (for example GPT-4o) without any limitation. Which means I could cancel the membership of chatgpt because the result should be the same ?
It has the same language model. But not the other modalities.
You can try it in writing mode to see how similar it is.
For a beginner in the AI like me, do you mean that the database is the same but the usage (engine) how the database is interpreted is different ?
you won't get Voice-to-Voice and image generation (not released in chatgpt yet either)
also, there is no database
Isn't LLM a database ?
I thought LLM is like a database
it's kinda like your phone's keyboard suggestions, but much more advanced
I suggest reading about it, but anyway
Yes I need to read about it. I need such information in order to understand the logic behind.
nobody asked but I recently started to change my car oil every 5,000 miles, you ask why? because AI said so. And I hope it did not hallucinate those numbers.
of course not
5000 miles is usually a good number to change oil at assuming you drive regularly.
3000 for regular oil.
5000 for synthetic oil
Coffeezilla mentioned perplexity in his latest video. 🫡
Love how passwords cost more than a SSN
Criminals buy Social Security numbers (SSNs) because they are a key piece of personal information that can be used to commit various forms of identity theft and fraud. With an SSN, criminals can:
- Open New Credit Accounts: They can apply for credit cards, loans, and other financial products in the victim's name, leading to significant financi...
Just a props to @tranquil orchid in general - of all the tools and gpts, and claudes etc - I get the best results just prompting in Perplexity
Try Copilot for Telegram, your personal AI messaging companion. Text, create, translate, search, and more wherever you are with Copilot.
Big tech is starting to join telegram
It would be great, if Perplexity would allow the creation of collections within collections.
yeah because rabbit had partnership with them
Yep, he states that their only non chatgpt functionality is provided by perplexity
Hello chat im new
hey
When will the page feature back?
I wonder why Gemini isn't on perplexity
Or Gemini Flash, given that it's basically free
The Discover page is amazing, i hope it keeps improving
One suggestion is ability to create custom discovers (e.g. put topics / keywords) and the crawlers would maintain the feed for me
It is currently in closed beta 😉
Hi sir, a little background sir, before there was Gemini Pro on perplexity sir, then there was a feud between perplexity and google sir, perplexity removed gemini pro sir, people asked sir, perplexity replied that gemini is low performance sir and perplexity promised that they will incorporate gemini again sir (but just an excuse sir, like what they did to Opus sir) now you know sir. Dont believe everything sir.
lol they really just got away with limiting opus and saying that they will bring it back
businesses are going to business ig rip
@cursive jacinth sir you know you dont have to end every sentence with sir. just letting you know sir.
And how can someone become part of that closed beta? 😉
All seats are full
could try: https://www.perplexity.ai/page/new (I doubt it, but there's an outside chance it might work..)
but yeah presumably it'll be out of beta soon enough in any case (strange though that it was briefly available to all users a day or so ago)
I'd love to see sonar large go through the MMLU. I bet it'd do really well on all of the factual knowledge
"1st LLM to get 100% on the MMLU"
I mean there are some math problems that might be tricky for an LLM, so it probably wouldn't get 100%, but it'd be fun to see
Gday everyone. In case someone wants to take their sub cheaper (and help me in this case to play with claude's narrative capability in making stories🫠 ) feel free to use my referral
https://perplexity.ai/pro?referral_code=J9ID1YP6
ps is there any news on 50 opus limitation? 🥺
Wondering the same but my guess would be that Google is their number 1 competitor so they do not want to promote their stack... Am I close?
Hello I'm new here so not sure if i should post this here but, I m sturggling (with Pro) to compile a substantial literature review using perplexity, the sources it gives are 70% of the time of good academic standard and relate to my prompt and needs. However perplexity struggles to then access the sources I give it and to compile secondary data (qualitative but mostly quantitative) in either bullet point format or directly incorporating it in my literature review draft
Dear all, i need help please. I want to create images but i cant and the emails just tell me to do it but i need some sort of step by step for begginers or something
Guys i use perplexity since yesterday and it's soo cool!
no its still 50
i think it will only change if anthropic decreases their api cost
I like Space 🌌✨ (just saying)
Been using Perplexity ever since it came out. I am still running up against the same issue after I continue to test.
Seems that if I ask a question, Perplexity will give you a response, then it will have related questions, if I click on those they all work well and provide good additional information.
However if I ask my OWN question, the system seems to do an ok job but it often times repeats ALOT of what it originally produced in the first response. Anyone else witness the same thing? Again this occured for me from day one whether on free plans or pro
seconded
Hi, is the search free? Are there any interaction limitations, like 30 searches per day?
You won't reach any limits
Thank you!
Unless you are using Co-Pilot (PRO search) , there are no restrictions on how many basic searches you can do.
200/month is crazy
also you get gemini 1.5 pro free in the ai studio ui 😭
I feel like what all ai art/text/songs/voice/etc lack is human reasoning
They seem to be lacking in like consistency and logic
All of them feel like dementia patients
On steroids
i mean they can't "reason" - some are just better at predicting the next token than others
Yeah
We need a reasoning engine
Alongside llms
Hopefully whatever Yann is working on bears fruit
who?
bruh when do i get the new pro search
smh
they use https://www.geteppo.com/
hmm
idk
Hi all, I need help logging back into my pro account but it keeps signing me up for new free accounts on the same mail id. support mail id has not provided any support either 😦
check your emails for which accounts were pro
and whatever account it does log you into, check if there's a manage subscription button (maybe it got cancelled/finished?)
I appreciate where you're coming from, but you're making subjective statements with objective authority, which is a fallacious argument and contention. You're presupposition that less is better is predicated on opinionative conjecture, when, in reality, it could simply mean that OpenAI, much like Google for the first 2 decades, is purposefully losing money to garner a wider audience and customer base, before largely integrating and monetizing their system worldwide. Also, to further illustrate this point, OpenAI has partnerships with mega-billion dollar corporations, and the most advanced server architecture in history, meaning, they can aptly handle much more compute power, versus a smaller offshoot such as Anthropic.
Positive assertions require evidence based foundations.
How often does perplexity search get updated? One examples, last few weeks I’ve noticed, it still refers to Azure Entra ID as Azure Active Directory. I’m new to Ai, using the Claude 3 opus search with pro.
HelloI I encountered issue with perpelxity. Is there a way to sign-in with password? I opted out from marketing emails and now I am not getting any sign-in verification.
Someone help. :p
@agile jay @warm cave
look at this haiku response
guess how much tokens of context sent to model!
1k?
2.8k i think
and before i was sending 12k
well the main issue is
this running on my cpu took ages
🔗 [36.98s] Encoded 505 source chunks (13.66 chunks/sec)
🔍 [0.58s] Found 13 chunks from search queries - 2763 tokens```
using an already tiny model BAAI/bge-small-en-v1.5
let me try... on gpu
this mac doesnt have a useful gpu
my cpu is faster for nearly all ai tasks
so im porting it to a t4 on modal
Oh, is it an old Intel one?
yeah
You could try using an embedding API then.
Is there a trick to get the voice feature to work when the app isn't "active" on android?
like, I pressed the button (at least in my case) so I could do other things while listening to the answer. if I open another app or lock my phone it stops the audio and resets its position in its reading, super annoying
bro what the heck, i highlight the text and then the "ask follow up question" appears and it just starts writing without me even asking anything. is that a bug cause bard has that feature?
i have a question, if u send an image/other resources to pplx, does it append them to every subsequent query? I noticed that gpt4 can be confused by the sources sometimes. e.g. when u ask it a subsequent question irrelevant to a previously sent picture, it might still analyze the picture and include info about it in its response. Wouldn't this cost more money for the company 💀 sometimes i have to let it generate a response first, then manually remove the sources (so generate again as you don't have an option to do this beforehand)
@nearkala warned
Reason: Spam, please avoid sending single word messages
Duration: 7 days
that is the feature
the text is used as the question
it isnt a question about the text
the selected text is the question
AI Builders' Favorite Tools: A Hive Mind Survey Summary
I recently asked AI builders about their favorite go-to libraries and frameworks for AI projects. Here is a summary of the first ~63 replies, categorized.
Agent Frameworks:
• AgentOpsAI: This tool simplifies the
so i deployed embeddings to modal on an api
cold start is around ~18 seconds + time to embed (3 seconds for 52k tokens on a super small model)
and it costs (0.000164 + 0.00000667 + 0.0000533)*25 at most per request
that cold start is so high though...
What about Openai text embedding large?
It's $0.13/million tokens.
Or small, which is $0.02
@sleek vortex
@sleek vortex I have heard good things about mixedbread: https://www.mixedbread.ai/pricing
mixedbread.ai offers simple text embedding generation, designed to enhance the developing experience in your AI projects.
They have an API but also have Open weight models
I know, just that I wanted to try my own embeddings for fun
Hm
Looks pretty good, but the pricing is for 1K tokens, not 1M
So the actual cost is not much different to openai's large embedding model.
Yeah $0.13 vs $0.10 is not huge, but having the option to run it yourself is nice
Oh, is the model open-source?
Yeah
21seconds gained by picking chunks via embeddings, using a remote gpu
Open source or open weight idk one of them lol
Nice 👍
Are you storing the embeddings? Or generating them from scratch each time?
right now it's from scratch
loading the small embedding model on a T4 and embedding around 500 512 char chunks takes 15-22 seconds on a cold start
(3-6 if i leave the gpu hot, kinda "bills" me for that idle though)
There is something that I don't quite understand: every time Google is used to search, Google benefits. In the case of Perplexity I understand that it may be different. What benefit does perplexity have every time a user performs a search through your system?
also for some reason my web scraper has got slower
well thats why you have to pay for perplexity pro, you dont pay for google do you
Quality of output.
no benefit to the company
also for free users and pro users that dont turn it off, their data from searches is kept
might be used for model training or idk, who knows
Anyone else experiencing troubles with the voice output ? Like volume or language changes ... ?
But once this is monetized, the free part could upload images or documents like Copilot. that is very limited in tokens
How can you fight against Microsoft and Google if what you do they offer it for free not to mention GPT4 obnivus
if you dont have pro on
it doesnt use gpt4
it uses like the dirt cheap haiku or their own models
so theyre able to do that for free
Concurrency issue again, I assume.
Lots of places are switching to haiku as a budget model, perplexity, brave, I think duck duck go, prob more than that
Mostly because of the large context.
Yeah 200k is 3.5 16k still?
I assume that will change when meta officially releases their larger context versions.
yup
this is quite the interesting cache folder
Yeah, can’t for meta’s and their next drops
give me some new queries to try
Waffles around the world lol
What new policy did rishi sunak propose?
You asked for queries.
So I chose something that happened today/yesterday
In the free part it seems that the model is Claude Haiku. It makes sense all services are hosted on AWS Antrophic like Perplexity. It was to be expected 🙂
Ozempic prevents kidney disease 🦠
Ozempic prevents responsibility 🏃♂️
Yann LeCun on the limits of LLMs
I expect you have to stay on Ozempic to keep the gains...
Meta pitches Hollywood on AI
Idk, I will ask perplexity 🙃
So it's a lifetime subscription
What I miss, I just got back
Yes, you generally need to keep taking Ozempic to maintain its benefits. Here's a detailed breakdown of what happens when you stop taking Ozempic:
Weight regain is common after stopping Ozempic. Studies have shown that people tend to regain a significant portion of the weight they lost while on the medication. For instance, in one trial, partici...
ok well a bit of a scam that perplexity can check its own domain and i cant (cloudflare)
¿?
Red/Blue team testing, I assume.
They DoD approached Sam a while ago about buying GPT over a year ago. Sam turned them down.
The government can technically take it if they wanted to, it would just look really bad.
not cloudflare, but because it's a client side only react app
Or did some under the table deal, GPT-7lite—FBI-o
Well, yeah
C'est possible
After all, if AI is the next nuke, they can easily say it's a safety concern.
But Sam was pretty publicly adamant at the time that GPT never be used in military
And all AI companies have already said the same claim.
That's what the new rollback thing is for...
No longer does the FBI agent have to look through your webcam 24/7
They can just rewind and see what you were up to.
Copilot+ tablets record your screen history as a feature
One you cannot turn off btw
not bad at all
Yep, GPT-7lite—FBI-o is so advanced it can infer the future with 92% accuracy, welcome folks to minority report
Ask how many rocks we should eat for a healthy diet.
This is that whole Alexa recording thing all over again but with AI
Yeah, everyone gets all nervous and then gets used to it
"I know they're recording but I could order pizza with my voice without getting off the couch..
Guess I'll let them do it"
I do not have enough information in the provided context to recommend how many rocks someone should eat for a healthy diet. Consuming rocks is generally not considered a safe or healthy practice, as rocks do not contain the necessary nutrients for human health. Eating rocks can actually be harmful and lead to serious medical issues. The context provided indicates that some specific types of clay or soil may have some nutritional benefits, but it does not give specifics on recommended amounts or types of rocks to eat. Overall, I would not advise eating rocks as part of a healthy diet. Focusing on a balanced diet of whole, nutritious foods is a much safer and more effective approach to maintaining good health.
But this time after getting use to it they will start dating it
Wow, better than Google AI search...
its giving the hologram pleasure girl from blade runner
wait what LMAO
Already 100x better than google
People are already dating it and claiming their chats have sentience
encoding took 41 seconds tho.... very very slow
Hey! 😡she dose, don’t listen to him sky, I love you the way you are
I swear it's like they are talking to ||stripper|| who just tells them anything they want for money and acts like they are soulmates
But don't remember their face or name in public
Sky just got the “Memory” update, she truly cares 😇
Of course not, she would never 😧
Imagine if OpenAI adds a dating mode in the future with some crazy monthly cost
:(
I do not have any specific examples to provide about people dating AI and claiming their chats have sentience. This appears to be an emerging and complex topic that is still being explored.
Based on the context provided, it seems that while some people are forming emotional connections with AI-powered conversational agents like chatbots, the current consensus among experts is that these AI systems do not possess genuine consciousness or sentience. The interactions may feel human-like, but the AI is essentially mimicking human conversation rather than experiencing true self-awareness.
There are certainly ethical considerations and potential risks around people developing deep emotional attachments to AI companions. Experts caution that this could lead to a detachment from real human relationships. The design and use of these AI systems should prioritize user well-being and avoid manipulative practices.
Overall, this appears to be an emerging and complex phenomenon that is still being studied. I do not have definitive examples to provide, but the context indicates it is an area of active research and discussion in the field of AI and human-computer interaction. Let me know if you have any other questions!
i mean this is why places like character.ai have so many users
Is character AI a subscription?
i mean its like every ai website ever
a bit free, at cheapish price
too slow/overloaded = buy our sub for better roleplay!! faster speed!!
I thought it was place where people shared models, like Lora’s and such, I know about AI dating sites getting big, just did not really know hat character AI was specifically
character ai is like a consumer frontend for idk prompts that imitate certain characters speaking
like from tv shows and anime or whatever
and idk
i dont really use it
I guess I know where I’m spending my money now 🫠
Perplexity makes it sound so legit, lol: https://www.perplexity.ai/search/What-is-character-u9XxIg8oTkedAm94XcQe.w
Character AI is an advanced AI chatbot web application that leverages large language models (LLMs) and deep machine learning to generate human-like text responses and facilitate engaging conversations. Developed by former Google AI developers Noam Shazeer and Daniel De Freitas, Character AI was released in beta form in September 2022 and has sin...
lets ask whatever my thing is called
the folder is called kittyagent but thats just because my head is full of cats
🔍 [13.23s] Found 16 chunks from search queries - 7719 chars / 1446 tokens
Based on the information provided in the context, here's what I can tell you about Character AI:
Character AI is an AI-powered chatbot service that allows users to engage in conversations with unique, lifelike characters. Unlike traditional chatbots, Character AI employs advanced natural language processing models to generate contextually appropriate responses, giving the interactions a more human-like feel.
The key features of Character AI include:
-
Diverse Characters: Character AI offers a wide range of characters, from fictional beings to historical figures and even celebrities. Users can interact with these characters and explore their distinct personalities, backstories, and behaviors.
-
Creative Storytelling: Character AI can be used as a tool for creative writing and worldbuilding. Users can design their own characters and have them participate in interactive narratives, allowing for more engaging and dynamic storytelling.
-
Educational Value: The platform can serve as an educational resource, as users can learn new information and perspectives by conversing with the diverse characters.
-
Entertainment and Enjoyment: Many users, especially younger audiences, find Character AI entertaining and enjoyable, as they can have fun, imaginative conversations with the characters.
While I don't have specific details about the inner workings of Character AI or its development, the context provided suggests that it is an innovative and advanced AI chatbot service that has gained popularity among users of various ages and interests due to its unique approach to interactive character-based experiences.
Yeah, character ai is... It's the text equivalent of those late night commercials for you to call and talk to "single women"
For only $5 a minute
lmao
I am bored of c.ai
It's so bad
Like genuinely
The gemini flash model
Is better than c.ai
💯 LOL
Better at what? 🤔
idk what response is better
the pplx one or this one
theyre both like not gettting the main point
Just better at creating character and roleplaying
Hell even llama 3 is better
There are a lot of sites like character ai... One infamous one was called Sakura and their mods were sharing chat logs they found funny on their website in their private discord channels
Yeah, basically all models are trained to not talk about that subject, so not surprising
kittyagent % modal run --interactive -q kittyagent.mini_agent_plus --query "price comparison - claude 3 models, gpt 4 turbo, gpt4o, gemini 1.5 pro and flash - use unit million tokens and give it in a table"
lets try this
What does the interactive flag do?
thats all on the modal cli
-q = it just removes their weird spinner progress bar
-i = makes it so i can use input()
🔍 [1.26s] Turned query into 2 searches
🔎 [0.99s] Searched for price per million tokens claude 3 vs gpt 4 turbo vs gpt4o vs gemini 1.5 pro vs flash - got 14 links, 0 snippets
🔎 [1.65s] Searched for price comparison claude 3 models gpt 4 turbo gpt4o gemini 1.5 pro flash - got 18 links, 0 snippets
well thats rubbish...
might need a better prompt to gemma 7b
The hell is kittyagent?
.
Oh
its my little perplexity clone that ive been working on for fun
well less a clone and more something similiar, thats trying to be better??
And what use does a folder full of cats help with?
Someone really likes cats
The yarn mode was cool when it came out
I think it was like 100k context
Super large for open source
At the time
yeah
rope scaling 128k of 7b llama
i remember running it on llama.cpp and it took years to process the context because of my nonexistent gpu
No clue
I mean it’s just the folder name 😅
lol, yeah, back before good ok ollama
olllama is just a fancy wrapper really
when you run inference it does call literally the llama.cpp cli
because FFI in go is much worse to figure out than doing that apparently
Yeah I know, but you use to have to know how to do it, now it’s basically a one click install
And I had no idea what I was doing, lol. I messed up the setup so many times
is there a way to load a torch model into gpu really fast
i’m using torch.load but even that takes like 3-10 seconds (literally random) on a what like 130M parameter model
seems kinda too slow really
Yeah, it is slow
do you think people would prefer an ai agent that has a ton of detail over speed
realistically what if i tried to turn this into a kind of agent structure
instead of gemma7b, the first model is actually haiku with your query
which then splits it into the required tasks and tools
then that’s allocated to like a few web searches or maybe other function calls each
then they share the same embedding gpu instance
then it comes out into haiku again
Kinda like the new copilot but i could see it integrating other things too
weather api
spotify api
whatever api
it would be like a more real version of maybe like what the rabbit promises to do lol
well something like that with speed would be even better, not sure how i could make this whole process much faster though
It depends on the target audience; some want to ask a stupid question and random Google searches for a random user? Then maybe not, but I personally would like it; if Open Interpreter was more reliable/cheaper, then I would use it more often
That’s what i’m thinking too - it would depend on the user
maybe the user themselves can pick
the thing i thought of earlier
like levels to it
“quick search”, “deep dive”
I don’t think trying to make an auto mode would suit everyone really
Yeah, that would be a good choice
It would be pretty cool if i could ask the thing to be like
find out a list of the 50 most popular artists by spotify album sales and also by physical album sales and then queue me a few of those songs on spotify
well that’s the kind of query i can imagine off my head
the real uses would come along when you actually need it i guess
perplexity’s “plan a trip to japan”default query on the mobile app could be taken way way deeper
imagine if you gave it a month range and it got the best week in that month based on the weather, then looked at flight prices, all while also looking at the locations to go to
Yeah, that would be cool, the types of searches I tried on OI were ones that the next step needed informed from the first step, like find out Th top companys in VGT, then find out the names of the founders and then find out which are still alive and then tell me about those people
But having further uses would really make it good
Hmm
Yeah, there is a lot you could do
Fine tuning would kind of be a big part of that
I don’t think just a prompt could do that
either you’d need an expensive LLM like 4o or opus
Or maybe you could fine tune like a 7b?
Yeah, maybe a fine tune of llama 3 8b, or phi 3
There’s a lot you could do with just something like these apis:
weather, spotify, flights, document upload, (cloud) code interpreter
you could add in email sending
and then it would kind of just rely on a really intelligent first analysis of the query
after which you can have mini search agents and whatnot
including like which steps depend on each other
I know system prompts get long and the model forgets stuff, maybe a directory, where the modes knows the category to select, and the system prompts with descriptions on how to use each would populate Into the context, idk maybe stupid lol 🤷♂️
But with small models o feel I can only teach it one or two things
What isn’t working is giving gpt4o or opus all the functions and just praying it works
i don’t think that will ever be the way to go
that’s probably what like that ai tool devin does really
which is why it’s so slow?
22 minutes to do one coding task?
Yeah, probably
or whatever it was
Kinda funny, back when Autogen first came out I could not figure out how to use Pplx a pi as an agent, so I wrote a skill for the agent to talk to it
To fulfill the given query, we can break it down into the following steps:
-
Retrieve the list of the 50 most popular artists by Spotify album sales:
- This step involves searching the web and reading websites to find the data on the 50 most popular artists by Spotify album sales.
- This step is independent and can be done first.
-
Retrieve the list of the 50 most popular artists by physical album sales:
- This step also involves searching the web and reading websites to find the data on the 50 most popular artists by physical album sales.
- This step is independent and can be done in parallel with the previous step.
-
Queue a few songs from the identified popular artists on Spotify:
- This step requires the lists of popular artists from the previous two steps.
- For each artist, this step involves playing a song given a specific Spotify track ID or the name and artist.
- This step is dependent on the completion of the previous two steps.
The steps can be summarized as follows:
- Retrieve the list of the 50 most popular artists by Spotify album sales.
- Retrieve the list of the 50 most popular artists by physical album sales.
- Queue a few songs from the identified popular artists on Spotify.
The steps 1 and 2 are independent and can be executed in parallel, while step 3 is dependent on the completion of steps 1 and 2.
The available workers can be utilized as follows:
- The "Searching the web and reading websites" worker can be used to complete steps 1 and 2.
- The "Playing a song given a specific Spotify track ID, or the name and artist" worker can be used to complete step 3.
- The "Getting the weather at one specific location" worker is not directly relevant to the given query and cannot be utilized in this case.
query was
Given the users query and available workers, break it down into steps and also mention which steps are dependent on each other
Query:
find out a list of the 50 most popular artists by spotify album sales and also by physical album sales and then queue me a few of those songs on spotify
Available workers:
Searching the web and reading websites
Playing a song given a specific spotify track ID, or the name and artist
Getting the weather at one specific location
hmm
Maybe it is possible ?
Lol, everything else was food tho
This is haiku
food?
or good
{
"steps": [
{
"id": 1,
"name": "Retrieve Spotify album sales",
"description": "Retrieve the list of the 50 most popular artists by Spotify album sales",
"dependencies": [],
"workers": [
"Searching the web and reading websites"
]
},
{
"id": 2,
"name": "Retrieve physical album sales",
"description": "Retrieve the list of the 50 most popular artists by physical album sales",
"dependencies": [],
"workers": [
"Searching the web and reading websites"
]
},
{
"id": 3,
"name": "Queue songs on Spotify",
"description": "Queue a few songs from the identified popular artists on Spotify",
"dependencies": [
1,
2
],
"workers": [
"Playing a song given a specific Spotify track ID, or the name and artist"
]
}
]
}
wel this is just me asking on my phone
seems interesting though
Yeah, there is promise
let’s try the flights one
For specific tasks, finetunes or multi shot reasoning is the way.
Yeah but multishot is good until you have many ability, how would you manage it after a certain point?
You just fill the context with multiple examples of how to do the task.
More like how do you manage when you have 100 abilities each with 3 examples.
Bc I think we are talking about a general purpose agent
Given the users query and available workers, break it down into steps and also mention which steps are dependent on each other
Query:
plan me a trip to japan somewhere in the month of may 2024
Available workers:
*Searching the web for something, and reading websites and returning relevant information to some keywords
*Playing a song given a specific spotify track ID, or the name and artist
*Getting the weather at one specific location
*Airport code search (enter a rough name and it gives the airport code)
- Retrieve the price for a flight, given the take off and landing destinations and the day
Let’s try this
this list of abilities is getting longer as we go lol
which is expected
Fine-tuning
why can’t you upload as file on mobile
well this is in natural language as again i’m on my phone
we would have to make it more rigid as to specify it all in actual json
maybe give the list of search queries
and what inputs go to what next agents
Yep, or a schema of the tools etc
@sleek vortex did you start that lil project?
Yep, I think it's been around a week since they started.
started what?
I think it's like perplexity, where it uses sources to answer queries.
interesting
ive seen alot of similar projects recently
its hard to find that new value added
Yep, probably speed, accuracy and number of sources are the main vectors of progress available.
hh ┬─┬ノ( º _ ºノ)
I'm curious, I clicked share link as I was planning to share with myself via email a thread I was about a week deep in. There's an eye icon (web version) right next to my name on the top left of the interface, with the number "2" next to it seemingly indicating two people have viewed it. Maybe I'm wrong? Either way, I never shared the link, nor opened it so I'm slightly concerned if someone else has viewed it.. and also that there doesn't seem to be a way to change it back to private other than to put it in a "collection". Can anyone shed some light on this?
you can coerce json-only outputs with prompting, but increasingly models have json mode as a built-in option - it's definitely superior / preferable (like one malformed JSON can break the whole flow..)
the newer openai, anthropic, google and mistral models all have json mode i believe. But yeah aside from Mistral, none are OS (not sure about llama3 - I don't think it supports json)
once you press share link, i think it becomes possible for a search engine's crawler to pick it up (especially if the link is shared in a tweet or something like that, but conceivably that is not necessarily).. https://search.yahoo.com/search?p=site%3Aperplexity.ai%2Fsearch%2F
Just joined pro! Love this application has been using the free version almost daily for almost two months
Interesting, I would have thought you'd have to actually share the link in order for it to be searchable. I feel like there could be some serious privacy concerns if simply sharing, especially if it's just to yourself to antoher device of yours, that its now searchable to the public
yeah agree, but would also note that 2,730, I assume, is only a very small fraction of all the threads ever made shareable by perplexity users - a lot of the top results are from threads from the perplexity team, posted on twitter. Though not all are. I'm not sure why a handful are getting indexed and thus effectively searchable without having the link
list top 10 glass bottle supplier in Europe
Good point about the small number, but there should be a 0% chance of shared links, especially ones to yourself, containing private information, etc should be crawled and able to be publicly searched at all. That seems like a pretty big oversight to be honest not to mention legal violations. That should probably be confirmed and looked into a little deeper by the legal team if its true crawled pages that have been linked are searchable to the public, that's begging for a lawsuit tbh
As a pro subscriber with "priority help request services", I sent an email asking a much simpler question and never received a response so this was the straw that broke the camel's back for me. Having linked threads be searchable (when asking Perplexity itself it denies this very broadly by saying
"No, Perplexity's threads are private by default and not crawled or indexed by search engines like Google. According to Perplexity's FAQ: "Your Threads are private and only for your reference by default. To make a thread public, click on the top right corner on the share button, and select sharable. Anyone with a link will be able to view your thread." So unless you explicitly share a thread and make it public, your conversations with Perplexity are kept private and not searchable on the open web. Even public threads are only accessible through the shared link, not through general web searches."
But @warm cave your picture proves it so I tried it myself and indeed, there are linked threads from users (can't tell where they're linked to) that are searchable. Canceled the sub after that. No thank you.
Hey @supple pecan!
If you find the original message helpful, please consider reacting to it with the :star: emoji. If the post is appreciated by the community and receives 5 stars, it will go to the https://discord.com/channels/1047197230748151888/1082806833938436228 channel and the post author will get the <@&1082034222778302614> role on Perplexity.
Did the chatbot became restrictive
Now I am getting that it is not able to find from the provided search results.
I didn't provide any links in query. I feel like some changge happened
I used Claude Opus. Always I get best result in Opus. Not sure from today what happened
I did some absolutely non statistically relevant tests on perplexity, and I found that free chatgpt has become a bit better?
I was using perplexity with pro active
it's a new model, called GPT-4o
that's why chatgpt feels better
it's basically a GPT4 level model but free (10 messages per 3-4 hours)
Oh, thank you
ive never run out of gpt4o usage on chatgpt tbh
it feels like 50 msg per hour or so
80 per 3 hours
are you using chatgpt plus ?
yea
it feels more tbh
that's why ig
free version is pretty limited
the limit for plus is "five times higher than free"
the free ver is 10msgs per 4h?
i didn’t start the pplx labs sidebar thing
but i’m invested in this little perplexity clone where i’m trying to adopt agent structure, instead lol
im doing some progress on it, but idk if im doing it right tbh, how would you "save convo" and then continue from where you left?
like im doing this custom prompt but then it depends on the model context size
oh
right now i’ve got responses as good as perplexity copilot v1 in like 30-40 seconds response time using free gpu for embeddings and free claude haiku credits
no way lol
we’d have to somehow reverse engineer the state right??
because we want to load the convo into the UI
it’s only so slow because of cold start gpu embeddings 😦
im just using indexedDB to store the convo + inject prompt ( User: Message1; AI: Message2 ) and continue from that
but whatever
yea 30sec is a bit...
yeah but idk it would be nice if we could show the messages in the actual ui
if i leave the gpu on then it goes down to like 20 seconds
but this is with only one pipeline
i’m yet to add agent like splitting
which might make it longer but idk what i think will happen is it will barely get longer but quality should improve a ton
im not showing it
gimme a sec
this is how it looks like so far
ive hidden the original textarea
its quite similar
like is it the same as perplexity or it has some new features?
.
also been working on getting code interpreter working so
similisr to chatgpt but better in that there should be internet access
and maybe more languages other than python
oh this is actually a good idea
if lots of people like it i might make it a real website lol
i could ask modal for some credits as a startup
they do like “upto 25k$ in credits just ask us!!”
which is promising?
literally right now been running off the free $30 credits so
under what conditions?
a promising project
startups or educational institutions
oh i see
yeah id assume
and even then i have it optimised enough to use like $0.0001 per query on embeddings
that’s embedding like 300-500 512 character chunks so
not bad
i could improve performance with a bigger model and or bigger gpu but at speed and cost expense
models using groq and haiku
i could go as far as to run the llm on modal itself if i need to
like if i fine tuned a 8b as my decision model/ worker model instead of using groq
yes
really can’t do much with 10 messages ngl
yea its good
if you are already getting outputs similar to perp with only that
then its a good start tbh
i beat googles ai search in misinformation tests so um
on the other hand, it's better than not having access at all (as was the case for a long time)
lmao
google getting another hit
mine said this to the eating rocks query
google well…
not bad
oh and i’m only now feeding like 3k of context to haiku at the end
google's ai overview is half baked
which is extremely cheap
but it’s between sending the whole sources 10-60k of tokens (response time in 12-15s) to just sending a refined 3k(30-40s cold gpu)
i think refined is still the way to go
quality of response is the same or even better
when sending the huge amount of tokens i think haiku sometimes got confused?
idk they’re building their own indexer infra right
btw, have you considered DeepseekV2?
i could try do that on an absolute budget
i thought about that
how i could index the most popular sites and embed them beforehand
you mean, filter out the sites to the reliable ones ?