#๐Ÿ’ฌโ”‚general

1 messages ยท Page 57 of 1

sleek vortex
#

before it got like none of them right

#

this is a good test query to test lots of parts of the system

#

ive changed my searxng instance to have way better and updated source rankings

#

now i need to see why its reporting the wrong info for 1.5 pro

#
๐Ÿ”— [SearchAgent] [0.00s] Picked 6 links total
๐Ÿš€ [SearchAgent] [1.98s] Finished pulling 3 sources - 8874 max chars/source
    * ๐Ÿ”— https://ai.google.dev/pricing - 3699 chars
    * ๐Ÿ”— https://www.cnet.com/tech/services-and-software/google-gemini-pricing-1-5-pro-and-1-5-flash-compared/ - 2526 chars
    * ๐Ÿ”— https://beebom.com/how-use-gemini-1-5-flash/ - 7446 chars```
#

hmm

#

this has the info, just not organised well?

devout geyser
#

you have got me curious, so I'm trying a similar query with my own project just to see how it handles it. I don't expect it to work as I never tested this aspect ๐Ÿ˜›

#

well mine missed some models unfortunately (such as GPT-4o). it did a lot of search queries though, what's in the table is somewhat correct, a few anomalies from what I can see

I'll try your exact query instead and see what that results in

#

if you're wondering what my current result is with my personal project ๐Ÿ˜‚, as you can see, it's got a few problems in the result

sleek vortex
#

im getting really annoyed at this single query

devout geyser
#

just that alone?

sleek vortex
#

since my one can never seem to get it perplexity-like correct

#

yeah just that

#
* gemini 1.5 pro price in million tokens
* gemini 1.5 pro price per mtok```
devout geyser
#

trying both variants now

sleek vortex
#

when i did mtok

#

it thought mtok was a crypto

#

....

#

and when i did the first one half the time it doesnt take in the information from the main site

devout geyser
#

๐Ÿคฃ

sleek vortex
#

ai.google.dev/pricing

#

half the time it gives me vertex ai pricing

#

this is so infuriating

devout geyser
#

should know what mtok says in a moment

#

for mtok it said:

The pricing for Gemini 1.5 Pro is as follows:

- For input prompts up to 128K tokens, the cost is $3.50 per 1 million tokens.
- For input prompts longer than 128K tokens, the cost is $7.00 per 1 million tokens.
- For output, the cost is $10.50 per 1 million tokens for prompts up to 128K tokens and $21.00 per 1 million tokens for prompts longer than 128K tokens.
sleek vortex
#

asked it for the comparison and i got this

#

come on omfg

#

why has it given so many variants

devout geyser
#

lol

sleek vortex
#

trying once more...

devout geyser
#

trying the million one now

sleek vortex
#

so close yet SO FAR

#

why is it only this stupid gemini one thats only right like 10% of the time

devout geyser
#

well million said this:

The pricing for Gemini 1.5 Pro in terms of million tokens is as follows:

- **Input Tokens:**
  - $3.50 per million tokens for prompts up to 128K tokens.
  - $7.00 per million tokens for prompts longer than 128K tokens.

- **Output Tokens:**
  - $10.50 per million tokens for prompts up to 128K tokens.
  - $21.00 per million tokens for prompts longer than 128K tokens.

This means the cost will vary depending on the length and complexity of the input and output text.
sleek vortex
#

yeah spot on

#

but again im budget limited on context

#

which might be the main issue

#

see the issue is it also doesnt know when it's wrong

#

so my feedback loop idea from before didnt really work...

devout geyser
#

yeah probably, I'm using Gemini 1.5 Flash to extract relevant content from the webpages scraped, to reduce wasted input context on GPT-4o. This however does slow things down a bit, although I get an accurate answer at the end of the day so it's worth waiting about half a minute ๐Ÿ™‚

sleek vortex
#

right now using llama3-8b-8192 for the per-intent source handling

devout geyser
#

I see

sleek vortex
#

which works like

#

80% of the time

#

but some queries like this

#

it decides to not

devout geyser
#

yeah I had a similar problem with some models, where it would either make something up or mix up details on the webpage

sleek vortex
#

its mainly because it keeps thinking this vertex page is the main source

#
    * ๐Ÿ”— https://ai.google.dev/pricing - 3699 chars
    * ๐Ÿ”— https://cloud.google.com/vertex-ai/generative-ai/pricing - 11353 chars
    * ๐Ÿ”— https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ - 11572 chars```
devout geyser
#

I see

sleek vortex
#

Here is a detailed summary of the pricing information:

**Gemini Models Pricing**

* Model          | Price (input) | Price (output) | Notes
* Gemini 1.5 Flash | $0.0001315 / image | $0.000125 / 1k characters | 
* Gemini 1.5 Pro    | $0.001315 / image | $0.00125 / 1k characters |
* Gemini 1.0 Pro   | $0.0025 / image | $0.000375 / 1k characters | 

**Context Caching for Gemini**

* Model          | Price (input) | Price (output) | 
* Gemini 1.5 Pro | $0.0006575 / image | $0.000625 / 1k characters | 

**Imagen Pricing**

* Model          | Price (input) | Price (output) | 
* Imagen         | $0.020 per image | $0.003 per image | 

**Multimodal Embeddings Pricing**

* Model          | Price (input) | Price (output) | 
* Multimodal Embeddings | $0.0002 / 1k characters | No charge for output | 

**PaLM 2 for Text Pricing**

* Model          | Price (input) | Price (output) | 
* PaLM 2 for Text  | $0.00025 per 1,000 characters | $0.0005 per 1,000 characters | 

**Partner Models Pricing**

* Model          | Pricing | 
* Claude 3 Opus   | Input: $15 / million tokens Output: $75 / million tokens |
* Claude 3 Sonnet | Input: $3 / million tokens Output: $15 / million tokens |
* Claude 3 Haiku | Input: $0.25 / million tokens Output: $1.25 / million tokens |

**Gemini 1.5 Pricing**

* Model          | Price (input) | Price (output) | 
* Gemini 1.5 Pro | varies depending on context window size (starts at $0.0006575 / image) | varies depending on context window size |

It's important to note that the prices listed are subject to change and may vary depending on the specific use case and context window size. Additionally, there may be additional costs associated with using these models, such as latency and computational requirements.

It's also important to note that the prices listed are in USD and that prices may vary depending on the user's location. 
#

so thats what this "worker" kinda outputted

#

which might be right but its interms of images??? and idfk what

devout geyser
#

๐Ÿ˜‚

sleek vortex
#

well just not what im looking for

devout geyser
#

yeah understandable

sleek vortex
#

now theres only so many options i have here

#

i could use haiku instead of 8b

#

but that would increase the price per query quite a bit

#

lemme run that calculation

#

right now average

(0.59/1000000)*1500 + (0.05/1000000)*350 # groq/llama-3-70b router 
+ ((0.05/1000000)*6656 + (0.05/1000000)*768))*4 # ~4x groq/llama-3-8b search summarisers
+ (0.25/1000000)*1750 + (1.25/1000000)*400 # claude-3-haiku final
#

and with haiku

(0.59/1000000)*1500 + (0.05/1000000)*350 # groq/llama-3-70b router 
+ ((0.25/1000000)*6656 + (1.25/1000000)*768))*4 # ~4x claude-3-haiku search summarisers
+ (0.25/1000000)*1750 + (1.25/1000000)*400 # claude-3-haiku final
#

$0.012336 per query

#

which is more than double

#

๐Ÿ˜ฆ

devout geyser
#

ah so I see. I'd like to drop gemini for something else primarily because of the moderation sometimes falsely flagging, but I've yet to find a viable alternative. I'll have to try tweaking the prompt I use for extracting the relevant content with another model and see if I can get it to eventually give me similar results on some specific tests

#

ideally the model needs to be fast

sleek vortex
#

i mean i could use gemini flash but again im trying to minmax costs

#

so that i could maybe turn this into a real service lol

devout geyser
#

yeah of course, that's understandable

sleek vortex
#

well the question here might be

#

whats actually at fault

#

whats wrong in searchworker?

  • websearch rankings
  • the scraping output
  • the 8b model
devout geyser
#

a mystery to solve

sleek vortex
#
    * ๐Ÿ”— https://ai.google.dev/pricing - 3699 chars
    * ๐Ÿ”— https://cloud.google.com/vertex-ai/generative-ai/pricing - 11353 chars
    * ๐Ÿ”— https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ - 11572 chars
โฐ [SearchAgent] [2.60s] Model response time (llama8b)```
#

so its shipped the context of the main source which is ai.google.dev/pricing

#

but the thing is my extraction for that page also isnt the best due to google's rubbish layout

devout geyser
#

yeah that won't help for sure

sleek vortex
#

can you show me what sources yours used

#

to answer it

devout geyser
#

sure let me see

sleek vortex
#

what im trying to do is like a needle in my head

#

few accurate sources per element of the query

#

what you and pplx are doing is more like search and verification by the masses?

devout geyser
#

at least on the first one

sleek vortex
#

hmm

#

do you have a debugger to see how it was extracted?

#

curious to see the differences in our impl

devout geyser
#

yeah I do have such output, let me see what it did for that specific page

sleek vortex
#

idk there really isnt any other good models

#

deepseek v2???

devout geyser
#
<webpage url="https://ai.google.dev/pricing" search_query="gemini 1.5 pro price in tokens" publication_date="">
<result>
The Gemini 1.5 Pro model is priced at $3.50 per million tokens for input prompts up to 128K tokens and $7.00 per million tokens for input prompts longer than 128K tokens. For output, the price is $10.50 per million tokens for prompts up to 128K tokens and $21.00 per million tokens for prompts longer than 128K tokens.
</result>
</webpage>

<webpage url="https://ai.google.dev/pricing" search_query="gemini 1.5 pro price in millions" publication_date="">
<result>
Gemini 1.5 Pro is currently in preview and is free of charge. Starting May 30, 2024, it will be priced on a pay-as-you-go basis. The pricing for Gemini 1.5 Pro is as follows:

* **Input:** $3.50 / 1 million tokens (for prompts up to 128K tokens) and $7.00 / 1 million tokens (for prompts longer than 128K tokens).
* **Output:** $10.50 / 1 million tokens (for prompts up to 128K tokens) and $21.00 / 1 million tokens (for prompts longer than 128K tokens).
* **Context Caching:** $1.75 / 1 million tokens (for prompts up to 128K tokens) and $3.50 / 1 million tokens (for prompts longer than 128K tokens).
* **Storage:** $4.50 / 1 million tokens per hour.

The pricing is in USD.
</result>
</webpage>

There are a few more but I think two of them are adequate examples. What it extracts as 'relevant content' depends solely on the search query used basically.

sleek vortex
#

are you using scraping like me or a headless web browser?

#

like do you know what text content came extracted out of that page

devout geyser
#

requests in Python, so not a headless browser

sleek vortex
#

thats what im curious about

devout geyser
# sleek vortex ?

to answer that question, I don't explicitly output that in the console but I can certainly add that output

sleek vortex
sleek vortex
#

just would like to see :d

devout geyser
#

sure, one moment

#

I think I'll save the output to file instead, as it will be messy to find it in the console

sleek vortex
#

avg ~$0.00642752 per query with deepseek-v2-32k (in theory)
vs
avg ~$0.0034784 per query with llama3-8b-8192 (in theory)

#

interesting

cinder comet
#

Is it available yet through api?

#

It seems slow on lmsys

sleek vortex
#

on a waitlist...

devout geyser
sleek vortex
#

ok well in the end nobody is beating the price of groq

tame current
sleek vortex
#

Hmm, your parsing is very similisr to mine

devout geyser
#

That's what I send to Gemini 1.5 Flash to extract relevant content from that

sleek vortex
#

so it must be down to a case of the model just being confused due to the extra sources and not being able to pickup on the first source

#

man

#

idek how to fix this without changing model

#

changing model isnโ€™t really an option iโ€™ll be entirely honest

devout geyser
#

possibly a combination of a good prompt and right temperature/top_p, but if the model just isn't good enough to do it then rip ๐Ÿ˜ฆ

sleek vortex
#

WAIT youโ€™re right i havenโ€™t set temperature

#

WAAAIT

devout geyser
#

so far the only model I've had actual success with is flash, haiku worked but it wasn't as detailed with the docker compose release notes

sleek vortex
#

youโ€™re onto something

#

did i forget temperature setting

#

am checking../

devout geyser
#

I found temperature alone didn't work for me, I had to also adjust top_p to stop Flash randomly getting the docker compose release notes "test" wrong every so often

sleek vortex
#

Iโ€™ve never really played with top_p

#

yeahโ€ฆ i donโ€™t think iโ€™ve set a temperature

#

whatโ€™s the default

#

1.0???

devout geyser
#

nope, mine is 0.3 actually

sleek vortex
#

but whatโ€™s the default temperature for groq api

devout geyser
#

not sure on that one

agile jay
devout geyser
#

I tried asking about that in my personal project, it didn't find the answer

sleek vortex
#

whatโ€™s your top_p set to?

sleek vortex
devout geyser
#

0.9

sleek vortex
#

nowhere on the groq docs

agile jay
agile jay
#

1 I would imagine

devout geyser
#

yeah I'd probably say it's 1 by default, most APIs usually go with 1 by default

sleek vortex
# agile jay

i know what this means due to that 3blue1brown video

#

lol

#

have you seen it?

#

the ones on transformers

agile jay
#

Yep, good for removing unlikely tokens.

sleek vortex
#

oh my god

#

it worked???

agile jay
#

Looks like the default is 1, so all probabilities are considered.

agile jay
sleek vortex
#

just the temperature

sleek vortex
agile jay
#

Lol, to make it more deterministic?

sleek vortex
#

oh my god youre telling me i had such good results already with 1 temp

#

?????

devout geyser
#

setting the right parameters helps

sleek vortex
#

silliest oversight from me but

agile jay
#

Lowering it should reduce hallucinations.

sleek vortex
#

yeah

#

worked well

agile jay
#

And lowering the Top P should help too.

sleek vortex
#

yeah

#
llama70b = Groq(model="llama3-70b-8192", temperature=0.6, top_p=0.9)
llama8b = Groq(model="llama3-8b-8192", temperature=0.2, top_p=0.9)

claude = Claude(model="claude-3-haiku-20240307",
                api_key="",
                temperature=0.3, top_p=0.9
                )```
#

ive gone with this configuration

agile jay
#

It's probably set like that since the default is best for general tasks.

sleek vortex
#

70b is the initial model, 8b is the source summarisers, and claude is the final response

devout geyser
#

llama70b is for what? generating the search queries?

sleek vortex
#

(api key redacted)

agile jay
#

Such as writing etc, where randomness is seen as a feature at times.

sleek vortex
agile jay
devout geyser
#

ah ok I see

#

sorry, nearly 1am here so I'm half sleepy ๐Ÿ˜›

agile jay
sleek vortex
#

yeah

#
[
  {
    "intent": "Get the prices of the AI models",
    "steps": [
      {
        "tool": "search",
        "query": "Claude 3 models pricing"
      },
      {
        "tool": "search",
        "query": "GPT-4 Turbo pricing"
      },
      {
        "tool": "search",
        "query": "GPT4O pricing"
      },
      {
        "tool": "search",
        "query": "Gemini 1.5 Pro+Flash pricing"
      }
    ]
  },
  {
    "intent": "Convert prices to million tokens",
    "steps": [
      {
        "tool": "calculator",
        "query": "convert prices to million tokens"
      }
    ]
  },
  {
    "intent": "Create a price comparison table",
    "steps": [
      {
        "tool": "write_answer",
        "query": "create table with prices in million tokens"
      }
    ]
  }
]```
agile jay
#

Intent is what I use, since it's the models guess of what you wanted.

sleek vortex
#

maybe i ought to lower temp more

#

right now it ignored calculator since i havent impl'd it

agile jay
#

Yep

#

0.1 is probably good.

#

You want it to use sources, and not make stuff up.

sleek vortex
#

yeah but then im thinking

devout geyser
#

yeah play around with the temperature and see what results you get basically

sleek vortex
#

will it just go really bad on the queires like the japan one

agile jay
sleek vortex
#

so when i did it before and got a crazy good response, it was on temp 1

#

so idk

#

it did like 7 searches on different aspects

#

it was really good lol

devout geyser
#

for curiosity I'll try that japan query with mine ๐Ÿ˜›

sleek vortex
#

plan me a trip to japan somewhere in the month of may 2024 from london

devout geyser
#

thanks

sleek vortex
#

raycast clipboard is godly

agile jay
#

Also, you can change the temperature for the different steps, if needed.

sleek vortex
#

yeah

#

i did

#

oh like

#

search vs calc

#

yeah fair enough

agile jay
#

Yep

sleek vortex
#

for now ive hidden the nonexistent tools that we mocked up though

#

calulcator doesnt actually work rn, but the thing is i have it in the shot prompts so to avoid confusion/keep consistency its there

agile jay
# sleek vortex

Yep, my infrastructure is an intent based model, and I also think intent just sounds cool too.

agile jay
sleek vortex
#

intent makes me think of like

#

google assistant

sleek vortex
agile jay
#

For the calc

sleek vortex
#

searxng has wolfram

agile jay
#

Nice

sleek vortex
#

i can use that

#

well ive hosted my own searxng

agile jay
#

searxng is pretty nice.

sleek vortex
#

since the public ones have mostly outdated/cached responses

#

and arent as reliable

agile jay
#

Yep, also better for latency long term.

#

I am hooking up tools using rpc, so they are pretty seamless.

#

And not coupled.

devout geyser
#

ok I have no idea how much of this is correct but this is what the japan query said for me on my personal project. Mine isn't as comprehensive in that it won't search for plane ticket availability or such

sleek vortex
#

ooh but it provided budget

#

whats your end model?

#

gpt4o?

devout geyser
#

gpt-4o yes

sleek vortex
#

fair enough then

#

haiku aint telling me about a simcard any time soon lmao

devout geyser
#

๐Ÿ˜‚

sleek vortex
#

gemini 1.5 pro price in million tokens

#

back to this query

#

sometimes it works

#

sometimes it doesnt

devout geyser
#

for the sake of curiosity, I could try a different end model if you want

sleek vortex
#

im using haiku

#

mainly since i do so much testing id go broke in gpt4 credit

agile jay
#

Yep, and the speed too...

devout geyser
#

yeah understandable, I'll try Haiku with the same query on my setup

sleek vortex
#
๐Ÿ”Ž [0.00s] Searched for "Gemini 1.5 Pro price"@SearXNG - got 5 links, 0 snippets
๐Ÿ”— [SearchAgent] [0.00s] Picked 4 links total
๐Ÿš€ [SearchAgent] [7.63s] Finished pulling 3 sources
    * ๐Ÿ”— https://ai.google.dev/pricing - 3699 chars
    * ๐Ÿ”— https://cloud.google.com/vertex-ai/generative-ai/pricing - 11353 chars
    * ๐Ÿ”— https://artificialanalysis.ai/models/gemini-1-5-pro - 11572 chars
โฐ [SearchAgent] [2.41s] Model response time (llama8b)
Based on the provided context, the Gemini 1.5 Pro model is a multimodal model that can be used for various tasks such as text generation, image generation, and multimodal fusion. The pricing for the Gemini 1.5 Pro model is as follows:

* Input token price: $0.001315 per image, $0.001315 per second, $0.00125 per 1k characters
* Output token price: $0.00375 per 1k characters
* Context caching: 0.0006575 per image, 0.0006575 per second, 0.000625 per 1k characters
* Context cache storage: 0.0011835 per image per hour, 0.0011835 per second per hour, 0.001125 per 1k characters per hour

The pricing for the Gemini 1.5 Pro model is based on the number of input and output tokens, as well as the context caching and storage. The prices are listed in US Dollars (USD) and are subject to change.

It's worth noting that the pricing for the Gemini 1.5 Pro model is different from the pricing for other models, such as the Gemini 1.5 Flash model, which has a different pricing structure. ```
#

really...

#

come on

#

๐Ÿ˜

#

back to square one ๐Ÿ˜ญ

devout geyser
#

ah crud, I can't actually do that easily as I forgot I'm using OpenAI directly, not using OR for the final model. Well I'll try it tomorrow when I'm more awake ๐Ÿ˜›

sleek vortex
#

fair enough

#

lol yeah 1am so

#

well idk

#

even pplx pro does the crypto thing

agile jay
#

...

#

Confused I guess

sleek vortex
#

yeah but mine takes it too seriously

#

how do i stop it from thinking million tokens is a crypto

#

ok for the sake of testing

#

let me try sonnet's output

devout geyser
#

it wasn't as difficult as I thought it would be to change it to haiku, here's what Haiku said, which isn't as detailed:

Here is a suggested 10-day Japan itinerary for a trip in May 2024 from London:

Day 1: Arrive in Tokyo
- Check into your hotel
- Explore the Asakusa district, including the Sensoji Temple and Nakamise shopping street
- Visit the Imperial Palace East Gardens

Day 2: Tokyo
- Visit the Meiji Shrine and Yoyogi Park
- Explore the Shibuya Crossing and Harajuku district
- Attend the Sanja Matsuri festival in Asakusa (mid-May)

Day 3: Tokyo to Kyoto
- Take the Shinkansen bullet train to Kyoto (approx. 2.5 hours)
- Visit the Kinkakuji (Golden Pavilion) and Arashiyama Bamboo Grove
- Explore the Gion district and watch a traditional geisha performance

Day 4: Kyoto
- Visit the Kiyomizudera Temple and Fushimi Inari Shrine
- Explore the Nijo Castle and Nishiki Market

Day 5: Kyoto to Nara
- Take a day trip to Nara
- See the friendly deer in Nara Park
- Visit the Todaiji Temple and Kasuga Taisha Shrine

Day 6: Kyoto to Hiroshima
- Take the Shinkansen to Hiroshima (approx. 2 hours)
- Visit the Hiroshima Peace Memorial Park and Museum
- See the iconic Itsukushima Shrine on Miyajima Island

Day 7: Hiroshima to Osaka
- Travel to Osaka (approx. 1.5 hours)
- Explore the Dotonbori district and try the local cuisine
- Visit the Osaka Castle

Day 8: Osaka
- Take a day trip to Himeji Castle
- Explore the Kobe Harborland and Kitano Ijinkan district

Day 9: Osaka to Hakone
- Travel to Hakone (approx. 2 hours)
- Ride the Hakone Ropeway and enjoy the views of Mount Fuji
- Relax in an onsen (hot spring)

Day 10: Hakone to Tokyo, depart
- Return to Tokyo (approx. 1.5 hours)
- Explore any remaining sights in Tokyo
- Depart for London

This itinerary allows you to experience the highlights of Tokyo, Kyoto, Nara, Hiroshima, Osaka, and Hakone, with a focus on cultural attractions, festivals, and natural scenery. Let me know if you would like me to modify or expand on this suggested Japan trip plan for May 2024.
agile jay
#

So just a simpler and less accurate response.

devout geyser
#

yeah

sleek vortex
#

yeah

agile jay
#

It's probably still affordable, as long as there is more filtering before passing into 4o.

sleek vortex
#

wha

#

no like he replaced the 4o at the end with haiku (like i have)

sleek vortex
#

oh

agile jay
#

Some finetuning of llama 70B could also do the trick.

sleek vortex
#

ok im moving on from this one gemini query

agile jay
#

Can't wait for groq to add it.

sleek vortex
#

i cant get it to work

#

but whatever ๐Ÿ˜

agile jay
#

Yep, asking for the cost of all the different models has always been hard.

sleek vortex
#

no

devout geyser
#

although slower to reply, here's what wizardlm-2-8x22b also said for anyone curious ๐Ÿ™‚

sleek vortex
#

this is different

sleek vortex
#

this is gemini 1.5 pro price in million tokens
problem 1: keeps quoting https://cloud.google.com/vertex-ai/generative-ai/pricing (this is completely unrelated and has prices in per character)
problem 2: thinks million tokens is some crypto

agile jay
#

I think making a knowledge graph would probably be useful.

#

And then injecting the knowledge if it's relevent to the query.

sleek vortex
#

For prompts up to 128K tokens:
- $0.35 per 1 million input tokens
- $1.05 per 1 million output tokens

For prompts longer than 128K tokens:
- $0.70 per 1 million input tokens  
- $2.10 per 1 million output tokens```
sleek vortex
#

well its like 50% right...

#

it didnt do the crypto thing

#

but thats the flash prices, not pro

agile jay
#

Maybe ask it to repeat what it found before giving the answer?

#

But as a summary, not the whole number of sources.

sleek vortex
#
๐Ÿ”Ž [0.91s] Searched for "Gemini 1.5 Pro price in Million Tokens"@SearXNG - got 22 links, 0 snippets
๐Ÿ”— [SearchAgent] [0.01s] Picked 6 links total
๐Ÿš€ [SearchAgent] [7.17s] Finished pulling 4 sources
    * ๐Ÿ”— https://artificialanalysis.ai/models/gemini-1-5-pro - 11600 chars
    * ๐Ÿ”— https://indianexpress.com/article/explained/explained-sci-tech/google-gemini-pro-1-5-1-million-tokens-9166398/ - 11600 chars
    * ๐Ÿ”— https://ai.google.dev/pricing - 3699 chars
    * ๐Ÿ”— https://www.cnet.com/tech/services-and-software/googles-gemini-1-5-pro-will-have-2-million-tokens-heres-what-that-means/ - 2797 chars
โฐ [SearchAgent] [2.83s] Model response time (llama8b)
"""Based on the provided context, I understand that you are looking for the price of Gemini 1.5 Pro in Million Tokens. According to the text, the pricing for Gemini 1.5 Pro is as follows:

* For prompts up to 128K tokens: $0.35 / 1 million tokens (input) and $1.05 / 1 million tokens (output)
* For prompts longer than 128K tokens: $0.70 / 1 million tokens (input) and $2.10 / 1 million tokens (output)

Please note that these prices are subject to change and may vary depending on the specific use case and requirements."""```
#

time to look into these sources

#

well first, both these sources have some parsing issue...

agile jay
#

Yep, must be really confusing for the model.

#

What about suing 70B and asking it to remove any errors from the scraping? Maybe it's already knowledgeable to do it.

sleek vortex
#

it would slow the query too much tbf

agile jay
#

Just as a test.

sleek vortex
#

if this was to become a site where i can get active like rlhf feedback from users then maybe i could see when these sites make certain issues

agile jay
#

Since if 8B can do it too, then it should be pretty fast with groq.

sleek vortex
#

$10.50 per 1M Tokens for prompts up to 128K tokens
$21.00 per 1M Tokens for prompts longer than 128K tokens

The key details are:

- Gemini 1.5 Pro has a pay-as-you-go pricing model
- For prompts up to 128K tokens, the price is $0.35 per 1M Tokens
- For prompts longer than 128K tokens, the price is $0.70 per 1M Tokens
- There are also additional charges for output prompts
- Billing for Gemini 1.5 Pro starts on May 30, 2024

Please note that these prices are subject to change and may vary based on the specific use case and requirements.
โฑ๏ธ 12.199028968811035 seconds```
#

oh my god FINALLY it works

#

ok well again its only 75% correct

indigo plank
#

yo sneakyfishy

agile jay
#

Yep, I'm just working on an easier way to clean up the input data.

sleek vortex
sleek vortex
#

markdown

#

im trying that rn

indigo plank
#

i dont want to sound like dumb or anything but once you use perplexity what like website do you sue to bypass like the ai detection

sleek vortex
#

perplexity since it cites the web usually doesnt sound like ai much anyway

#

if you want you can click copy, remove the sources at the end and ask ai to rephrase it, or if youre super paranoid something like quillbot

indigo plank
#

its for presentation

sleek vortex
#

but if its for important research, or maybe an assignment youre turning in, id reccomend you rephrase anything manually really

indigo plank
#

and i have to include all like the sources and everything

sleek vortex
#

you can copy it over and change a few words

#

then add sources as seems fit

agile jay
#

Or you can just use perplexity to quickly find sources you can use.

sleek vortex
#

yeah

patent rapids
#

Can't you summarize YouTube videos?

sleek vortex
#

gemini is decent at that

#

you can but idk if it works well

agile jay
#

Yep, also thinking how to handle graphs and svg's.
I'm guessing removing the actual svg content and only leaving the class and stroke/fill will be enough for those.

#

For graphs, probably just leaving them as plain html would work.

#

Think I'm gonna make a quick interface to measure how many tokens are saved from each method, compared to the average end result of the query output.

sleek vortex
#

i just stripped images and svg entirely

#

$10.50 per 1M Tokens for prompts up to 128K tokens
$21.00 per 1M Tokens for prompts longer than 128K tokens

The pricing details are:

- Input: $7.00 per 1M Tokens (for prompts up to 128K tokens) and $21.00 per 1M Tokens (for prompts longer than 128K tokens)
- Output: $10.50 per 1M Tokens (for prompts up to 128K tokens) and $21.00 per 1M Tokens (for prompts longer than 128K tokens)

Please note that these prices are subject to change and may vary depending on the context window and other factors.```
#

i will take that

#
[
  {
    "intent": "Plan a trip to Japan in May 2024 from London",
    "steps": [
      {
        "tool": "search",
        "query": "flights from London to Japan in May 2024"
      },
      {
        "tool": "search",
        "query": "best places to visit in Japan in May"
      },
      {
        "tool": "search",
        "query": "Japan weather in May"
      },
      {
        "tool": "search",
        "query": "Japan travel guide"
      }
    ]
  },
  {
    "intent": "Create an itinerary",
    "steps": [
      {
        "tool": "search",
        "query": "7-day Japan itinerary"
      },
      {
        "tool": "search",
        "query": "things to do in Tokyo in May"
      },
      {
        "tool": "search",
        "query": "things to do in Kyoto in May"
      }
    ]
  },
  {
    "intent": "Book accommodations and flights",
    "steps": [
      {
        "tool": "search",
        "query": "book flights from London to Japan in May 2024"
      },
      {
        "tool": "search",
        "query": "book hotel in Tokyo"
      },
      {
        "tool": "search",
        "query": "book hotel in Kyoto"
      }
    ]
  },
  {
    "intent": "Return the final answer",
    "steps": [
      {
        "tool": "write_answer"
      }
    ]
  }
]```
#

interestingly does a huge levelled response for japan query

#

like before

#

even on low temp

agile jay
#

Yep, because lowering the temp just reduces the randomness

#

Which is good when using sources.

sleek vortex
#
[
  {
    "intent": "Plan a trip to Japan in May 2024 from London",
    "steps": [
      {
        "tool": "search",
        "query": "flights from London to Japan in May 2024"
      },
      {
        "tool": "search",
        "query": "best places to visit in Japan in May"
      },
      {
        "tool": "search",
        "query": "Japan weather in May"
      },
      {
        "tool": "search",
        "query": "Japan itinerary for 7-10 days"
      }
    ]
  },
  {
    "intent": "Get accommodation options",
    "steps": [
      {
        "tool": "search",
        "query": "hotels in Tokyo"
      },
      {
        "tool": "search",
        "query": "best areas to stay in Japan"
      }
    ]
  },
  {
    "intent": "Plan transportation and activities",
    "steps": [
      {
        "tool": "search",
        "query": "Japan train tickets"
      },
      {
        "tool": "search",
        "query": "things to do in Tokyo in May"
      }
    ]
  },
  {
    "intent": "Return the final trip plan",
    "steps": [
      {
        "tool": "write_answer"
      }
    ]
  }
]```
#

ran it again

#

even better???

#

groq.BadRequestError: Error code: 400 - {'error': {'message': 'Please reduce the length of the messages or completion.', 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

#

uhh

#
    * ๐Ÿ”— https://www.selectiveasia.com/japan-holidays/weather/may - 366 chars
    * ๐Ÿ”— https://top.his-usa.com/destination-japan/blog/a_guide_to_japan_-_may_and_june.html - 9822 chars
    * ๐Ÿ”— https://www.japan-guide.com/e/e2273.html - 10000 chars
    * ๐Ÿ”— https://www.holiday-weather.com/tokyo/averages/may/ - 7460 chars```
agile jay
#

Rip, too long

sleek vortex
#

Flights:
- Depart London on May 1, 2024 and return on May 15, 2024.
- Based on the search results, the cheapest flights from London to Tokyo during this time period are around ยฃ402 roundtrip. The flights will take approximately 13 hours and 49 minutes each way.
- The most popular airline for this route is Iberia.

Accommodations:
- For your 2-week trip, I would recommend staying in a mix of hotels and traditional ryokans (Japanese inns) to experience both modern and cultural aspects of Japan.
- In Tokyo, consider staying at the Hotel Gracery Shinjuku, which has an 8.3 rating and rates starting around ยฃ128 per night.
- In Kyoto, you could stay at a ryokan like Yoshida-sanso, which offers a more authentic Japanese experience.
- In Hakone, the Park Hotel Tokyo is a luxury option with stunning views of Mount Fuji, starting around ยฃ165 per night.

Itinerary Suggestions:
- Spend 4-5 days in Tokyo to see the top sights like the Imperial Palace, Sensoji Temple, and explore the diverse neighborhoods.
- Take a day trip to Kamakura to visit the famous Daibutsu (Great Buddha) statue and historic temples.
- Spend 3-4 days in Kyoto to see the Kinkakuji, Kiyomizudera, and Arashiyama bamboo forest.
- Visit Hakone for 2-3 days to ride the Hakone Ropeway, see Lake Ashi, and try to catch a glimpse of Mount Fuji.
- Consider a day trip to Nara to see the friendly deer and historic temples.
- Spend the remaining days exploring other areas of interest, such as Hiroshima, Miyajima, or Kanazawa.

Let me know if you need any other details or have additional requests for your Japan trip planning!
โฑ๏ธ 42.565316915512085 seconds```
#

the hotel pricing is decently accurate

#

the flight not so much?

#

well its literally may 29

#

how is it showing info for a flight in the past

agile jay
#

Looks like they can save a lot. Now I wanna find out which filters have the most gain, and for what sites. So I can create embeddings to choose which filters to apply.

sleek vortex
agile jay
sleek vortex
#

whys your cleaned html so high

#

my cleaned text was like only 1.2x markdown size

#

markdown just added like the bolding and title separation

agile jay
#

Your cleaned html is the text content right?

sleek vortex
#

yeah

#

well i developed it decently well

agile jay
#

Mine is actual html

sleek vortex
#

oh lol

agile jay
#

And tailwind css adds a lot to the amount of characters lol.

#

The initial cleanup is stuff like removing tags like script, and element with classes which include nav, sidebar, header, footer etc.

#

But I'm gonna make more advanced ones to remove stuff like all the tailwindcss classes.

#

But the aim is to remove any extra clutter which isn't part of the main content.

#

So stuff like the apple webpage would be doable after cleanup.

#

There's the cleaned html is getting smaller.

agile jay
# sleek vortex

After the cleanup, and then passed back to llama 3 70B with the initial prompt, I get this as the source:

### Gemini 1.5 Pro: Quality, Performance & Price Analysis

**Quality:** Gemini 1.5 Pro is of higher quality compared to average, with a MMLU score of 0.819 and a Quality Index across evaluations of 88.

**Price:** Gemini 1.5 Pro is more expensive compared to average with a price of **$10.50 per 1M Tokens** (blended 3:1). Gemini 1.5 Pro Input token price: $7.00, Output token price: $21.00 per 1M Tokens.

**Speed:** Gemini 1.5 Pro is slower compared to average, with a throughput of 56.2 tokens per second.

**Latency:** Gemini 1.5 Pro has a higher latency compared to average, taking 0.95 s to receive the first token (TTFT).

**Context Window:** Gemini 1.5 Pro has a larger context window than average, with a context window of 1.0M tokens.

I think making the input super small like this is probably the way to go for price.

wise edge
harsh stag
warm cave
mighty gale
wide mesa
#

Really only writing mode doesn't have "search"

plush cairn
#

I got an issue, in the last two days while using the Perplexity answer repeating the previous answer with different prompt

austere kestrel
# agile jay Yep, because lowering the temp just reduces the randomness

yeah I think this is part of the issue.. and for all the models involved - like if one is told to find links relevant to X, and another is tasked with producing a response about X, the smaller models are prone to just making it up if the information (about X) isn't actually there (or if it's there but is embedded within a bunch of unstructured and non-relevant information)

#

lowering temp would presumably help

#

but it's also kinda just a limitation with the smaller models imo.. they just can't parse lots of information and stay focussed on multiple requirements effectively.. they get lost in a way that models like GPT-4 and Opus don't (or at least are less likely to)

#

btw came across this yesterday. haven't tried it out, but looks interesting (/potentially relevant @sleek vortex ) https://www.firecrawl.dev/

Firecrawl

Turn any website into LLM-ready data.

stable radish
devout geyser
#

I recall seeing that website, not cheap

sleek vortex
#

i mean i think its open source

#

well ive done similiar things

#

just isnt organised lol

devout geyser
#

yeah looks like you can self host it for free

sleek vortex
#

there was this very fast scraper

#

it doesnt do any of the parsing

#

this is just a crawler

#

but its pretty fast

#

written in go

devout geyser
#

I see

sleek vortex
#

so what if we kept a function warm on like aws lambda

#

with a chromium browser backed by s3 scraping cache

#

then maybe when one query hits a site, we can go in the background and scrape the whole site into s3 cache

#

idk

#

but is there really much benefit from that

fleet sandal
#

I subscribed to Perplexity a day ago, but I'm having a problem that I haven't had with others. It responds quite slowly and, worse, it repeats itself. For example, if I have an error in my code, it gives me a solution. I then ask something else, and it repeats the same thing I asked in previous questions. Is there any solution? Am I using the chat incorrectly? I don't know. These things never happen to me with the official GPT page or even with Phind.

south kindle
proper sage
#

I subscribed to a pro plan today, and it seemed like we had a limit of 600 requests per day. Where can I see the remaining credits? It's not displayed anywhere in my account?

sleek vortex
#

if youre enterprise yeah it just doesnt say

proper sage
#

Over Pro button i just see CTRL .

sleek vortex
#

they hide it for enterprise pro, idk why

fleet sandal
sleek vortex
#

well i dont think ive ever hit 600 anyway

proper sage
#

ok it's was just for check but i'm not enterprise

sleek vortex
#

oh

#

then they hide for all

#

not sure

proper sage
#

yes just for curiosity ๐Ÿ˜„

sleek vortex
#

they used to show it when it was 300

#

ยฏ_(ใƒ„)_/ยฏ

proper sage
#

BEcause i played so much today haha

sleek vortex
#

idk

livid mantle
devout geyser
#

yeah the counter is now hidden until you get 'low'

#

in Opus's case it's a bit misleading as you don't get the same amount, you get nearer to 1/10th compared to other models such as Sonnet or GPT-4o

agile jay
agile jay
#

Looks like the advantage of katana is being able to easily switch between http and headless requests.

hollow sorrel
#

I just don't understand something:

  1. I upload a PDF into Perplexity,
  2. I ask a question or two about the contents of the pdf,
  3. I create a Collection and in the "AI Prompt (optional)" section I indicate that I don't want Perplexity to use any outside sources when it responds to questions,
  4. I come back the next day, open this collection and ask a question and it consults outside sources! I ask it in the prompt NOT to do this and it still does it!

What am I doing wrong? I appreciateย any help someone can provide. ๐Ÿ˜ฌ

sleek vortex
#

turn off pro mode

#

if you have pplx pro then select a model in settings like gpt4o or opus

hollow sorrel
#

I do have the Pro slider set to off....

austere kestrel
#

it doesn't have any kind of persistent memory. each thread starts fresh. your previous messages and uploads don't carry over

sleek vortex
#

upload carry over in follow up

#

iโ€™ve had good success with that

austere kestrel
#

yeah but i think it's for a collection. as in you need to reupload the files each time. The prompt for the Collection is the only part that is "stored"

#

though i may have misunderstood

hollow sorrel
hollow sorrel
austere kestrel
#

they're very convincing, always obliging - but not always accurate ha

hollow sorrel
#

The problem is that it told me that certain concepts were contained in the uploaded document, but when I searched for those terms in Adobe Acrobat it - correctly - told me that those terms were NOT mentioned in the pdf!

austere kestrel
hollow sorrel
#

I guess I should just use Adobe Acrobat's AI Chat feature.... Or is there another AI tool I should use when I only want to search withing a pdf?

austere kestrel
gentle dirge
#

yo guys whats the best way to scrape web for giving llm latest context? im doing brave search api + cheerio currently

austere kestrel
#

but fwiw for what you were describing earlier - wanting to upload a repository of files - you might want to look into chatgpt's Custom gpts

hollow sorrel
austere kestrel
#

Press the 'focus' tab then select it from there

hollow sorrel
austere kestrel
#

google's Notebook LM is also good for working with documents

tidal sail
tidal sail
fading moth
#

Well, they've got more funding, but there is such a thing as wearing too many hats

#

I'd rather have 4.5/5 fix 4/4o's weak points than stretch its capabilities to other things that might suffer from those weak points

mystic basin
#

Hello!
Can anyone tell me what is the limit of Claude 3 Opus and when does attempts renew

fading moth
#

Given that Google's main game is data and being synonymous with Internet searching, and how they have been struggling with Gemini that has 10x the context windows of GPT while having Google's 80% of all user data; plus, experience as the world's biggest search engine...

#

It shows that OpenAI would need to do a lot of work in areas that currently have better solutions, even if that search functionality was added.

#

It would be better for OpenAI to be innovative, improve their weak links, and make their foundation unshakable before trying to branch out into areas GPT wasn't built to handle optimally without fixing the weak spots.

#

Gemini was far less known compared to GPT to the general public and GPT still hallucinates without pulling fringe/troll search data from sites like reddit.

It would be a large blow to their credibility; just as Google's AI currently is losing credibility by suggesting people eat rocks or jump off bridges.

fading moth
sleek vortex
#

I was able to make a better search summary than google with like a few days of work

#

@agile jay what do you think i should make the backend in

agile jay
#

I would have probably made it with Go.

sleek vortex
#

true but i know 0 go

agile jay
sleek vortex
#

true

agile jay
#

And the simple binaries as output is nice.

warm cave
agile jay
#

And the build times too.

sleek vortex
#

hmm, okay

#

maybe i will learn go

near imp
#

I just switched back to Android from iOS, and perplexity app seems to not support the same 'voice conversation' features on Android as on iOs? is that correct?

agile jay
near imp
#

.... wow.....

#

that is #sadge

agile jay
#

Yep, that what happens when the devs are from SF...

near imp
#

specially since Apple and iPhone are literally the worst platform for Ai and will see signifcant dropoff in the next few years, due to their lack thereof, which they won't be able to make up for anytime soon.

#

sad.

agile jay
#

Yep, in the US it's like 70% apple users, and since SF is likely richer, their percentage is likely even higher.

fading moth
#

Yeh. The GPT assistant is macOS only too.

near imp
#

yeah, but that's OpenAI, but indeed, I was suprised considering they literally have microsoft as their top investor.

#

no idea why anyone wants to be inside the digital prison that is AppleVerse.

#

I tried an iPhone for 2 years, and yes it has some nice features, but it was a prison.

agile jay
#

They like spending money to feel relevant.

fading moth
#

It's also weird since Apple is so restrictive with what they approve of and disapprove of in hardware/software.

#

So much more red tape.

agile jay
#

Yep, and no side loading.

#

But in the EU there is supposed to be sideloading this year.

fading moth
#

EU is doing a lot to help everyone fight Apple's "isolationism" tech

#

In the US we wouldn't be able to hold any ground against Apple

agile jay
#

Yep, otherwise you have to pay $99/year as a dev, just to sign your apps...

fading moth
#

So, it makes no sense. OpenAI putting out an Apple only thing that I would love to try.

#

Given Microsoft backing.

#

The assistant was like my most interested thing during their announcement.

tidal sail
#

I wonder how windows 12 will work with all this ai hallucination. Will that thing break their system?

fading moth
#

Well, MS has said W11 is the "last" iteration and it will instead just be patched and improved as an ongoing product

tidal sail
#

Chatgpt alone uses 6GB of my ram space currently lol

fading moth
#

Chrome and Firefox eating up mah RAM too

fading moth
#

Ye, that's why I said said :p

#

Instead of W11 is the last

#

The only people who know are above my paygrade.

sleek vortex
sleek vortex
#

then the ceo changed

fading moth
#

Copilot+ spooks me

tidal sail
#

I remember once when microsoft said windows 10 would be the last windows version lol

fading moth
#

And I would need to be an entire laptop to use it

sleek vortex
#

recall is a cool rag idea until they start selling my data

#

why does it take screenshots

#

surely better way exists

fading moth
#

Ye

warm cave
#

Maybe the images is for the users benefit, and what is stored is the output from Phi 3 vision

fading moth
#

GPT assist doesn't do that stuff like copilot+ ... The memory for the assistant may be less "accurate" in the long term but I really just need it to help with my current questions instead of asking if it remembers what I was doing three months ago.

#

Data is taken either way, but it's the amount that is so vastly different

tidal sail
agile jay
north magnet
#

why is the api so bad it doesnt even seem like its online

fading moth
#

Well, the new laptops have it integrated, I believe?

#

Like you can't turn it off

tame current
north magnet
#

the api isnt using the same model as the webchat clearly

tidal sail
north magnet
#

i want my money back

agile jay
#

Not Apple where adding 8GB more ram increases the price by a few hundred dollars...

north magnet
#

anyone else here using the api?

tame current
#

the sonar one? with -online suffix?

#

it's unusable, i've built my own api with other model

#

i can provide you source code

north magnet
#

its litreally unusable

tidal sail
north magnet
sleek vortex
fading moth
#

32GB of ram is pretty standard for even prebuilt these days. The RAM speed might be crap, but it's still 32.

sleek vortex
#

or cloud?

tame current
#

i'm using it on openrouter currently, but i will later adjust it to be able to run locally

north magnet
#

i just dont understand why it gives such bad respones then the web app? it doesnt say anywhere when signing up for the api that its using a unusable model

tame current
#

it's work-in-progress but searches well and is better than perplexity api already

fading moth
#

Make sure you have a spicy GPU if you want to run locally

agile jay
tidal sail
agile jay
north magnet
#

the thing im confused about is i was under the impression perplexity was just a wrapper, so how come the api doesnt give the same respones

agile jay
north magnet
#

the api model doesnt even seem online...

fading moth
north magnet
#

yeah but they dont state that anywhere @agile jay

#

i want my money back

#

@signal hamlet

mystic ivy
#

Hey sorry if this is wrong channel but who can I contact to get higher api limits?

fading moth
#

There is sort of a soft cap with speed at a universal level when we start dealing with quantum computation at a consumer level.

sleek vortex
#

open router has phi3 for free??

#

wait what

sleek vortex
#

i could try this instead of llama 8b for sources...hmm

fading moth
mystic ivy
tame current
mystic ivy
#

I want to be part of the happy few

fading moth
north magnet
#

there is something shady going on here ngl

mystic ivy
sleek vortex
#

pplx api maybe only got released to look good

#

for vc

mystic ivy
sleek vortex
#

but they really might only have like 2 employees on it

#

idk

north magnet
#

they make it seem like its using the same model

#

legit a scam

agile jay
sleek vortex
#

yeah its nothing like the pplx frontend

fading moth
sleek vortex
#

at most it might be the same as what free search gets...?

north magnet
#

im going to bring some friends here and dig into there business model

agile jay
#

/chat/completions

north magnet
#

something seems wrong

agile jay
tame current
north magnet
#

i was under the impression perp was a wrapper for other models, yet they cant make that model api accsaible?

#

seems like they are stealing and using something they dont want us to find

sleek vortex
#

yeah no pplx api is only for their own models

#

idk

north magnet
#

so confusing man

sleek vortex
#

i dont think they have the infra to actually host api scale other models

north magnet
#

whats the best api you guys are using

sleek vortex
#

i think they just have some good code and a few cloud gpus on it

sleek vortex
#

idk

#

about it really

north magnet
#

really

sleek vortex
#

well

#

it has llama 8b and 70b

north magnet
#

i never used grog

sleek vortex
#

fastest and cheapest out there

north magnet
#

groq

sleek vortex
#

they dont use gpu

tame current
#

openrouter is the best, has the most models for cheap

agile jay
#

Groq for speed and price.

sleek vortex
#

they use their own type of chip (LPU)

sleek vortex
# tame current openrouter is the best, has the most models for cheap

Surge limit: By default, all users are subject to a maximum rate limit of 200 requests per second to defend against denial-of-service attacks. Contact us in Discord or using our support@ email address if you need a higher limit.

Free limit: If you are using a free model variant (with an ID ending in :free), then you will be limited to 20 requests per minute and 200 requests per day.

#

hmm

fading moth
#

Oh. Speaking of Groq, what y'all think of the xAI thing musky was talking about?

agile jay
#

Open router and vercel for other stuff.

sleek vortex
#

thats grok right?

north magnet
#

i need acruate update responses for my app thats why i was planning to use perp till i realised how dogshit the api is

agile jay
#

There is Groq the infrastructure company, and Grok the x.ai model.

north magnet
#

got it

sleek vortex
agile jay
fading moth
north magnet
#

im guessing yes

agile jay
#

SInce people always want upto date info

sleek vortex
agile jay
fading moth
#

OH

sleek vortex
#

not groq LPU inference

fading moth
#

Gotcha

north magnet
#

perp was giving perfect repsones in the web app, im so heartbroken the api isnt the same

fading moth
#

Well, Groq Grok was crap on Twitter, I don't see why anyone would want to run it locally

sleek vortex
#

nothing prod ready tho

#

idk how i would scale it to a full api

north magnet
#

yeah man keep me updated 100%

fading moth
#

My bad

north magnet
#

i have 1,000 of customers and would pay top dollar for this

#

s

sleek vortex
#

right now i have working web search kinda model as decent as perplexity frontend

north magnet
#

i must be missing something

#

how come we cant archive the same results as them if they aren't using there own model, i get it has harcoded prompts but how is it getting arcuate information via web search?. couldn't we jsut build the same thing?

#

or is that what you are doing @sleek vortex

sleek vortex
#

yeah im doing that

#

doing it better (i think)

fading moth
#

A secret sauce that perplexity has on their web app?

sleek vortex
#

well ive built that in less than a week

north magnet
#

something shady man

sleek vortex
#

something equivilant to pplx web app

north magnet
#

i dont think its using LLM

#

ngl

sleek vortex
#

i can explain what i think pplx has

#

if you want me to

north magnet
#

yes please

agile jay
#

Yep, it's not hard to make a perplexity like app. The harder part is getting VC money.

sleek vortex
#

yeah lmao

agile jay
#

To scale to the moon.

north magnet
#

i have funding

#

personal funding

#

i dont need vc's

agile jay
#

So do I, but most of the competition will likely get nuked by openai when they release their search.

#

The path to AGI is full of dead startups...

fading moth
#

Wasn't Gemini 1.5 Pro with the 1M window also supposed to be a nuke?

agile jay
#

Who actually needs 1M context?

sleek vortex
# north magnet yes please

my theory is something like this:

  • they take the user's query
  • in copilot mode, they send this to a small finetuned llm which returns a bunch of searches to make on google/bing/their own indexer
  • in non-copilot mode, they just use keyword extraction or search your query as is on google/bing/their own indexer

they may have layers in the backend that summarises the sources or uses embeddings, but im not entirely sure - if they have their own indexer then they may be running this in the background but i really doubt pplx is doing this

  • they then take the top N results and fit as much as they can into the LLM's context and make a response
north magnet
#

i have users without web seacrh, web search will only make the respones in my app 100x better which should = more growth

sleek vortex
#

ive been trying to make something similiar that can also do multistep reasoning

north magnet
#

maybe im missing something

agile jay
#

The longer the context the slower the response, and the more entropy to the output.

sleek vortex
sleek vortex
agile jay
#

Yep, the more difficult part is making the model answer the way the user wants.

#

Which is what multi step reasoning is useful for.

sleek vortex
#

yeah

#

but right now my issues/todos are

  • first moving the codebase out of python local
  • somehow scaling it
  • and then i need to maybe build my own indexer/cache layer backed on s3 or some form of cheap storage
  • and then frontend, ofc
north magnet
#

perp skipped half of this for me

agile jay
sleek vortex
#

i mean yeah half of ai is just making it acessible and useful to each user's own circumstances

north magnet
#

providing up to date responses

#

but the api is useless

#

so i cant use it inside my apps

sleek vortex
#

why did i get flagged bruh

#

who deleted my message

fading moth
#

Try sending as txt

agile jay
sleek vortex
agile jay
#

The ipad pro m4 pricing one

sleek vortex
#

this the sort of thing ive been building so far

north magnet
#

ooo

sleek vortex
#

accurate up-to date information using multiple llms and custom searching pipeline

north magnet
#

for e-com?

sleek vortex
#

and it can get faster than 21 seconds

#

my parallemism needs a rewrite

agile jay
#

Yep, python can be janky when trying to make it concurrent.

sleek vortex
north magnet
#

you should try and self fund this, i would be intrested

sleek vortex
#

thats what i am sorta trying to do :d

#

i have no real money of my own so

agile jay
#

You don't need much funding for it.

sleek vortex
#

using what i can get

north magnet
#

dm me details

sleek vortex
#

for free

agile jay
sleek vortex
#

nobodys going onto gofundme for an ai project

#

if i built out the whole project and platform i could probably get users from just promoting it

agile jay
#

Or start a patreon for people to support you if they want.

sleek vortex
#

then make a consumer subscription

#

like pplx

#

cheaper since i dont have opus

#

have every model except opus

agile jay
north magnet
#

you never know man

#

gofund me might actually help xd

sleek vortex
#

i plan to add code interpreter and other tools too idk

agile jay
#

Such as making it more agentic, since you have steps and intent prediction.

sleek vortex
#

right now dont have dependencies really working tbh

agile jay
#

Yep, since you don't have a clear schema for it.

north magnet
#

are you working on it alone?

agile jay
#

In my case, I just use gRPC to combine them, and have them as seperate services.

#

Yep, and by sharing progress so I and a few others can help with dev suggestions.

agile jay
# sleek vortex yeah

You could probably make more by just focusing on making an API for other devs to use, lol.

sleek vortex
agile jay
#

And then increasing the margins.

sleek vortex
#

from scratch

north magnet
#

lmao the fact we are talking about building a competitor in there discord cracks me up

sleek vortex
#

yeah lmao ive had that thought at the back of my mind

tame current
sleek vortex
agile jay
sleek vortex
#

like thats what just a project turned into a nice credits/api

#

and then people use it

#

because convenience, right?

agile jay
sleek vortex
sleek vortex
north magnet
#

i wonder if the vc's know how much money they are missing out on bc the api doesnt work

#

might have to let them know

sleek vortex
#

perplexitys product is

#

the frontend facing product

agile jay
#

But probably pplx also doesn't want an easy API, since then someone can easily just make a mirror site that just uses the API...

sleek vortex
#

why do you think theyve got so much funding from telecoms

agile jay
#

And make it cheaper than the subscription...

sleek vortex
#

integration with korea telecom this that

#

its all a consumer focus

agile jay
#

Yep, SKT and Softbank partnerships.

sleek vortex
#

ill wait for the vodafone partnership so i dont have to pay for pplx pro...

#

ยฏ_(ใƒ„)_/ยฏ

#

until then!

agile jay
#

Yep, the question now is how to make citations a lot better in search.

sleek vortex
#

citations...hm

agile jay
#

Since currently just adding a super long list of source numbers is probably not the way to go...

meager sparrow
sleek vortex
#

we would need the mini models to push forward the used sources

meager sparrow
#

What is going on here??

sleek vortex
#

wha

#

this was ages ago

agile jay
#

Just pages.

meager sparrow
#

They are telling me โ€œplease react to the channelโ€ with that and then they showed me I have to then access this channel

#

But I have no idea what channel it is

sleek vortex
meager sparrow
#

It is not my fault

sleek vortex
#

but yeah its a new feature they were testing

sleek vortex
#

it isnt the best - i think theres still a decent amount of issues with it

meager sparrow
sleek vortex
#

but its not bad either

meager sparrow
#

I love experiments

agile jay
#

@sleek vortex katana is pretty good for crawling sites btw.

sleek vortex
#

yeah it is

meager sparrow
sleek vortex
#

not sure if theyre still checking it though

meager sparrow
sleek vortex
#

i think they were beta testing it

#

so they made it like gated

#

but then did they give up or something

#

as there hasnt been feedback for a few weeks

sleek vortex
sleek vortex
#

what i have right now is pretty good

#

but yeah as you said id like to be able to deal with sites like apple too

agile jay
agile jay
agile jay
#

Basically compare all the pages of a site and remove the duplicate elements.

sleek vortex
#

what on earth is this price

agile jay
#

There are multiple layers in place.

meager sparrow
#

@sleek vortex I clicked on the link you sent and it told me to insert an email and it said โ€œwe will send you a message shortlyโ€

agile jay
#

Which is probably why.

meager sparrow
#

So alright

sleek vortex
agile jay
#

And work for nearly every site.

sleek vortex
#

yeah then we could combine with convert to markdown or something

#

markdown is quite good because it preserves title weights/significance from articles

agile jay
#

Yep, makes life so much easier.

sleek vortex
#

where to start with go

agile jay
sleek vortex
#

oh i was asking you a q before

#

this probably is basics but

agile jay
#

Yep, you didn't say it though.

#

Go by example?

sleek vortex
#

yeah

#

is s an array, or a slice referencing the array?

agile jay
#

Basically you can think of slices as the default arrays. Since it's pretty uncommon to use an array, which has a fixed length.

#

But they are using a slice in that one, since they didn't specify the length of it.

sleek vortex
#

But howcome when they do

#

s = s[:0]

meager sparrow
#

@sleek vortex what can it do anyways???

#

I hate is it even good at doing ?

#

Pages

sleek vortex
meager sparrow
#

I have experimented with various Language models and some Ai powered search engines #

sleek vortex
#

so is the slice like undelying reference to the array
which is why its able to be expanded all of a sudden and re-adapt the elements

agile jay
sleek vortex
#

its really not the best imo

meager sparrow
#

So I already have a lot of knowledge and understanding in these models

agile jay
#

Slicing is pretty useful, do you not use it when limiting the model input?

sleek vortex
#

no i do but

#

go is like a bit different in how you can re-expand the slice

#

thats my main question

meager sparrow
#

Not sure what you mean

sleek vortex
#

they do s = s[:0] but then it's re-expanded with the same s?

sleek vortex
agile jay
meager sparrow
sleek vortex
#

once youve sliced it you lose the rest

#

same with like js

agile jay
#
some_list = [1, 2, 3, 4, 5]

some_list = some_list[:0] # gets rid of all items in the list
some_list[25] = 25 # now i've added it to the 25th index, even though there are no values between 0 and 24
#

So it's something you can do, but you rarely ever see it in code.

meager sparrow
#

@sleek vortex you said you were going to screen record a demo

#

๐Ÿ’€

agile jay
#

He's busy learning some Golang

#

To become a Gopher/Goblin

meager sparrow
#

Ok..

sleek vortex
#

compressing it

#

im not lying

#

macos screen recorder outputs huge files

#

so im running ffmpeg on it (slowly)

#

frame= 3033 fps= 73 q=31.0 size= 8960kB time=00:00:50.51 bitrate=1453.0kbits/s dup=27 drop=0 speed=1.21x

#

nearly done

#

slightly long but yeah there you go!

agile jay
#

Yep, ffmpeg can take a while, if you're doing CPU encoding.

meager sparrow
agile jay
#

But can you be sure it didn't hallucinate?

meager sparrow
#

Looks like something you can use to get 100% on a whole assignment

agile jay
meager sparrow
north magnet
#

how do you have the exact same UI as perp?

#

did you rebuild it

#

or get access to the source code

agile jay
north magnet
#

im confused

#

can someone explain please

halcyon coral
sleek vortex
half venture
#

Nvidia single handedly is doing the heavy lifting for the entire US economy at this point

sleek vortex
#

2.82T now

#

going to surpass apple so soon wth

#

apple is 2.92T

#

what the hell

half venture
#

Yeah

#

Went from the 94th position

#

To 2nd

#

Within a year

#

When this bubble pops

#

Oh boi

sleek vortex
#

i mean

#

will it pop

#

i didnt see any insane company growth like this in failed hypetrains like crypto/web3

half venture
#

It definitely will unless

#

Openai comes out

sleek vortex
#

gpt5

half venture
#

Yeah maybe that

#

But more like

#

Massive rollout worldwide free gpt 4 voice for the normies

sleek vortex
#

yeah

#

idk

half venture
#

I don't think you understand but

#

Most people only use chatgpt 3.5

#

And that's it

sleek vortex
#

did they go up after copilot+pcs

sleek vortex
#

most people use that

#

some might be touching google gemini and others

half venture
#

And now gpt 4o is free including vision browsing etc as a small limit

sleek vortex
#

but id assume the paid model population is really low in actual consumer adoption

agile jay
#

Yep, but now they all have access to 4o with all its features.

half venture
#

Definitely going to hype the markets

sleek vortex
#

the average consumer, doesnt know what they could do, or at least thats what i think

agile jay
#

But it also means that the next model should come out soon, for the plus users.

#

Otherwise, what's the point.

half venture
#

Now the revolutionary thing will be if.....

#

They roll out voice mode

#

For free

#

As well

agile jay
#

Yep

half venture
#

If some boomer in middle of nowhere

#

Gets to use the voice

#

I bet he will.take this ai stuff more seriously

agile jay
#

Yep, all those retired boomers with no social life will likely use it a lot.

sleek vortex
#

i wonder how google would change

#

if they released project astra tommorow

#

but after their already bad situation...

#

no clue

agile jay
#

Guess AGI for president will be more realistic since it will have the retired voters votes.

half venture
#

Yeah definitely

agile jay
#

And by far the most AI devs

half venture
#

Didn't google raise the price for gemini flash

sleek vortex
#

bruh their teams literally invented transformers

half venture
#

Right after bragging that it's cheap

agile jay
sleek vortex
#

they have their own insane tpus

half venture
sleek vortex
#

i honestly dont know why they arent first...

#

so stupid

#

they have the whole internet

#

they have all the compute ever

#

what are they missing?????

agile jay
#

Because they are bad at making new products.

half venture
#

Or any organization

#

The answer is

agile jay
#

They are only good at going into a current field and improving it.

#

Can't think of a field made by google.

half venture
#

The beuracratically stuck in a limbo

sleek vortex
#

they didnt invent search

#

but they won in the end

#

or they have at least

#

the future, maybe not

half venture
#

They didn't invent YouTube

sleek vortex
#

yeah

#

brought it

half venture
#

Actually I am more pissed off at Google for that

#

They have fuc k i ng YouTube

agile jay
#

Yep, it's probably the reason why they have such a large graveyard compared to other companies.

half venture
#

Make something better than Sora

#

Like come on

sleek vortex
#

yeah...

#

they have the whole of the internet on google images

#

why is their latest model not dalle 9 level

half venture
#

They are just being a wuss

agile jay
#

Because they have too many devs.

sleek vortex
#

why is SGE powered by gemma 0.1b

#

like ...

half venture
sleek vortex
#

no wonder youre being flamed about eating rocks

agile jay
#

Doesn't matter if you have the most compute, if it's shared with a large dev team.

sleek vortex
#

just throw some godamn compute at it

#

then finetune a model later

#

like bruh

half venture
#

The issue is

#

Search is so profitable for.them

#

And so cheap

#

They want genai to be just as cheap

#

But it's not

sleek vortex
#

bruh then finetune a model that incorporates ads

#

genai expensive entry

agile jay
#

Maybe they are making AI summary sh*t on purpose.

sleek vortex
#

but then make it cheap

#

then what

agile jay
#

To make people less likely to use it.

sleek vortex
#

release ai summary 2.0

half venture
#

LLMS even a 1 billion parameter when deployed at a scale of billions actually is very expensive

sleek vortex
half venture
#

Google is in scale.of billions

#

Microsoft in millions

#

That's the difference

agile jay
half venture
#

Also it's been almost 3 weeks

#

No voice

#

And they are treating chatgpt.free users better

agile jay
#

Yep, likely because of sky drama

half venture
#

And not even mentioned us plus opens