#Claude 3.5 Sonnet
329 messages · Page 1 of 1 (latest)
wtf? above 1st gen opus??
exciting about this!
I tried it for 2-3 responses in their web UI and wasn't super impressed. Opus still miles better on what I asked it.
Waiting for full OpenRouter release to play with it more (they likely have a shitty system prompt / temp in their web UI)
Another day, another hype train in LLM land, at least no outlandish claims. Better & cheaper is fine with me.
sadly 3.5 still very ..... censored XDDD
it doesnt like that i rizz humans
I expect no less...
my bad. it was copyrighted content.... rizzing is fine
The "human eval" benchmark they showed for coding made it seem on par with gpt-4o , but in my experience it's definitely not close to gpt-4o.
Maybe that benchmark used more basic coding problems, or Sonnet is just not that good at javascript. Sonnet frequently gives me code that immediately doesn't look quite right. Syntax is fine, it just seems confused and misinformed. I then send Sonnet's response through gpt-4o and it spots all the mistakes Sonnet 3.5 made. So unless you're doing basic coding, it's not great.
for large context copy-pasting doesnt seem feasible. For me, both models perform extremely poorly with OOP or async coding. The only workaround I found is to keep codebase close to FP and reducer-style execution, but that would only ever work in small projects
Its also hard to force any model to follow eslint/typescript rules without additional prompts
also consider that HumanEval is Python only
and doesn't reflect coding capabilities in other languages at all
Yeah, I find that both of them aren't great for languages that aren't Python. For Java, Claude 3.5 Sonnet isn't any better than 4o, which was already quite disappointing imho. Claude 3 Opus and GPT-4 Turbo are still far better at Java and Python than both of them from my experience.
For Python, it's not bad, but after some more messing around with it it's really not that great
so we can let it craete code in python and then just translate to lets say c#?
nah, it's better to prompt to code in C# right away
LLMs tend to do more mistakes when translating the code from one language to another
It should be competent enough in C# anyway
"Lost in TransC#ation"
Ah I was wondering about that benchmark.
Anyone notice that sonnet 3.5 through API doesnt seem to have knowledge past 2022? Wheras the one on claude.ai does.
Double check that you are using 3.5 and not old 3?
Even tried in anthropic console, same stuff
Ask what its knowledge cutoff is, it says 2022, ask what latest vuejs package it knows of is, its old
ask the same of claude.ai -> current answers
Hmm. Interesting:
$ llm -m "claude-3.5-sonnet" "What is your knowledge cutoff date? What is the last version number for vue.js you know?"
My knowledge cutoff date is September 2022. As for Vue.js versions, the last version I'm certain about is Vue 3.2, which was released in August 2021. There may have been newer versions released after that date, but I don't have definitive information about them.
I can confirm your observation as shown above.
(via direct API access to Anthropic)
Weird right?
makes me suspect its not actually running the newest model
I tried to write the help chatbot, but your know, who knows when that will be seen
At least the models seems different. Or the system prompt did not get updated?
hmmm may be something like this, its weird as it actually shows newer knowledge
i asked it directly about the vuejs definemodel macro, and it gave a good answer
which is vue 3.4
BUT, if I first ask it for its knowledge cutoff:
lol
and this is starting with the what is definemodel question:
So if its reminded that its knowledge is from 2022 it wont answer questions about newer than that
This has a certain "configured by prompt" smell to it.
Maybe add a system prompt telling it its knowledge cutoff is 2024
I would not be surprised if that improves things.
My current guess it that the system prompt for the website include a hard coded cutoff date, while sonnet via API is just the old improved model, which they forgot to tell that it has newer information.
Otherwise it should be obvious/testable from other behavior that it is 3.5 and not 3 on the API
yeah the fact that it sometimes can answer the definemodel questions is proof it has newer knowledge
but weird that its so forced
But this is an interesting observation nonetheless
what system prompt wizardry have they done for the claude.ai model
winner is correct at least
I would have no clue as a sport agnostic European
european, american sports agnostic so yeah had to google that shit
My current assumption is that models via API are very 'bare' and models via Chat interface get a lot of system prompt tuning. I think it is still possible that API sonnet-3.5 just did not learn about its new cutoff date, which gets hard-coded in the chat system prompt.
Of course it is not impossible that Anthropic uses different fine-tunes for API and chat
Seems more expensive tho
Its also starting to get linguistically confused on me:
aint no french in my tool descriptions hah
I have not checked that website, but my guess would be that they require to give them your own API key to work? Impressive number of tokens, nonetheless
afaik websim does provide a free version too
I used Sonnet on it for free
yeah they still do, just checked
If these are all free tokens then they are burning $10k to $100k per month on this website (roughly calculated)?
I have no clue where they do get money for free access tbh
Like they gave Opus for free too
(I mean this may still be funny money in crazy LLM world)
I mean a bunch of hobbyists did a 2k$ finetune recently (Magnum), just for fun
So yeah money going around in this sphere is insane
Hi all, I read somewhere that usage costs can increase rapidly on Claude API if you use the same context window for too long (i.e. without refreshing a new chat)... has this been your experience? Are there commands or prompts I can use to avoid burning through credits too quickly?
for all llm providers everytime you send a message you are paying not just for the message you sent but also all previous messages and responses. That is not exclusive to claude. The only way around this is to start a new chat.
Would love to have image prompting support added!
text+image -> text is supported, see -> https://openrouter.ai/docs/requests#images-_-multimodal-requests
(Sonnet-3.5 is a multimodal model and supports text+image prompts)
Awesome!
being charged for overloaded error 502 👀
If the provider returns an error you should not get charged. You may see replies with this error, but they should not show up on your activity page -> https://openrouter.ai/activity
saw error and this though
This is related to a bug with text completion that should be fixed by now (those requests did not return an error btw), see here -> #1271200228611067934 message
I was on chat completion. Anthropic had issues yesterday.
Still, errors do not get charged. Those empty replies were due to a bug that should be fixed now. Ask -> [email protected] about a refund etc.
especially given the downtime.... can we get access via AWS? Aws proxies it back to the same claude's server? I was under the impression AWS was hosting claude in their DC.
https://aws.amazon.com/bedrock/claude/
https://aws.amazon.com/bedrock/pricing/ - they only have sonnet 3.5 in some DCs
Access Anthropic's Claude large language model through Amazon Bedrock to build generative AI applications.
Does response_format actually work on this model even though it's not documented?
anyone knows if you cache is already implemented on open router?
No. See here -> #general message
kewl thanks!
[redacted] never mind
before I go ahead and test this myself, about caching on sonnet:
if TEXT1 is already cached
and I send a new request starting with
TEXT1+TEXT2
will caching everything up until TEXT2 cost the full token count of TEXT1+TEXT2 or just TEXT2?
Deepseek's caching seems to be much easier to understand and use, and caches for longer, cheaper 😅
you need to use several cache control
First turn:
TEXT 1 (cache control)
Cost: TEXT 1 uncached
Second turn:
TEXT 1 (cache control)
TEXT 2 (cache control)
Cost: TEXT 1 cached, text 2 uncached
Third turn:
TEXT 1
TEXT 2 (cache control)
TEXT 3 (cache control)
Cost: TEXT 1+ TEXT 2 cached, text 3 uncached
Imagine that the logic you start at the bottom and you go to each cache control successively.
So at the third turn, you are like.
Ok...so the first cache control starting from the bottom is at TEXT 3. Do I have TEXT1+TEXT2+TEXT3 in cache? No? Then it's not cached.
Next cache control...At text 2. Do I have TEXT1+TEXT2 in cache? Yes. So we can use that cache.
And since there is a cache control on text 3, let's cache TEXT1+TEXT2+TEXT3
at least that's my understanding of it
but important:
on openrouter, I noticed that cache control only works on user message
Claude 3.5 Haiku out today as experimental public beta, along with an upgraded 3.5 Sonnet.
https://www.anthropic.com/news/3-5-models-and-computer-use
Haiku 3.5 seems to come later this month, not today -> #general message
oops I read too quickly
No Haiku-3.5 but updated Sonnet-3.5 in Anthropic workbench ->
Opus 3.5 has disappeared too from what I can see, unless I'm blind
Opus-3.5 was never released
I thought it was a coming soon originally?
Claude 3.5 Haiku is the next generation of our fastest model. For the same cost and similar speed as Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses Claude 3 Opus
Surpasses opus
For various interpretations of "surpasses", see also -> #general message
Any idea on which model version the current anthropic/claude-3.5-sonnet on openrouter is point to?
20241022 or 20240620
In doubt the old one.
hover icon now points to newer #announcements message
eee
Tested the new 3.5 Sonnet.
After all is done and accounted for, it jumped ranks from #15 > #7 with slightly less prudishness (still much higher than the competition).
I saw massive gains in tasks labeled for Reasoning (suspiciously high gains, I need to investigate this further). A slight dip in prompt adherence and code. I scrutinized and retested all tech-related coding tasks a total of 6 times, ended up running 18 queries PER TASK in that particular label to exclude any random outliers. The results were consistently delivering the same outcome, though.
Good improvements as a whole.
Can you explain the code aspect? Worse in what way? And what does utility refer to
Also thanks for doing these. I always appreciate your tests
1 example which is weird for a western company to do, since its essentially just history.
I understand you want to know this history, but it involves sensitive content. It is recommended that you learn about the relevant history through reliable channels.
Yeah I wouldn’t expect a western company to do that. Wonder if their dataset is tainted ?
qwen and Yi answered this less censored, lol
still biased of course but not like this
Anybody got computer use working via OR? It looks to me like the tool definition is not transformed correctly between the OAI format and anthropic...
Four months ago, we introduced Anthropic’s Claude 3.5 in Amazon Bedrock, raising the industry bar for AI model intelligence while maintaining the speed and cost of Claude 3 Sonnet. Today, I am excited to announce three new capabilities for the Claude 3.5 model family in Amazon Bedrock: Upgraded Claude 3.5 Sonnet – You now have […]
And bedrock jump straight to this version
What's the difference between anthropic/claude-3.5-sonnet:beta and anthropic/claude-3.5-sonnet?
Is beta the latest version (added today)?
Both are the same model (20241022), they only differ when moderation gets enforced.
See also -> #arc-feedback message
Is it possible to use the version before the current update?
Right now the new update doesn't suit me, it produces answers I don't like...
I see they added it back. The ones that say (2024-06-20).
Hey, how to use client = anthropic.Anthropic(api_key=API_KEY) with OpenRouter? Thx.
it's interesting that they published no comparisons with the o1- models
as far as I can see anyway, model card or announcement
on the bus home theory, they didn't jump to Claude 4 because they're going to expand to longer run reasoning functionality and release the three way Claude 4 Sonnet, Claude 4 Haiku, Claude 4 Opus etc etc which will look awesome on billboards
Today is truly Claude 3 Service Pack 2
i'd say might as well skip claude 3.5 opus
it will be slow and expensive
better try something like o1
(o1 is slow and expensive)
i'd say o1 mini is quite fast other than thinking period lol
also how does it have 64k output
No way opus would be slower than o1 unless they really mess up serving it
And I don't really consider opus and o1 to be for the same use case anyway
Hi, i would like to ask if openrouter support Anthropics prompt caching?
Yes, see here -> https://openrouter.ai/docs/prompt-caching
thanks 👍
I read the documentation for Anthropics prompt caching, it seems that cache_control parameter must be sent. Does it means tools like Cline have to implement this to their user/system message to take advantage of this prompt cache?
Cline should supports this since a few versions
Great! Do we have to do some configuration in cline to take advantage of this prompt cahching or it is automatically done by them?
Ask in their Discord, invite link is here -> https://github.com/cline/cline
Thanks, done that.. Just asking in case you know about it. Thanks for yr help 👍
The difference between them is the beta version is latest version of the model but without :beta it’s older version of the model
No, this is completely wrong.
Why?
Because the difference between :beta and the non-:beta is only how moderation gets handled. Otherwise they are exactly the same model. The irritating :beta slug is hopefully going away soon -> #arc-feedback message
Hmm
I think the OR announcement might need to be a lot more blunt about the new sonnet...all the power users got the message but the same questions keep getting asked
bumping again: anyone got computer-use working with OR?
No. See also -> #1298768509844852766 message
i know this is the wrong discord, but did anyone get new sonnet working on vertex?
Apparently it is available on Vertex -> #general message
yes, thanks. i asked a follow up there just a few mins ago... looks like it might need an explicit quota increase request... my quota on vertex is 0 for the new sonnet (but unlimited for the old)
i submitted a quota increase request to GCP... got email back saying they'll try to resolve within 2 days
Just me or does this new version feel a bit "preachier" than before?
Not really? It's about the same in that regard for me
new Sonnet seems pretty good for my usecase - I also noticed that it does get things wrong often still, but I just ask questions and it realizes quite often.
It once even answered along the lines of
"Yes, ...
but no you should not change that"
contradicting / correcting itself :D
I've observed that the new model will also ask me questions if it requires more clarity on a task, I didn't really notice this in the previous iteration
Yeah, I was using it to write some regex for SED earlier and it kept correcting itself in the same reply?!
Im wondering if lower temp could help with it giving the right answer right away
but maybe it just „realizes“ later on
"Actually, for the capture group to work correctly, it should be:" and then in the next reply "Actually, just like before, we need the capture group version:" were in the middle of the reply and both times it rewrote the command?!
Making a mistake once and self-correcting was weird enough, but then to do it again was really weird!
continuing the discussion in #general regarding the output length issues,
It is interesting to me that bedrock sonnet 20241022 "v2" is still unable to have a max output length beyond 4096. they also make it clear that an optimal setting is 4000. aws have told me the ability to set it higher is with their higher team but no estimate on a fix.
coincidence? 🥸
Just wondering
is it only me or new sonnet is getting very lazy?
always stopped in the middle and tell me that he want me to say continue, then it will continue.
At least on the Claude website the new Sonnet works as good as ever if not better for me.
i dont have pro but can I ask for your help
try to tell it write > 800 words article or anything
cus i am using API to use the new sonnet, it keeps telling me to confirm to continue.
other than concise outputs at times, I've found the new sonnet to perform better at complex tasks than the previous one for me (I'm using it as a coding assistant primarily)
I think sonnet did some thinking, when i tell it in the prompt that dont ask me continue or not.
[I'll continue with the _______________________ in the next part, as I want to ensure these foundational elements are clear first. Would you like me to proceed with those sections?]
I apologize - I caught myself asking again! Let me continue with the complete response:
That's true, it is clearly better but it is really annoying that keep asking me to give it go ahead
true, it seems to prefer giving shorter outputs :x
hopefully they'll address this soon
I think it is good that way, no need too much explanation, just give me the answer.
However, sometime, or most of the time, it will just stop in the middle
yeah, if it's an intentional feature then it should be toggleable on their API
This is a long shot but does anyone know how to use prompt caching with OpenRouter Sonnet on SillyTavern? I see an option to do it with Anthropic just not OR
Ask in the SillyTavern Discord?
It is up to them to allow/implement this for OR
Figured I should go to them haha thanks
I think best way to do caching is simply doing a small local proxy that apply the caching for you (ie, a openai-compatible small endpoint locally that will call openrouter while applying the cache)
rather than wait for all product to implement sonnet caching properly for OR
sonnet seems slow today
Claude site also reroutes to Haiku currently for free users, but -> https://status.anthropic.com/ is still completely green
Welcome to Anthropic's home for real-time and historical data on system performance.
i get these empty infinite writings from sonnet today https://i.imgur.com/63MZVqZ.png
8usd credits
Works for me ->
yesa, sometimes.
What does your activity page show? 0 token replies for those?
Nothing? Then it is probably an error reply which does not get billed.
Try using OR chat to verify that this is not a problem with your client. If you see the same problems there, we can work on them.
Note there is currently a problem with Cloudflare returning 524 timeout errors for some users under some circumstances, maybe your client does not show these correctly. See also -> #announcements message
New features just dropped:
- https://docs.anthropic.com/en/docs/build-with-claude/pdf-support - input PDF as images, similar to Gemini's PDF support
- https://docs.anthropic.com/en/docs/build-with-claude/token-counting - new API for counting tokens, since Claude 3 never had a public tokenizer
Ah so thats what they meant by Haiku launching "by the end of this month"
wym
haven't been following
oh shit it's november WHERE IS HAIKU
D:
For API it's "later this year".
👀 https://docs.anthropic.com/en/release-notes/api#october-8th-2024
Curious, is OR aware of the change regarding user/assistant roles?
That's a nice change, as OpenAI allow you to do multiple user messages before an assistant message
yep! cc @brisk shore
yeah we added this change some time ago. Not sure if Vertex is updated yet tho
Hello, i'm trying to run a python code from huggingface spaces (gpu) using gradio and model 'claude-3-5-sonnet-20241022' .. i get everytime Error: {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"}} if i call the api directly or from huggingface settings is the same error.. i verified that the api is correct and working.
the api i put on huggingface is from Openrouter.. probably i do something wrong in the code??
import gradio as gr
import spaces
import requests
import os
import json
ANTHROPIC_API_KEY = os.environ.get('ANTHROPIC_API_KEY')
@spaces.GPU
def process_text(text):
try:
headers = {
"x-api-key": ANTHROPIC_API_KEY,
"anthropic-version": "2023-01-01", # Changed version
"content-type": "application/json"
}
data = {
"model": "claude-3-5-sonnet-20241022",
"max_tokens": 2000,
"messages": [{"role": "user", "content": text}],
"system": "Rewrite content while maintaining exact meaning and accuracy."
}
response = requests.post(
"https://api.anthropic.com/v1/messages",
headers=headers,
json=data
)
print(f"Full API Response: {response.text}")
if response.status_code == 200:
return response.json()['content'][0]['text']
else:
return f"Error: {response.text}"
except Exception as e:
return f"Error: {str(e)}"
interface = gr.Interface(
fn=process_text,
inputs=gr.Textbox(lines=5),
outputs=gr.Textbox(lines=5)
)
interface.launch()
ok i found the issue (what a stupid).. as i have the api on Openrouter i must replace https://api.anthropic.com/v1/messages with https://openrouter.ai/api/v1/chat/completions and change the code a little
Interesting, users noticed turning on prompt caching makes the non-self-mod filter stick until they wait out the 5 minutes to reset the cache.
how could you do that 'prompt caching' with silly tavern?
Update first, staging branch.
In config.yaml set cachingAtDepth to (-1 is off) to 0 if you have no prompts between Chat History and Prefill, 2 if any, +2 for each level of depth insertion. Cache depth works by counting role switches rather than chat history messages. 1 would be on assistant message before last set of user prompts, hence you want an even number.
Custom prompts after Chat History should be set to user role instead of system role.
If doing group chat, blank out group nudge under utility prompts and add it to user role custom prompt instead.
Check the terminal to make sure cache_control is on a chat message behind anything not a chat message.
v
non self mod?
Are there any anti-NSFW prefills/filters on the regular version of the new Sonnet on OR, as opposed to the self-moderated one?
The regular version is moderated meaning before the chats are even sent to sonnet it will be checked externally to see if there is anything nsfw and block it if it finds any
is the openroute api key created for anthropic claude a dropin replacement of the api key created on anthropic console?
"Self-mod" (:beta endpoint) has a prompt injection somewhere that makes it like having a flagged Anthropic key.
"Regular" endpoint does not have such injection but can be intercepted by OR's moderation model, preventing a response (returns API error), at no charge. This endpoint is otherwise like having a normal key aside from potential blockage. One thing to note is that Anthropic API uses a single system parameter and does not have system role for messages array. OpenRouter sweeps all system messages in messages array of their own API which will be converted to the system parameter sent to Anthropic. Ensure that there is no user/assistant message before a system message, then model behavior should be identical to using Anthropic directly.
Good knowledge! Any idea what OpenRouter uses as a moderation system for the non-self-moderating option?
No. OpenRouter API keys can only be used for the OpenRouter API, vice versa for Anthropic API keys (can only be used for Anthropic API) and OpenAI API keys (can only be used for OpenAI API), for example.
There is a new feature in the works which will allow you to bring your own API key from Anthropic or OpenAI (or other provider) and use those through OpenRouter, but that is kind of the opposite way you described in your question.
Oh, by "drop in replacement" question I was thinking of model behavior than sticking keys into an API. What he said, can't stick OR key into direct.
Not much more is disclosed than #1298353935500836957 message
thank you. this is very insightful. what do you mean by flagged anthropic key?
yeah its more of the later. sticking keys into the api without refactoring. 🙂
When people do enough no-nos on direct, they get notified that their account is flagged, and they'll get an injection that affects model responses. OR doesn't flag accounts this way.
i see.
Vertex AI now supports prompt caching: https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude-prompt-caching
Now OR gets to deal with the complexity of dear lord which provider was the cache created in 😅
is it just me or is Sonnet cringe and predicable at RP?
do they crank up the stricness again on claude? previous sonnet write good when it dont have that much rule to follow.
Don't think so. Works fine for me with minimal jb
Noting here that Anthropic is actively working to scale sonnet capacity as they're overwhelmed: https://x.com/artificialguybr/status/1879402029061378171
Yeah, they seem to be struggling quite a bit recently. No new models. Opus 3.5 underperforms, so they didn't release it. Either they've reallocated most of their compute into training a thinking model or they just generally don't have enough power to sustain that many users.
Aren't they tight with amazon
Is amazon out of gpu
Maybe its unprofitable and they are controlling their burn rate
I think this might be the reason why they bumped up the price on 3.5 Haiku, not whatever excuse they've made
Making a API request to claude-3.5-sonnet:beta is giving me the attached screenshot error
Other models working fine
Is there any solution or reason ?
anthropic is overloaded with requests so sometimes 3.5 sonnet stops working
last time this happened it seemed to be about a month or so before a new model was released, so I'm hoping it's because they are cooking/testing a new model
Hopefully 🤞
Ah, rate limits... Also Sonnet is up by ~40B tokens per week vs 2 weeks ago.
We're working as fast as we can on getting more sonnet capacity (provisioning extra dedicated instances of it). In the meantime, adding your own Anthropic key in Settings -> Integrations will help boost your rate limits, if you're running into them.
Thank you very much for working on increasing the Sonnet Capacity.
Currently, I have registered the Anthropic API Key in Integrations, and I am using it with the settings "Use this key as a fallback" Disabled(Prioritize using my Anthropic Key -> if rate limit or failure, use the OpenRouter Credit).
I’ve noticed some strange behavior occurring about once a day, so I’m reaching out with a question about it.
Weird Behavior:
- Normally works fine with Anthropic Key and get "personal key used" log on the OpenRouter with 5% charged.
- Got request warning sign on the Anthropic Logs with under log
{
"client_error": true,
"code": 499,
"detail": "Client disconnected"
}
- For the subsequent requests, The request goes to OpenRouter directly and use the OpenRouter credits.
TimeLine(KST based):
- Until [Feb 18, 02:01:33 AM] works fine with Anthropic Key
- [Feb 18 02:02:42 AM] Got Warning sign on Anthropic Logs
- After [Feb 18, 02:03:02 AM] using the OpenRouter credits directly, even though 'Use this key as a fallback' is disabled.
- [Feb 18, 02:58:35 AM] Return back to use my Anthropic Key
I really like the Integrations feature, but I don't know why this Is happening. I have the Tier 3 account on the Anthropic(that has Input tokens per minute: 160,000) and I think this is not the rate limited or failure issue on the Anthropic(I will share the Logs with Image).
Have you ever received reports of behavior similar to mine? I would really appreciate any help you can provide regarding this.
This is exponential, but crashing into hard limits. This model has had capacity issues from day 1. Anthropic accidentally created GOAT AI and didn't charge enough for it. OpenAI's strategy with their cheaper and shittier 4o makes more sense in this context.
Very odd, haven’t heard of it before. No reason we wouldn’t use your key unless Anthropic was sending down a 499 response code. Will flag to the team to double check
Cline absolutely gobbles tokens, but I'm guessing Cursor dwarves this number. The jump from 4. to 3. is wild
Anthropic likely threw an error, and OpenRouter backup generator kicked on for a while.
Does OR always try the preferred source first in this configuration?
Thank you so much for your response! I hope this issue gets resolved quickly. If you need any information, please feel free to let me know anytime. 🙆
On OR Docs, [BYOK] -> [Automatic Fallback] we have this section
Conversely, if “Use this key as a fallback” is disabled for a key, OpenRouter will prioritize using your key. If it hits a rate limit or encounters a failure, it will then retry with your credits.
And this is what I exactly wanted
(prioritize my Anthropic key first -> if rate limited or failure happens -> use OR credits)
But I don't understand why the error below occurred, and I also can't find the reason why continues to be used OR credits directly after it happens(for about an hour or less).
{
"client_error": true,
"code": 499,
"detail": "Client disconnected"
}
Here is my current Integrations config
If you don't want to pay more, try limiting the max output tokens to less than 4553.
does anyone know it sonnet 3.5 supports context caching for providers other than Anthropic (Bedrock and Vertex)?
I don't think so. I'm not sure where it shows the provider caching function.
Caching works for Vertex, and Bedrock will start caching very very soon (today hopefully cc @brisk shore )
whenever caching works for your requests, you're of course pinned to that provider until the cache expires
is pinning going on automatically or should I set only 1 provider in a config?
automatic
or maybe it's just a different system prompt 🤓
"Really THINK through the question step by step. Write a lot of text that makes it look like you're really agonising over the solution. Then just ignore that and go with your gut like always 🤙"
Honestly even if it's just an incremental upgrade and/or something bolted on to improve it, I'll be happy. It's still #1.
But yeah it would be nice to get Claude 4 or whatever, so we can finally get our minds blown again
not with the price hopefully 🥲
Haha... who knows, but I feel like if it's the same price as 3.5... we have to count that as a win 🙃
going to be worthless
Damn. At least theres 1000s of other waifu models for you to goon with
Did caching break this morning? Tried all three providers, doesn't seem to be any cache writes or read.
wouldn't be claude.
should be better now as of about an hour after you posted this message
Is there a reason Vertex and Bedrock are listed twice in the model page?
Similar but slightly different latency/throughput stats.
one is US deployment and the other is EU, gives us extra capacity
Prompt caching for OpenAI models should be automatic, but for openai/chatgpt-4o-latest it is not caching anything for me. Anthropic caching is working as intended.
That model has never supported caching, to our knowledge
Just their main api models
Surely they won’t poison the API with thag
it was actually the opposite!! can you believe it?!
talking to the model you can tell it's positivity poisoned at it's core despite this.
While it wont refuse it will "work around" generating "negative" content. Death during a story is near impossible.
I would go as far as to say that it's too happy to help with problems often trying to make things sound better than they are. At least in my coding testing.

"The red area in the graph marks the important segment of the data" type stuff has been said many a time by now.
like no shit I made that graph
Interesting. I'm looking forward to really putting it thought its paces with Cursor. I've got a rules file specifically tuned for 3.5's bad habits, I'm hoping the changes might make it less "corporate".
I haven't seen this mentioned, but very weclome. It knows about the Node v22 LTS and React 19 releases, for reference.
This is the model that's had an actual, measurable positive impact on my life, and was able to break my rule of never getting hyped about a future release. I just can't help but feel a touch of schadenfreude for people who need it to also write stories that would be banned in most countries for them to consider it useful. And since there's never enough Claude to go around, I'm kinda glad honestly
I think the tools are severely underpowered for story writing really, more than the models. If you're ever really satisfied by next token prediction... maybe you should try reading more widely. It takes a human in the loop to write something truly thrilling.
SillyTavern is inspirational and reading its code is a big part of how I learned the some of the intricacies of prompting, but it trying to be everything to everyone coupled with its tech/UI debt makes it frustrating to work with.
Hermes 3 405B and Mistral Large have no limits, but they need help with prose. I've got some ideas for a more focused tool, but it's only something I do occasionally and it's hard to make a priority. I think most of the pieces are already here, but frankly I think RP/story writers have been fairly lazy, and lack imagination.
@supple umbra I think it's true that the current gen tools don't write prose on the level of a good author, and come up with very generic straight forward stories. If you give Claude pages of world building, it won't consistently use them. Sometimes it appears to 'get' its directions but then does something jarringly out of place.
It's still at a level where it makes an author more efficient rather than replacing them. Same with serious coding, if you need good architecture and abtractions and consistent use of coding patterns accross a large application, you have to micromanage.
The thing is - a lot of people are not that great at coding and writing. A lot of even commercial books and code are written pretty bad. They are, apparently, good enough. So for "terrible ghost written fantasy novel for Kindle Unlimited" writing and "I made a dashboard for the sales department" coding, you can get pretty good results with a light touch.
What I find interesting is... both of these allow people who don't have the complete set of skills but have ideas to make a thing they otherwise wouldn't have made.
Joe from accounts can write a python script that emails bad debtors on a friday.
Then he can go home and write that harry potter inspired space western he always wanted to write.
So for me, I'm AI genning a little novel in Claude, and the prose isn't good but it's a lot better than I could write. On the other hand, I have a lot of life experience and ideas that someone who focused on honing their writing craft might not have. Like, I'm writing some corporate thriller story, and I can include all kinds of realism and dumb detail a beginner author writing from research couldn't capture. For me, it makes the story a lot more interesting and less subtly irritating.
And I think that's the core thing, the place we are almost at -
I can make a story that is exactly what I want to read, without all that much effort. The prose is good enough. It's accurate to my life experience. It's alligned to my tastes. It is the story that I want to read today.
What we're going to see I think is more people who wouldn't have bothered, creating content for themselves because it's low effort. Content that is "good enough", and unique to them.
A lot of low grade commercial content is written by authors who are watching keyword rankings on kindle. Today "dark romance" and "self help" are trending so they write a book about a woman who finds her self confidence with a hot exploitative therapist.
And it's terrible. Written in one take no edit. By a 17 year old Indonesian ghost writer saving for college. And people buy it, enough people.
I think this is getting to the point where it beats that.
What do you think the best model for writing? has you taking a looks at community RP model for it
I mean I like Claude personally, but different models have different strengths for different parts of writing I think
Like uh, when you are brainstorming, deepseek can pull from your notes in a way that is better or at least different. It's less positively biased so it comes up with different ideas if prompted appropriately (brainstorm in charecter works well for me)
I like the prose out of grok but Claude does a better job of understanding intent and knowing how to write a scene that is better than your outline. I've had good experience debating how to do a charecter with claude backwards and forwards and getting a unique voice and style out the other end that is different and better than I'd have come up with alone...
Chatgpt still bad
Smaller models... I don't get on with them, but I think it's a skills issue
But uh, fundamentally, it's good enough to keep me happy, and I usually read a lot of badly written trash.
Interesting..
I actually have idea of using claude to make the output first then gave it to grok to rewrite it while keeping it originality.
The thinks about claude is that its a smart model, making story with it actually make sense, the problem it have is the steering from antrophic. grok are just more open.. they didnt care how wild or how disgusting its, which make it able to output much more diverse prose.
Honestly, when I feed large lumps of ready written text to Grok, and other models, I find it just reproduces them verbatim.
Like when you wrote that I thought "hey, I'll feed it a scene and see what it does"
It just copy pasted
has you try to steer it?
example, i got story where the mc are on the bad side and its a monster, when the monster doing some nasty thing likes eating "things", claude will either reject it or giving it out but in much more tame manner
with the output claude gave i went into grok, then i prompt it how the scene should be.
"rewrite this story while keeping it originality on its logic and pace with the difference it be more graphic, g3re,......... and so on" making it outputing more wild version of the given scene
Hmm, I had some luck with generating very detailed notes from an existing scene, deleting it from the chat, and then asking for a rewrite
That's awesome! It's cool to hear about how it's helping you with a long term project. I think you and I would agree that AI is at its best as a force-multiplier, synthesising your own ideas with its natural writing ability into something that wouldn't have existed otherwise, and still be quite personal despite perhaps having the signature hallmarks of a particular LLM - this is a much easier problem to solve with some effort than the alternative of writing from scratch.
I don't use it for fiction so much, but just being able to brain-dump my thoughts and have it neatly laid out with some thought-provoking questions added on has been extremely valuable to me. It's helped me record details that would have been lost otherwise. I've never been a good note-taker, I easily get bogged down in the particulars of structure and prose to the point of frustration, so I've never found it a pleasurable experience like some do. Being able to just let Claude take the wheel is pretty mind-blowing when you stop to remember this concept was a sci-fi fantasy two years ago.
Really I was expressing a frustration with the overwhelmingly singular fixation of a significant part of the LLM community being on ERP only, and that being the benchmark people are measuring models by. I've had some fun with it, its cool, I'm just kinda over it. I get a little disturbed by the sense that some people seem addicted to it, and are hanging out for their next hit of waifu sex text. I mean it's amazing that ERP works as well as it does, but it a bit sad to think that's all it is for some.
Oh god, thank you for saying this
The ERP community is also quite polarized, with some actually digging into fine-tuning and training, while the others are one of the laziest people I have ever seen that they would just keep asking which model is the best for ERP from time to time without doing the homework themselves or even bothering to Cmd+F and searching
If you’re interested in using high end models for story creation, do check out my site infiniteworlds.app. I’ve put a LOT of effort into how to get good interactive stories out of them.
very call, somewhat what deen was talking about I think. I'll take a look
Starting February 19, 2026, Anthropic will terminate and no longer support Claude 3.5 Sonnet or Claude 3.5 Sonnet v2.
Received from Vertex
well, Anthropic is yeeting 3.5 Sonnet earlier on their platform, October 22, 2025 https://docs.anthropic.com/en/docs/about-claude/model-deprecations#2025-08-13%3A-claude-sonnet-3-5-models
cc @keen stream for the 2 month countdown
@keen stream Anthropic just pulled 3.5 off their platform
rest in power, king 🙏
What happened to 3.5 Sonnet?
The description changed to this on Vercel:
The upgraded Claude 3.5 Sonnet is now state-of-the-art for a variety of tasks including real-world software engineering, agentic capabilities and computer use. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.