#SWE Bench Verified

6 messages · Page 1 of 1 (latest)

radiant arrow
smoky flumeBOT
#

Hi @radiant arrow. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Warp achieved a top score on SWE-bench Verified (71% in June, and later 75.8%) and published detailed blog posts about their results and methodology. However, the knowledge sources do not provide a specific explanation for why Warp does not appear on the public SWE-bench Verified leaderboard. The blog posts confirm that Warp’s results were achieved using their production agent and a custom evaluation harness, and that their results are representative of real-world usage, but there is no mention of official leaderboard submission status or reasons for absence from the leaderboard itself.

It’s possible that Warp’s results were self-reported or not submitted for public listing, but the knowledge sources do not explicitly state this. If you need a definitive answer, you may want to reach out to the Warp team or the SWE-bench maintainers directly for clarificationWarp scores 71% on SWE-bench Verified.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@radiant arrow Kapa.ai is still learning and improving, please let me know how I did by reacting below.

radiant arrow
#

This does not answer the question, it just specullates

dusk phoenix
#

@radiant arrow I don't work at warp but I am somewhat familiar with the swe-bench submission process, you can find their submission here https://github.com/SWE-bench/experiments/pull/278 and on the leaderboard they are not on the default filter, but You can modify the filter (just select all) and you will see the warp submission on the leaderboard.