Hacker Newsnew | past | comments | ask | show | jobs | submit | banjoe's commentslogin

Wow, crushing 2.5 Flash on every benchmark is huge. Time to move all of my LLM workloads to a local GPU rig.


Just remember to benchmark it yourself first with you private task collection, so you can actually measure them against each other. Pretty much any public benchmark is unreliable at this moment, and making model choices based on other's benchmarks is bound to leave you disappointed.


This. Last benchmarks of DSv3.2spe hinted at beating basically everything, yet in my testing even sonnet is miles ahead both in terms of speed and accuracy


Why would you use an Omni model for text only workload... There is Qwen3-30B-A3B.


Except the image benchmarks are compared against 2.0, which seems suspicious that they would casually drop to an older model for those.


This is a clone of https://autoshorts.ai/


I still need to talk very fast to actually chat with ChatGPT which is annoying. You can tell they didn't fix this based on how fast they are talking in the demo.


"If opportunities for role fulfillment fall far short of the demand by those capable of fulfilling roles, and having expectancies to do so, only violence and disruption of social organization can follow."

People need a purpose


A VS Code Extension to help you write code in every language.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: