Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
EagnaIonat
2 days ago
|
parent
|
context
|
favorite
| on:
New benchmark shows top LLMs struggle in real ment...
Models have different nuances though. Llama4 for example you have to explicitly ask it not to output its CoT, whereas GPT you don't.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: