Hacker Newsnew | past | comments | ask | show | jobs | submit | throwaway314155's commentslogin

Do skills get access to the current context or are they a blank slate?

They execute within the current context - it's more that the content of the skill gets added to that context when it is needed.

> which presumably hasn't done a fresh pre-training over the web

What makes you think that?

> Did they figure out how to do more incremental knowledge updates somehow?

It's simple. You take the existing model and continue pretraining with newly collected data.


A leak reported on by semi-analyses stated that they haven't pre-trained a new model since 4o due to compute constraints.

GPT 4o was an MoE model as well.

> it is really hard to look at for me.

What were you expecting?


Affine texture mapping is kinda jarring to look at, especially in this GBA port since there is no fixup with huge ground polygons drifting around.

One of the listed features in the PS1 port in the OP article is tesselation to reduce the issues of the PS1 HW affine texture mapper, on the GBA you have some base cost of doing manual software texture mapping but also oppurtunities to do some minor perspective correction to lessen the worst effects (such as doing perspective correction during the clipping process).


The GBA version does actually leverage dynamic polygon splitting in direct reference to how PS1 games used this approach https://www.youtube.com/watch?v=1Oo2CZWbHXw&t=271s

I think the resolution makes it particularly rough though.


Almost felt like the second video (despite being older) looked better in terms of texture jumping, looking closer now the wandering textures actually seems more to be more of an clipping issue than directly perspective correction related.

I am probably misremembering but wasn’t super Mario 64 “flat shaded” ie no textures just colors?

You’re misremembering. SM64 was fully textured, outside of specific models.

Also flat shading (vs. say gouraud shading) is isomorphic to the question of texture mapping, and concerns how lighting is calculated across the surface of the polygon. A polygon can be flat shaded and textured, flat shaded and untextured, smoothly shaded and textured, or smoothly shaded and untextured.


(Too late to edit but did not mean “isomorphic”, meant “orthogonal”. Wrong smart person word trying to look smart, how embarrassing, sigh.)

Like a lot of N64 titles, there were many solid colour objects to save on RAM, but lots of things in the environment especially were textured too.

Nothing. I have zero expectations. Giving an honest take on what I saw is all.

Somehow I suspect Claude Code (in an interactive session with trial, error, probing, critiquing, perusing, and all the other benefits you get) would do better. This example seems to assume Claude can do things in "one shot" (even the later attempts all seem to conceal information like it's a homework assignment).

That's not how to successfully use LLM's for coding in my experience. It is however perhaps a good demonstration of Claude's poor spatial reasoning skills. Another good demonstration of this is the twitch.tv/ClaudePlaysPokemon where Claude has been failing to beat pokemon for months now.


Using https://github.com/anthropics/claude-code/tree/main/plugins/... with style-supporting instructions and context would've improved the outcome as well.

Is the skill effectively just adding a little extra context here though? Doesn’t strike me as the sort of context that would improve the outcome.

https://github.com/anthropics/claude-code/blob/main/plugins/...


thank you! I'll try this

Not a homework assignment, and no deliberate attempt to conceal information, just very long and repetitive logs. A lot of the same "insights" so I just didn't provide them here.

> That's not how to successfully use LLM's for coding in my experience.

Yeah agree. I think I was just a little surprised it couldn't one-shot given the simplicity.


Realistically there has to be _some_ time limit. No one is going to sit in a room for 10 hours while you finish your test.

Sure. I doubt that if some test at the moment takes an hour then you're getting much extra benefit at the five hour mark. The whole point of the time compression is to spread the grades out - along an axis different to "competence".

>whole point of the time compression is to spread the grades out

I suspect that is true for standardized tests like the SAT, ACT, or GRE.

I suspect in classroom environments that there isn't any intent at all on test timing other than most kids will be able to attempt most problems in the test time window. As far as I can tell, nobody cares much about spreading grades out at any level these days.


Why?

How strong is the argument that a student completing a test in 1 hour with the same score as a student who took 10 hours that the first student performed "better" or had a greater understanding of the material?


> Why?

Teachers have lives, including needing to eat and sleep.


Sure, but that answer doesn't address the questions of the value of time limits on assessment.

What if instead we are talking about a paper or project? Why isn't time-to-complete part of the grading rubric?

Do we penalize a student who takes 10 hours on a project vs the student who took 1 hour if the rubric gives a better grade to the student who took 10 hours?

Or assume teacher time isn't a factor - put two kids in a room with no devices to take an SAT test on paper. Both kids make perfect scores. You have no information on which student took longer. How are the two test takers different?


Not arguing with any of that, just stating plainly that there are practical reasons for time limits and one of the many reasons is that tests are done supervised and thus must have _some_ sort of time limit. Everything else is you projecting an argument onto me that I didn't make.

It has no idea what it's own knowledge cutoff is.

Knowledge cutoff date is usually part of the system prompt.

Helps you get useful answers like "I don't know that's too recent" when you ask questions like "who won the basketball game last night".


I opened up the Gemini app and asked:

Who are the Dallas Stars playing tonight? Is there anyone that will be out of the lineup for the Stars? What have their recent injuries been like? Can you get the current score?

It's response:

The Dallas Stars are playing the New York Rangers tonight, Tuesday, December 2, 2025, at Madison Square Garden.

Here is the information available regarding the rest of your questions:

Out of the Lineup & Recent Injuries *Lian Bichsel (Defenseman): He suffered a lower-body injury (ankle) in the recent game against the Senators and is expected to miss 6-8 weeks and require surgery. His absence will be felt on the blueline.

*Jake Oettinger (Goaltender): He is dealing with an illness and is not expected to start tonight; Casey DeSmith is projected to start in his place. Oettinger is expected to play tomorrow night in New Jersey.

*Thomas Harley (Defenseman): He remains week-to-week with a lower-body injury and is not with the team on their current road trip.

Other injured players who will likely be out of the lineup include Matt Duchene (Upper Body - on IR-LT, but joining the road trip), Adam Erne (Lower Body - on IR-LT), and Nils Lundkvist (Lower Body - on IR-LT).

Current Score The game is in progress, and the score is currently New York Rangers 1 - 1 Dallas Stars.

Would you like an update on the final score later, or news on a specific player's status?

I ask it things like updates on teams all the time, and it usually pulls the right information.


The consumer apps use RAG and traditional search to give the LLM recent information in the prompt when it answers your query. This basically bridges over the knowledge gap between the end of training and today.

I'm fully aware, I'm just wanting to point out to people that the actual AI apps they'll use can and do return recent information due to integrations like that. Lots of people think AI can only answer stuff in its training set, but it can answer anything from whatever data you hand to it, including any data on the internet.

Lots of AI tools can easily answer "who won the basketball game last night".


GCP is sort of blending this into their Gemini APIs.

https://ai.google.dev/gemini-api/docs/google-search

You don’t have to do RAG or use a serp scraper MCP, just add the Search Grounding tool to the APzi request and it does the rest at the model’s discretion and $0.014 / search.

I think that’s generally a fair price for my time vs doing my own search queries at 1/100th the speed. It could get expensive for deep research type queries.


The pricing may be reasonable, but Grounding with Google Search has extremely restrictive terms of use.

https://ai.google.dev/gemini-api/terms#grounding-with-google...


yup, I looked at it GwGS and it's not interesting. I want actual results, not a summary and some links to sites (not pages), then to go crawl those pages

Don't want Google as the middleman between the internet and my agent. Their search has become so bad I don't even use it any more


What?

It helps when questions intended to resolve ambiguity are not themselves hopelessly ambiguous.

See also: "Help me help you" - https://en.wikipedia.org/wiki/Jerry_Maguire


I'm confused. This just seems like feel-good bullshit advice that only works for people in extremely good circumstances.

There's a false equivalence between -

“All it takes is for one to work out.”

and the following:

- "You don’t need every job to choose you. You just need the one that’s the right fit."

- "You don’t need every house to accept your offer. You just need the one that feels like home. "

The latter assumes that _every_ attempt you make has a chance at being "the right fit", "the one that feels like home". That is not the way things works for 99% of us.


Unless you interpret 'working out' as being 'the right fit', then it comes together pretty nicely


I’d love to follow your work. Got a GitHub?


I do (same username), but I haven't published any of this (and in fact my Github has sadly languished lately); I keep working on it with the intent to publish eventually. The big problem with models like this is that the training dynamics have so many degrees of freedom that every time I get close to something I want to publish I end up chasing down another set of rabbit holes.

https://gist.github.com/cheald/7d9a436b3f23f27b8d543d805b77f... - here's a quick dump of my SVDLora module though. I wrote it for use in OneTrainer though it should be adaptable to other frameworks easily enough. If you want to try it out, I'd love to hear what you find.


This is super cool work. I’ve built some new sampling techniques for flow matching models that encourage the model to take a “second look” by rewinding sampling to a midpoint and then running the clock forward again. This worked really well with diffusion models (pre-DiT models like SDXL) and I was curious whether it would work with flow matching models like Qwen Image. Yes, it does, but the design is different because flow matching models aren’t de-noising pixels so much as they are simply following a vector field at each step like a ship being pushed by the wind.


Neat! Is that published anywhere?

It seems conceptually related to ddpm/ancestral sampling, no? Except they're just adding noise to the intermediate latent to simulate a "trajectory jump". How does your method compare?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: