That’s what this is. It’s caching the state of the model after the tokens have been loaded. Reduces latency and cost dramatically. 5m TTL on the cache usually.
Interesting! I’m wondering, does caching the model state mean the tokens are no longer directly visible to the model? i.e. if you asked it to print out the input tokens perfectly (assuming there’s no security layer blocking this, and assuming it has no ‘tool’ available to pull in the input tokens), could it do it?
The model state encodes the past tokens (in some lossy way that the model has chosen for itself). You can ask it to try and, assuming its attention is well-trained, it will probably do a pretty good job. Being able to refer to what is in its context window is an important part of being able to predict the next token, after all.
Theres no difference between feeding an LLM a prompt and feeding it half the prompt, saving the state, restoring the state and feeding it other half of the prompt.
Ie. The data processed by the LLM is prompt P.
P can be composed of any number of segments.
Any number of segments can be cached, as long as all preceeding segments are cached.
The final input is P, regardless.
So; tldr; yes? Anything you can do with a prompt you can do, becasue its just a prompt.
When the prompt is processed, there is an internal key-value cache that gets updated with each token processed, and is ultimately used for inference of the new token. If you process the prompt first and then dump that internal cache, you can effectively resume prompt processing (and thus inference) from that point more or less for free.
Not just out, both directions can be tricky to measure. It is hard to say for certain how many potential kcal you're consuming are actually absorbed by the body. If you see whole corn kernels in the toilet, those kcal didn't count :)
But yes. CICO is and always has been absolutely true. People are just overly reductive in how they measure both sides, and then claim that CICO is garbage.
In my experience, the most reliable way to understand your body's calorie needs is through consistent measurement:
1. Log everything you eat each day.
2. Weigh yourself first thing the next morning, before eating.
3. Track the trend (did you gain, lose, or maintain?)
Over time, clear patterns emerge. You start to see exactly how your intake maps to weight changes, and you can fine-tune accordingly. It’s not guesswork, it’s feedback.
What surprised me most was how little food I actually needed. Even with regular strength training, a modest surplus was enough to support muscle growth.
Aren't calorie numbers on foods just made up numbers anyway? I'm no expert but I'm pretty sure that a body's method of metabolizing food is not the same as oxygen burning it. They might offer a standardized number, and a basis for comparison, but other than that it's not reflective of anybody's reality.
You need to get orders of magnitude right at first. I find keeping a punned tab with any AI works pretty well. Drop 5 words with every meal or snack and thats it.
It’s ok to assume that you absorb 100% of what you eat, unless you see evidence to the contrary, and no corn kernel poop doesn’t count. Frequent diarrhea, weight loss, skin rash, and basically any symptom of vitamin or mineral deficiency.
>It’s ok to assume that you absorb 100% of what you eat
That's not really true. If you've ever done the keto diet, you know that your body expels unburned ketones through your breath, sweat, and urine. Protein can be used to repair structures rather than burned or stored for energy.
There's also something called the "thermic effect of feeding". Your body requires more energy to process protein (20-30% of calories consumed) than it does carbs (5-10%) than it does fats (0-3%).
There are many ways for food to not be 100% absorbed, which I think can most easily be demonstrated by eating a bag of nuts and waiting a day or two
I don't think it's unreasonable to think that different bodies absorb food in different ways (or proportions), particularly given what we've seen about the gut microbiome
In response to, "but those are rats", I think it's a lot easier to cast doubt on "100% of food is always absorbed" vs "I don't think that always holds true"
I mean, heck: if there are no residual calories in human waste, how can it burn?
My original point: it's ok to assume you absorb 100%.
About the rat thing: the cico hypothesis point of view might look at whether meal timing affecting energy expenditure first, rather than assuming meal timing change digestive absorption.
There is not much point in getting in the weeds about how much you absorb, unless you're running trials on yourself like changing when you eat, or what you eat, and leaving all other things equal like calorie intake and expenditure.
The best dieting strategies I've seen track calories in and weight change. From their you derive calorie expenditure, and it really doesn't matter if you burned it or pooped it out, does it?
It’s not like almonds have x calories for a certain group and y calories for another.
Being wrong about the number of calories in almonds doesn’t count as evidence that skinny people are skinny because they poop out undigested calories.
Also, I’m not saying digestive malabsorption is impossible, just that you shouldn’t assume it unless you have strong evidence to the contrary that doesn’t have another simpler explanation.
CICO is more of an upper bound, but people like to incorrectly use it as a lower bound. Meaning that it's true that you can't burn more calories than you eat, but you can certainly eat many more calories than you store as fat.
(Hell, CICO isn't even valid for something as "simple" as an electric vehicle. My EV's end-to-end efficiency is quite a bit different depending on whether I'm charging from 120V or 240V, the outside temperature at charging time, the outside temperature at driving time, and a handful other factors like state-of-charge. The human body is even more complicated.)
He also brought us IC-Light! I wonder why he's still contributing to open source... Surely all the big companies have made him huge offers. He's so talented
I think he is working on his Ph.D. at Stanford. I assume whatever offers he has haven't been attractive enough to abandon that, whether he’ll still be doing open work or get sucked into the bowels of some proprietary corporate behemoth afterwards remains to be seen, but I suspect he won't have trouble monetizing his skills either way.
Wan 2.1 (and Hunyuan and LTXV, in descending ordee of overall video quality but each has unique strengths) work well—but slow, except LTXV—for short (single digit seconds at their usual frame rates — 16 for WAN, 24 for LXTV, I forget for Hunyuan) videos on consumer hardware. But this blows them entirely out of the water on the length it can handle, so if it does so with coherence and quality across general prompts (especially if it is competitive with WAN and Hunyuan on trainability for concepts it may not handle normally) it is potentially a radical game changer.
For completeness, I should note I'm talking about the 14B i2v and t2v WAN 2.1 models; there are others in the family, notably a set of 1.3B models that are presumably much faster, but I haven't worked with them as much
Wan 2.1 is solid but you start to get pretty bad continuity / drift issues when genning more than 81 frames (approx 5 seconds of video) whereas FramePack lets you generate 1+ minute.
I don’t totally understand the point of this. Why would you want to use a Canvas renderer for this use case? If you want to render a massive table, apps will render a subset of it on regular HTML elements like EveryUUID [1].
uWrap exists to efficiently predict varying row heights for list and grid virtualization[1], a technique for UI performance optimization when rendering large, scrollable datasets.
EveryUUID's virtual grid can assume every cell is the same height, but it's much more difficult if you assume cells have wrapped text. This is further complicated if you allow grid resizing.
> a static files pipeline for Django with whitenoise, how is that not included by default?
It is. They have a file server in debug mode and recommend something like nginx for serving files in production (and provide a collectstatic command to make that easy).
People shouldn’t be using a WSGI server to serve static media. Whitenoise shouldn’t exist.
I came back to this thread after realizing I whitenoise would solve my current problem...
I'm working on a small internal tool using Django. When I turned debug off, my files stopped serving. And for this small deployment, I really don't want to have to require a separate nginx server. I get it now.
I have been guilty of this. I will sometimes use MST when I should use MDT due to muscle memory. And if I say MT it could be ambiguous when you consider Arizona (which doesn’t observe daylight savings).
I will not write “X city local time” though, I will take the extra time to make sure my timezone is correct.
Often "X city local time" is what you want though. If I schedule a recurring meeting at 1 PM, most people will expect it recur at 1 PM local time. A 1 PM EDT meeting would become noon EST when the clocks switch, and no-one wants a lunchtime meeting.