More

mikaelaast · 2025-12-20T11:49:18 1766231358

Are we sure that unrestricted free-form Markdown content is the best configuration format for this kind of thing? I know there is a YAML frontmatter component to this, but doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process? I would like my agents to be inherently evaluable, and free-text instructions do not lend themselves easily to systematic evaluation.

coldtea · 2025-12-20T13:01:46 1766235706

>doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?

The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.

Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.

At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.

Etheryte · 2025-12-20T11:57:13 1766231833

The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.

mikaelaast · 2025-12-20T12:12:09 1766232729

Sure. Verifiability is far-fetched. But say I want to produce a statistically significant evaluation result from this – essentially testing a piece of prose. How do I go about this, short of relying on a vague LLM-as-a-judge metric? What are the parameters?

visarga · 2025-12-20T17:39:06 1766252346

You 100% need to test work done by AI, if it's code it needs to pass extensive tests, if it's just a question answered, it needs to be the common conclusion of multiple independent agents. You can trust a single AI as much as a HN or reddit comment, but you can trust a committee of 4 as a real expert.

More generally I think testing AI by using its web search, code execution and ensembling is the missing ingredient to increased usage. We need to define the opposite of AI work - what validates it. This is hard, but once done you can trust the system and it becomes cheaper to change.

JamesSwift · 2025-12-20T21:17:50 1766265470

How would you evaluate it if the agent were not a fuzzy logic machine?

The issue isnt the LLM, its that verification is actually the hard part. In any case, its typically called “evals” and you can probably craft a test harness to evaluate these if you think about it hard enough

coldtea · 2025-12-20T13:04:53 1766235893

Would a structured skills file format help you evaluate the results more?

mikaelaast · 2025-12-20T13:45:31 1766238331

Yes. It would make it much easier to evaluate results if the input contents were parameterized and normalized to some agreed-upon structure.

Not to mention the advantages it would present for iteration and improvement.

coldtea · 2025-12-20T16:34:59 1766248499

"if the input contents were parameterized and normalized to some agreed-upon structure"

Just the format would be. There's no rigid structure that gets any preferrential treatment by the LLM, even if it did accept. In the end it's just instructions that are no different in any way from the prompt text.

And nothing stops you from making a "parameterized and normalized to some agreed-upon structure" and passing it directly to the LLM as skills content, or parsing it and dumping it as skills regular text content.

hu3 · 2025-12-20T12:16:42 1766233002

At least MCPs can be unit tested.

With Skills however, you just selectively append more text to prompt and pray.

joshka · 2025-12-20T23:25:57 1766273157

The DSPy + GEPA idea for this mentioned above[1] seems like it could be a reasonable approach for systematic evaluation of skills (not agents as a whole though). I'm going to give this a bit of a play over the holiday break to sort out a really good jj-vcs skill.

[1]: https://news.ycombinator.com/item?id=46338371

heliumtera · 2025-12-20T17:46:52 1766252812

Then rename your markdown skill files to skills.md.yaml.

There you go, you're welcome.

mikaelaast · on Jan 13, 2024

I love this. I tried doing sub-pixel simulation for a tool I created (screenstab.com if anyone’s interested – yeah I know, shameless plug, etc.). I ended up abandoning the sub-pixel aspect in my shader because of the distracting patterns caused by the Moire effect.

mikaelaast · on July 30, 2023

Shameful (-less?) plug: I’ve implemented dithering for creating bead patterns from images in my web application Beadifier https://www.beadifier.com

arketyp · on July 30, 2023

Neat. Would be nice if there were some examples of what a beads rendition looks like. Maybe it's obvious for people in the game. I assume they are hexagonal?

mikaelaast · on May 19, 2023

The link again, but clickable: https://www.linkedin.com/posts/erke_chatgpt-chatgptplugins-s...

mikaelaast · on April 1, 2023

While Figma can be a useful tool for aligning design with code, I think it's unrealistic to expect it to accommodate all the constraints of the web platform. Relying solely on a 1-to-1 mapping between Figma components and code components can be problematic and may not accurately reflect the complexities and nuances of web development.

mikaelaast · on March 31, 2023

> The designers should be working with the developers to implement their vision

I agree with the importance of this. I guess my gripe is with the fact that at the end of the day, the burden of formalizing anything that gets put on the web is on the shoulders of developers, even in the case of expressing design language, as this usually isn’t discernibly structured until the developer starts typing out code.

Jtsummers · on March 31, 2023

> I guess my gripe is with the fact that at the end of the day, the burden of formalizing anything that gets put on the web is on the shoulders of developers

Welcome to the world of system development. This has always been the case unless your customer is operating at a similar technical level and can formalize the requirements in your own language (or near enough). Your designers are able to formalize their requirements, but using a domain of discourse that your developers are unfamiliar with, and probably missing details your developers need because they, the designers, are unfamiliar with the domain of discourse your developers use. This always happens, no matter the field. Each group has their own domain language with its own notion and degree of formalization. As the developer, it falls on them to ensure their understanding is correct. The same would be said for non-software development efforts. An architect has the same problem with their customers, and a builder has the same problem with the architect.

It is certainly frustrating, but that frustration has to be overcome. Unless your customer (designers in your case) are intransigent and refuse to communicate when asked for clarification or refinement of details or feedback on a partial implementation, then this is a surmountable problem.

mikaelaast · on April 1, 2023

Acknowledging that formalization typically occurs in the development process, it's worth noting that it's often the developer who initiates and enforces it. However, it would be beneficial for designers to see formalization as an integral part of the design process itself, which could lead to more efficient collaboration between designers and developers, ultimately streamlining the entire development process.

mikaelaast · on March 31, 2023

OP here. I propose a concept to create a tool similar to Figma but focused on designing screen reader experiences. The goal is to encourage designers to formalize and structure their work in a way that would consider accessibility and user experience for those relying on screen readers.

This new tool would require designers to think about the semantics and hierarchy of the content, forcing them to consider not just the visual presentation, but also the underlying structure that screen readers rely on. By doing so, designers would have to make their design intentions more explicit and less open to interpretation by front-end developers.

I shared this idea with my coworkers and the reception was lukewarm.

flappyeagle · on April 2, 2023

It got a lukewarm reception because it doesn’t take into account how design works and how it adds value. Designers are not engineers. Forcing them to formalize early in the process makes them less efficient and hinders their ability to explore widely.

Figma has a lot of features like symbols, auto layout and design system support which can be used to introduce formality and structure.

mikaelaast · on April 2, 2023

I feel like your second paragraph undermines the first one.

flappyeagle · on April 2, 2023

It doesn’t. Read more carefully.

mikaelaast · on March 20, 2023

Thanks for your comment! I'm glad it caught your attention and stood out as unique. Developing this has taken an inordinate amount of effort, so it's rewarding to see it recognised. I set out to create something different, and you made me feel like I accomplished that. Thanks for your support!

mikaelaast · on March 19, 2023

OP here! I can read it fine, personally. There's a trade-off between shininess and legibility.

mikaelaast · on Feb 10, 2023

OP here. This really blew up. I actually made this back in 2021 (doesn't seem long ago), and probably tried posting it to HN back then, to no avail. I just posted it again on a whim, because I felt I was on a roll with my previous post on here about the metal skeuomorphism thingamajig (https://www.metalmorphism.com). If you liked this project, and want to see what I've been up to in my spare time lately, feel free to check out that discussion: https://news.ycombinator.com/item?id=34707160

varenc · on Feb 10, 2023

It looks like your 2021 post on this was decently successful! Here’s the discussion back then: https://news.ycombinator.com/item?id=26088625