Hacker Newsnew | past | comments | ask | show | jobs | submit | plagiarist's commentslogin

IMO it should need a third party running the LLM anyway. Otherwise the evaluated company could notice they're receiving the same requests daily and discover benchmarking that way.

With the insane valuations and actual revenue at stake, benchmarkers should assume they're assessing in an adversarial environment. Whether from intentional gaming, training to the test, or simply from prioritizing things likely to make results look better, targeting benchmarks will almost certainly happen.

We already know large graphics card manufacturers tuned their drivers to recognize specific gaming benchmarks. Then when that was busted, they implemented detecting benchmarking-like behavior. And the money at stake in consumer gaming was comparatively tiny compared to current AI valuations. The cat-and-mouse cycle of measure vs counter-measure won't stop and should be a standard part of developing and administering benchmark services.

Beyond hardening against adversarial gaming, benchmarkers bear a longer term burden too. Per Goodhart's Law, it's inevitable good benchmarks will become targets. The challenge is the industry will increasingly target performing well on leading benchmarks, both because it drives revenue but also because it's far clearer than trying to glean from imprecise surveys and fuzzy metrics what helps average users most. To the extent benchmarks become a proxy for reality, they'll bear the burden of continuously re-calibrating their workloads to accurately reflect reality as user's needs evolve.


But that's removing a component that's critical for the test. We as users/benchmark consumers care that the service as provided by Anthropic/OpenAI/Google is consistent over time given the same model/prompt/context

Might as well have the free tokens, then, especially if it is an open benchmark they are already aware of. If they want to game it they cannot be stopped from doing so when it's on their infra.

I was looking into what local inference software to use and also found this behavior with models to be onerous.

What I want is to have a directory with models and bind mount that readonly into inference containers. But Ollama would force me to either prime the pump by importing with Modelfiles (where do I even get these?) every time I start the container, or store their specific version of files?

I had trying out vLLM and llama.cpp as my next step in this, I'm glad to hear you are able to share a directory between them.


> What I want is to have a directory with models and bind mount that readonly into inference containers.

Yeah, that's basically what I'm doing, + over network (via Samba). My weights all live on a separate host, which has two Samba shares, one with write access and one read-only. The write one is mounted on my host, and the container where I run the agent mounts the read-only one (and have the source code it works on copied over to the container on boot).

The directory that LM Studio ends up creating and maintaining for the weights, works with most of the tooling I come across, except of course Ollama.


I wish a fitting comeuppance upon all the grifters taking up a seat which could've been filled by someone actually interested in governance.

these are exactly the kinds of people interested in governance. That's the problem

whew, good thing Nigel Farage is a straight shooter interested strictly in good governance.

Risky life close to the edge you're sailing there...

"If the wind changes, you'll get stuck that way."



For your comment or mine? I'm happy to share.

All the same, you can have a long successful careeer, but you say nice things about Nigel the one time and forever after they'll call you a goat f*r and throw milkshakes on you :/


My comment was sarcastic

That seemed clear when I first replied ...

There are a lot of Android devices that look temping until one discovers how out-of-date the firmware is.

With no option to install your own, of course. Boot loaders should be exclusively for running the manufacturer's lone security update from 5 years ago.


With Linux I think you do have the option of encrypting with your own cert using the PCKS#11 module on the Yubikey.

That's interesting, thank you, I will definitely look into this.

He really will just close a ticket because he disagrees with how Linux works. I read about systemd sysusers and thought they would be neat for running containerized services. But Poettering doesn't like the /etc/subuid files and refuses to work with them.

Well, he specifically doens't like the static allocation of subuids. There is a reason `systemd-nsresourced` exists.

How do I have nsresourced work in a regular systemd service or quadlet so that I can have an ephemeral user run a container? I am trying to find information and just seeing it as part of nsspawn, that seems to require a container specifically built around a root filesystem.

I am not going to struggle with systemd if I have to build containers specifically for it. If I have to rearrange everything I am doing I would just learn to do it on a minimal Kubernetes install instead.


nspawn containers aren't really any different to regular system images/archives other than they don't need a kernel.

I don't think the setting is exposed to regular service units (it might be able to in the future, I don't know) and I don't think podman has any integration with it.

What kinda service do you have where you need a full range of UIDs?


I don't need a full range. I would just like to run podman under a non-root user using regular system services. Especially where a persistent volume or bind mount is involved.

Let's say Home Assistant. It would be nice to have a have some system user "homeassistant" with no home directory that owns the process and owns its /var/whereever/config.conf . It would be nice to have the isolation on host in addition to the isolation via container. But I don't want to be rebuilding any containers to get that, unless I am misunderstanding something on nsresourced.

I'd be really pleased with that setup. MQTT could be its own system user. And HA could depend on MQTT so I have nice startup behavior. Etc.

IDK how to have system users like this run a container without the subuid range. Even when I create the users with ranges in the file, there seems to be problems with informing systemd (as a non-root user) that the running process is different from the one it started.


podman quadlet doesn't seem to support running at a "system level" as a non-root user, at least according to their docs[0]. I assume they make some assumptions which wouldn't hold up if the user actually changed when running at a system level, dunno.

> But I don't want to be rebuilding any containers to get that, unless I am misunderstanding something on nsresourced.

Setting up the user namespace would be part of the container manager and not the containers themselves, so they shouldn't need any rebuilding or special handling (possibly the files might need to be shifted into the "foreign ID" range[1, 2], but I might be lying with this and this isn't necessary for this usecase) but the container manager needs to be specifically make use of nsresourced.

I really think currently the best option is to go with either systemd as your "container manager" (e.g. just regular system files with sandboxing or nspawn images or maybe systemd-portabled[3]) or podman as your container manager. As much as I too would love to mix them, I don't think it's the best idea (at least in the current state) and just go with what is more suited for the task (in your case it sounds like podman would be the most suited option).

> there seems to be problems with informing systemd (as a non-root user) that the running process is different from the one it started.

Yea, I don't think systemd likes double forking. The best option would be to keep the process that spawned your actual process alive until the child exists and just bubble up the exit code. There is the `PIDFile=` option with `Type=forking`, but I haven't used it, nor looked much into it.

[0]: https://docs.podman.io/en/v5.7.1/markdown/podman-systemd.uni...

[1]: https://www.freedesktop.org/software/systemd/man/latest/syst...

[2]: https://systemd.io/UIDS-GIDS/#special-systemd-uid-ranges

[3]: https://systemd.io/PORTABLE_SERVICES/


And you cannot remove it on every motherboard because some of the firmware blobs are signed. You cannot remove their keys and leave only your own.

I see the problem here. So, actually, the ones in masks who are randomly assaulting (sometimes murdering) nonviolent bystanders are ICE, not the protestors. Hope that helps.

I am talking about protestors who obstruct law enforcement and their operations. Protestors who threaten regular people and law enforcement. Protestors who damage other people's property. Protestors who violate noiseordnance. Protestors who are trespassing.

I am not referring to actual bystanders. Implying that I am is purposefully being ignorant of what I am talking about.


The FBI should investigate the first item in the Bill of Rights.

The problem is not systemd vs SysV et al, the problem is systemd spreading like a cancer throughout the entire operating system.

Also trying to use systemd with podman is frustrating as hell. You just cannot run a system service using podman as a non-root user and have it work correctly.


Quadlet actually solves this. It's the newer way to define containers for systemd and handles the rootless user case properly. I migrated my services to it recently and it's much more robust than the old generate scripts.

Could you give an example system-level quadlet that accepts connections on a low port, like 80, but runs the actual container as a non-root user (and plays nice with systemd, no force kill after timeout to stop, no reporting as failed for a successful stop)?

My understanding is quadlet does not solve this, and my options are calling "systemctl --user" or "--userns auto". I would love to be wrong here.


As an alternative solution to the sibling comment, I do run everything rootless in systemd --user so my services don't have access to privileged ports, and use firewall rules to redirect the external interface low ports, to the local high ports (that sounds annoying but in practice I only redirect a single port - 443 - to traefik and the use it to route to the right container service depending on domain)

I solved the port 80 issue by adding AmbientCapabilities=CAP_NET_BIND_SERVICE to the Service section of the unit file. That lets you bind privileged ports while still defining a User= line to run non-root. The lifecycle management seems solid in my experience, no force kills required.

Well, thank you, I will give it a try

Quadlet are great but running podman via systemd as a non root user worked perfectly well before quadlets and I have no idea what your parent is talking about (I'm currently in the process of converting my home services from rootless podman over systemd to quadlet)

Fair, it worked, but podman generate systemd is deprecated now. I found the generated unit files pretty brittle to maintain compared to just having a declarative config that handles the lifecycle.

I agree 100%, I was stuck without quadlet in previous Debian stable so I had to work with systemd generate, but quadlets are undoubtedly better, and I was looking forward to upgrade Debian just for that, and now that I did, I'm really happy to migrate. Especially custom container image management is so much smoother.

> You just cannot run a system service using podman as a non-root user and have it work correctly.

Err... You just need to run `podman-compose systemd`?

I have my entire self-hosted stack running with systemd-controlled Podman, in regular user accounts.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: