bigodanktime's comments

bigodanktime · on April 5, 2023

What do you mean by work. The underlying page cache will keep much of the data actual cached if it's recent. Even databases like PostGreSQL use this to their advantage (https://github.com/postgres/postgres/blob/master/src/backend...).

astrange · on April 5, 2023

Copying the file backed pages to heap memory and possibly having to swap them out.

bigodanktime · on April 5, 2023

I may have parsed your statement incorrectly, but I'm assuming you are talking about the copy of data when using either mmap or File IO (memcpy versus write) Whether you do File IO versus mmap, there's going to be copy. With files, the copy occurs within kernel space with data being copied into the pages in the buffer cache, with mmap the copy occurs in userspace with data being copied into the address space. Swapping can occur in the buffer cache or mmap, this is why so many databases implement their own buffer cache to ensure specific data isn't flushed, leaving them in an inconsistent state.

An advantage of copying in userspace is the ability to use more performant instructions to perform the memcopy, which the kernel does not typically have access to (https://www.mongodb.com/blog/post/getting-storage-engines-re...)

astrange · on April 5, 2023

> With files, the copy occurs within kernel space with data being copied into the pages in the buffer cache, with mmap the copy occurs in userspace with data being copied into the address space.

There is no copy with mmap, the page is either unwritable or CoW. There's always a copy with read(). (But read() can still be faster and more memory efficient nevertheless.)

> An advantage of copying in userspace is the ability to use more performant instructions to perform the memcopy, which the kernel does not typically have access to (https://www.mongodb.com/blog/post/getting-storage-engines-re...)

Darwin kernel does though.

I believe Linux uses the builtin old memcpy instructions on Intel, just to force CPU vendors to keep them usable.

bigodanktime · on April 5, 2023

> There is no copy with mmap

You are right, if you are directly modifying the mmaped region. I always internally model my data as staging my changes to be synchronized to the mmaped region, so thats my mistake there.

> the page is either unwritable or CoW.

This is not universally true, or maybe I'm confused on this statement. MAP_SHARED exists, but maybe you are referencing a specific kernels' implementation on how they achieve coherence between file backed shared memory regions in two processes? Im not sure.

> Darwin kernel does though.

Sure we can always point to a kernel that has has implemented some feature or another, which is why I said typically you don't see it.

saagarjha · on April 5, 2023

> Darwin kernel does though.

It does not. Compare the implementation of _bcopyout against _platform_memmove, you'll see the difference :)

astrange · on April 5, 2023

Huh, maybe I was thinking of "you can use floating point in the kernel".

That doesn't work in every kernel because they don't want to bother saving/restoring the extra registers.

saagarjha · on April 6, 2023

To be entirely honest I'm not sure why the kernel doesn't use better routines here, I think on ARM at least it saves the entire NEON state on context switch…

bigodanktime · on April 5, 2023

I can understand these folks struggling with what mmap is actually doing. But this isn't a new discussion about the qualities of MMAP versus file based IO etc. Although, many of the comments stated are quite wrong.

Related Work on this problem: 1. https://www.mongodb.com/blog/post/getting-storage-engines-re... - talks about developments on MongoDB's backend to use mmap. 2. https://www.pdl.cmu.edu/PDL-FTP/Database/p13-crotty.pdf - Talks about some of the cons of mmap, some I think are not as prevalent due to the existence of low latency, high throughput storage devices. 3. https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/_my_direct_up... - less relevant but related.

bigodanktime · on Feb 10, 2022

I'm going to self plug here on work I'm apart of that does persistent processes, although the motivation is different. I do think this paper does a good job of informing the reader on why these sort of features would be really cool that this project does not necessarily dive into. But it requires widening the API and allowing for developers to choose how they persist.

https://www.rcs.uwaterloo.ca/pubs/sosp21-aurora.pdf

bigodanktime · on Jan 14, 2022

Many conferences are starting to adopt a badge system and will evaluate your artifact. And this is becoming more and more popular, and I know many researchers that will keep these badges in mind when reading the evaluation in the paper. For example here is the artifact evaluation that was done at SOSP 2021 https://sysartifacts.github.io/sosp2021/results.html.

hannob · on Jan 14, 2022

These badges are kinda controversial. The message they send out is "we give you an extra goodie if you do proper science, because we don't expect that to be the default".

Thus badges can become a kinda excuse for not fixing stuff by default.

bigodanktime · on Jan 9, 2022

Professors are rewarded for publications and almost nothing else (at least from what I've seen at the school I'm apart of), so because of it all effort goes towards publications, and this passes down to students and the folks they hope to hire. Its also unfortunate any work that is not done towards a publication is essentially invisible to everyone within that system. Even things like teaching are considered second class to publications (very unfortunate consequence).

That being said this is from my experience and am unsure how it is at other schools.

kurthr · on Jan 9, 2022

Well, early in their carriers they are rewarded for grants and supporting many grad students. It's problematic to not award tenure to a professor who is supporting 30 students. If they leave with their grants and only take a few (of the best) students, leaving the rest of the students in the lurch and very (rightfully?!) angry at the department dean/provost. Usually, some other means of supporting them is found since it's a little embarrassing otherwise.

After tenure, it seems like profs drop down to a manageable 4-7 students in the disciplines I studied in. You can do that on 1-2 grants a year.

As others have previously mentioned it's a bit weird, because actually teaching is neither taught nor really supported by most research universities. It's often required, but not all that advantageous to do well.

bigodanktime · on Jan 9, 2022

Its unfortunate with teaching as I see it as an opportunity to extend the good values of research and problem solving to students (at least in the later years of a degree).

I guess it depends on how you view an academic institution, is it to produce good thinkers or produce good work, although similar seems like you approach it in different ways when approaching it from one way or the other.

I'm not even upset with professors, I think Assistant Profs do the most work I've ever seen anyone ever do, its absolutely insane the crazy workload they take on to get tenure.

robwwilliams · on Jan 9, 2022

I think this demand for focus is more pervasive at top tier institutions. I intentionally chose a second tier research institution with great colleagues in my area, then neuroscience, so that I could indulge my whims and T-type style of science. I “wasted” two years playing with databases for genetics and then published the first paper in biomedical science with a URL in 1994 (Portable Dictionary of the Mouse Genome). That service is still running as wwww.genenetwork.org and has been a terrific catalyst for much high impact work.

The advantage now of T type approaches is that, as the source points out so clearly, it gives you flexibility to grow, shift fields, and collaborate efficiently.

throwawaygh · on Jan 9, 2022

But that's a good example of a high-risk/high-reward investment in research infrastructure.

The article mentions, for example, maintaining the group's build system and making project logos. Different in kind, no?

PhD students should not be recruited with the expectation that they will do devops, software engineering, and graphic design work. Or, if that's the labor they're doing, universities should maybe start paying them a fair wage for their labor.

Building something like genenetwork.org in 1994 was exciting R&D for the time. Setting up a CI/CD pipeline or making a project logo in 2022 isn't.

Pseudomanifold · on Jan 9, 2022

> PhD students should not be recruited with the expectation that they will do devops, software engineering, and graphic design work. Or, if that's the labor they're doing, universities should maybe start paying them a fair wage for their labor.

Fully agree with you there---and for me, that was never the point of the article.

> Building something like genenetwork.org in 1994 was exciting R&D for the time. Setting up a CI/CD pipeline or making a project logo in 2022 isn't.

Yes, at the same time, maybe setting up CI/CD is needed nonetheless in order to better manage a shared codebase. My point is that PIs should be aware of these things and also recognise this sort of 'foundational' work whenever it happens. Of course, if a lab needs complex DevOps, then of course they should hire someone _dedicated_ for the role.

My observations pertain to the kind of work that happens behind the scenes and is often unrewarded and unrecognised. As I said: academia is doing a disservice to the people that are willing to (to re-use the example you supplied) set up a CI pipeline for their project or teach others how to write code cleanly, etc. Of course, the best approach would be to have dedicated roles for dedicated tasks in research (and beyond).

bachmeier · on Jan 9, 2022

> Professors are rewarded for publications and almost nothing else

This is true for professors at research universities in grant fields. Their job is to bring in grants. Teaching is of minor importance in those fields, to the point that they don't need you to teach at all if you're that bad at it. Other fields don't have much in the way of grants, so teaching is given considerably more weight at all but the wealthiest institutions.

periheli0n · on Jan 9, 2022

This depends heavily on the university employing the professor. At least in the UK there are universities where one can progress to the highest academic rank by excellence in teaching. Given that such universities draw the majority of their income from students, this makes economical sense.

Whether the economisation of higher education is a good thing is a different question, of course. But in the UK there seems to be a strong drive towards separating teching and research. And to be honest, it becomes increasingly difficult to be good at teaching and research at the same time. If done properly, either is a full time job.

g_p · on Jan 9, 2022

The UK's "multiple tracks" leading to professor are indeed attractive, however at some institutions it's not really a "true" professorship - some might be fixed-term contracts, and (at least in another track) it isn't recognised by outsider funders as a full professorship, or indeed as a true academic post.

Teaching is generally under-valued across the board - graduate students are expected to teach as part of their PhD work very often, sometimes paid extra for this. I've seen some unscrupulous departments making no sincere attempts to help students finish their PhD on time, then turn around and offer up scraps like hourly teaching "contracts" to teach their classes, after their PhD funding has run out.

Separating teaching and research, if it leads to both being equal peers, can make sense. I fear, however, that teaching will continue to be the un-favoured step-sibling in institutions which have research income. Research income and grants let you build a fiefdom of underlings, but teaching generally doesn't - one professor can teach a class of 200, and run a couple of tutorial sessions with a team of 8 post-grad students.

The scaling of teaching is very attractive to the university, but it doesn't get the professor people under them, to make them more important.

I've seen people be hired into "active teaching" academic roles who are manifestly incapable of teaching. Their "research track record" seemingly made up for their car-crash presentation to the department during the interview process. After the pile of (entirely foreseeable, by anyone at the presentation) student complaints flooded in, they ended up not needing to teach. There's definitely an implicit assumption that "teaching is easy, and anyone can do it" lingering around in a lot of departments.

I don't think we should entirely split research and teaching - it's very much possible to be T-shaped and good at both. Indeed, being able to engage a room full of tired students first thing on a Monday morning is a skill many academic presenters would benefit from having, purely in learning how to better communicate their research. Unless departments take a much wider, more holistic view of what is expected though, this won't be valued or change, as far as I can see at least.

periheli0n · on Jan 9, 2022

> I've seen some unscrupulous departments making no sincere attempts to help students finish their PhD on time, then turn around and offer up scraps like hourly teaching "contracts" to teach their classes, after their PhD funding has run out.

That is getting increasingly difficult to pull off, since students (and attorneys) have figured out thatched can sue the university of the don‘t fulfil their part of the PhD agreement. Universities are totally paranoid about being sued, and all sorts of administrative arse-covering is the result.

bigodanktime · on Dec 22, 2021

Counts what you are afraid against. There's always some side channel attack that could possibly used to gain information, even on VM's this is true. Off the top of my head there could be some timing attack to gain information on which libraries others are using by reading in libraries and seeing if they are warm in the buffer cache, counts if you care about sharing the same kernel. I generally find them secure enough considering how fast they can be brought up and down.

antihero · on Dec 22, 2021

Can a process in them exec on the rest of your system?

marcosdumay · on Dec 22, 2021

The fundamental thing about those features (and the equivalent on every system except Windows) is that you can never get more capabilities, only less. Once you are in a jail, there is no API for getting out of it.

You can't even see a binary from the rest of your system, and exec won't get you out.

bigodanktime · on Dec 22, 2021

A great wrapper UI I have used for FreeBSD Jails is iocage (https://iocage.readthedocs.io/en/latest/). Its a great project.

Timothycquinn · on Dec 22, 2021

I was about to say the same. iocage is now the default jails wrapper on FreeNAS, which means that there is good documentation and support. The previous jails wrapper used by FreeNAS was not very well documented, but was written by some smart folks.

One thing I like about iocage is how easy it is to grant the jail access to the host ZFS datasets.

On a Jails note, I have had issues creating a jail that can do network inspection. I believe this is an issue with network restrictions of the jails subsystem itself. Eg, I could never run nmap or get mac addresses of remote hosts from within a jail.

bigodanktime · on Nov 24, 2021

I think you are assuming all these things are independent of one another. I think deep engagement with customers directs your problem statement. Build the smallest possible thing to test the idea you are proposing. If you are trying to publish than its a requirement to be reading papers in general, you need to be able to argue why this idea you are testing is orthogonal to other ideas being proposed. As time goes on you just scan important conferences in your field and get better at knowing which papers you need to skim versus deeply understand.

Edit: Also intelligence is an over-rated metric, persistence is far more important when it comes to these hard problems.

rmbyrro · on Nov 24, 2021

I agree, persistence, discipline and ordering of daily activities.

It's very easy to get lost in day to day mess.

bigodanktime · on July 11, 2021

Lots of interesting work is being done in this area (Currently doing research around serverless at the moment). Cold start up times still remain a pretty large issue (125ms start up for VM is still quite large) but some interesting papers trying to attack this, through strategies like snapshotting!

https://arxiv.org/pdf/2101.09355.pdf

Also predicting function calls to properly schedule and reduce cold start latency

https://www.usenix.org/system/files/atc20-shahrad.pdf

Dunedan · on July 11, 2021

These 125ms are only the startup time of the MVM and don't include additional latency introduced by optimizing the code package and the involvement of the placement service.

You can also avoid the cold start penalties entirely, if you're willing to pay extra for provisioned concurrency [1].

[1]: https://docs.aws.amazon.com/lambda/latest/dg/configuration-c...

bigodanktime · on July 11, 2021

This seems to be a solution that comes at the cost of the consumer which is fine if they want to pay for it, and seems to be an option provided for more latency sensitive applications.

Obviously, one could eliminate the cold start issue in general by just constantly paying for a running EC2 instance.

But cold start is still an issue for the provider as cold start is a cost to them, even reducing cold start of internal runtimes would be a massive benefit (For example pre-warmed JITs). Better cold start times means better bin packing for their services, and overall less cost to everyone.

bigodanktime · on July 11, 2021

Pretty sure its just gRPC calls, and firecracker passes events along to its firecracker-containerd services.

My guess is java is slow as it works really well when the JIT can optimize your code. Longer running functions will very likely outperform python.