We may be talking past each other. I am on the research side of academic computing/informatics and have faced these struggles my whole career, encountering some very reluctant IT divisions.
We have had to bite the bullet and use colo facilities to self-host internet-facing deployments that the overhead-funded IT groups would not touch with a ten foot pole. From these experiences, I also acquired a more nuanced perspective on the IT division perspective and constraints, and how they derive from overall organizational policy and economics. We also had funny situations where we tried to help other PIs benefit from our new-found independence, and immediately regretted it. They did not understand what self-hosting means. I think anybody trying to toss integrations over the fence to an ops team needs to have an extended tour of duty trying to operate their own solutions in production WITHOUT assistance before they form bold opinions about operations constraints.
When there are strong time-to-market constraints (which includes publishing papers in academics), you are forced to find solution points that are different than if you are planning to run something for long periods at low overhead and low accumulative risk. These solution points also have to take into account the staffing and resources for that ongoing production.
Those things like bleeding edge libraries and assumption-breaking deployment constraints are the headache for ongoing operations and maintenance. It's not enough to have an existence proof that some complex integration can be built and passes its tests. You need a plan for how all the components will be maintained, patched, and upgraded. You need contingency planning when some of those bleeding edge components are going to become deprecated. You need to consider what staff capabilities are assigned to do that maintenance work or what will happen when the institutional knowledge used to form the original integration is not on-call to reintegrate it in the face of unexpected events.
> “I think anybody trying to toss integrations over the fence to an ops team needs to have an extended tour of duty trying to operate their own solutions in production WITHOUT assistance before they form bold opinions about operations constraints.”
I think this is one of the worst possible attitudes to have. It’s a petty way to feel, desiring some type of “I’ve seen some shit” tough guy credential more than supporting the stuff needed to actually solve business problems.
If you hire people whose value add to your company is inventing completely new things, including deployment, ops, scaling, etc., that goes along with that, then it is the job of infrastructure on the other side of that fence to happily and eagerly accept whatever is tossed over the fence, to understand why developer teams made the choices they made, and to take an attitude of supporting as much as possible.
> “You need a plan for how all the components will be maintained, patched, and upgraded. You need contingency planning when some of those bleeding edge components are going to become deprecated. You need to consider what staff capabilities are assigned to do that maintenance work or what will happen when the institutional knowledge used to form the original integration is not on-call to reintegrate it in the face of unexpected events.”
Yes, of course. But all this is already what dev teams are doing. Ops / infra is not taking a hare-brained plan and adding these robustness aspects into it. Not at all. Instead they take plans from application teams and try to use policy to minimize their own maintenance burden, even when that optimization is antithetical to what the company requires at a more fundamental level.
A lot of companies languish and die because of sociological dysfunction in the policy interface between dev teams and infrastructure. The more that infrastructure has political control of that interface, the closer to death is that company.
It’s like a body that is disallowed from generating white blood cells in response to a new immune challenge. Even if the bleeding edge integrations are really hard, the alternative world where you slow them down with policy is death and attrition.
We have had to bite the bullet and use colo facilities to self-host internet-facing deployments that the overhead-funded IT groups would not touch with a ten foot pole. From these experiences, I also acquired a more nuanced perspective on the IT division perspective and constraints, and how they derive from overall organizational policy and economics. We also had funny situations where we tried to help other PIs benefit from our new-found independence, and immediately regretted it. They did not understand what self-hosting means. I think anybody trying to toss integrations over the fence to an ops team needs to have an extended tour of duty trying to operate their own solutions in production WITHOUT assistance before they form bold opinions about operations constraints.
When there are strong time-to-market constraints (which includes publishing papers in academics), you are forced to find solution points that are different than if you are planning to run something for long periods at low overhead and low accumulative risk. These solution points also have to take into account the staffing and resources for that ongoing production.
Those things like bleeding edge libraries and assumption-breaking deployment constraints are the headache for ongoing operations and maintenance. It's not enough to have an existence proof that some complex integration can be built and passes its tests. You need a plan for how all the components will be maintained, patched, and upgraded. You need contingency planning when some of those bleeding edge components are going to become deprecated. You need to consider what staff capabilities are assigned to do that maintenance work or what will happen when the institutional knowledge used to form the original integration is not on-call to reintegrate it in the face of unexpected events.