I'm not a hardware guy, so I don't know if this is feasible or has ever been implemented, but:
I could imagine "polite" interrupts—where instead of the processor immediately jumping into the ISR's code, it simply places the address of the ISR that "wants to" run into an in-memory ring-buffer via a system register, and then the OS can handle things from there (by e.g. dedicating a core to interrupt-handling by reading the ring-buffer, or just having all cores poll the ring-buffer and atomically update its pointer, etc.)
The major difference with this approach is that pushing the interrupt onto the ring-buffer wouldn't steal cycles from any of the cores; it would be handled by its own dedicated DMA-like logic that either has its own L1 cache lines, or is associated to a particular core's L1 cache (making that core into a conventional interrupt-handling core.) Therefore, you could run hard-real-time code on any cores you like, without needing to disable/mask interrupts; delivering interrupts would become the job of the OS, which could do so any way it liked (e.g. as a POSIX signal, a Mach message, a UDP datagram over an OS-provided domain socket, etc.) Most such mechanisms would come down to "shared memory that the process's runtime is expected to read from eventually."
There would still be one "impolite" hardware interrupt, of course: a pre-emption interrupt, so that the OS can de-schedule a process, or cause a process to jump to something like a POSIX signal handler. However, these "interrupts" would be entirely internal to the CPU—it'd always be one core [running in kernel code] interrupting another [running in userland code.] So this mechanism could be completely divorced from the PIC, which would only deliver "polite" interrupts. (And even this single "impolite" interrupt you could get away from, if the OS's userland processes aren't running on the metal, but rather running off an abstract machine with a reduction-based scheduler, like that of Erlang.)
Schemes like that are in fact implemented by some devices on top of the existing PCIe interrupt mechanism. For example, GPUs have many different interrupt sources, so a common technique is to have an interrupt ring buffer that the GPU writes to, which contains all the information about the interrupt source and additional payload data.
An actual PCIe interrupt is sent to the CPU only when that interrupt ring buffer goes from empty to non-empty, and the driver's interrupt handler simply reads the whole ring buffer contents.
It seems like your scheme would require dedicating an entire core to kernel interrupt handling, all the time (because if you let every core run userspace, and then a network packet arrived, it wouldn't be handled until some core went back into the kernel for another reason).
That seems strictly worse than the current design.
I could imagine "polite" interrupts—where instead of the processor immediately jumping into the ISR's code, it simply places the address of the ISR that "wants to" run into an in-memory ring-buffer via a system register, and then the OS can handle things from there (by e.g. dedicating a core to interrupt-handling by reading the ring-buffer, or just having all cores poll the ring-buffer and atomically update its pointer, etc.)
The major difference with this approach is that pushing the interrupt onto the ring-buffer wouldn't steal cycles from any of the cores; it would be handled by its own dedicated DMA-like logic that either has its own L1 cache lines, or is associated to a particular core's L1 cache (making that core into a conventional interrupt-handling core.) Therefore, you could run hard-real-time code on any cores you like, without needing to disable/mask interrupts; delivering interrupts would become the job of the OS, which could do so any way it liked (e.g. as a POSIX signal, a Mach message, a UDP datagram over an OS-provided domain socket, etc.) Most such mechanisms would come down to "shared memory that the process's runtime is expected to read from eventually."
There would still be one "impolite" hardware interrupt, of course: a pre-emption interrupt, so that the OS can de-schedule a process, or cause a process to jump to something like a POSIX signal handler. However, these "interrupts" would be entirely internal to the CPU—it'd always be one core [running in kernel code] interrupting another [running in userland code.] So this mechanism could be completely divorced from the PIC, which would only deliver "polite" interrupts. (And even this single "impolite" interrupt you could get away from, if the OS's userland processes aren't running on the metal, but rather running off an abstract machine with a reduction-based scheduler, like that of Erlang.)