Using https://github.com/kvcache-ai/ktransformers/, an intel/amd laptop with 128GB RAM and 16GB VRAM can run the IQ4_XS quant and decode about 4-7 token/s, depending on RAM speed and context size.
Using llama.cpp, the decoding speed is about half of that.
Mac with 128GB RAM should be able to run the Q3 quant, with faster decoding speed but slower prefilling speed.
Assuming you already know what context in terms of LLMs, prefilling is the process of converting the current conversation into tokens and passing that into the LLM.
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
From: https://huggingface.co/deepseek-ai/DeepSeek-V2.5