For the physical hardware I use the esp32-s3-box[1]. The esphome[2] suite has firmware you can flash to make the device work with HomeAssistant automatically. I have an esphome profile[3] I use, but I'm considering switching to this[4] profile instead.
For the actual AI, I basically set up three docker containers: one for speech to text[5], one for text to speech[6], and then ollama[7] for the actual AI. After that it's just a matter of pointing HomeAssistant at the various services, as it has built in support for all of these things.
I assume it's very similar to what Home Assistant's backing commercial entity Nabu Casa sells with the "Home Assistant Voice PE" device, which is also esp32-based. The code is open and uses the esphome framework so it's fairly easy to recreate on custom HW you have laying around.