Ask HN: Creating image recognition model with own data set?

I have a fairly unique set of images and data that I am hoping to use to create and train an image recognition system.

There are 100k+ images which are categorized with generic tags (such as conference room, reception area) as well as specific tags identifying the particular manufacturer / product visible in the image (Herman Miller Aeron Chair). The product-specific tags also contain X&Y coordinates for where the products are in the photo (there are roughly 50k of these).

If you were going to create your own detection model, what would you use?