I've managed to get it running in koboldcpp, had to add --forceversion 405 because it wasn't being detected properly, even with q5.1 I was getting an impressive 15 T/s and the code actually seemed decent, this might be a really good candidate for fine-tuning on large datasets and passing massive contexts of basically entire small repos or at least several full files
Odd they chose neox as their model, I think only ctranslate2 can offload those? I had trouble getting the GPTQ running in autogptq.. maybe the huggingface TGI would work better