Training your own will be very difficult. You will need to gather so much data to get a model that has basic language understanding.
What I would do (and am doing) is just taking something like llama3 or mistral and adding your own content using RAG techniques.
But fair play if you do manage to train a real model!