Being on a Windows system might make things more tough simply because most of the fast language processing type libraries are going to have some C code with them and it's not unusual for that code to not be Windows friendly. You'll want to probably look into what the Mycroft project is doing since they've tackled a lot of this stuff. Bear in that their project isn't offline and that voice recognition and natural language processing is complicated and tends to be CPU intensive.
Though my response might be late and perhaps half the solution, I highly recommend pythons SpeechRecognition library as well as pocketsphinx. As for text to speech a simple way would be to utilize win32com.client for more info: https://stackoverflow.com/questions/1614059/how-to-make-JUMP_LINK__&&__python__&&__JUMP_LINK-speak