“The output of the network is a matrix of character probabilities over time. In other words, for each time step the network outputs one probability for each character in the alphabet, which represents the likelihood of that character corresponding to what’s being said in the audio at that time.”Tagged: Machine Learning, Text To Speech