Synth Speech

edited August 2012 in Development
Hi All,

Can anyone recommend an easy way to get a *subroutine that I can call in asm that will give a decent sounding phrase?

I've had a look at a few speech programs on WOS (eg ones where you type Y1E1 for a given similar sound), which are quiet good but I'd have to hack the code to get the phrase I want (which I'm not good at), as I don't want the entire program.

Basically I'm after a gravelly throaty sounding "YEAHHH!!" a bit like Heihachi would say in tekken but a bit more protracted, maybe as if said by a WWF wrestler.

Would I need to find a kindly soul with a Currah uspeech and a theatric nature?

Any thoughts?

RT

*not too long
Post edited by R-Tape on
«1

Comments

  • edited July 2012
    Went to play with the AY one, unfortunately it seems to use a DOS program for converting to TAP which won't run under Win7 64bit. :/ Tbh I think that rules it out of anything after XP.

    Shame, wouldn't have minded trying it.
  • edited July 2012
    If you need just one word, and have a bit of free memory, use a sample. It will be way much better than a speech synthesizer, and probably not even larger. Old speech synthesizers aren't capable to give any expression in the voice.
  • edited July 2012
    RobeeeJay wrote: »
    Went to play with the AY one, unfortunately it seems to use a DOS program for converting to TAP which won't run under Win7 64bit. :/ Tbh I think that rules it out of anything after XP.

    Shame, wouldn't have minded trying it.

    DosBox works quite happy with Win 7 x64 for all DOS needs
  • edited July 2012
    zerohour wrote: »
    DosBox works quite happy with Win 7 x64 for all DOS needs

    Or you could recompile the source to run under CMD if you're feeling brave.
  • edited July 2012
    zerohour wrote: »
    DosBox works quite happy with Win 7 x64 for all DOS needs

    It's a Windows app that has an embedded DOS application in it. I'm not sure DosBox is going to run that. :p

    aowen wrote: »
    Or you could recompile the source to run under CMD if you're feeling brave.

    Got a link to it? :)

    Probably easy to fix, it appears to extract an internal copy of the DOS program to the temp directory, so I'm sure it would be easy to change that so it uses an external one.
  • edited July 2012
    How about this? It's a simple 3-bit PCM sample player using PWM, with a "yeaaaah" sample. Samples have to be digitized at 18kHz.
    http://www.atc.us.es/~rodriguj/yeaaaah.tap

    EDIT: Just added this version, that uses Sigma-Delta modulation to perform DtoA conversion. Samples are 4-bit signed. Sounds louder than the PWM version.
    http://www.atc.us.es/~rodriguj/yeaaah_sigma_delta.tap
  • edited July 2012
    aowen wrote: »

    Ta Andrew, it's Beeper I'm after so have been looking at the second one. The 'amusing' instructions are a bit light but I should be able to suss it. Depending how theatrical I'm feeling I might try to use a wav file of my own voice.
    How about this? It's a simple 3-bit PCM sample player using PWM, with a "yeaaaah" sample. Samples have to be digitized at 18kHz.
    http://www.atc.us.es/~rodriguj/yeaaaah.tap

    EDIT: Just added this version, that uses Sigma-Delta modulation to perform DtoA conversion. Samples are 4-bit signed. Sounds louder than the PWM version.
    http://www.atc.us.es/~rodriguj/yeaaah_sigma_delta.tap

    Excellent thanks! A bit more sinister than I had in mind but certainly captures the feeling I was after :-). Could I get a bit more detail on how you made this, I'm not familiar with the things you mention?

    I see there's 10k of code above 49152, more than I was hoping. Is that the amount of memory I'll need to leave for a phrase like this?
  • edited July 2012
    Amount of memory depends from required quality. You can get an 'yeah' sample in 1-2K with low quality.
  • edited July 2012
    Shiru wrote: »
    Amount of memory depends from required quality. You can get an 'yeah' sample in 1-2K with low quality.

    2k would be good. The use of it will be tongue in cheek anyway, so quality can suffer quite a lot.
  • edited July 2012
    R-Tape wrote: »
    Excellent thanks! A bit more sinister than I had in mind but certainly captures the feeling I was after :-)
    Thanks! I tried to sound as Hulk Hogan :D , as you requested.
    R-Tape wrote: »
    Could I get a bit more detail on how you made this, I'm not familiar with the things you mention?

    PWM and Sigma-Delta are two techniques used to generate an analog signal from a stream with pulses with only two values. You can search for both things at Wikipedia, for instance, which will explain them to you in detail.

    So, what I've done is, for PWM, to take one sample at a time, and generate a wave with a duty cycle proportional to the value of the sample. For Sigma-Delta, I've implemented a 4-bit SD modulator, and used the output of it to drive bit 4 of port 254.
    R-Tape wrote: »
    I see there's 10k of code above 49152, more than I was hoping. Is that the amount of memory I'll need to leave for a phrase like this?

    It's a sound sample of about 1s, recorded at 18kHz, using one byte to store two samples (so, we have 4 bits of resolution, actually 3 for PWM to keep the PWM frequency -the high pitch you can hear along with the yeahhhh! sound- as high as possible). 1 second needs 18000 bytes (8-bit samples), so if using 4-bit samples, we need 9000 bytes (actually some more because the audio sample is slighly longer than 1 second).

    You can shorten the amount of memory needed by recording a shorter sample, or by halving the sampling frequency.
  • edited July 2012
    mcleod_ideafix suggests advanced methods that use major amount of memory but provide high quality not seen in games of the past.

    For comprasion, here is what you could get with the most basic method, under 2K.
  • edited July 2012
    I think I can shorten the amount of memory needed if I perform the SD modulation offline. I output two bits for each 4 bit sample, so storing the actual SD outputs will need half the memory.
  • edited July 2012
    Shiru wrote: »
    mcleod_ideafix suggests advanced methods that use major amount of memory but provide high quality not seen in games of the past.

    PWM has been used in the past for sure. It's a very straightforward way to do analog. I even figured it out myself without even knowing that it already existed :D

    As for Sigma-Delta, I knew of it while finding a way to output analog signals from a FPGA. I used them in my clone to generate both sound output (ULA + AY) and video output (3-bit resolution per colour component). It's a kind of "intelligent PWM" and besides, it's way faster than conventional PWM.

    For "advanced methods" I'd rather think of Viterbi coding, which seems to have been used to optimize a sequence of samples so they can be played using the AY chip with an equivalent resolution of 8 bits.
  • edited July 2012
    Shiru wrote: »
    mcleod_ideafix suggests advanced methods that use major amount of memory but provide high quality not seen in games of the past.

    For comprasion, here is what you could get with the most basic method, under 2K.

    That's perfect quality! Especially as it's only 2k. If I could get something close to mcleod_ideafix;s example in the same space it would be perfect.
    I think I can shorten the amount of memory needed if I perform the SD modulation offline. I output two bits for each 4 bit sample, so storing the actual SD outputs will need half the memory.

    Sounds great, something around the 2-5k mark would work for me.

    Your "YEAHH!" was SO close to what I was after (Shirus example is a bit too clean). The only difference is that I was hoping for something deeper & more gravelly, as if Tom Waits shouted it, and a bit shorter/punchier. I'll need to do a web search for a prime example.

    I'm still thoroughly amused by your Hulk Hogan impression :-D !

    YEAHH!!
  • edited July 2012
    I think I can shorten the amount of memory needed if I perform the SD modulation offline. I output two bits for each 4 bit sample, so storing the actual SD outputs will need half the memory.

    That said: this version uses the same Sigma-Delta coding, but the actual coding has already been performed in the sample loaded, so that each bit in each byte of the loaded sample is the current output from the SD coder. The player is very basic, which outputs each bit of each byte in the sample block in sucession.

    http://www.atc.us.es/~rodriguj/yeaaah_sigma_delta_offline_version.tap
  • edited July 2012
    R-Tape wrote: »
    The only difference is that I was hoping for something deeper & more gravelly, as if Tom Waits shouted it, and a bit shorter/punchier. I'll need to do a web search for a prime example.

    Here you have some "yeahhs" :)
    http://www.freesound.org/search/?q=yeah&page=1#sound

    PS: I don't know who is Tom Waits. I'll search and see if I can model my voice after him...

    PPS: Oh oh. This is going to be difficult...
    Wikipedia wrote:
    Waits has a distinctive voice, described by critic Daniel Durchholz as sounding "like it was soaked in a vat of bourbon, left hanging in the smokehouse for a few months, and then taken outside and run over with a car."
  • edited July 2012
    This is the closest one I found in terms of length and feel. The only thing would be it needs to be deeper and rougher.

    If you're still willing and can produce something <5k I'd love to include (and credit) your work :-) (your first example was very close re roughness by the way).

    Haha, love the description of Tom Waits, one of few men who sing like they're gargling gravel, don't worry about matching him ;-).

    YEAHH!
  • edited July 2012
    PWM has been used in the past for sure. It's a very straightforward way to do analog. I even figured it out myself without even knowing that it already existed :D
    I don't know any ZX game of the past that uses PWM for sampled sound on beeper. They all either use normal '1 bit - one output' approach, or the weird SpeakEasy coding that encodes time between changes (results in larger data often, and produces carrier for silence). If you know games that do use PWM, would be nice to know the titles.
  • edited July 2012
    Shiru wrote: »
    If you know games that do use PWM, would be nice to know the titles.

    I was going to say "Cobra's Arc" because I remember that in the review Microhobby did about this game, they mentioned that the game featured speech, and they described it as "a whisper coming out the speaker". So in my memory I had associated that to a PWM-coding style. Turned out that the speech is synthetic, not sampled, and they probably used the very same speech routine Microhobby featured in its own magazine.

    That said: no. I don't know of a game that uses PWM-conding for playing digitized sounds (in fact, I only know some Spectrum games... I've never been a gamer). It would be strange for me if there were not even a single game that used it. It's a very popular coding scheme.
  • edited July 2012
    R-Tape wrote: »
    This is the closest one I found in terms of length and feel. The only thing would be it needs to be deeper and rougher.

    This is how that sample sounds after offline Sigma-Delta coding. It's about 1.8K of code+data.
    http://www.atc.us.es/~rodriguj/yeaaah_rtape.tap

    It sounds a lot better if you can turn the "treble" knob of your amplifier all the way off, leaving only "bass". I'm sure I can improve tough, as I'm using full 8 bit samples. That in theory is better than using 4 bits, but for SD coding, the frequency with which samples have to be outputted to the speaker has to be 2^(n+1)*Fmax, where n is the number of bits used (8) and Fmax is the highest frequency present in your sample (10kHz, as the sample rate for this sample has been adjusted at 20kHz). If I lower "n", then the output sampling frequency can be lower, thus reducing the amount of "hissss" at the output...

    ... or at least, that's what I think :D

    At this moment I cannot record live sounds (it's 1:24 AM here in Spain and my wife is sleeping. A loud "yeaaaah" at this time, and I will have to sleep on the couch). Tomorrow I will try some yeaaaahs using that sample as guide.
  • edited July 2012
    Love these type of topics. :)
  • edited July 2012
    So what sampling did Odin use for their games. Nodes, Arc and Robin used some sample techique. The 128 veesions obviously used the AY but but the others didn't
  • edited July 2012

    At this moment I cannot record live sounds (it's 1:24 AM here in Spain and my wife is sleeping. A loud "yeaaaah" at this time, and I will have to sleep on the couch). Tomorrow I will try some yeaaaahs using that sample as guide.

    Sounded good to me, and 2k! Indeed the hiss is more pronounced.

    One cheeky request if you find time to have a go at this: could it be compiled at 56064 or above?, I have a buffer is at 49152.

    Apologies to your wife in advance who may think you have gone crazy.

    YEAHH!
  • edited July 2012
    Shiru wrote: »
    For comprasion, here is what you could get with the most basic method, under 2K.


    Wow, very nice.
    ZX81/ZX Spectrum/Amiga/Atari music: http://yerzmyey.i-demo.pl/
  • edited July 2012
    R-Tape wrote: »
    One cheeky request if you find time to have a go at this: could it be compiled at 56064 or above?, I have a buffer is at 49152.

    Here you have all the source code and auxiliary files :)
    http://www.atc.us.es/~rodriguj/sigma_delta_encoder_and_player.zip
  • edited July 2012
    mcleod_ideafix, do you really want to make R-Tape release his project under GPL?
  • fogfog
    edited July 2012
    zerohour wrote: »
    So what sampling did Odin use for their games. Nodes, Arc and Robin used some sample techique. The 128 veesions obviously used the AY but but the others didn't

    http://www.worldofspectrum.org/infoseekid.cgi?id=0008714

    could be used.. as mentioned already.. decent affordable samplers didn't come around till later.. (akai 950 etc)

    but synclaiver has been used on some c64 things.. can't speak for speccy

    it's far better to do it pc to spec now for sure , as obv. you can totally mute the volume in pauses with a wav editor (wavsaur etc)
  • edited July 2012
    Shiru wrote: »
    mcleod_ideafix, do you really want to make R-Tape release his project under GPL?

    By default, I make my contributions to be GPL licensed, but I have no problem changing that. Let's R-Tape tells us the way he'd preffer to do it. Would a BSDL license suit better?
  • edited July 2012
    The speccy is crying out for an up to date speech package!

    Don't suppose anyone fancies writing 'Speechola'? ;-)

    PM sent re previous post
Sign In or Register to comment.