Juliette Foucaut - 05 Apr 2019
I decided to write this post after reading about a very cool procedural NPC name generator and thinking that it might be of interest to show a much more basic example. This post is intended for people who have never used procedural generation and know very little programming. The examples are written in Python. I'll do my best to keep things simple and introduce the complexities progressively.
The algorithm is basic: names are generated by randomly assembling four syllables. First I'll explain how it's built, then the features I added to it to make sure the names are within an arbitrary size range, and more importantly, unique.
A couple of years ago I wrote a simple procedural username generator in the Shadok language for our players registration system (more about 'Les Shadoks' below) for fun. I wanted the default names to be automatically generated at registration to make it fast and painless. I also wanted the players' default name to be amusing yet neutral to encourage people to personalise their name. I'd originally intended the Shadok name generator to be a temporary solution and a bit of a joke, but ended up keeping it.
But before we start, let me tell you about 'Les Shadoks'. It's an animated series that first aired on French television 50 years ago. I remember watching it as a child. Most people in France have heard of them. The Shadoks are aliens that look a little like birds. They don't have much memory so their language is made of only four syllables: 'Ga', 'Bu', 'Zo' and 'Meu'
skip to 1'42'' to listen to the Shadok syllables 'Ga', 'Bu', 'Zo', 'Meu'.
Using a small number of chosen syllables guarantees they'll be readable when concatenated (attached one after the other) at random. With 'Ga', 'Bu', 'Zo' and 'Meu', we can create many combinations resulting in words. For instance:
'Ga'
'BuGa'
'MeuGaBuZo'
'ZoMeuMeuBuZoGa'
Let's start with an empty word, called 'word':
word = ''
Next, we put the syllable in an indexed array called 'syllables':
syllables = [ 'Ga', 'Bu', 'Zo', 'Meu' ]
In the syllables array, 'Ga' is at index 0, 'Bu' at index 1, 'Zo' at index 2 and 'Meu' at index position 3. Hence as an example, retrieving the value stored at index 2 in the syllables array returns 'Zo':
syllables[ 2 ] = 'Zo'
To pick a syllable at random, all we have to do is generate a random number between 0 and 3, and use it as the index from the syllables array:
import random random_index = random.randint( 0, 3 ) syllable = syllables[ random_index ]
Now all we have to do is add the syllable at the end of the word:
word = word + syllable
Writing this in one line by nesting the commands we get:
word = word + syllables[ random.randint( 0, 3 )]
This adds one random syllable to the word. For example assuming the random index is = 2
word = '' word = word + syllables[ 2 ]
is equivalent to
word = '' + 'Zo' (empty string with 'Zo' added at the end)
hence
word = 'Zo'
Note - in python, instead of random.randint() you can use random.choice() to the same effect:
word = word + random.choice( syllables )
I've chosen to use indexes as it's a more common programming concept.
What about adding several syllables to a word?
Let's make a word with five syllables. For this we can either write the command five times as such:
word = syllables[ random.randint( 0, 3 )] + syllables[ random.randint( 0, 3 )] + syllables[ random.randint( 0, 3 )] + syllables[ random.randint( 0, 3 )] + syllables[ random.randint( 0, 3 )]
Or write a loop:
import random word = '' MAX_SYLLABLES = 5 # maximum number of syllables allowed. It's written in uppercase # to remind us it's not supposed to change when we run the program. count_syllables = 0 # count how many syllables are in the word while count_syllables < MAX_SYLLABLES: # as long as there are syllables to add word = word + syllables[ random.randint( 0, 3 )] # add a random syllable to the word count_syllables = count_syllables + 1 # increment the syllables counter by 1 print word
The loop runs 5 times (count_syllables = 0 to 4). At the end of the fifth run, count_syllables = 5. In other words count_syllables = MAX_SYLLABLES hence the loop start condition (count_syllables < MAX_SYLLABLES) is no longer valid and the loop ends.
Every time we run the program, we get a different, five syllables long word:
'ZoBuZoMeuBu'
'GaGaZoBuZo'
'BuMeuGaMeuGa'
'MeuMeuMeuMeuMeu' (this one fortuitously means 'the end' in Shadok)
What if we wanted to increase the chances of specific syllables appearing more often than others in the final word? For instance 'MeuMeuMeuMeuMeu' currently has a 1/1024 chance of occuring ('Meu' has one chance out of four of being randomly chosen every time -five- we add a syllable). An easy way to increase the probability is to add duplicates of 'Meu' in the syllables array:
syllables = [ 'Ga', 'Bu', 'Zo', 'Meu', 'Meu', 'Meu' ]
'Meu' now has three chances out of six (or one out of two) of being selected each time a syllable is added to the word, hence 'MeuMeuMeuMeuMeu' now has a whooping 1/32 chance of occuring. All the other words will also have a lot more 'Meu' syllables included:
'MeuBuGaMeuMeu'
'GaMeuMeuZoGa'
'MeuGaZoGaMeu'
'GaMeuMeuMeuMeu'
import random syllables = [ 'Ga', 'Bu', 'Zo', 'Meu', 'Meu', 'Meu' ] word = '' MAX_SYLLABLES = 5 count_syllables = 0 while count_syllables < MAX_SYLLABLES: word = word + syllables[ random.randint( 0, 5 )] # note there are now 6 indexes to pick from count_syllables = count_syllables + 1 print word
Before we continue let's undo this change and go back to equal probability for all syllables.
We've procedurally generated our first Shadok words. Now let's make them more interesting by varying the number of syllables in each word, say between one and five.
MIN_SYLLABLES = 1 MAX_SYLLABLES = 5
Let's use random again to pick a number between one and five. This number will be the number of syllables in the word we'll generate. We'll call it num_syllables.
num_syllables = random.randint[ MIN_SYLLABLES, MAX_SYLLABLES ]
The modified algorithm looks like this:
import random syllables = [ 'Ga', 'Bu', 'Zo', 'Meu' ] word = '' MIN_SYLLABLES = 1 MAX_SYLLABLES = 5 num_syllables = random.randint[ MIN_SYLLABLES, MAX_SYLLABLES ] count_syllables = 0 while count_syllables < num_syllables : word = word + syllables[ random.randint( 0, 3 )] count_syllables = count_syllables + 1 print word
We run it several times and we get words of varying lengths:
'GaBuZoGaMeu'
'BuGaMeu'
'BuZoGaBu'
'Zo'
'BuMeuZo'
We've just seen we could make long and short words by randomising the number of syllables they're made of. But there is a problem. The length of a word is usually expressed in the number of letters it contains.
For instance I want to generate player names and I want them to be neither too short, nor too long. Hence I arbitrarily decide I want them to be between 3 and 12 characters long. In these conditions:
'Zo' - 1 syllable - 2 characters - too short
'Meu' - 1 syllable - 3 characters - ok lengthwise
'ZoBuGaGaZoZo' - 6 syllables - 12 characters - ok lengthwise
'MeuBuGaMeuMeu' - 5 syllables - 13 characters - too long
So we can't rely on the number of syllables to control the length of our words. To work around this let's calculate the min and max number of syllables.
Shadok syllables are 2 or 3 characters long. Since we don't know whether the randomly picked syllable will be 2 or 3 characters long, we'll need minimum 2 and maximum 4 syllables to make sure we get a word that's 3 to 12 character long. This restricts the number of words we can generate to around 330.
'GaZoGaBu'
'MeuGa'
'BuBu'
'GaMeuBuMeu'
'MeuZoZo'
To get around this restriction and make sure 'Meu' and 'ZoBuGaGaZoZo' can still be included, let's make the minimum syllables 1, the max 6. We'll then check the length of the word as we generate it. If the word is too short, we override the requested number of syllables to add more. If it risks becoming too long we stop adding any.
Let's change our program to consider min and max length and add tests to the loop:
import random syllables = ['Ga', 'Bu', 'Zo', 'Meu'] word = '' MIN_LENGTH = 3 MAX_LENGTH = 12 MIN_SYLLABLES = 1 MAX_SYLLABLES = 6 num_syllables = random.randint( MIN_SYLLABLES, MAX_SYLLABLES ) count_syllables = 0 # Run the loop to add syllables while: # (we still have syllables to add) OR (the word is shorter than the minimum required) while ( count_syllables < num_syllables ) or ( len( word ) < MIN_LENGTH ): syllable = syllables[ random.randint( 0, 3 )] if len( word + syllable ) > MAX_LENGTH: # test if the predicted word is too long break # Word would be too long if we added the syllable. Stop the loop. word = word + syllable # Length will be ok. We can add the syllable to the word. count_syllables = count_syllables + 1 print word
We can now generate approximately 2000 different words filling the full range of 3 to 12 characters.
'ZoZoMeuMeuBu'
'BuBuGa'
'Meu'
'GaZoBuGaGaZo'
'GaBuZoMeu'
Leaving procedural generation aside for a moment to improve the algorithm: we can calculate the number of syllables from our list of syllables and the desired word lengths (so we can easily change our requirements for the length of words or use a different set of syllables.) and move all the code into the function 'generate()'.
import random # Generate a word from syllables def generate(): syllables = ['Ga', 'Bu', 'Zo', 'Meu'] word = '' MIN_LENGTH = 3 MAX_LENGTH = 12 # Calculate the number of syllables max_length_syllable = 0 min_length_syllable = 100 for syllable in syllables: if len(syllable) > max_length_syllable: max_length_syllable = len(syllable) if len(syllable) < min_length_syllable: min_length_syllable = len(syllable) min_syllables = MIN_LENGTH / max_length_syllable max_syllables = MAX_LENGTH / min_length_syllable num_syllables = random.randint( min_syllables, max_syllables ) # Generate the word count_syllables = 0 while ( count_syllables < num_syllables or len( word ) < MIN_LENGTH ): syllable = syllables[ random.randint( 0, 3 )] if ( len( word + syllable ) > MAX_LENGTH ): break word = word + syllable count_syllables = count_syllables + 1 return word print generate()
Feel free to skip this section and go straight to the practical solution.
We're now able to generate around 2000 different Shadok words of varying lengths. But how do we deal with duplicates? The easiest way is to use a set (an unordered collection of unique elements in python). When we need a new name, we generate it and check it against the set. If it's not in the set, we use it and add it to the set so we know it's already been used in preparation for the next time we need a new name. Otherwise, if it's already in the set, we discard it and repeat the process until we find a name that's not already in the set.
import random words = set() # empty set word = '' # word to be generated # generate a unique word (and build a set of words) def generate_unique(): word = generate() if word in words: # the word is already in the set word = generate_unique() # rerun this generator else: words.add(word) # the word hasn't been generated before. Add it to the set and return it return word print generate_unique()
This algorithm (it's a recursion by the way) works very nicely when the set is empty or almost empty. But the more we use it, the more the set fills up and the harder the program has to work randomly generating words that are not already in the set. It also gets slower as every time it's used and successfully finds a new word, the set gets larger. The larger the set is, the slower the search. This is a bad situation. It causes the program to get slower and slower as it gets harder and harder to find the remaining unique words. Eventually, the program generates and stores all possible words. The next time it gets called, it generates an infinite loop.
We could program clever things to avoid this situation, but let's be pragmatic. In our case we just want a set of funny-looking words, and we want to be fast and efficient about it (i.e. not use the computer as a space-heater). Let's get rid of the recursive function (where a function calls itself), and go back to simple loops. We'll also introduce an arbitrary boundary to how many attempts we make.
This is the method I've used for generating default usernames in our enkiWS system (simplified, I'll explain the other trick I used further down). We still use a set (why a set?) but we count each attempt we've made at generating a unique word. We cap the number of attempts at an arbitrary 99.
import random words = set() # empty set word = '' # word to be generated MAX_ATTEMPTS = 99 attempt = 0 # attempt to generate a unique word until successful or reach the max number of attempts while attempt <= MAX_ATTEMPTS: word = generate() attempt = attempt + 1 if word not in words: # the word hasn't been generated before. Add it to the set and exit the loop words.add(word) break else: # the word is already in the list. Reset it. word = '' if word: print word else: print 'Could not generate unique word'
As I've mentioned above, with our small set of Shadok syllables and word length restrictions, we can generate around 2000 different words. Clearly, that's not enough usernames for a registration system. However enkiWS supports non-unique display names by appending 4 digits between 1000 and 9999 to the end of every name. We didn't invent it, it's the same system used by Blizzard for their battletags.
So we get, for example:
GaBu#2365
GaBu is the prefix, 2365 is the suffix
This has several advantages:
The final username generator looks like this:
import random def generate_prefix(): syllables = ['Ga', 'Bu', 'Zo', 'Meu'] word = '' MIN_LENGTH = 3 MAX_LENGTH = 12 max_length_syllable = 0 min_length_syllable = 100 for syllable in syllables: if len(syllable) > max_length_syllable: max_length_syllable = len(syllable) if len(syllable) < min_length_syllable: min_length_syllable = len(syllable) min_syllables = MIN_LENGTH / max_length_syllable max_syllables = MAX_LENGTH / min_length_syllable num_syllables = random.randint( min_syllables, max_syllables ) count_syllables = 0 while ( count_syllables < num_syllables or len( word ) < MIN_LENGTH ): syllable = syllables[ random.randint( 0, 3 )] if ( len( word + syllable ) > MAX_LENGTH ): # abort if the predicted word is too long break word = word + syllable count_syllables = count_syllables + 1 return word def generate_suffix(): return str( random.randint( 1000, 9999 )) # cast the random integer to a string so can be added to the word words = set() word = '' MAX_ATTEMPTS = 99 attempt = 0 while attempt <= MAX_ATTEMPTS: prefix = generate_prefix() suffix = generate_suffix() word = prefix + '#' + suffix attempt = attempt + 1 if word not in words: words.add( word ) break else: word = '' if word: print word else: print 'Could not generate unique word'
We can generate a set of 20 different names by adding a loop around the attempts:
words = set() MAX_ATTEMPTS = 99 for i in xrange( 20 ): word = '' attempt = 0 while attempt <= MAX_ATTEMPTS: prefix = generate_prefix() suffix = generate_suffix() word = prefix + '#' + suffix attempt = attempt + 1 if word not in words: words.add( word ) break else: word = '' print words
And the result is this:
set(['MeuBu#3824', 'GaGaZo#6199', 'BuGa#1055', 'GaMeu#6168', 'ZoZo#1324', 'ZoGa#9628', 'GaBuGaZoMeu#4396', 'Meu#3456', 'MeuZoBuBuBu#2038', 'BuGaGa#8077', 'GaBuBuBuZoBu#8035', 'BuMeuZo#9541', 'ZoZoMeuMeuGa#8483', 'MeuZoMeuZoBu#1650', 'ZoZo#9598', 'GaBuGaMeuGa#4454', 'BuGa#4129', 'MeuBuGaZoZo#2980', 'BuZo#8500', 'MeuGaGa#1796'])
(Note we could make the generator more efficient, for instance by taking the min_syllable and max_syllables calculation out of generate_prefix() as they only need to be calculated once)
And we're done. Congratulations, you now have a basic name generator.
To use the generator as part of a registration system, you will need to store the list of unique generated words / usernames in a permanent repository, for instance a table in a database. The list will also need to include the names the users have chosen for themselves.
If you want to see my Shadok display name generator in action you can play with the online demo of enkiWS:
If you're interested, this is the source-code for enkiWS and a shortcut to the Shadok name generator.
Demo: enkiws.appspot.com
Example: we use enkiWS to power our website www.enkisoftware.com
A great side-effect of writing this detailed tutorial is that I found out my Shadok name generator didn't work properly and I could also simplify it.
This post was inspired by Yevhen Loza's Seeds 3 contribution describing his NPC name generator: What's your NAME?
A simple word generation tutorial in python by Brett Witty using Tracery: www.brettwitty.net/tracery-in-python
The author of Tracery, Kate Compton has a many procedural generation resources on her website: www.galaxykate.com
Seeds is the PROCJAM Zine "about all the awesome ideas you're having, experiments you're making and things you're doing with generative software": www.procjam.com/seeds
If you're interested in the procedural generation in our game Avoyd, this article in Seeds gives an overview of how we built the 3D 'boxes in space'. I plan to write a more detailed devlog post about the procedural 3D environment soon.
If you want to generate 3D procgen shapes in Avoyd, download Avoyd and start a game. If you want to go further, in the Voxel Editor open 'Tools' > 'Edit Tool'. Choose 'Set' and under 'Shapes' you'll find procgen tree and linked boxes. Select one of them, click in space and see what happens!
'Les Shadoks' full video archive (in French) starting with season 1 (1968)
And here's an episode in English: