Independent game developers - making Avoyd


Support our work and posts like these through our Patreon

Procedural python word generator: making user names out of 4 syllables

Juliette Foucaut - 05 Apr 2019


I decided to write this post after reading about a very cool procedural NPC name generator and thinking that it might be of interest to show a much more basic example. This post is intended for people who have never used procedural generation and know very little programming. The examples are written in Python. I'll do my best to keep things simple and introduce the complexities progressively.

The algorithm is basic: names are generated by randomly assembling four syllables. First I'll explain how it's built, then the features I added to it to make sure the names are within an arbitrary size range, and more importantly, unique.

A couple of years ago I wrote a simple procedural username generator in the Shadok language for our players registration system (more about 'Les Shadoks' below) for fun. I wanted the default names to be automatically generated at registration to make it fast and painless. I also wanted the players' default name to be amusing yet neutral to encourage people to personalise their name. I'd originally intended the Shadok name generator to be a temporary solution and a bit of a joke, but ended up keeping it.

But before we start, let me tell you about 'Les Shadoks'. It's an animated series that first aired on French television 50 years ago. I remember watching it as a child. Most people in France have heard of them. The Shadoks are aliens that look a little like birds. They don't have much memory so their language is made of only four syllables: 'Ga', 'Bu', 'Zo' and 'Meu'

skip to 1'42'' to listen to the Shadok syllables 'Ga', 'Bu', 'Zo', 'Meu'.

Making Shadok words: assembling syllables at random

Using a small number of chosen syllables guarantees they'll be readable when concatenated (attached one after the other) at random. With 'Ga', 'Bu', 'Zo' and 'Meu', we can create many combinations resulting in words. For instance:

'Ga'
'BuGa'
'MeuGaBuZo'
'ZoMeuMeuBuZoGa'

Picking a syllable at random

Let's start with an empty word, called 'word':

word = ''

Next, we put the syllable in an indexed array called 'syllables':

syllables = [ 'Ga',  'Bu', 'Zo', 'Meu' ]

In the syllables array, 'Ga' is at index 0, 'Bu' at index 1, 'Zo' at index 2 and 'Meu' at index position 3. Hence as an example, retrieving the value stored at index 2 in the syllables array returns 'Zo':

syllables[ 2 ] = 'Zo'

To pick a syllable at random, all we have to do is generate a random number between 0 and 3, and use it as the index from the syllables array:

import random
random_index = random.randint( 0, 3 )
syllable = syllables[ random_index ]

Now all we have to do is add the syllable at the end of the word:

word = word + syllable

Writing this in one line by nesting the commands we get:

word = word + syllables[ random.randint( 0, 3 )]

This adds one random syllable to the word. For example assuming the random index is = 2

word = ''
word = word + syllables[ 2 ]

is equivalent to

word = '' + 'Zo' (empty string with 'Zo' added at the end)

hence

word = 'Zo'

Note - in python, instead of random.randint() you can use random.choice() to the same effect:
word = word + random.choice( syllables )
I've chosen to use indexes as it's a more common programming concept.

Assembling several syllables picked at random

What about adding several syllables to a word?

Let's make a word with five syllables. For this we can either write the command five times as such:

word = syllables[ random.randint( 0, 3 )]
+ syllables[ random.randint( 0, 3 )] + syllables[ random.randint( 0, 3 )]
+ syllables[ random.randint( 0, 3 )] + syllables[ random.randint( 0, 3 )]

Or write a loop:

import random
word = ''
MAX_SYLLABLES = 5   # maximum number of syllables allowed. It's written in uppercase
                    # to remind us it's not supposed to change when we run the program.

count_syllables = 0 # count how many syllables are in the word
while count_syllables < MAX_SYLLABLES:  # as long as there are syllables to add
    word = word + syllables[ random.randint( 0, 3 )]  # add a random syllable to the word
    count_syllables = count_syllables + 1   # increment the syllables counter by 1

print word

The loop runs 5 times (count_syllables = 0 to 4). At the end of the fifth run, count_syllables = 5. In other words count_syllables = MAX_SYLLABLES hence the loop start condition (count_syllables < MAX_SYLLABLES) is no longer valid and the loop ends.

Every time we run the program, we get a different, five syllables long word:

'ZoBuZoMeuBu'
'GaGaZoBuZo'
'BuMeuGaMeuGa'
'MeuMeuMeuMeuMeu' (this one fortuitously means 'the end' in Shadok)

Influencing the randomness

What if we wanted to increase the chances of specific syllables appearing more often than others in the final word? For instance 'MeuMeuMeuMeuMeu' currently has a 1/1024 chance of occuring ('Meu' has one chance out of four of being randomly chosen every time -five- we add a syllable). An easy way to increase the probability is to add duplicates of 'Meu' in the syllables array:

syllables =  [ 'Ga',  'Bu', 'Zo', 'Meu', 'Meu', 'Meu' ]

'Meu' now has three chances out of six (or one out of two) of being selected each time a syllable is added to the word, hence 'MeuMeuMeuMeuMeu' now has a whooping 1/32 chance of occuring. All the other words will also have a lot more 'Meu' syllables included:

'MeuBuGaMeuMeu'
'GaMeuMeuZoGa'
'MeuGaZoGaMeu'
'GaMeuMeuMeuMeu'

import random
syllables = [ 'Ga', 'Bu', 'Zo', 'Meu', 'Meu', 'Meu' ]
word = ''
MAX_SYLLABLES = 5

count_syllables = 0
while count_syllables < MAX_SYLLABLES:
    word = word + syllables[ random.randint( 0, 5 )]  # note there are now 6 indexes to pick from
    count_syllables = count_syllables + 1

print word

Before we continue let's undo this change and go back to equal probability for all syllables.

Changing the word length

Varying the number of syllables in a word

We've procedurally generated our first Shadok words. Now let's make them more interesting by varying the number of syllables in each word, say between one and five.

MIN_SYLLABLES = 1
MAX_SYLLABLES = 5

Let's use random again to pick a number between one and five. This number will be the number of syllables in the word we'll generate. We'll call it num_syllables.

num_syllables = random.randint[ MIN_SYLLABLES, MAX_SYLLABLES ]

The modified algorithm looks like this:

import random
syllables = [ 'Ga', 'Bu', 'Zo', 'Meu' ]
word = ''
MIN_SYLLABLES = 1
MAX_SYLLABLES = 5
num_syllables = random.randint[ MIN_SYLLABLES, MAX_SYLLABLES ]

count_syllables = 0
while count_syllables < num_syllables :
    word = word + syllables[ random.randint( 0, 3 )]
    count_syllables = count_syllables + 1

print word

We run it several times and we get words of varying lengths:

'GaBuZoGaMeu'
'BuGaMeu'
'BuZoGaBu'
'Zo'
'BuMeuZo'

Words of a predefined length range

We've just seen we could make long and short words by randomising the number of syllables they're made of. But there is a problem. The length of a word is usually expressed in the number of letters it contains.

For instance I want to generate player names and I want them to be neither too short, nor too long. Hence I arbitrarily decide I want them to be between 3 and 12 characters long. In these conditions:

'Zo' - 1 syllable - 2 characters - too short
'Meu' - 1 syllable - 3 characters - ok lengthwise
'ZoBuGaGaZoZo' - 6 syllables - 12 characters - ok lengthwise
'MeuBuGaMeuMeu' - 5 syllables - 13 characters - too long

So we can't rely on the number of syllables to control the length of our words. To work around this let's calculate the min and max number of syllables.

Shadok syllables are 2 or 3 characters long. Since we don't know whether the randomly picked syllable will be 2 or 3 characters long, we'll need minimum 2 and maximum 4 syllables to make sure we get a word that's 3 to 12 character long. This restricts the number of words we can generate to around 330.

'GaZoGaBu'
'MeuGa'
'BuBu'
'GaMeuBuMeu'
'MeuZoZo'

To get around this restriction and make sure 'Meu' and 'ZoBuGaGaZoZo' can still be included, let's make the minimum syllables 1, the max 6. We'll then check the length of the word as we generate it. If the word is too short, we override the requested number of syllables to add more. If it risks becoming too long we stop adding any.

Let's change our program to consider min and max length and add tests to the loop:

import random
syllables = ['Ga', 'Bu', 'Zo', 'Meu']
word = ''
MIN_LENGTH = 3
MAX_LENGTH = 12
MIN_SYLLABLES = 1
MAX_SYLLABLES = 6
num_syllables = random.randint( MIN_SYLLABLES, MAX_SYLLABLES )

count_syllables = 0
# Run the loop to add syllables while:
# (we still have syllables to add) OR (the word is shorter than the minimum required)
while ( count_syllables < num_syllables ) or ( len( word ) < MIN_LENGTH ):
    syllable = syllables[ random.randint( 0, 3 )]
    if len( word + syllable ) > MAX_LENGTH: # test if the predicted word is too long
        break  # Word would be too long if we added the syllable. Stop the loop.
    word = word + syllable # Length will be ok. We can add the syllable to the word.
    count_syllables = count_syllables + 1

print word

We can now generate approximately 2000 different words filling the full range of 3 to 12 characters.

'ZoZoMeuMeuBu'
'BuBuGa'
'Meu'
'GaZoBuGaGaZo'
'GaBuZoMeu'

Reformulation

Leaving procedural generation aside for a moment to improve the algorithm: we can calculate the number of syllables from our list of syllables and the desired word lengths (so we can easily change our requirements for the length of words or use a different set of syllables.) and move all the code into the function 'generate()'.

import random

# Generate a word from syllables
def generate():
    syllables = ['Ga', 'Bu', 'Zo', 'Meu']
    word = ''
    MIN_LENGTH = 3
    MAX_LENGTH = 12

    # Calculate the number of syllables
    max_length_syllable = 0
    min_length_syllable = 100
    for syllable in syllables:
        if len(syllable) > max_length_syllable:
            max_length_syllable = len(syllable)
        if len(syllable) < min_length_syllable:
            min_length_syllable = len(syllable)
    min_syllables = MIN_LENGTH / max_length_syllable
    max_syllables = MAX_LENGTH / min_length_syllable
    num_syllables = random.randint( min_syllables, max_syllables )

    # Generate the word
    count_syllables = 0
    while ( count_syllables < num_syllables or len( word ) < MIN_LENGTH ):
        syllable = syllables[ random.randint( 0, 3 )]
        if ( len( word + syllable ) > MAX_LENGTH ):
            break
        word = word + syllable
        count_syllables = count_syllables + 1

    return word

print generate()

Unique words

The doomed way

Feel free to skip this section and go straight to the practical solution.

We're now able to generate around 2000 different Shadok words of varying lengths. But how do we deal with duplicates? The easiest way is to use a set (an unordered collection of unique elements in python). When we need a new name, we generate it and check it against the set. If it's not in the set, we use it and add it to the set so we know it's already been used in preparation for the next time we need a new name. Otherwise, if it's already in the set, we discard it and repeat the process until we find a name that's not already in the set.

import random
words = set()   # empty set
word = ''       # word to be generated

# generate a unique word (and build a set of words)
def generate_unique():
    word = generate()
    if word in words:   # the word is already in the set
        word = generate_unique()    # rerun this generator
    else:
        words.add(word) # the word hasn't been generated before. Add it to the set and return it
        return word

print generate_unique()

This algorithm (it's a recursion by the way) works very nicely when the set is empty or almost empty. But the more we use it, the more the set fills up and the harder the program has to work randomly generating words that are not already in the set. It also gets slower as every time it's used and successfully finds a new word, the set gets larger. The larger the set is, the slower the search. This is a bad situation. It causes the program to get slower and slower as it gets harder and harder to find the remaining unique words. Eventually, the program generates and stores all possible words. The next time it gets called, it generates an infinite loop.

We could program clever things to avoid this situation, but let's be pragmatic. In our case we just want a set of funny-looking words, and we want to be fast and efficient about it (i.e. not use the computer as a space-heater). Let's get rid of the recursive function (where a function calls itself), and go back to simple loops. We'll also introduce an arbitrary boundary to how many attempts we make.

The practical solution

This is the method I've used for generating default usernames in our enkiWS system (simplified, I'll explain the other trick I used further down). We still use a set (why a set?) but we count each attempt we've made at generating a unique word. We cap the number of attempts at an arbitrary 99.

import random
words = set()   # empty set
word = ''       # word to be generated

MAX_ATTEMPTS = 99
attempt = 0

# attempt to generate a unique word until successful or reach the max number of attempts
while attempt <= MAX_ATTEMPTS:
    word = generate()
    attempt = attempt + 1
    if word not in words: # the word hasn't been generated before. Add it to the set and exit the loop
        words.add(word)
        break
    else:   # the word is already in the list. Reset it.
        word = ''

if word:
    print word
else:
    print 'Could not generate unique word'

More unique words

As I've mentioned above, with our small set of Shadok syllables and word length restrictions, we can generate around 2000 different words. Clearly, that's not enough usernames for a registration system. However enkiWS supports non-unique display names by appending 4 digits between 1000 and 9999 to the end of every name. We didn't invent it, it's the same system used by Blizzard for their battletags.

So we get, for example:

GaBu#2365
GaBu is the prefix, 2365 is the suffix

This has several advantages:

  • The combination prefix + suffix means we can automatically generate around 20 million unique names.
  • the prefix is cheap to compute.
  • If we ever need to generate more words, we can easily add a syllable to the syllables list (conveniently, the Shadoks suggest 'Ni' might be available) or make the suffix hexadecimal, etc.
  • People who want to personalise their name can replace the prefix with a string of their choice (between 3 and 12 characters). This vastly expands the number of possible names to over several billion. Thanks to the suffix, we can have several people using the same prefix: Ada#9087 and Ada#2766 can safely coexist. It also means we can allow more short, 3 letter prefixes.

The final username generator looks like this:

import random

def generate_prefix():
    syllables = ['Ga', 'Bu', 'Zo', 'Meu']
    word = ''
    MIN_LENGTH = 3
    MAX_LENGTH = 12
    max_length_syllable = 0
    min_length_syllable = 100
    for syllable in syllables:
        if len(syllable) > max_length_syllable:
            max_length_syllable = len(syllable)
        if len(syllable) < min_length_syllable:
            min_length_syllable = len(syllable)
    min_syllables = MIN_LENGTH / max_length_syllable
    max_syllables = MAX_LENGTH / min_length_syllable
    num_syllables = random.randint( min_syllables, max_syllables )
    count_syllables = 0
    while ( count_syllables < num_syllables or len( word ) < MIN_LENGTH ):
        syllable = syllables[ random.randint( 0, 3 )]
        if ( len( word + syllable ) > MAX_LENGTH ): # abort if the predicted word is too long
            break
        word = word + syllable
        count_syllables = count_syllables + 1
    return word

def generate_suffix():
    return str( random.randint( 1000, 9999 )) # cast the random integer to a string so can be added to the word

words = set()
word = ''
MAX_ATTEMPTS = 99
attempt = 0

while attempt <= MAX_ATTEMPTS:
    prefix = generate_prefix()
    suffix = generate_suffix()
    word = prefix + '#' + suffix
    attempt = attempt + 1
    if word not in words:
        words.add( word )
        break
    else:
        word = ''

if word:
    print word
else:
    print 'Could not generate unique word'

We can generate a set of 20 different names by adding a loop around the attempts:

words = set()
MAX_ATTEMPTS = 99

for i in xrange( 20 ):

    word = ''
    attempt = 0

    while attempt <= MAX_ATTEMPTS:
        prefix = generate_prefix()
        suffix = generate_suffix()
        word = prefix + '#' + suffix
        attempt = attempt + 1
        if word not in words:
            words.add( word )
            break
        else:
            word = ''

print words

And the result is this:

set(['MeuBu#3824', 'GaGaZo#6199', 'BuGa#1055', 'GaMeu#6168', 'ZoZo#1324', 'ZoGa#9628', 'GaBuGaZoMeu#4396', 'Meu#3456', 'MeuZoBuBuBu#2038', 'BuGaGa#8077', 'GaBuBuBuZoBu#8035', 'BuMeuZo#9541', 'ZoZoMeuMeuGa#8483', 'MeuZoMeuZoBu#1650', 'ZoZo#9598', 'GaBuGaMeuGa#4454', 'BuGa#4129', 'MeuBuGaZoZo#2980', 'BuZo#8500', 'MeuGaGa#1796'])

(Note we could make the generator more efficient, for instance by taking the min_syllable and max_syllables calculation out of generate_prefix() as they only need to be calculated once)

And we're done. Congratulations, you now have a basic name generator.

Demo and example

To use the generator as part of a registration system, you will need to store the list of unique generated words / usernames in a permanent repository, for instance a table in a database. The list will also need to include the names the users have chosen for themselves.

If you want to see my Shadok display name generator in action you can play with the online demo of enkiWS:

  • Go to the enkiWS demo site: enkiws.appspot.com
  • Sign up (You'll need a valid email. You can delete your account any time from the profile page)
  • when you set your display name you'll see the Shadok name generated for you (the suffix isn't displayed).
  • Refreshing the page generates a new name.

If you're interested, this is the source-code for enkiWS and a shortcut to the Shadok name generator.
Demo: enkiws.appspot.com
Example: we use enkiWS to power our website www.enkisoftware.com

A great side-effect of writing this detailed tutorial is that I found out my Shadok name generator didn't work properly and I could also simplify it. I'm still learning, so thank you! If you want to comment or ask any questions I'm @juulcat on twitter.

Going further

This post was inspired by Yevhen Loza's Seeds 3 contribution describing his NPC name generator: What's your NAME?

A simple word generation tutorial in python by Brett Witty using Tracery: www.brettwitty.net/tracery-in-python

The author of Tracery, Kate Compton has a many procedural generation resources on her website: www.galaxykate.com

Seeds is the PROCJAM Zine "about all the awesome ideas you're having, experiments you're making and things you're doing with generative software": www.procjam.com/seeds

If you're interested in the procedural generation in our game Avoyd, this article in Seeds gives an overview of how we built the 3D 'boxes in space'. I plan to write a more detailed devlog post about the procedural 3D environment soon.

If you want to generate 3D procgen shapes in Avoyd, download Avoyd and start a game. If you want to go further, in the Voxel Editor open 'Tools' > 'Edit Tool'. Choose 'Set' and under 'Shapes' you'll find procgen tree and linked boxes. Select one of them, click in space and see what happens!

'Les Shadoks' full video archive (in French) starting with season 1 (1968)

And here's an episode in English:

Support our work and posts like these through our Patreon

 › 2019
 ›› Procedural python word generator: making user names out of 4 syllables 
 › In-game building
 › Player-deployable turrets in Avoyd
 › 2018
 › Avoyd Game Singleplayer and Coop Multiplayer Test
 › Voxel Editor Evolved
 › 2017
 › Speeding up Runtime Compiled C++ compile times in MSVC with d2cgsummary
 › Multiplayers toxic last hit kill and how to heal it
 › Avoyd Editor Prototype
 › 2016
 › Black triangles and Peter Highspot
 › Colour palettes and lighting
 › Concept art by Rebecca Michalak
 › 2015
 › Internals of a lightweight task scheduler
 › Implementing a lightweight task scheduler
 › Feral Vector
 › Normal generation in the pixel shader
 › 2014
 › Python Google App Engine debugging with PyCharm CE
 › Lighting voxel octrees and procedural texturing
 › Patterns and spheres
 › Python Google App Engine debugging with PyTools
 › Interview
 › Domain masking using Google App Engine
 › Octree streaming - part 4
 › Black triangles and nervous_testpilot
 › Presskit for Google App Engine
 › Octree streaming - part 3
 › Octree streaming - part 2
 › Octree streaming
 › 2013
 › LAN discovery with multiple adapters
 › Playing with material worlds
 › Developer Diary archive
 › Website redesign
 › First Person Editor
 › First Avoyd tech update video
 › Implementing a static website in Google App Engine
 › Multiplayer editing
 › First screenshots
 › Thoughts on gameplay modes
 › Back in 1999
 › 2002
 › ECTS 2002
 › Avoyd Version 1.6.1 out
 › Avoyd Version 1.6 out
 › 2001
 › Biting the bullet
 › Avoyd version 1.5 out
 › Monday Mayhem
 › Avoyd version 1.5 alpha 1 out
 › Avoyd version 1.4 out
 › ECTS 2001
 › Fun with Greek letters
 › Closer just a little closer
 › Back already
 › Artificial Humanity
 › Products and promises
 › Ecommerce
 › Explosions galore
 › Spring fixes
 › Open source and ports to other operating systems
 › Avoyd LAN Demo Version 1.1 is out
 › Thanks for the support
 › Avoyd LAN Demo Ready
 › Game Tech
 ›› Procedural python word generator: making user names out of 4 syllables 
 › Speeding up Runtime Compiled C++ compile times in MSVC with d2cgsummary
 › Internals of a lightweight task scheduler
 › Implementing a lightweight task scheduler
 › Normal generation in the pixel shader
 › Lighting voxel octrees and procedural texturing
 › Octree streaming - part 4
 › Octree streaming - part 3
 › Octree streaming - part 2
 › Octree streaming
 › LAN discovery with multiple adapters
 › enkiTS
 › Internals of a lightweight task scheduler
 › Implementing a lightweight task scheduler
 › RCC++
 › Speeding up Runtime Compiled C++ compile times in MSVC with d2cgsummary
 › Web Tech
 ›› Procedural python word generator: making user names out of 4 syllables 
 › Python Google App Engine debugging with PyCharm CE
 › Python Google App Engine debugging with PyTools
 › Domain masking using Google App Engine
 › Presskit for Google App Engine
 › Implementing a static website in Google App Engine
 › Avoyd
 › In-game building
 › Player-deployable turrets in Avoyd
 › Avoyd Game Singleplayer and Coop Multiplayer Test
 › Voxel Editor Evolved
 › Multiplayers toxic last hit kill and how to heal it
 › Avoyd Editor Prototype
 › Black triangles and Peter Highspot
 › Colour palettes and lighting
 › Concept art by Rebecca Michalak
 › Feral Vector
 › Patterns and spheres
 › Interview
 › Black triangles and nervous_testpilot
 › Playing with material worlds
 › Website redesign
 › First Person Editor
 › First Avoyd tech update video
 › Multiplayer editing
 › First screenshots
 › Thoughts on gameplay modes
 › Back in 1999
 › Avoyd 1999
 › Developer Diary archive
 › Back in 1999
 › ECTS 2002
 › Avoyd Version 1.6.1 out
 › Avoyd Version 1.6 out
 › Biting the bullet
 › Avoyd version 1.5 out
 › Monday Mayhem
 › Avoyd version 1.5 alpha 1 out
 › Avoyd version 1.4 out
 › ECTS 2001
 › Fun with Greek letters
 › Closer just a little closer
 › Back already
 › Artificial Humanity
 › Products and promises
 › Ecommerce
 › Explosions galore
 › Spring fixes
 › Open source and ports to other operating systems
 › Avoyd LAN Demo Version 1.1 is out
 › Thanks for the support
 › Avoyd LAN Demo Ready