Friday, May 23, 2014

Procedural Language Generator

As of late, I've been spending far more time tinkering with linguistics for my tastes. For instance, I took a three day diversion from a paragraph to figure out the linguistic rules set just to establish a two word phrase. The language is from a far-off country called Eshet, whose citizens wear a particular style of ankle-length toga. The fashionable people of Anchorest love the style, but in order to continue to sound fashionable they kept the name straight from the Eshan language, which is lor nobre in Eshan, and never mind that in their own language it just means 'the long cut'.

Granted, that far-off land is pretty important and the language will end up being used a lot more than those two words that it initially spawned, so the time isn't necessarily wasted. It could just be better-spent doing some actual writing.



So, I'm programming a procedural language generator, to be able to create random, serviceable languages almost whole-cloth. I've already done some preliminary building on it, and it's going to be able to handle some complicated randomness, as well as fine-tuning from the operator.

Each phoneme is getting a guttural/sibilance value assigned to it, so you can make the language aspirated or deep in the throat (from Elvish to Orcish, with English being in the middle). There is also as a 'Foreign Index' value. While all the starting phonemes will have a Foreign Index of 0 (in relation to English), ultimately I want to include non-English phonemes, so I can make a generated language sound a little off from English, or totally and completely alien and near-unpronounceable by the human tongue. It'll pick randomly from the list of available phonemes based on the criteria, or if there aren't a lot of phonemes available due to your criteria, it'll use them all.

Conjugations can take many forms, due to time, gender, action, intensity, and much more, and conjugations can happen to verbs, nouns, pronouns, adjectives, whatever. If these are not configured by the user, they will be randomly chosen. My conjugation routines will end up being the most complex part of the system, I imagine.

Words can have structural rules sets (the way -ir and -er verbs work for Spanish, for instance), and these will be randomly picked if not configured by hand. These rules sets can apply to any type of word.

The English dictionary will have 'tech level' associated to the word. If the word is primordial in origin, it's a 0, and if it's a word regarding technology that transcends magic, it's a 9, with about a 6 being 'modern day'. That way I'll be able to build instant dictionaries for any time period, and won't have to weed out words like 'telephone' from my Anchorest dictionaries.

When it is building the dictionary, it will take into account syllable length of the source word. The idea here is that word length can identify several factors about that word. Words of shorter length were either created in the primordial time (fire, dog, tree, leg) or if they are short words introduced in a recent time, they have cultural importance (carriage to car, or telephone to phone) or are discussed so much as to warrant foreshortening in speech (linguistic economy). So when the generator spits out 'mom' as an English word, it won't try to build 'burfarginsplag' as the corresponding new word - the primordial time suggests that this is going to be a relatively short word.

Eventually, a language will be able to be weathered and aged as it drifts away from the source language. I'd also like to be able to incorporate multiple sources, producing an amalgam that still retains fingerprints of its original sources.

In terms of the back-end, initially I was going to write this using the program Tablesmith, where I have a lot of other Anchorest procedural stuff built. Though Tablesmith can do some amazing things, there are some things it's just not going to be convenient to perform within that framework. I decided to go with an object-oriented database scripting language from the old MUSH days. Ultimately, this program isn't for any kind of distribution to the public, it's for my own linguistic sanity, so the platform it runs on isn't terribly important.

When it spits out the results, I'll then go over it with my discerning eye (it's the one on the right) and weed out any problems. The results will be saved, so any changes I make will update. Once I'm satisfied with it visually I'll write out a test paragraph or two using the rules set and see if anything else needs tweaking.

There are probably other solutions out there to randomizing languages. Anyone know of any? The key here is 'serviceable'. Any program can spit out random phonemes with random syllable lengths, but it takes some serious effort to make the language serviceable.

No comments:

Post a Comment