Minecraftian Narrative: Part 4

Introduction

At this point, I’ve communicated the basics of the Toki Sona language (a “story-focused” Toki Pona), its potential for simply communicating narrative concepts, and the types of interfaces and games that could exploit such a language.

This time, we’ll be diving into some of the nuts and bolts that might revolve around the actual interpretation of Toki Sona and how it might tie into code. An intriguing array of questions come into play due to Toki Sona’s highly interpretive semantics. The end result is a sort of exaggerated problem domain taken from Natural Language Processing. How much information should we infer from what we are given? How do we handle vague interpretations in code? And what do we do when the language itself changes through usage over time? Let’s start thinking…

Variant Details In Interpretation

What we ultimately want in a narrative engine is to be able to craft a computer system that can dynamically generate the same content that a human author would be able to create. To accomplish this, we must leverage our main tool: reducing the complexity of language to such an extent that the computer doesn’t have to compete with the linguistic nuances and artistic value that an author can imbue within their own work. Managing the degree to which we include these nuances requires a careful balancing act though.

For example, “It was a dark and stormy night…” draws into your mind many images beyond simply the setting. It evokes memories filled with emotions which an author may use to great effect in their manipulation of the audience’s emotional experience. Toki Sona’s focus on vague interpretation leaves many different ways of conveying the same concept, depending on one’s intent. Here are some English literal translations:

Version A: “When a black time of monstrous/fearful energy existed…”
- tenpo-pimeja pi wawa-monsuta lon la, …
Version B: “This is the going time: The time is the black time. The air water travels below. As light of huge sound cuts the air above…”
- ni li tenpo-kama: tenpo li tenpo-pimeja. telo-kon li tawa anpa. suno pi kalama-suli li kipisi e kon-sewi la, …

You’ll notice that version A jumps directly into communicating the tone that the audience should understand. As a result, it is far less particular in setting the scene’s physical characteristics about the weather.

Version B on the other hand takes the time to establish scene details with particulars (as specific as it can get, anyway). Although it takes several more statements to present the idea, it eventually equates itself loosely with the original English phrase. In this way, it manages to conjure emotions in the audience through imagery the same way the original does, but you can also tell that the impact isn’t quite as nuanced.

One of the key aspects of Toki Sona is that it is unable to include two independent phrases in a single statement. It is also unable to include anything beyond a single, adverbial dependent clause in addition to the core independent clause. These restrictions help ensure that each individual statement has a clear effect on interpretation. Only one core set of subjects and one core set of verbs may be present. Everything else is simply details for the singularly described content. As a result, a computer should be able to extract these singular concepts from Toki Sona more easily than it would a more complex language.

So while both database queries and statistical probability calculations are factors in interpreting the text, the algorithms will rely more on the probabilities due to the diminished size of database contents (not as many vocabulary terms to track). This is also likely because words frequently have several, divergent meanings that are relevant to a given context. As such, algorithms will often need to re-identify meanings after-the-fact once successive statements have been interpreted.

Our difficulty comes in when we must identify how interpreted statements are to be translated into understood data. Version B is far more explicit about how things are to be added, while version A relies far more heavily on the interpreter to sort things out. How many narrative elements should the interpreter assume based on the statistical chances of their relevance? The more questionable elements are added, the more items we’ll need to revisit for every subsequent statement. After all, future statements could add information that grants us new insight into the meaning of already stated terms.

To illustrate this, let’s break down how the interpreter might compose a scene based on these statements into pseudocode, starting with version B. We’ll leave English literal translations in and identify them as if they were Toki Sona terms.

Version B
contextFrames[cxt_index = 0] = cxt = new Context(); //establish 1st context

"This is the going time:" => contextFrames[++cxt_index] = new Context(); //':' signifies new context cxt = contextFrames[cxt_index]; //future ideas added to new context cxt += Timeline(Past); //Add the "time that has gone" to the context

"The time is the black time." => cxt += TimeOfDay(Night) //Add the "time of darkness" to the context

"The air water travels below." => cxt += Audio(Rain) + Visual(Rain) // Add "water of the air" visuals. Audio auto-added.

"As light of huge sound cuts the air above..." => cxt += {Object|Visual}(Light+(Sound+Huge)) >> Action(Cut) >> Visual(Sky+Air); cxt += Mood(Ominous)? ... // The scene includes a light that is often associated with loud noises. These lights (an object? A visual? Is it interactive?) are cutting across the "airs in the sky", likely clouds. All together, this combination of elements might imply an ominous mood.

Version A
contextFrames[cxt_index = 0] = cxt = new Context(); //establish 1st context

"When a black time of monstrous/fearful energy existed..." => cxt += TimeOfDay(Night)? + Energy(Terrifying)? + Mood(Terrifying) + Mood(Ominous)? ... // Establish night time and presence of a terrifying form of energy in the scene. Based on these, establish that the mood is terrifying in some way with the possibility of more negatively toned content to follow soon. Possible that "monstrous energy" may imply a general feel rather than a thing, in which case "black time" may reference an impression of past events as opposed to the time of day.

To emphasize ease of use and make a powerful assistance tool, it’s best to let the interpreter do as much work as possible and then just update previous assumptions as new information is introduced. That way, even if the user inputs a small amount of information, it will feel as if the system is anticipating your meaning and understanding you effectively. To do otherwise would save significantly on processing time, but would result in far too many assumptions being made that don’t account for the full context. This would in turn result in terrible errors in interpretation. Figuring out exactly how the data is organized and how the interpreter will make assumptions will be its own can of worms that I’ll get to some other day.

Data Representation

An additional concern is to identify the various ways that words will be understood logically as classes or typenames, hereafter “types” (for the non-programmers out there, this would be the organization the computer uses to better identify the relationships and behaviors between terms). Examples in the above pseudocode include TimeOfDay, Visuals and Audio elements, etc. Ideally, each of these definitions would alter the context in which characters exist. It would inform their decision-making and impact the kinds of events that might trigger in the world (if anything like that should exist).

One option would be to create a data structure type for each Toki Sona word (there’d certainly be few enough of them memory-wise, so long as a short-cut script were written to auto-generate the code). Having types represent the terms themselves, however, is quite unreliable as we don’t want to have to alter the application code in response to changes in the language. Furthermore, any given word can occupy several syntactic roles depending on its positioning within a sentence, and each Toki Sona word in a syntactic role comes with a variety of semantic roles based on context.

For example, “kon”, the word for air, occupies a variety of meanings. As a noun, it can mean “air”, “wind”, “breath”, “atmosphere”, and even “aether”, “spirit” or “soul” (literally, “the unseen existence”). These noun meanings are then re-purposed as other forms of speech. The verb to “kon” means to “breathe” or, if being creative, it could mean “to pass by/through as if gas” / “to blow passed as if the wind”. To clarify, when one says, “She ‘kon’s” or “She ‘kon’ed”, one is literally saying “she ‘air’ed”, “she ‘wind’-ed”, “she ‘soul’-ed”, etc. The nouns themselves are used AS verbs, which in turn results in language conventions for interpreted meaning. You can therefore understand the interpretive variations involved, and that’s not even moving on to adjectives and adverbs! Through developing conventions, we could figure out that when a person “airs”, its semantic role is usually that the person breathes, sighs, or similar, not that they spirit away or become one with the atmosphere or something (which are far less likely to use “kon” as an verb in the first place – probably an adverb if anything).

In the end, a computer needs to understand a definitive behavior that is to occur with a given type name. However, since the nature of this behavior is dictated by the combination of terms involved, we can understand that Toki Sona terms are meant to serve as interpreted inputs to the types. Furthermore, it seems most appropriate for types to serve two purposes: they must indicate the syntactic role the word has in a sentence, and they must indicate the functional role the word has in a context.

In the pseudocode excerpt I came up with, we chose to highlight the latter route, defining described content based on how it impacted the narrative context: is this an Audio or Visual element that will affect perception or is this a detail concerning the setting’s external details such as the TimeOfDay, etc.? In addition to this, we’ll also need to incorporate syntactic analysis to better identify what the described content will actually be (is it a noun, verb, adjective, etc.?). As mentioned before, the way a word is used will greatly affect the type of meaning it has, so the function should be built on the syntax which is in turn built on the vocabulary.

Language Evolution

In addition, a system that implements this sort of code integration should be built around the assumption that the core vocabulary and semantics will change. As it stands, we already want to give users the power to add their own custom terms to the language for a particular application. These custom terms are always re-defined using a combination of sentences made of core terms and pre-existing custom terms.

However, because the integration of a living, breathing, and spoken language into a code base is a drastic measure, it is vital that the code be designed around the capacity for the core language to change. After all, languages are not unlike living creatures that adapt to environments, evolve to meet their needs, and strive to achieve their goals in the midst of it. In this sense, we can rest assured that players and developers alike will look forward to experimenting with and transforming this technology. This transformation will assuredly extend to the core terms, so not even the language should be tightly bound to them.

Given the lack of assurances in regard to the core terms over an extended period of time, it would behoove us to incorporate an external dictionary. It should most likely be pre-baked with statistical semantic associations derived from machine learning NPL algorithms and then fed into runtime calculations that combine with the context to narrow down the interpretation most likely to meet users’ expectations.

In simple terms, Wyrd should be given a massive list of Toki Pona (or Toki Sona, later on as it becomes available) statements periodically, perhaps with a monthly update. It should then scan through them, learn the words, and figure out what they likely mean: How frequently is “kon” used as a noun? What verbs and adjectives is it often paired with? What words is it NEVER associated with? What sorts of emotions have been associated with the various term-pairings and which are most frequent? These statistical inputs will assist the system in determining the functional and syntactic role(s) words possess. Combining this data with the actual surrounding words in context will let the application have a keen understanding of how to use them AND grant it the ability to reload this information when necessary.

Wyrd applications should also keep track of all Toki Sona input (if the user has volunteered it) so that they can be used as new machine learning test material. If people start using a word in a new way, and that trend develops, then the engine should respond by learning to adapt to that new usage and incorporate it into characters’ speech and applications’ descriptions. To do this, the centralized library of core terms must be updated by scanning through more recent Toki Sona literature. Ideally, we would pull this from update-electing users, generate new word data, and then broadcast this update to those same Wyrd users.

Conclusion

Well, we’ve explored some of the more in-depth programming difficulties that reside in using Toki Sona. There’ll likely be more updates in the future, but for now, this has all just been a brainstorming and analysis activity. I apologize for those of you who weren’t more tech-savvy (tried to make things a little simpler outside of the pseudocode). From here on out, it’s likely we’ll end up dealing with things that are a bit more technical than the previous fare, but there will also be plenty of high level discussion, so worry not!

For next time, I’ll be diving into the particulars of Agents, Characters, and the StoryMind: the fundamental tools for manipulating and understanding narrative concepts!

Next Article: Dramatica and Narrative AI
Previous Article: Interface and Gameplay Possibilities

7 thoughts on “Minecraftian Narrative: Part 4”

Pingback: Minecraftian Narrative: Part 3 | The Interactive Spirit
Pingback: Minecraftian Narrative: Part 1 | The Interactive Spirit
Pingback: Minecraftian Narrative: Part 2 | The Interactive Spirit
Pingback: Minecraftian Narrative: Part 5 | The Interactive Spirit
Pingback: Minecraftian Narrative: Part 6 | The Interactive Spirit
Pingback: Minecraftian Narrative: Part 7 | The Interactive Spirit
adrix89 says:

July 7, 2017 at 1:40 pm

>”What we ultimately want in a narrative engine is to be able to craft a computer system that can dynamically generate the same content that a human author would be able to create. To accomplish this, we must leverage our main tool: reducing the complexity of language to such an extent that the computer doesn’t have to compete with the linguistic nuances and artistic value that an author can imbue within their own work.”

This is the wrong approach.
The language doesn’t really matter.
What you should is simulate the mental process of a human. The so called “simulating the soul”, to act willfully and with reason as a human for their objective.

>”For example, “It was a dark and stormy night…” draws into your mind many images beyond simply the setting.”
For humans. An AI wouldn’t know jack shit. It has absolutely no idea what it means to us, it has no emotion.
You can translate it in any language and it would not understand it.
A more equivalent understanding of the AI is “It was hour 01:23 with grey visual image, low visibility and precipitation rate of 68.985%”. That is what an AI would “understand”.

You cannot fucking cheat and make the AI more competent through magic.
Your simple language is not magic.
Everything has to be explicitly coded or data mined to generate responses and even then it will not understand.

LikeLike