Godot: the Game-Changer for GameDevs

Edit 1: Updated Graphics Comparison section with before/after shots of Godot and accuracy corrections.

Edit 2:  There was some confusion over whether the Deponia devs used Godot for their PS4 port. I had removed that content, but now they they have officially confirmed that it WAS in fact a PS4 port. Links in the Publishing section.

Edit 3: I’ve received questions regarding the preference for a 2D renderer option, so I’ve added a paragraph explaining my motivations in the Graphics section.

Edit 4: Godot lead developer Juan Linietsky mentioned in a comment a few points of advancement present in the new 3D renderer. The Graphics section has been updated with a link to his comment.

Edit 5: I have personally confirmed that the GDNative C++ scripting on Windows 10 x64 is now fully functional in a VS2015 project. Updating the Scripting section.

Edit 6: I have received new information regarding the relative performance of future Godot scripting options. I have therefore updated the scripting section.

Edit 7: Unity Technologies that they are phasing out UnityScript. Updating the Scripting section. Also, making a correction to Unreal’s 2D renderer description in Graphics.

Edit 8: Unreal recently added C# as a scripting language. I’ve updated the Scripting section accordingly.

Edit 9: More “official” blog posts / tutorials have been made explaining other aspects of the Godot Engine, so I am including links to those where appropriate.


I’ve been tinkering with game development for around 5 years now. I’ve worked with GameMaker: Studio (1.x, though I’ve seen 2.0), Unity, and Unreal Engine 4 along with some exploration of Phaser, Construct 2, and custom C++ engine development (SFML, SFGUI, EntityX). Through my journey, I’ve struggled to find an engine that has just the right qualities, including…

  • Graphics:
    • Dedicated 2D renderer
    • Dedicated 3D renderer
  • Accessibility:
    • Powerful scripting capabilities
    • Strong community
    • Good documentation
    • Preferred: visual scripting (for simpler designer/artist/writer tooling)
    • Preferred: Simple and intuitive scripting architecture
  • Publishing:
    • A large variety of cross-platform support (non-negotiable)
    • Continuously expanding/improving cross-platform support (non-negotiable)
  • Cost:
    • Low monetary cost
    • Indie-friendly licensing options
  • Customization:
    • Ease of extending the editor for custom tool creation
    • Capacity for powerful optimizations (likely via C++)
    • Preferred: open source
    • Preferred: ease of editing the engine itself for community bug fixes / enhancements

For each associated field, I’ll examine some various features in the top engines and include a comparison with the new contender on the block: Godot Engine, a community-driven, free and open source engine that is beginning to expand its graphics rendering capabilities. Note that my comments on the Godot Engine will be in reference to the newest Godot 3.0 pre-alpha currently in development and nearing completion.

Initial Filtering Caveats

Outside of the big 3, i.e. GM:S, Unity and UE4, everything fails on the publishing criteria alone since any custom or web-based engine isn’t going to be easily or efficiently optimized for publishing to other platforms.

The new GitHub for Desktop app is an example of Electron (web browser dev-tools on right).

It is true that technologies such as Electron have been developed that ensure that it is possible to port HTML5 projects into desktop and mobile applications, but those will inherently suffer limitations in ways natively low-level engines will not. HTML5 games will always need to rely on 3rd party conversion tools in order to become available for more and more platforms. If we wish to avoid that limitation, then that leaves us with engines written in C++ that allow for scripting languages and optimizations of some sort.

GM:S’s scripting system is more oriented towards beginners and doesn’t have quite the same flexibility that C# (Unity) or Blueprint (UE4) has. In addition, GM:S has no capacity for extending the functionality of the engine or optimizing code. The latest version has a $100 buy-in (or a severely handicapped free version). Without a reasonable free-to-use option available in addition to all of the other issues, GM:S fails to meet our constraints. That leaves only Unity, Unreal, and Godot.

Graphics Comparisons

There is no doubt that Unreal Engine 4 is currently the reigning champion when it comes to graphical power with Unity coming in at a close second. In regards to 2D renderers, it should be noted that…

  1. Unity has no dedicated 2D renderer as selecting a 2D environment just locks the z-axis / orthogonal view of the 3D cameras in the 3D environment.
  2. Unreal’s dedicated 2D renderer, Slate, can be leveraged through the UMG widget framework to be used in the 3D environment. In a project consisting solely of UMG content therefore, you can more or less use Unreal to get the benefits of operating solely within a 2D renderer. However, all of Unreal’s official tooling and features related to 2D are confined within the non-UMG content (collision, physics, etc.), so it’s not exactly a legitimate “support” for 2D rendering.
  3. Godot has a dedicated 2D renderer (what it started with in fact) and a dedicated 3D renderer with similar APIs and functionality for each.

For those who might wonder why a dedicated 2D renderer is even significant, the main reason is the ease of position computation. If you were to shift an object’s position in 2D, the positions (at least in Godot) are described purely in terms of pixels, so it’s a simple pair of addition operations (one for each axis). An analogous operation in a 3D renderer requires one to map from pixels to world units (a multiplication), calculate the new position in world coordinates (3 additions) and then convert from world space to screen space (a matrix multiplication, so several multiplications and additions). Things are just a lot more efficient if you have the option of working directly with a 2D renderer.

For the 3D rendering side, Unity and Unreal are top dogs in the industry, no doubt about it.

This is a few iterations old, but you get the idea of where they stand (very impressive)

Godot 2.x’s 3D renderer on the other hand left something to be desired. The showcased marketing materials on their website look like this:

Godot 2.x 3D demonstration, from the godotengine.org 3D marketing content

With Godot 3.0, steps are being taken to bring Godot closer to the fold. Pretty soon, we may start to see marketing materials like this:

A test demonstration of the 3.0 pre-alpha’s power, shared in the Godot Facebook group.

I’d say that’s a grand improvement, and this graphical foundation lays the groundwork for an impressive future if the words of Godot’s lead developer, Juan Linietsky, are to be taken to heart. (Edit: Juan recently published an article that goes into MUCH further detail regarding how the 3D renderer works. He also recently updated the docs for the Godot shading language)

It remains to be seen exactly how far this advancement will go, but if this jump in progress is any indication, I see potential for Godot 3.0 to enter the same domain of quality as Unity and Unreal.

Publishing Comparisons

Unity is by far the leader in publishing platform diversity with Unreal coming in second and Godot coming in last.

Unity’s cross-platform support as of July 2017


Unreal Engine 4’s cross-platform support as of July 2017
Godot’s publicly disclosable cross-platform support as of July 2017

Note that for Godot specifically, it actually has the capacity to be integrated with console platforms since it is natively written in C++; all you need is the development kit. For legal reasons, however, Godot’s free and open source GitHub cannot include (and thereby publicize freely) the integrated source code for these proprietary kits. Members of the community have proposed investigating what can be done to circumvent this issue for now. Despite this setback, the PS4 port of the game Deponia was implemented in Godot.

In addition, Godot 3.0 has recently conformed to the OpenHMD API, integrating functionality for all VR platforms that rely on that standard (so that would include HTC Vive, Oculus Rift, and PSVR).


All in all, Unity is still the clear leader here, but both Unreal and Godot provide a wealth of options for prospective developers to publish to the most notable and widespread platforms related to game development. As such, this factor tends to be somewhat irrelevant unless one is targeting release on one of the engine-specific platforms.

Licensing Comparisons

Unity’s free license permits the developer to craft projects with little-to-no feature limitations (only Unity-powered services are restricted, not the engine itself), so long as the user’s total revenue from a singular game title does not exceed $100,000 (since time of writing). The license does limit the number of installations you can have, however, as they are linked to a “Personal Edition” of the software. If you end up exceeding the usage limits, then you must pay for a premium license that involves a $35 (to double the monetary limit) or $125 (to remove the monetary limit) monthly subscription rate.

Unreal Engine 4 likewise has a free license, however its license has no restrictions whatsoever on the size of your team or the number of instances of the engine you are using (distinct from Unity). On the other hand, it has a revenue-sharing license in which 5% of all income over $3,000 per quarter is contributed to Epic Games.


The licensing between these two platforms therefore can be more or less beneficial depending on…

  1. How long you plan to spend developing (if using Unity professionally).
  2. How quickly you expect the revenue from your game to roll in (if it dips into UE4’s 5% cut trigger).
  3. How much total revenue you expect to make from a single game (Unity’s revenue cap per title).

Godot, as you might expect, has no restrictions in any capacity since it is a free and open source engine. It uses the MIT license, effectively stating that you may use it for whatever purposes you wish, personal or commercial, and have no obligation to share any resources or accumulated revenue with the engine developers in any way. You can create as many copies of the engine as you like for as many people as you like. The engine itself is developed through the support of its contributors’ generous free work and through a Patreon that is filtered by the Software Freedom Conservancy.

In this domain, Godot is the obvious winner. The trade-off therefore comes in the form of the additional tooling and effort you as a developer have to invest to develop and publish your game with Godot. This and more we shall cover in the below examinations.

Scripting Comparisons

Unity officially supports Mono-powered C#. With some tweaking, you could potentially use other .NET languages too (like F#). If you end up needing optimizations, you are restricted to the high level language’s typical methods of speeding things up. It would be more convenient and vastly more efficient if one could just directly develop C++ code that can be called from the engine, but alas, this is not the case. Unity also doesn’t have any native visual scripting tools, although there are several paid-for extensions to the engine that people have developed.

Unreal Engine 4 is gaining a stronger and stronger presence due to its tight integration of C++, the powerful, native Blueprint visual scripting language, and its recent addition of Mono C#. Blueprint is flexible, effective, and can be compiled down into somewhat optimized C++ code. Unreal C++ is an impressive concoction of its own that adds reflection and garbage collection features commonly associated with high level languages like C#.

Unreal’s Blueprint visual scripting language

It is in this area that Godot especially shines out from the others. Previous iterations of Godot have had directly implemented C++ and an in-house python-like scripting language called GDScript. It was used after having already tried Python, Lua, and other scripting languages and found all of them lacking in efficiency when dealing with the unique architectural designs that the Godot Engine implements. As such, GDScript is uniquely tailored for Godot usability in the same way that Blueprints for UE4 are.

Later on, a visual scripting system called VisualScript was implemented that could function equivalently to GDScript. Godot 3.0 is also including native support for Mono C# to cater to Unity enthusiasts.

Godot’s VisualScript visual scripting language

The power that truly sets Godot 3.0 apart however is its inclusion of a new C API for binding scripted properties, methods, and classes to code implemented in other languages. This API allows any native or bound language’s capabilities to be automatically integrated with every other native or bound language’s dynamically linked functionality. The dynamically linked libraries are registered as “GDNative” code that points to the bound languages’ code rather than as an in-engine script, effectively creating a foreign interface to Godot’s API. This means that properties, methods, and classes declared and implemented in one language can be used by any other language that has also been bound. Bindings of this sort have already been implemented for C++ (Windows, Mac, and Linux). Developers are also testing bindings for Python (already in beta), Nim, and D. Rust and JavaScript bindings are in the works as well, if I understand correctly.

In comparing these various scripting options, C# will likely have better performance than GDScript, but GDScript is more tightly integrated and easier to use. VisualScript will be the least performant of these, but arguably the easiest for non-programmers to use. If raw performance is the goal, then GDNative will be the most effective (since it is literally native code), but it is the least easiest to use out of these as you have to create different builds of the dynamic library for each target platform.


The “loose integration” this enables will empower any Godot developer to leverage pre-existing libraries associated with any of the bound languages such as C++’s enhanced optimizations/data structures, any C# Unity plugins that are ported to Godot, pre-existing GDScript plugins, and the massive library of powerful statistical analysis and machine learning algorithms already implemented by data research scientists in Python. With every newly added language, users of Godot will not have to resign themselves to the daunting “language barrier” that haunts game development today. Instead, they’ll be able to create applications that take advantage of every conceivable library from every language they like.

Edit: C# was recently merged in, and someone ran a comparison of the performance between GDScript, C#/Mono, and GDNative C++. In addition, here is a post I made on Reddit that goes more in-depth into the relationship between the engine’s scripting languages.

Framework Comparisons

Unity and Unreal have very similar and highly analogous APIs when it comes to the basic assets developers work with. There are the loadable spaces in the game world (the Scene in Unity or Level in Unreal).  They then have component systems and a discrete root entity that is used to handle logic referring to a collection of components (the GameObject in Unity or Actor in Unreal). Loadable spaces are organized as tree hierarchies of the discrete entities (Scene Hierarchy in Unity or World Outliner in Unreal) and the discrete entities can be saved into a generalizable format that can be duplicated or inherited from (the Prefab in Unity or Blueprint in Unreal).

If you want to extend the functionality of these discrete entities, you then must create scripts for them. In Unity this is done by adding a new MonoBehaviour component within the 1-dimensional list of components associated with a game object. You can add multiple scripts and each script can have its own properties that are exported to the editor’s property viewer (the “Inspector”).

Multiples of these scripts can be added to a single GameObject if desired, but there is also no relationship defined between components.

In Unreal, a discrete entity has an Actor-level tree-hierarchy showing its components. Scripts, however, are not components themselves (although scripts can extend components too), but rather things directly added to the Actor Blueprint as a whole. An individual function may be created from scratch or extending/overloading an existing function. One can also create Blueprint scripts disassociated from any entity as an engine asset (called a Blueprint Function Library). The bad news is that Blueprints aren’t just a file you point to, i.e. you can’t just add the same script file to different Blueprints like you can with Unity’s C# files.

Components have their own hierarchy, but are merely variables in the scripting organized by context (event/function/macro) in Actors.

In Godot, things are simplified a great deal. Components, called “Nodes” are similarly organized into discrete entities that can be saved, duplicated, inherited and instanced; however, Godot sees no difference between the way a Prefab/Blueprint would organize their components and the way a Scene/Level would organize the entities. Instead, it unifies these concepts into a “scene” in its entirety, i.e. a Prefab/Blueprint is a GameObject/Actor is a Scene/Level; everything is just a gigantic set of instanceable and inheritable relationships between nodes. Scenes can be instanced within other scenes, so you might have one scene each for your bullet, your gun, your character, and your level (using them as you would a Prefab/Blueprint). Scripts to extend node functionality are attached 1-to-1 with nodes, and nodes can be cheaply added with the attached script being built-in (Saved into the scene file) or externally linked from a saved script file.

A screenshot of Godot 3.0 and its node system from a WIP game jam project I worked in.

This section is more or less just to demonstrate how each engine has their own way of organizing the game data and highlighting the relationships between elements of functionality. In my personal experience, I find Godot’s model to be much more intuitive to reason about and work with once preconceptions from other engines’ tropes are discarded, but to be honest, this is really just a matter of personal taste.

(Edit: for lack of another place to put this, I’m inserting here; Godot will soon be integrating the Bullet physics engine as an option you can toggle in the editor settings.)

Community and Documentation Comparisons

The glaringly divergent quality between the engines is the documentation. Unreal’s Blueprint and C++ documentation pale in comparison to the breadth and depth of Unity’s massive array of concepts, examples, and tutorials, built both by Unity Technologies and the large community. This is a damaging blow, but wouldn’t be so bad if Unreal’s documentation were at least adequate. Unfortunately, this is not the case: Blueprints have some diversity of tutorials and documentation (nothing like Unity’s though), especially from the user base, but Unreal C++’s documentation is abhorrently lacking. In-house tutorials will often times be several versions behind and the Q&A forums can take anywhere from a few days to weeks, months, or even over a year to get a proper response (several engine iterations later when the same issue is popping up still).

The ironic curve-ball in the situation is that Unreal Engine 4 publishes its own source code to its licensed users. One could arguably reference the source code itself in order to teach themselves UE4’s API and best practices. Unfortunately, Unreal C++ tends to be a huge, intimidating beast with custom compilation rules that are not well documented, even in code comments, and very difficult-to-follow code due simply to the complexity of the content. A typical advantage of source code-publishing projects is the capacity to spot a problem with the application, identify a fix, implement it, and submit a pull request, but the aforementioned complexity makes taking full advantage of UE4’s visible source code much more difficult for the average programmer (at least, in my experience and that of other programmers I’ve discussed it with).


Godot Engine’s documentation is stronger than Unreal’s, but still a bit weaker than Unity’s. A public document generator is used for the Godot website documentation while an in-engine XML file is used to generate the contents of the Godot API documentation. As such, anybody can easily open up the associated files and add whatever information may be helpful to users (although they are approved by the community through pull requests).

On the downside this means that it is the developers’ responsibility to learn how to use tools. On the upside, the engine’s source code is beautifully written (and therefore very easy to understand), so teaching yourself isn’t really difficult when you really have to do it; however, that is often unnecessary as the already small community is filled with developers who have created very in-depth YouTube tutorials for many major tasks and elements of the engine.

You can receive fully informative answers to questions within a few hours on the Q&A website, Reddit page, or Facebook group (so it’s much more responsive than Unreal’s community). In this sense, the community is already active enough to start approaching the breadth and depth of Unity’s documentation and this level of detail is achieved with a minute fraction of the user base. If given the opportunity, a fully grown, matured, and active Godot community could easily create documentation approaching the likes of Unity’s professional work.


So, while Unity is currently still the winner here, it is also clear from Godot’s accessibility and elegance that even with a larger community, Godot could easily enhance the dimensions of its documentation and tutorials to compensate for the community’s needs.

(Edit: note, that the community has been doing weekly documentation sprints in anticipation of the 3.0 release. Even with API revisions between 2.1 and 3.0, the docs have already improved by roughly 20% over the previous version’s content in the past 5 weeks alone. If you are interested in assisting, please visit the Class API contribution guide to get involved and discuss your plans / progress with the #documentation Discord channel (Discord link).)

Extension Comparisons

Due to UE4’s code complexity and Unity’s closed-source nature, both engines suffer from the disease of needing to wait for the development teams to implement new feature requests, bug fixes, and enhancements.

UE4 supposedly exposes itself for editor extensions with their Slate UI that can be coded in C++, but 1) Slate is incredibly hard to read and interpret and 2) it relies on the C++ code just to extend the editor as opposed to a simple scripting solution.

Unity does supply a strong API for creating editor extensions though. Creating certain types of C# scripts with Attributes above methods and properties can allow one to somewhat easily (with a little bit of learning) develop an understanding of how to create new tools and windows for the Unity engine. The relative simplicity of developing extensions for the editor is a prime reason why the Unity Asset Store is so replete with available options for high quality editor extensions.

Godot has an even easier interface for creating editor extensions: Adding the ‘tool’ keyword to the top line of a script simply tells the script to run at design time rather than run-time, instantly empowering the developer to understand how to manipulate their scripts for tool development: they need only apply their existing script understanding to the design-time state of the scene hierarchy.


EditorPlugin scripts can also be written to edit the engine UI and create new in-engine types. The greatest boon is that all of the logic and UI of the scripting API is the exact same API that allows them to control the logic and UI of the engine itself, allowing these EditorPlugin scripts to operate using the same knowledge already accumulated during one’s ordinary development. These qualities together make creating tools in Godot unbelievably accessible.

In a completely unexpected, but bewilderingly helpful feature, Godot also helps to simplify the process of team-based / communal extension development: all engine assets can be saved with binary files (the standard option for Unreal and Unity) OR with text-based, VCS-friendly files (.scn and .tscn, respectively). Using the latter kinda makes pull requests and git diffs trivially simple to analyze, so it comes highly recommended.

Another significant difference between Unity and Godot’s extension development is the cultural shift: when looking up something in the Unity Asset Store, you’ll often times find a half-dozen or more plugins for the same feature with different APIs, feature-depth/breadth, and price points.

Godot’s culture on the other hand is one of “free and open source tools, proprietary games”. Plugins on the Godot Asset Library must be published with an open license (most of them use MIT), readily available for community enhancements and bug fixes. There is usually only 1 plugin for any given feature with a common, community-debated implementation that results in a common toolset for ALL developers working with the feature in the engine. This common foundation of developer knowledge and lack of any cost makes integrating and learning Godot plugins a joy.



Given a desire for high accessibility, a strong publishing and community foundation, minimal cost, powerful optimizations, and enhanced extensibility, I believe I’ve made the potential of Godot 3.0’s affect on the game industry quite clear. If offered a chance, it could become a new super-power in the world of top-tier game engines.

This article is the result of my working with the Godot 3.0 pre-alpha for approximately 3 months. I had never investigated it before, but was blown away by the engine when I first started working with it. I simply wished to convey my experience as a C++ programmer and my insight into what the future of Godot might hold. Hopefully you too will be willing to at least give it a try.

Who knows? You might find yourself falling in love all over again. I know I did.


Minecraftian Narrative: Part 7

Table of Contents

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Unlike previous iterations of this series, today we’ll be diving into the field of linguistics a bit more intensely. The reason for my lack of any new posts in a month and a half has been the result of my work on a completely new language which is now approaching an alpha state (at which point, I will theoretically be able to build a full parser for it). Today, I’ll be covering why I decided to invent a language, where it came from, how it is different, and how it all ties into the overall goal of narrative scripting.

Future posts will most certainly reference this language, so if you aren’t interested in the background and just want the TL;DR of the language features and relevance to narrative scripting, then feel free to skip on down to the conclusion where I will review everything.

Without further ado, let’s begin!

Issues With “Toki Sona”

Prior to this post, I had puffed up the possibilities of using a toki pona-derived language, heretofore referred to as toki sona. While I was quite excited about tp’s potential to combine concepts together and support a minimal vocabulary with simple pronunciation and an intuitive second-hand vocabulary (e.g. “water-enclosedConstruction” = bathroom), there were also a variety of issues that forced me to reconsider its adaptation towards narrative scripting.


First and foremost is the ambiguity within the language. Syntactic ambiguity makes it nearly impossible for an algorithm to easily understand what is being stated, and tp has several instances of this lack of clarity. For example, “mi moku moku” could mean “the hungry me eats” or “I hungrily eat” or even some new double-word emphasis that someone is experimenting with: “I really dove into eating [something]” or “I am really hungry”.  With the language unable to clearly distinguish modifiers and verbs from each other, identifying parts of speech and therefore the semantic intent of a word and its relationship to other words is needlessly complicated.

The one remedy I thought of for this would be to add hyphens in between nouns/verbs and their associated modifiers, not only allowing us to instantly disambiguate this syntactic confusion, but also to simplify and accelerate the computer’s parsing with a clear delimeter (a special symbol to separate two ideas). However, this solution is not audibly communicable during speech and therefore retains all of these issues in spoken dialogue, violating our needs. Using an audible delimeter would of course be completely impractical.

The other problem of ambiguity with the language is the intense level of semantic ambiguity present due to the restricted nature of the language’s vocabulary. The previously mentioned “bathroom” (“tomo telo”) could also be a bathhouse, a pool, an outhouse, a shower stall, or any number of other related things. Sometimes, the distinction is minor and unimportant (bathroom vs. outhouse), but other times that distinction may be the exact thing you wish to convey. What happens then if we specify “bathroom of outside”?


One possibility is the use of “ma” meaning “the land, the region, the outdoors, the earth”, but then we don’t know if this is an outdoor bathroom or if it is the only indoor bathroom in the region, or if it is just a giant pit in the earth that people use. The other possibility could be “poka” meaning “nearby” or “around”, but that is even more unclear as it specifies a request purely for an indoor bathroom of a given proximity.

As you can see, communicating specific concepts is not at all tp’s specialty. In fact, it goes against the very philosophy of the language: the culture supported by its creators and speakers is one that stresses the UNimportance of knowing such details.

If you ever happen to speak with a writer, however, they will tell you the importance of words, word choice, and the evocative nature of speech. They can make you feel different emotions and manipulate the thoughts of the reader purely through the style of speech and the nuanced meanings of the terminology they have used. If we are to support this capacity in a narrative scripting language, we cannot be allowed to build its foundation on a philosophy prejudiced against good writing.

Lost and Confused Signpost

The final issue, related to the philosophy, is the grammatical limitations imposed by its syntax.

  1. Inter-sentence conjunctions like English’s FANBOYS (“I was tired, yet he droned on.”) are not entirely absent thankfully: they have an “also” (“kin”) and a “but” (“taso”) that can be used to start the following sentence and relate two ideas. One can even adverbial phrases (the only types of dependent clauses allowed) to assist in relating ideas. However, limitations are still present, and the reason for that is a mix of the philosophy and the (admirable) goal of keeping the vocabulary size compact.
  2. You cannot satisfactorily describe a single noun with adjectives of multiple, tiered details (“I like houses with toilet seats of gold and a green doorway”). This is a problem many have attempted to deal with revolving around the “noun1 pi multi-word modifier” technique that converts a set of words into an adjective describing noun1. Users of tp have debated on ways of combating this. One that I had considered, and which is mildly popular, was re-appropriating the “and” conjunction for nouns (“en”) as a way of connecting pi’s, but because you effectively need open and closed parentheses to accomplish the more complex forms of description, there isn’t really a clean way of handling this.

Through all of the limitations, prejudice of writing, and ambiguity, toki pona, and any language closely related to it, is inevitably going to find itself wanting in viability for narrative scripting. Time to move on.

Lojban: Let’s Speak Logically!

In an attempt to solve the problems of toki pona, a friend recommended to me that I check out the “logical language”, Lojban (that ‘j’ is soft, as in “beige”).

Lojban is unlike any spoken language you have ever learned: it borrows much of its syntax from functional programming languages like Haskell. Every phrase/clause is made up of a single word indicating a set of relationships and the other words are all things that act as “parameters” by plugging in concepts for the relations.


For example, “cukta” is the word for book. If you simply use it on its own, it plugs “book” into the parameter of another word. However, that’s not all it means. In full, “cukta” means…

x1 is a book containing work x2 by author x3 for audience x4 preserved in medium x5

If you were to have tons of cukta’s following each other (with the proper particles separating them), having a full version would mean…

A book is a book about books that is written by a book for books by means of a book.

You can also specifically mark which “x” a given word is supposed to plugin as, without needing to use any of the other x’s, so the word cukta can also function as the word for “topic”, “author”, “audience”, and “medium” (all in relation to books). If that’s not conservation of vocabulary, I don’t know what is.

It should be noted that 5-parameter words in Lojban are far more rare than simpler ones with only 2 or 3 parameters. Still, it’s impressive that, using this technique, Lojban is able to communicate a large number of topics using a compressed vocabulary, and yet remain extremely explicit about the meaning of its words.

Just as important to notice is how Lojban completely does away with the concept of a “noun”, “verb”, “object”, “preposition”, or anything of the sort. Concepts are simply reduced to a basic entity-relation-entity form: entity A has some relationship x? to entity B. This certainly makes things easier for the computer. In addition, while on the one hand one might think this would make things easier to understand when learning (since it is a much simpler system), the fact that it is so vastly different from the norm means that people coming from more traditional languages will have a more difficult time understanding this system, especially given the plurality of relationships that are possible with a single word.


Another strong advantage of Lojban is that it is structured to provide perfect syntactic clarity to a computer program and can be completely parsed by a computer in a single pass. In laymen’s terms, it means that the computer only needs to “read” the text one time to understand with 100% accuracy the “parts of speech” of every word in a sentence. There is no need for it to guess how a word is going to be syntactically interpreted.

In addition, Lojban employs a strict morphological structure on its words to indicate their meaning. For example, each of these “root set” words like “cukta” have one of the two following patterns: CVCCV and CCVCV (C being “consonant” and V being “vowel”). This makes it much easier for the computer to pick out these words in contrast to other words such as particles, foreign words, etc. Every type of word in the language conforms to morphological standards of a similar sort. The end result is that Lojban parsers, i.e. “text readers” are very very fast in comparison to those for other languages.

One more great advantage of Lojban is that it has these terrifically powerful words called “attitudinal indicators” that allow one to communicate a complex emotion using words on a spectrum. For example, “iu” is “love”, but alternative suffixes give you “iucu’i” (“lack of love”, a neutral state) and “iunai” (“hate/fear”, the opposite state). You can even combine these terms to compose new emotions like “iu.iunai” (literally “love-hate”).


For all of these great elements though, Lojban has two aspects that make it abhorrent to use for the simple narrative scripting we are aiming for. It is too large of a language: 1,350 words just for the “core” set that allows you to say reasonable sentences. While this is spectacularly small for a traditional language, in comparison to toki pona’s nicely compact 120, it is unacceptably massive. As game designers, we simply can’t expect people to devote the time needed to learn such a huge language within a reasonable play time.

The other damaging aspect is the sheer complexity of the language’s phonology and morphology. When someone wishes to invent a new word using the root terms, they essentially mash them together end-to-end. While this would be fine alone, switching letters around and having part of the latter consumed by the end of the former is unfortunately very difficult to follow. For example…

skami = “x1 is a computer used for purpose x2”
pilno = “x1 uses/employs x2 [tool, apparatus, machine, agent, acting entity, material] for purpose x3.”
skami pilno => sampli = “computer user”

Because “skami pilno” was a commonly occuring word in Lojban’s usage, a new word with the “root word” morphology can be invented on the fly by combining the letters. Obviously, this appears very difficult to do on the fly and effectively involves people learning an entirely new word for the concept.

All that to say that Lojban brings some spectacularly innovative concepts to the table, but due to its complex nature, fails to inspire any hope for an accessible scripting language for players.

tokawaje: The Spectral Language

We need some way of combining the computer-compatibility of Lojban with the elegance and simplicity of toki pona that omits as much ambiguity as possible, yet also allows the user to communicate as broadly and as specifically as needed using a minimal vocabulary.

Over the past month and a half, I’ve been developing just such a language, and it is called “tokawaje”. An overview of the language’s phonology, morphology, grammar, and vocabulary, along with some English and toki pona translations, can be found on my commentable Google Sheets page here (concepts on the Dictionary tab can be searched for with “abc:” where “abc” is the 3-letter root of the word). With grammar and morphology concepts derived from both Lojban and toki pona, and with a minimal vocabulary sized at 150 words, it approximates a toki pona-like simplicity with the potential depth of Lojban. While it is still in its early form, allow me to walk through the elements of tokawaje that capture the strengths of the other two despite avoiding their pitfalls.

Lojban has three advantages that improve its computer accessibility:

  1. The entity-relation-entity syntax for simpler parsing and syntactic analysis.
  2. Morphological and grammatical constraints: the word and grammar structure is directly linked to its meaning.
  3. The flexibility of meaning for every individual learned word: “cukta” means up to 5 different things.

toki pona has two advantages that improve its human accessibility:

  1. Words that are not present in the language can be estimated by combining existing words together and using composition to construct new words. This makes words much more intuitive.
  2. It is extremely easy to pronounce words due to its mouth-friendly word construction (every consonant must be followed by a single vowel).


“tokawaje” accomplishes this by…

  1. Using a similar, albeit heavily modified entity-relation-entity syntax.
  2. Having its own set of morphological constraints to indicate syntax.
  3. Using words that represent several things that are associated with one another on a spectrum.
  4. Relying on toki pona-like combinatoric techniques to compose new words as needed.
  5. Using a phonology and morphology focused on simple sound combinations that are easily pronounced. Must match the pattern: VCV(CV)*(CVCV)*.

Now, once more, but with much more detail:

1) Entity-Relation-Entity Syntax

Sentences are broken up into simple 1-to-1 relations that are established in a context. These contexts contain words that each require a grammar prefix to indicate their role in that context. After the prefix, each word then has some combination of concepts to make a full word. Concepts are each composed of some particletag, or root, (some spectrum of topic/s) followed by a precision marker that indicates the exact meaning on that spectrum.

The existing roles are… (pronounced like in Spanish):

  1. prefix ‘u’: a left-hand-side entity (lhs) similar to a subject.
  2. prefix ‘a’: a relation similar to a verb or preposition.
  3. prefix ‘e’: a right-hand-side entity (rhs) similar to an object.
  4. prefix ‘i’: a modifier for another word, similar to an adjective or adverb.
  5. prefix ‘o’: a vocative marker, i.e. an interjection meant to direct attention.

Sentences are composed of contexts. For example, “I am real to you,” is technically two contexts. One asserts that “I am real” while the other asserts that “my being real” is in “your” perspective. This nested-context syntax is at the heart of tokawaje.

These contexts are connected with each other using context particles:

  1. ‘xa’ (pronounced “cha”) meaning opening a new context (every sentence silently starts with one of these).
  2. ‘xo’ meaning close the current context.
  3. ‘xi’ meaning close all open contexts back to the original layer.

(These also must each be prefixed with a corresponding grammar prefix)

Examples of Concept Composition:

  1. ‘xa’ = an incomplete word composed of only a particle+precision.
  2. “uxa” = a full word with a concept composed of a grammar prefix and a particle+precision.
  3. “min” = root for pronouns, “mina” = “self”, full “umina” = “I”.
  4. “vel” = root for “veracity”, “vela”= “truth”, full “avela” = “is/are”.
  5. “sap” = root for object-aspects, “sapi” = “perspective”, full “asapi” = “from X’s perspective”.

Sample Breakdown:

“I am real to you” => “umina avela evela uxo asapi emino.”

  1. “umina” {u: subject, min/a: “pronoun=self”}
  2. “avela” {a: relation, vel/a: “veracity=true”}
  3. “evela” {e: object, vel/a: “veracity=true”}
  4. “uxo” {u: subject, xo: “context close”} // indicating the previous content was all a left-hand-side entity for an external context.
  5. “asapi” {a: relation, sap/i: “aspect=perspective”}
  6. “emino” {e: object, min/o: pronoun=you}


It’s no coincidence that the natural grammatical breakdown of a sentence looks very much like JSON data (web API anyone?). In reality, it would be closer to…

{ prefix: ‘u’, concepts: [ [“min”,”a”] ] }

…since the meanings would be stored locally between client and server devices.

This is DIFFERENT from Lojban in the sense that no single concept will encompass a variety of relations to other words, but it is SIMILAR in that the concept of a “subject”/”verb”/”object” structure isn’t technically there in reality. For example:

“umina anisa evelo” => “I -inside-> lie” => “I am inside a lie.”

In this case, “am inside” isn’t even a verb, but purely a relation simulating an English prepositional phrase where no “is” verb is technically present.

These contexts can be used without a complete context to create gerunds, adjective phrases, etc. For example, to create a gerund left-hand-side entity of “existing”, I might say

“avela uxo avela evelo.” => “Existing is (itself) a falsehood.”


You might ask, “how do we tell the difference with something like [uxavela]? Might it be {u: object, xav/e: something, la: something}? Actually, no. The reason the computer can immediately understand the proper interpretation is because of tokawaje’s second Lojban incorporation:

2) Strict Morphological Constraints for Syntactic Roles

Consonants are split up into two groups: those reserved for particles, such as ‘x’ and those reserved for roots, such as ‘v’. The computer will always know the underlying structure of a word’s morphology and consequent syntax. Therefore, given the word “uxavela” we will know with 100% certainty that the division is u (has the V-form common to all prefixes), xa (CV-form of all particles), and vela (CVCV-form of all roots).

Particles can be split up into two categories based on their usual placement in a word.

  1. Those that are usually the first concept in a word.
    1. ‘x’ = relating to contexts (as you have already seen previously)
      1. ‘xa’ = open
      2. ‘xo’ = close
      3. ‘xi’ = cascading close
      4. ‘xe’ = a literal grammar context (to talk about tokawaje IN tokawaje)
    2. ‘f’ = relating to irrelevant and/or non-tokawaje content
      1. ‘fa’ = name/foreign word with non-tokawaje morphology constraints
      2. ‘fo’ = name/foreign word with tokawaje morphology constraints
      3. ‘fe’ = filler word for something irrelevant
  2. Those that are usually AFTER a concept as a suffix (could be mid-word).
    1. ‘z’ = concept manipulation
      1. ‘za’ (zah) = shift meaning more towards the ‘a’ end of the spectrum
      2. ‘zo’ (zoh) = shift meaning more towards the ‘o’ end of the spectrum
      3. ‘zi’ (zee) = the source THING that assumes the left-hand-side of this relation.
        1. Ex. “uvelazi” => that which is something
        2. Shorthand for “ufe avela uxo”.
      4. ‘ze’ (zeh) = the object THING that assumes the right-hand-side of this relation.
        1. Ex. “uvelaze” => that which something is.
        2. Shorthand for “avela efe uxo”.
      5. ‘zu’ (as in “food”) = questioning suffix
      6. ‘zy’ (zai) = commanding suffix
      7. ‘zq’ (zow) = requesting suffix
    2. ‘c’ = tensing, pronounced “sh”
      1. ‘ca’ = future tense
      2. ‘ci’ = progressive tense
      3. ‘co’ = past tense
    3. ‘b’ = logical manipulation
      1. ‘be’ = not
      2. ‘ba’ = and
      3. ‘bi’ = to (actually, it is the “piping” functionality in programming, if you know about that)
      4. ‘bo’ = or
      5. ‘bu’ = xor

All other consonants in the language fall into the “root word” set. With these clear divisions, tokawaje will always know what role a concept has in manipulating the meaning of that word.

I’d also like to point out that informal, conversational uses of these two groups of particles may completely remove the distinction between them. For example, someone may simply say:

“uzq” => “Please.”

This would not actually impact the computer’s capacity to distinguish terms though. I even plan to make my own parser assume that lack of a grammar prefix implies an intended ‘u’ prefix (not that that’s encouraged)

3) Concepts in Tokawaje Exist on Spectra

Most every word in the language has exactly 4 meanings, with 3 non-root concepts using more than that: the grammar prefixes and ‘z’-based word manipulators (as you’ve already seen), and general expressive noises / sound effects which are vowel-only. This technique allows for vocabulary that is flexible, yet intuitive, despite its initial appearance of complexity.

4) Sounds and Structure are Designed for Clear, Flowing Speech

Every concept is restricted to a form that facilitates clear pronunciation and a consistent rhythm. Together, these elements ensure that the language is simple to learn phonetically.

Concepts have the form C (particles/tags) or CVC (roots) along with a vowel grammar prefix and a vowel precision suffix, resulting in a minimum word of VCV or VCVCV.

The rhythm to concepts emphasizes the middle CV: u-MI-na, a-VE-la, etc. Even with suffixes applied to words, this pattern never becomes unmanageable. The result is a nice, flowy-feeling language:

  1. uvelominacoze / avelominaco (“velomina” => a personal falsehood)
    1. u-VE-lo-MI-na-CO-ze (that which one lied to oneself about)
    2. a-VE-lo-MI-na-co (to lie to oneself in the past)


5) Tokawaje Employs Tiered Combinatorics to Invent New Concepts

The first concept always communicates the root “what” of a thing while the subsequent concepts add further description of the thing. This structure emulates toki pona’s noun-combining mechanics.

‘u’, ‘a’, and other non-‘i’ terms are primary descriptors and more closely adhere to WHAT a thing is. ‘i’ terms are secondary descriptors and approximate the additional properties of a thing BEYOND simply WHAT it is. Fundamentally, every concept follows these simple rules:

  1. Non-‘i’ words are more relevant to describing their role’s reality than ‘i’ words.
  2. However, individual words are described more strongly by their subsequent ‘i’ words than they are by other terms.
  3. Multiple non-‘i’ words will further describe that non-‘i’ term such that later non-‘i’ words act as descriptors for the next-left non-‘i’ word and its associated ‘i’ words.

Let’s say I have the following sentence (I’ll be using the filler particle “fe” with an artificially inserted number to reference more easily. Think of each of these as a root+precision CVCV form):

“ufe1fe2 ife3fe4 ife5 ufe6fe7 ife8 avela uxofe9 ife10 afe11”

This can be broken down in the following way:

  1. Any pairing of adjacent fe’s form a compound word in which the second fe is an adjective for the previous fe, but the two of them together form a single concept. For example, “ife3fe4”: fe4 is modifying fe3, but the two together form an adjective modifying the noun “ufe1fe2”.
  2. The subject is primarily described by “ufe1fe2” and secondarily by “ufe6fe7” since they are both prefixed with ‘u’, but one comes later. “ufe6fe7” is technically modifying “ufe1fe2”, even if “ufe1fe2” is also being more directly modified by the ‘i’-terms following it.
  3. Each of those ‘u’ terms are additionally modified by their adjacent ‘i’ term adjective modfiers.
  4. “ife5” is an adverb modifying “ife3fe4”.
  5. The “existence of ” the “ufe1-8” entity is the u-term of the “afe11” relation.
  6. The entirety of that u-term has a primary adjective descriptor of “-fe9” and a secondary adjective descriptor of “ife10”.


Suppose the word for “dog” were “uhumosoviloja” (u/lhs,humo/beast,sovi/land,loja/loyal = “loyal land-beast”). How might you describe a disloyal dog then? You would use an ‘u’ for stating it is a dog (that identifying aspect) and an ‘i’ for the disloyalty (the added on description). The spectrum of loyalty (“loj”) would therefore show up twice.

“uhumosoviloja ilojo” => “disloyal dog”

For clarity purposes, you may even split up the “loja”, but that wouldn’t impact the meaning since “uloja” still has a higher priority than “ilojo”.

“uhumosovi uloja ilojo” => “disloyal dog” (equivalent)

Let’s say there were actual distinctions between words though. How about we take the noun phrase “big wood box room”? Here’s the necessary vocabulary:

“sysa” => “big/large/to be big/amount”
“lijavena” => “rigid thing of plants” => “wood/wooden/to be wood”
“tema” => “of or relating to cubes”
“kita” => of or relating to rooms and/or enclosed spaces”

Now let’s see some adaptations:

  1. ukitasysa => an “atrium”, a “gym”, some space that, by definition, is large.
  2. ukita usysa => same thing.
  3. ukita isysa => a room that happens to be relatively big.
  4. ukita utema ulijavena usysa => cube room of large-wood.
  5. ukita utema ulijavena isysa => cube room of large-wood.
  6. ukita utema ilijavena usysa => room of wooden large-boxes.
  7. ukita itema ulijavena usysa => a cube-shaped room of large-wood.
  8. ikita utema ulijavena usysa => [something] related to rooms that is cube-shaped, wooden, and large.
  9. ukita utema ilijavena isysa => a room of large-wood cubes.
  10. ukita itema ilijavena usysa => the naturally large room associated with wooden cubes.
  11. ikita itema ulijavena usysa =>[something] related to cube-shaped rooms that is a large-plant.
  12. ukita itema ilijavena isysa => the room of large-wood boxes.
  13. ikita itema ilijavena usysa => [something] related to plant-box rooms that is an amount. (an inventory of greenhouses or something?)
  14. ikita itema ilijavena isysa => [something] related to rooms of large-wood boxes.
  15. ukita usysa utema ilijavena => large room of wooden boxes.
  16. ukita utema usysa ilijavena => room of wood-amount boxes.
  17. ukita utemasysa ilijavena => room of wooden big-boxes.
  18. ukita usysa iba ulijavena itema => A big-room and a cube-related plant.


Some of these are a little crazy and some of them are amazingly precise. The point is, we are achieving this level of precision using a vocabulary closer to the scope of toki pona. I can guarantee you that you would never have been able to say any of this in a language as vague as TP nor will it ever try to approximate this level of clarity. I can likewise guarantee that Lojban will never have a minified version of itself available for video games. Good thing we don’t need we have an alternative.


As you can see, tokawaje combines the breadth, depth and computer-accessibility of Lojban with the simplicity, intuitiveness, and human-accessibility of toki pona.

For those of you wanting the TL;DR:

The invented language, tokawaje, is a spectrum-based language. Clarity of pronunciation, compactness of vocabulary (150 words), and combinatoric techniques to invent concepts all lend the language to great accessibility for new users of the language. On the other hand, a sophisticated morphology and grammar with clear constraints on word formations, sentence structure, and their associated syntax and semantics result in a language that is well-primed for speedy parsing in software applications.

More information on the language can be found on my commentable Google Sheets page here (concepts on the Dictionary tab can be searched for with “abc:” where “abc” is the 3-letter root of the word).

This is definitely the longest article I’ve written thus far, but it properly illuminates the faults with pre-existing languages and addresses the potential tokawaje has to change things for the better. Please also note that tokawaje is still in an early alpha stage and some of its details are liable to change at this time.

If you have any comments or suggestions, please let me know in the comments below or, if you have specific thoughts that come up while perusing the Google Sheet, feel free to comment on it directly.

Next time, I’ll likely be diving into the topic of writing a parser. Hope you’ve enjoyed it.


Next Article: Coming Soon!
Previous Article: Relationship and Perception Modeling

Minecraftian Narrative: Part 6

Table of Contents

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Previously, we identified two narrative AIs: the StoryMind that manages story development and content generation behind the scenes, and the Agent that simulates the behaviors of a character. The Agent consults with a Character while interpreting narrative scripting input. It then relays instructions to the Vessel that executes those instructions in the virtual world on behalf of the Character. Today, we’ll explore how an Agent could model socio-cultural constructs, account for multiple layers of interactive perceptions, and integrate narrative scripting into each of these.

Vessel = gameplay logic, Character = personnel record, Agent = interpretation AI logic

Amorphic Relationship Abstractions

In games, programmers will often construct objects in the game world based on a flexible design called the “Component” design pattern. This technique builds game objects less by focusing on a hierarchy (a Dragon is a Beast is a Physical is a Renderable is an Object), and more by attributing generic qualities to them which can be added or removed as needed. The objects then simply function as amorphous containers for these “components” of behavior. You don’t have a “dragon”, you have a “fire-breathing”, “flying”, “intelligent” “serpentine” and “animalistic” object that “occasionally attacks cities” which we simply label as a dragon. A player could then talk with the dragon and convince it to become more peaceful mid-game. The “Component” system is what allows us to dynamically change the dragon Character by simply removing the city-attacking behavior.


This same model would seem to be extremely effective at describing our relationships in life. Relationships are amorphous and are often interpreted by context: which behaviors actually exist between two entities, and which behaviors are expected. Let’s say you are trying to understand whether you are in a “friendship” relationship with someone. If the other person isn’t doing what you are expecting a friend to do, then the likelihood that your unknown, actual relationship is the suspected one decreases. On the flip side, if you expect your friends to quack like a duck, and this random person does quack like a duck at you, then you have found someone who is likely to become your friend (though, you and she would be a bit weird). Critical to this is how each Character may have its own definition of what behaviors constitute “friendship.”

In addition, when one evaluates the satisfaction of a relationship, one typically focuses on the behaviors one wishes to engage in with others. However, the person doesn’t then start engaging in new behaviors immediately in the context of their old relationship; they first prioritize changing the relationship itself, so as to make the sought-after behavior more acceptable to the other party.


For example, if a boy likes a girl, he shouldn’t (necessarily) immediately go to her home and declare his love, but perhaps get her to first have a “familiar”, then “associate”, and then “friend” relationship first (though the way a relational path from relationship A to B is calculated for any given Character would be a function of the Character’s personality).

The implication is that these kinds of procedurally generated relational pathways can lead to characters that naturally develop a variety of human-like behaviors as they decide on a goal, calculate a possible social path towards that goal, and then further break down ways in which to change the situation they are in to meet their goals. This is related to the concept of hope. When you hope for someone to engage in a given behavior, then you are really stating that you will be more satisfied if you are in a relationship where those kinds of behaviors are expected (and where the person actually does those behaviors, indicating that they actually are in that relationship with you).


For example, a “daughter” entity D and a “father” entity F exist in the following way:

  • D & F both have the same expectations of F such that they both agree F is a “biological father” with D (he is responsible for impregnating her mother).
  • D & F both have the same expectations of each other such that they both agree F is a “guardian” with D (he houses/feeds/protects her, & pays for her healthcare/schooling, etc.).
  • D & F have different expectations of each other such that F believes he has a “fatherhood” relationship with D, but D does not believe this. F is always working, and D wishes he would play with her more often and come to her public achievements in school. As such, D is not satisfied since her conception of the “fatherhood” relationship is not the same.
  • Because D and F can both have different variations of the same relationship expectations, an AI will be able to support a system in which D and F may talk to each other “about” the same topic, but be thinking of totally different things (simulating the naturally confusing elements of the human condition). This is because the label for the relationship is equivalent, but the definition of each person’s idea of the relationship consists of different behaviors.

Tiered Relationship Expectations

Furthermore, the variations in relationship expectations may diverge at the individual level or group level. We may be able to assume that the vast majority of people within a given “group” have similar beliefs regarding one topic or another. However, we must also consolidate a hierarchy of relational priorities: for any given person, individual expectations override any group-level ones, and different groups will have various degrees to which they influence the individual’s social expectations of others. Let’s take The Legend of Korra as an example.


In this world, some people, called “benders,” can control a certain element (fire, earth, water, or air). Previously, the differences between types of benders and the cultures they came from led to conflict in the world. In “Republic City” however, those people can now live in peace with one another. This represents a “national” group with cultural expectations of uniting people despite their different cultures.


But for those who do not have powers at all, they are subject to the prejudice and general economic superiority of the “benders”. From there, spawns the political activist and terrorism group: the Equalists. This group adds another layer of people on top of the “national” group layer. An Equalist who still believes in the capacity for people to unite is simply someone who is not as loyal to the Equalist cause. Whether the person still has this hope for a positive relationship between the Republic City entity and themselves is something that others may notice and consider when evaluating the actions of this person. The Equalist leader will see this person as someone who must be further manipulated to the cause whereas peacekeepers will seek to redirect this person’s efforts of reform wrought from their emotions.

Finally, we have the individual level, which supersedes all group-level social expectations. Say one of these questionably-loyal Equalists also has another expectation that relates to equality: they believe a city should always be concerned about the well-being of the diversity of animals in the Avatar world. Animal-care efforts by the city therefore play a larger role in currying favor in this particular person, even if their Equalist position still puts them in a climate of distaste for the city. This person, like many who might join that organization, likely have a variety of internal conflicts that they are managing, expectations and needs that are battling for dominance of the mind. This is how our Agent’s should take into account decision-making: through a diverse conflict of interests.


Relationship Modeling

So, how to actually take this relational concept and model it in a way the computer can understand? Well, let’s first define our terms:

  • Narrative Entity: A basic “thing” in the narrative that has narrative relevance. Can be a form of Life, a non-living Object, a Place, or an abstract idea or piece of Lore. This is “what” the thing is and implies various sorts of properties. It also places default limits on things (for example, a Lore cannot be interacted with physically).
  • Character: A Narrative Entity that has a ‘will’, i.e. can desire for itself a behavior. This is “who” the thing is (ergo, it has a personality) and implies what sorts of behaviors it would naturally engage in.
  • Role: The name a Narrative Entity assumes under the context of its behaviors in a Relationship.
  • Relationship: The set of behaviors that have occurred between two Narrative Entities. They will always be binary links between NEs and may or may not be bidirectional, i.e. an Entity may not even have to do anything to be in a Relationship. It may not even be aware that it is in a Relationship with another entity.

A behavior as we will see it is defined by some action or state change. For actions, there is a source for this behavior and an object. As such, we can graphically portray transitive behaviors as directed lines running from one Narrative Entity to another. For intransitive verbs, we simply have a directed line pointing to a Null Entity that represents nothingness.

Rather than simply have these be straight lines however, it is easier to think of them as longitudinal lines between points on a globe.

Renderings may not necessarily place them at equidistant positions. This is merely the simplest rendering.
At each pole is a Narrative Entity and the entirety of the globe encompasses the actual relationship between the two. We can then define a set of “ideal” relationships that have their own globes of interactions. By checking the degree to which the ideal is a subset of the actual, we can calculate the likelihood that the actual includes the idealized relationship. This is an example of set logic in mathematics and its applications in identifying and relating relationships.

I further propose that these globular relationships have a sequence of layers: a core globe summarizing the history of behaviors that have occurred between the two entities, and an intermediate layer composed of hoped-for behaviors for any given Character.

The intermediate layer is far more complex since it is both hypothetical and subjective between any 2 Characters (visualized as two clearly-divided hemispheres) or a Character and a Narrative Entity (a globe).  The intermediate (hemi)sphere(s) would be calculated from an algorithm that takes into account the historical core of the relationship and the associated Character’s personality. Given Character goals X and past interactions Y, what type of relationship, i.e. what collection of behaviors does the Character wish to have with the target of the relationship?

Picture each division of this orange as the source hemisphere of two respective Characters: clearly divided, yet maintaining the same directed-lines-as-globe structure.

Perception Modeling

Furthermore, we must ensure that we can simulate the accumulation of knowledge and the questionable nature of it: How are we to model perceptions of knowledge, e.g. “I suspect that you ate my cookies.”

In this scenario, Person 1 is fairly certain Person 2 stole their cookies, but Person 2 has not yet even realized that Person 1’s cookies are missing. Person 2 also does not know how Person 1 obtained the cookies.
For this, we must allow even behaviors themselves to be abstracted into Narrative Entities that can be known, suspected, or unknown. Without this recursiveness, without the ability to form interactions between Characters and knowledge of interactions, you cannot replicate more complicated scenarios such as…

  • A actually knows a secret S1.
  • B hopes to know S2.
  • A suspects B wants to know S1 and therefore attempts to hide their knowledge of S1 from B.
  • B has reason to believe that A knows S2, so B pays more attention to A, but tries to avoid revealing this suspicion to A.
  • A has noticed B’s abnormal attention directed at him/her, so A surreptitiously engages in a behavior X to help hide the “way” of learning about S1.
  • C witnesses X and tells B about it, so B is now more confident that A knows about S2.
  • (We don’t even necessarily know if S1 and S2 are the same secret).
  • etc.

With this quick example, you can see how perceptions need to be able to have various degrees of confidence in behaviors (actions and state changes) to help inform the mentalities of Agents.

Narrative Scripting Integration

As far as codifying these Entities and Behaviors goes, that is where the narrative scripting comes in to play. Every Entity, every Behavior, and therefore every Relationship is described solely in terms of narrative scripting statements. This is to prevent situations where the technology must do an intermediate translation into another language during interpretation of scripted content into logical meaning.

So, for example, Cookies might have the following abstract description:

Cookies are…

  1. a bread-based production.
  2. have a flavor: (usually) sweet.
  3. have a shape: (usually) small, circular, and nearly flat.
  4. have a source material: bread-based semi-solid
  5. have a creation method: (usually) heated in a box-heat-outside (i.e. “oven”, distinct from the box-heat-inside, i.e. “microwave”).

These properties are defined in an order of priority such that if something were to refer to an entity that is a bread-based treat that is small and circular, the computer would have a higher percentage confidence in evaluating that statement as the entity that shares the other qualities “sweet”, “made in an oven”, “nearly flat”, etc. vs another entity described as having a different shape or a different taste.


With innumeral globes of interactions and perception lines linking everything together, a fully-rendered model might look something like this:

Some of you may recognize this from the article on “Modeling Human Behavior and Awareness”
This concept has really grown out of a pre-existing theory on how to model these same kinds of behaviors that I developed. No doubt it will receive revisions as an actual implementation is underway, but before we get to that, we’ll have to dive once more into the field of linguistics.

The great break in content between articles here is because I’ve been hard at work on developing my own constructed language that is quite distinct from Toki Pona/Sona. To hear about the reasons why, and what form this new language will take, please look forward to the next article.

As always, comments and criticisms are welcome in the comments below. Cheers!

Next Article: Evolution of Toki Sona to “tokawaje”
Previous Article: Dramatica and Narrative AI

Minecraftian Narrative: Part 5

Table of Contents

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Today’s the day! We’ve gotten an idea of what form Toki Sona-based narrative scripting will take, and we’ve examined some of the concerns regarding its integration and maintenance with code. Now we’re finally going to dive into my favorite part: theorizing the behavior of classes that would actually use Toki Sona and react.

The most brilliant illustrations of media, in my opinion, are those which exhibit the Grand Argument Story. These stories have an overarching narrative with a particular argument embedded within, advanced throughout the experience by the main character and those he or she meets as they personify competing, adjacent, or parallel ways of thinking.

But how are we to teach a computer the narrative and character relationships as they appear to us? Thankfully, a well-fleshed out narrative framework already exists to help us as we figure it out. Its name is Dramatica, and from it, we shall design the computer types responsible for managing a dynamic narrative: the Character, Agent, and StoryMind.

Brief Dramatica Overview

The Dramatica Theory of Story is a framework for identifying the functional components of a narrative. In its 350-page introductory book which is available for free on their website (the advanced book can be found over here too), it defines a set of story concepts that must exist within a Grand Argument Story in order for it to be fully fleshed out. If anything is missing, then the story will be lacking. To be honest, the level of detail it gets into is rather jaw-dropping as a writer. Its creators even had to create a software application just to help writers manage the information from the framework! How detailed is it? Check this out…

Dramatica defines four categories of Character, Plot, Theme, and Genre.

It also defines 4 “Throughlines” which are perspectives on the Themes.

  • Overall Story (OS) = The story summarized as everyone experiences it. A dispassionate, objective view.
  • Main Character Story (MC) = The story as the main character experiences it. The character we relate to, experiencing inside-out.
  • Influence Character Story (IC) = The story as the influential character experiences it. The character we sympathize/empathize with, experiencing from the outside-in.
  • Relationship Story (SS) = The story viewed as the interactions between the MC & IC. An extremely passionate, objective view.

Within Theme, there are 4 “Classes” that have several subdivisions within them.

  • Universe: External/State => A Situation
  • Physics: External/Activity => An Activity
  • Psychology: Internal/Activity => A Manner of Thinking
  • Mind: Internal/State => A State of Mind

One Throughline is matched to each of the Classes so that, for example, the MC is mainly concerned about dealing with a state of mind, the IC is trying to avoid a situation related to his/her past, the community at large is freaking out about the ongoing activity of preparing for and running a local tournament, and there is an ongoing difference in methodologies between the MC and IC that draws tension between them.

Each Class can be broken down into 64 elements. Highlighted: Universe.Future.Choice.Temptation Element.

For each Class, you select 1 Variation of a Concern per story. The 4 Plot Acts (traditionally exposition, rising action, falling action, and denouement) each then shift between the 4-Element quad within the chosen Variation. Since Variations each have a diagonal opposite, diagonal movements (a “slide”) don’t change the topic Variation as intensely as shifting Variations horizontally or vertically (a “bump”).

This slideshow requires JavaScript.

For example the Universe.Future.Choice variation has the two opposing Elements, “Temptation” and “Self-Control” plus the other two “Logic” and “Feeling”. Notice these are two distinct, albeit related spectra of the human experience that come into play when making decisions about the future regarding an external situation that must be dealt with. Shifting topics from Temptation to Self-Control wouldn’t be as big of a change as going to Logic or Feeling since the former deals with the same conflicting pair of Elements.

Each of those Elements can be organized with the Acts in a number of permutations. 3 patterns arise, each of which have 4 orientations and can be run forwards or backwards (2). That gives 24 possible permutations for each Variation. 16 Variations per class, 4 Classes per story, and then times 4 again since each of the 4 Throughlines can be paired with a Class. Altogether, that comes out to 6,144 possible Plot-Theme permutations.

The Theme Classes are also matched up with Genre categories which can help the engine identify what sort of content to create at a given point of the story (doesn’t increase multiplicity).

The merging of Plot with Genre

On top of that, there are the Characters to consider. There are 8 general Archetypes, each of them composed by combining a Decision Characteristic and an Action Characteristic  for each of 4 aspects of character: their reason for toiling, the way they do things, how they determine success, and what they are ultimately trying to do.


You can make any character by combining 2 Characteristics from 2 unopposed Archetypes. So, (7!) permutations of any given characteristic within an aspect (not matching up with an opposite for each of them). 5,040 * 4 aspects * 2 characteristics = 40,320 permutations of Characters, optimally.

Finally, there’s the number of Themes that can be delivered by the external/internal successes and failures of the MC…

4 Possibilities

…and whether the MC and IC remained steadfast in their Class or changed (e.g. did they stay/change their state of mind?) and the success/failure thereof.

This slideshow requires JavaScript.

That makes 4 * 4 possible endings: 16.

PHEW! Okay, now, altogether that’s 16 endings * 40,320 characters * 6,144 plots…

Carry the 3…there we go:

3.96 billion. Stories.

And that’s without even “skinning” them as pirate, sci-fi, fantasy, take your pick.

Needless to say, these kinds of possibilities are EXACTLY the sort of variation we should be looking for in procedural narrative generation. Even if you knocked out the Informational genre in the interest of counting only the  non-edutainment games, that still leaves about 2.97 billion possibilities. Good odds, I say.

Also, keep in mind, any given video game will often times have several sub-stories within the overarching story, ones where minor characters have their own stories to explore and see themselves as the Main Character and Protagonist of their own conflict. In these stories, you, the original main character, may play the role of Influence Character (think Mass Effect 2 loyalty missions if you’ve ever played that: every character’s unique storyline is critically affected by the decisions you make while accompanying them for a personal, yet vital journey). Assuming any given story has, say, 9 essential characters (pretty small number by procedural generation standards, but pretty normal for children’s books), that would imply any single gameplay experience may involve 26.73 billion story arrangements.

It isn’t just Dramatica’s variability that makes it so appealing though. Each of these details are designed to be clearly identified and catalogued. This has two important consequences. The first is that the engine will know what goes into into making a good story and will therefore know how to create a good story structure from scratch. The second, and far more important to us, is that the engine will know when and how any of these qualities are not present or properly aligned. It will therefore understand what has happened to the story when the player changes things and how to fix them. Even better, because of its understanding of related story structures, it will even be able to adapt with completely new story forms should it wish to.

Head hurting yet? Fantastic! Let’s dig into characters as computer entities.

Characters & Agents

While Dramatica gives us the functional role of Characters, it doesn’t really flesh them out properly. Unfortunately, writers don’t really maintain a consolidated list of brainstorming material, but you can find several odds and ends around the Internet (list of character needs, list of unique qualities for realistic characters, and a character background sheet, for example). Any and all of these can be used to help flesh out and define the particular aspects of our Characters, beyond just their functional role.

The main interest we have with these brainstorming materials is to define a set of fields that an AI can connect Toki Sona inputs to. Given some Toki Sona instruction A, a definition of Character B, and a certain Context C, what course of action D should I take? Answering this question is the job of the Agent.

What exactly does an Agent entail? They would be the singular existence that represents the computer logic for the entirety of an assigned Character. In our case, we’re going to define a Character as ANY Narrative Entity that has (or could resume having) a will of its own. A Narrative Entity would simply be anything that requires a history of interactions with it to be recorded such as a Life, an Object, a Place, or a piece of Lore.


Notice that characters don’t have to be living beings specifically. For example, an enchanted swamp may have an intelligence living amongst the trees. It would most certainly be a Character; however, it would also definitely be a Place that people can enter, exit, and reside in. As a swamp entity would be the embodiment of both the land, the plants, and the animals within, one could also extend its attributes to Life as well. As a result, we’d have the swamp Agent that accesses the Character which in turn maintains properties of both the Life and Place for the swamp Narrative Entity.

Sample low-effort UML Class Diagram for the Agent Subsystem (made with UMLet)

In the diagram above, we specify that a single Agent is responsible for something called a Vessel rather than for a Character directly. What’s more, the Vessel can “wear” several Characters! What is the meaning of this?

Let’s say we wished to create a Jekyll & Hyde story. Although Jekyll and Hyde have different personalities, they also share a body. Whatever one is doing, wherever one is, the other will also be doing the moment they switch identities. This relates back to assets too. Whatever one sprite/model animation will be doing, the other will also be doing when those assets are switched to another set. In this way, Characters and Vessels are fully changeable without affecting the other. A multiple personality character might change Characters while not changing Vessels. A shapeshifting character might change Vessels while not changing Characters. In the case of Jekyll and Hyde, it would be a swap for both Character and Vessel as their personalities and bodies are BOTH different, but it will always be tied to the same location and activity at the time of switching.


So, the Agent is just an AI that doesn’t care what it’s controlling or to what ends. It looks to the Character to figure out what it narratively should and can do, and it issues instructions based on that to the Vessel. It doesn’t care whether the Vessel knows how to do it. It simply assumes the Vessel will know what the instructions mean. In the process, we’ve divorced the concept of a Character 1) from the in-story and in-engine thing that they are embodied as and 2) from the logic that figures out what a given Character should do given a set of Toki Sona inputs from the interpreter.

The last important thing to note about the Characters and Agents here is that the Agents are informed, context-wise, by their associated Characters. As such, an Agent’s decisions are constrained by their Vessel’s current Character; only its acquired knowledge, background of experience and skills, and personality will invoke behavior. An Agent will therefore factor into its decision-making the Character’s history of perceptions, likes and dislikes, attitude, goals, and everything else that constitutes the Character. It then translates incoming Toki Sona instructions into gameplay behavior. For example, what might a Character do if asked, “What do you know about the aliens?”


Maybe they don’t know much about the aliens. Or maybe they do, but it’s in their best interest to only reveal X information and not Y. But maybe they also really suck at lying, so you can see through it anyway. How will they know exactly what to say? How will they say it? Does the personality invite a curt, direct response, or do they swathe the invading aliens with adoration and delight in a giddy, I’m-too-obsessed-with-science kinda way?

The StoryMind

Finally, we address the overall story controls: the StoryMind. In Dramatica, the StoryMind is the fully formed mental argument and thought-process that the story communicates. In our context, the StoryMind is the computer type responsible for delivering the Dramatica framework’s StoryMind. It understands the possible story structures and makes decisions regarding whether the story can reasonably deliver the same themes with the existing Characters and Plot or whether it will need to adjust.

The StoryMind will have full and total control over that which has yet to be exposed to a human player within the story. It’s job is to generate and edit content to deliver a Grand Argument Story of some kind to the player. What might this look like?

Story time:


Typical Fantasy RPG world/game. You’re a strength-and-dexterity-focused mercenary and you’ve developed a bit of a reputation for taking on groups of enemies solo and winning with vicious direct onslaughts. You’re walking through town and come across a flyer about a duke’s kidnapped heir (one of a few pre-generated premises made by the StoryMind). You ask a barkeep about it (and it alone), so the StoryMind begins to suspect that you may be interested in pursuing this storyline further (rather than whatever other premises it had prepared for you). It therefore begins to develop more content for this premise, inferring that it will need that story information soon. In fact, it takes the initiative.


You are blocked in the road by a woman named Steph who overheard you outside the bar and wishes to accompany you on your journey to rescue the heir. She says that she’s a sorceress with some dodgy business concerning the duke and she needs a bargaining chip. Let’s say you respond with “Sure. I only want the duke’s money,” (in Toki Sona of course). All of a sudden, the StoryMind knows a couple of things:

  1. You care more about the reward money than pleasing the duke.
  2. Because you have already invited risk into your relationship with the yet-to-be-met, quest-giving duke, you are even more likely to behave negatively towards this particular duke and his associates in the future. You also might have a natural bias against those of a higher social status (something it will test later perhaps).
  3. You have some level of trust towards Steph, though it’s not defined.
  4. You are not a definitive loner. You accepted a partnership, despite your past as a solo mercenary. But how deep does this willingness extend? It’s possible it might be worth testing this too.

Since you may have related goals, the StoryMind sets her up as the Influence Character. It randomly decides to attempt a “friendly rivals / romance?” relationship (partnership of convenience), modifying her Character properties behind the scenes so that she is similar to you (based on your actions and speech).


Along the way, a group of goblins ambush and surround you both, so you dash in to slaughter the beasts. The StoryMind may have been designing Steph to support you, but unbeknownst to you, in the interest of generating conflict, it changes some of Steph’s settings! Steph yells for you to stop, but you ignore her and slash through one of them to make an opening. In response, Steph sparks a blinding light, grabs your hand, and runs away in the ensuing chaos.


As soon as you’re clear, she starts yelling at you, asking why you wouldn’t wait. After you get her to calm down a bit and explain things, she confides that she is hemophobic and can’t stand to see, smell, or be anywhere near blood. She’d prefer to stealthily knock out, sneak passed, trick, or bloodlessly maim those who stand in her way. How will you react? Astonishment? Scorn? Sympathy? Is this a deal breaker for your temporary partnership? Remember, she’s always paying attention to you, and so is the StoryMind. This difference in desired methodologies is but a small part of the narrative the StoryMind is crafting.

  • Throughline Type: Class.Concern.Variation.Element, Act I
    • Description
    • Genre
  • Overall Story Throughline: Physics.Obtaining.SelfInterest.Pursuit
    • A dukedom heir has been kidnapped.
    • Entertainment Through Thrills: Pursuit of an endangered royalty.
  • Influence Character Throughline: Mind.Preconcious.Worry.Result
    • Steph is worried about how to deal with her hemophobia. (StoryMind shortly generates this afterward=) She can’t find work beast-slaying or healing because of it, and is now low on money. The duke is evicting her, despite her frequent requests for bloodless work as payment. Everything’s so stressful, and it’s all because of that stupid blood!
    • Comedy of Manners: the almighty sorceress, the bane of beasts and harbinger of health, brought down by the mere sight of blood.
  • Relationship Throughline:
  • You’d prefer to hack away at enemies, but she can’t stand blood and prefers alternative approaches to removing obstacles. As such, you each have different manners of thinking about how you feel obstacles should be dealt with.
  • Growth Drama


  • Main Character Throughline: Universe.?
    • If nothing interrupts the progress of the other 3, the StoryMind is fit to throw external state-related problems your way, and these problems will necessarily dig into deeper, thematic issues. For example…
    •  Universe.Progress.Threat.Hunch
      • You eventually learn that those who took the heir have loose ties to the duke himself. Since you’re in pursuit to rescue him/her, you have a hunch that you may be a target soon as well. You need to unearth the mystery surrounding this. Does this impact your ability to trust the various characters you come across?
      • Entertainment Through Atmosphere: You’re experiencing a fantasy world!



And to think, if you’d just said, “No, thanks. I’m more of a loner,” at the beginning, Steph might instead have developed as a hindering rival Influence Character who tries to steal the heir for herself, popping in and out of the story when you least expect it! Does she even know about the duke’s possible relation to the kidnapping? Too bad we’ll never find out. After all, you didn’t say that. The characters and experiences in this world are real and permanent. You live with your choices, build relationships, and engage with a game world that truly listens to you, more intimately than any other game has before.


I have found that Dramatica is an excellent starting point from which to build story structures and inform our StoryMind narrative AI of how to craft and manipulate storylines and characters. I hope you too are interested in the potential of this sort of system so that one day we might see it in action.

Also, it’s entirely possible I might have slightly messed up some calculations concerning the Dramatica system as the book doesn’t do a great job of clearly defining the relationships in one place (it’s sort of scattered about in the chapters). As far as I can tell, I’ve got them right, but I don’t terribly favor my math skills. I’d be happy to correct any mistakes someone notices.

In the future, expect to find an article diving into the hypothetical technical representation of Agents: their relationships, perceptions, and decision-making. Again, I’d love to hear from you below with comments, criticisms, and/or questions. Cheers!

Next Article: Relationship and Perception Modeling
Previous Article: Toki Sona Implementation Quandries

Minecraftian Narrative: Part 4

Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


At this point, I’ve communicated the basics of the Toki Sona language (a “story-focused” Toki Pona), its potential for simply communicating narrative concepts, and the types of interfaces and games that could exploit such a language.

This time, we’ll be diving into some of the nuts and bolts that might revolve around the actual interpretation of Toki Sona and how it might tie into code. An intriguing array of questions come into play due to Toki Sona’s highly interpretive semantics. The end result is a sort of exaggerated problem domain taken from Natural Language Processing. How much information should we infer from what we are given? How do we handle vague interpretations in code? And what do we do when the language itself changes through usage over time? Let’s start thinking…

Variant Details In Interpretation

What we ultimately want in a narrative engine is to be able to craft a computer system that can dynamically generate the same content that a human author would be able to create. To accomplish this, we must leverage our main tool: reducing the complexity of language to such an extent that the computer doesn’t have to compete with the linguistic nuances and artistic value that an author can imbue within their own work. Managing the degree to which we include these nuances requires a careful balancing act though.

For example, “It was a dark and stormy night…” draws into your mind many images beyond simply the setting. It evokes memories filled with emotions which an author may use to great effect in their manipulation of the audience’s emotional experience. Toki Sona’s focus on vague interpretation leaves many different ways of conveying the same concept, depending on one’s intent. Here are some English literal translations:

  • Version A: “When a black time of monstrous/fearful energy existed…”
    • tenpo-pimeja pi wawa-monsuta lon la, …
  • Version B: “This is the going time: The time is the black time. The air water travels below. As light of huge sound cuts the air above…”
    • ni li tenpo-kama: tenpo li tenpo-pimeja. telo-kon li tawa anpa. suno pi kalama-suli li kipisi e kon-sewi la, …

You’ll notice that version A jumps directly into communicating the tone that the audience should understand. As a result, it is far less particular in setting the scene’s physical characteristics about the weather.

Version B on the other hand takes the time to establish scene details with particulars (as specific as it can get, anyway). Although it takes several more statements to present the idea, it eventually equates itself loosely with the original English phrase. In this way, it manages to conjure emotions in the audience through imagery the same way the original does, but you can also tell that the impact isn’t quite as nuanced.


One of the key aspects of Toki Sona is that it is unable to include two independent phrases in a single statement. It is also unable to include anything beyond a single, adverbial dependent clause in addition to the core independent clause. These restrictions help ensure that each individual statement has a clear effect on interpretation. Only one core set of subjects and one core set of verbs may be present. Everything else is simply details for the singularly described content. As a result, a computer should be able to extract these singular concepts from Toki Sona more easily than it would a more complex language.

So while both database queries and statistical probability calculations are factors in interpreting the text, the algorithms will rely more on the probabilities due to the diminished size of database contents (not as many vocabulary terms to track). This is also likely because words frequently have several, divergent meanings that are relevant to a given context. As such, algorithms will often need to re-identify meanings after-the-fact once successive statements have been interpreted.

Our difficulty comes in when we must identify how interpreted statements are to be translated into understood data. Version B is far more explicit about how things are to be added, while version A relies far more heavily on the interpreter to sort things out. How many narrative elements should the interpreter assume based on the statistical chances of their relevance? The more questionable elements are added, the more items we’ll need to revisit for every subsequent statement. After all, future statements could add information that grants us new insight into the meaning of already stated terms.

To illustrate this, let’s break down how the interpreter might compose a scene based on these statements into pseudocode, starting with version B. We’ll leave English literal translations in and identify them as if they were Toki Sona terms.


Version B
contextFrames[cxt_index = 0] = cxt = new Context(); //establish 1st context

"This is the going time:" =>
contextFrames[++cxt_index] = new Context(); //':' signifies new context
cxt = contextFrames[cxt_index]; //future ideas added to new context
cxt += Timeline(Past); //Add the "time that has gone" to the context

"The time is the black time." =>
cxt += TimeOfDay(Night) //Add the "time of darkness" to the context

"The air water travels below." =>
cxt += Audio(Rain) + Visual(Rain) // Add "water of the air" visuals. Audio auto-added.

"As light of huge sound cuts the air above..." =>
cxt += {Object|Visual}(Light+(Sound+Huge)) >> Action(Cut) >> Visual(Sky+Air);
cxt += Mood(Ominous)?
// The scene includes a light that is often associated with loud noises. These lights (an object? A visual? Is it interactive?) are cutting across the "airs in the sky", likely clouds. All together, this combination of elements might imply an ominous mood.

Version A
contextFrames[cxt_index = 0] = cxt = new Context(); //establish 1st context

"When a black time of monstrous/fearful energy existed..." =>
cxt += TimeOfDay(Night)? + Energy(Terrifying)? + Mood(Terrifying) + Mood(Ominous)?
// Establish night time and presence of a terrifying form of energy in the scene. Based on these, establish that the mood is terrifying in some way with the possibility of more negatively toned content to follow soon. Possible that "monstrous energy" may imply a general feel rather than a thing, in which case "black time" may reference an impression of past events as opposed to the time of day.

To emphasize ease of use and make a powerful assistance tool, it’s best to let the interpreter do as much work as possible and then just update previous assumptions as new information is introduced. That way, even if the user inputs a small amount of information, it will feel as if the system is anticipating your meaning and understanding you effectively. To do otherwise would save significantly on processing time, but would result in far too many assumptions being made that don’t account for the full context. This would in turn result in terrible errors in interpretation. Figuring out exactly how the data is organized and how the interpreter will make assumptions will be its own can of worms that I’ll get to some other day.

Data Representation

An additional concern is to identify the various ways that words will be understood logically as classes or typenames, hereafter “types” (for the non-programmers out there, this would be the organization the computer uses to better identify the relationships and behaviors between terms). Examples in the above pseudocode include TimeOfDay, Visuals and Audio elements, etc. Ideally, each of these definitions would alter the context in which characters exist. It would inform their decision-making and impact the kinds of events that might trigger in the world (if anything like that should exist).

One option would be to create a data structure type for each Toki Sona word (there’d certainly be few enough of them memory-wise, so long as a short-cut script were written to auto-generate the code). Having types represent the terms themselves, however, is quite unreliable as we don’t want to have to alter the application code in response to changes in the language. Furthermore, any given word can occupy several syntactic roles depending on its positioning within a sentence, and each Toki Sona word in a syntactic role comes with a variety of semantic roles based on context.


For example, “kon”, the word for air, occupies a variety of meanings. As a noun, it can mean “air”, “wind”, “breath”, “atmosphere”, and even “aether”, “spirit” or “soul” (literally, “the unseen existence”). These noun meanings are then re-purposed as other forms of speech. The verb to “kon” means to “breathe” or, if being creative, it could mean “to pass by/through as if gas” / “to blow passed as if the wind”. To clarify, when one says, “She ‘kon’s” or “She ‘kon’ed”, one is literally saying “she ‘air’ed”, “she ‘wind’-ed”, “she ‘soul’-ed”, etc. The nouns themselves are used AS verbs, which in turn results in language conventions for interpreted meaning. You can therefore understand the interpretive variations involved, and that’s not even moving on to adjectives and adverbs! Through developing conventions, we could figure out that when a person “airs”, its semantic role is usually that the person breathes, sighs, or similar, not that they spirit away or become one with the atmosphere or something (which are far less likely to use “kon” as an verb in the first place – probably an adverb if anything).

In the end, a computer needs to understand a definitive behavior that is to occur with a given type name. However, since the nature of this behavior is dictated by the combination of terms involved, we can understand that Toki Sona terms are meant to serve as interpreted inputs to the types. Furthermore, it seems most appropriate for types to serve two purposes: they must indicate the syntactic role the word has in a sentence, and they must indicate the functional role the word has in a context.

In the pseudocode excerpt I came up with, we chose to highlight the latter route, defining described content based on how it impacted the narrative context: is this an Audio or Visual element that will affect perception or is this a detail concerning the setting’s external details such as the TimeOfDay, etc.? In addition to this, we’ll also need to incorporate syntactic analysis to better identify what the described content will actually be (is it a noun, verb, adjective, etc.?). As mentioned before, the way a word is used will greatly affect the type of meaning it has, so the function should be built on the syntax which is in turn built on the vocabulary.


Language Evolution

In addition, a system that implements this sort of code integration should be built around the assumption that the core vocabulary and semantics will change. As it stands, we already want to give users the power to add their own custom terms to the language for a particular application. These custom terms are always re-defined using a combination of sentences made of core terms and pre-existing custom terms.

However, because the integration of a living, breathing, and spoken language into a code base is a drastic measure, it is vital that the code be designed around the capacity for the core language to change. After all, languages are not unlike living creatures that adapt to environments, evolve to meet their needs, and strive to achieve their goals in the midst of it. In this sense, we can rest assured that players and developers alike will look forward to experimenting with and transforming this technology. This transformation will assuredly extend to the core terms, so not even the language should be tightly bound to them.

Given the lack of assurances in regard to the core terms over an extended period of time, it would behoove us to incorporate an external dictionary. It should most likely be pre-baked with statistical semantic associations derived from machine learning NPL algorithms and then fed into runtime calculations that combine with the context to narrow down the interpretation most likely to meet users’ expectations.


In simple terms, Wyrd should be given a massive list of Toki Pona (or Toki Sona, later on as it becomes available) statements periodically, perhaps with a monthly update. It should then scan through them, learn the words, and figure out what they likely mean: How frequently is “kon” used as a noun? What verbs and adjectives is it often paired with? What words is it NEVER associated with? What sorts of emotions have been associated with the various term-pairings and which are most frequent? These statistical inputs will assist the system in determining the functional and syntactic role(s) words possess. Combining this data with the actual surrounding words in context will let the application have a keen understanding of how to use them AND grant it the ability to reload this information when necessary.


Wyrd applications should also keep track of all Toki Sona input (if the user has volunteered it) so that they can be used as new machine learning test material. If people start using a word in a new way, and that trend develops, then the engine should respond by learning to adapt to that new usage and incorporate it into characters’ speech and applications’ descriptions. To do this, the centralized library of core terms must be updated by scanning through more recent Toki Sona literature. Ideally, we would pull this from update-electing users, generate new word data, and then broadcast this update to those same Wyrd users.


Well, we’ve explored some of the more in-depth programming difficulties that reside in using Toki Sona. There’ll likely be more updates in the future, but for now, this has all just been a brainstorming and analysis activity. I apologize for those of you who weren’t more tech-savvy (tried to make things a little simpler outside of the pseudocode). From here on out, it’s likely we’ll end up dealing with things that are a bit more technical than the previous fare, but there will also be plenty of high level discussion, so worry not!

For next time, I’ll be diving into the particulars of Agents, Characters, and the StoryMind: the fundamental tools for manipulating and understanding narrative concepts!

Next Article: Dramatica and Narrative AI
Previous Article: Interface and Gameplay Possibilities

Minecraftian Narrative: Part 3


Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


The last time, we discussed the concept of a narrative scripting language that could revolutionize the way players interact with a game world. We considered the possibility of using Toki Pona, an artificial 120-word language with 10 syntax rules, as a starting point for creating a custom language to be used for scripting purposes. In this article, we’ll be focusing a bit more on the ways in which the language might be used and what form its interface might take.

To begin with, I would like to clarify both the licensing plans I have for this concept as well as what terms are involved:

  • Toki Pona: (“The Simple Language”) The original artificial language. This is already free to be used for any purpose.
  • Toki Sona: (“The Story Language”) The modified language that I will be deriving from Toki Pona (Toki Pona’s word for knowledge/story is “sona”). This too will be free to use for any purpose (as it should be). A free C++, C#, Javascript, and APIs would probably be made available for engine/application integration.
  • Wyrd: a paid-for plugin for various popular engines that will include an AI system for interpreting and responding to Toki Sona dynamically for spontaneous dialogue, game events, and character behavior.

Now that we’ve defined things, I’ll be exploring how the heck Wyrd might show up in a game!

Interface Possibilities: Suggested GUI Input

Imagine you’re playing a game and you are given the chance to say something to another character. Rather than being given a succinct list of possible responses, you could simply be given a Minecraft-crafting style of word-composition system.

A character is encountered in the world.
An input field and a word bank are made available to the user (players could summon it at will).
Players can click on an image from the word bank. Think of this as picking a “block” in Minecraft.
As players select statements, the system visually hints at how things are being interpreted, for example: {Myself} {Want} {Go}. These would be the things that are “crafted” from putting terms together.
This hinting informs players of what concepts are ACTUALLY being communicated. For example: tomo a.k.a. {enclosed space} => “home” or “house”
When players need to combine concepts together, for example: {enclosed space} and {water}, it can show them that it is understood as “bathroom”
Players could then check what other interpretations are available for that combination (perhaps by clicking on it).
If they wished to communicate the desire to bathe/shower instead, they could select that option.
Obviously, it would be the responsibility of the Toki Sona engine to ensure there is a standardized image available for all of the desirable concepts, but limits will be necessary.

Another possibility that may be more realistic is to make it so that the hinted images are generated based on the full content of the statements made. For example, the {enclosed space} {water} combo may be assumed to be “bathroom”, but then when the player follows up that statement with, {myself} {feel} {dirty} (mi pilin jaki), then it might show the bathroom image modified to one of bathing after-the-fact. In this way, users wouldn’t be responsible for the interpretation (it can all be automated) which will allow the system to not have too much of a scope-creep going on, mapping images to concepts, etc. It also makes the player have to do less in order to fully interact with the system. Users would also be able to see how their statements impact the interpretation of previous statements.

Interface Possibilities: Suggested Text Input

The text concept is very much like that of the GUI input, however all that would be displayed to the user instead is a text field. Typing into that text field would display a filtered subset of the word bank below the typed text. Things typed would be assumed to be Toki Sona words (for example, “tomo”, meaning “enclosed space”, i.e. “home”, “house”, “construction”). Players would be able to hit [Tab] to move down the hint list and hit [Enter] to auto-complete the selected word and have the image and hinted interpretation pop up. This would allow for MUCH more fluid communication once a player has pieced together the actual vocabulary of the language (you’d eventually get to the point where you wouldn’t even want/need the suggested text).

We would also likely need both input types to have expected grammar displayed, i.e. having a big N underneath the beginning to show you need a noun for a subject. If you have a noun typed, it might suggest a compound noun or a verb, etc. All versions would also auto-edit what you have typed so that it is grammatically correct in Toki Sona (things like, [auto-inserting forgotten grammar particle here], etc.).

Gameplay Possibilities

Creation: One could easily envision a game where the player is capable of supplying WHAT they want to make in narrative terms and then clicking on the environment to place that thing. One could then edit the behavior of anything placed in the scene by selecting it, etc.

Simulation/Visual Novel: Something more like the Sims where the player is given a character and must direct them to do and say things to proceed. What they do and say, and to whom / where they do it may trigger changes in the other characters, the story, the environment, etc. This could naturally progress things.

PuzzleA game where the player is given a certain number of resources (limited points with which to spend for creation, a limited vocabulary, etc.) and must solve a problem. This would be something more like a Don’t Starve or Scribblenauts variety.

Generic RPG: A regular RPG game that allows the player to pop-up a custom dialogue window for speaking purposes, but which would otherwise not rely on directing player controls through the Wyrd system.

Roleplay-Simulation: A game that directly attempts to simulate the experience of a live role-playing game. The system acts as a Game/Dungeon Master and various players can connect to the game to participate in the journey together. A top-down grid environment may show up during enemy encounters of some kind, but players would completely interact with and understand the environment based on the text/graphic output of the system. More like an old school text adventure, but hyped up to the next level.


These are just some of the ideas I’ve had for interface and gameplay using the Wyrd system. Obviously this system still needs a lot of work, but I feel there is clearly a niche market that would long for experiences like this. If you have any comments or suggestions please let me know!

I know this article was a little bit shorter / lighter than my usual fare, but I promise I will develop some more detailed content for you next time. Cheers!

Next Article: Toki Sona Implementation Quandries
Previous Article: Is “Toki Pona” Suitable for Narrative Scripting?

Minecraftian Narrative: Part 2

Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


In the previous article, I explored the necessary elements of a “Minecraftian” game mechanic: one tailored for accessible and steady skill development, one that is equal parts editable and adaptable, visual and simple, granular and tabular.

I then addressed many issues with leveraging common languages to describe abstract concepts in this kind of mechanic. They are frequently hard to master. The Latin-based ones focus more on sounds than they do meanings. Their complexity warrants excessive processing for computer algorithms that are impractical for any imminent use on the scale with which we intend to use them. Relying on existing language saves learning time, but only for a subset of the intended audience; for others, it is an ostracizing element that comes with the expectation of translating into other existing languages to provide the same privileges to alternative audiences. It would also bias any software made against younger players with underdeveloped language skills.

Because of these considerations, we began to consider the language Toki Pona as a possible tool to adapt for narrative scripting. What are the advantages of this Simple Language? Are there any problems with it? Let’s dive in and find out.


Ideal Narrative Scripting

Let’s first review what exactly we mean by “narrative scripting”. What sorts of tasks are we actually wanting to perform with this language? We’ve already established many of the characteristics we are looking for from our Minecraft analysis, and while Toki Pona meets many of these criteria, we must also consider the actual usage environment of our target language before we can significantly evaluate the utility of Toki Pona.

Screen Shot 2016-09-02 at 8.39.49 PM.png

Before continuing, I would also like to point out that this sort of narrative scripting is entirely distinct from the “narrative scripting” language known as Ink. Scripting languages in general are just languages that are more user-friendly and provide a more intuitive, simple interface for computer-tasks that would otherwise be fairly complex. With Ink, the goal is to inform the computer of the relationships between lines of dialogue in branching story lines. In our case, the goal is to inform the computer of the narrative concepts associated with game world objects, actions, and places so that it can 1) interpret meaning based on those associations and 2) trigger events that can be leveraged by AI characters, world controls, and human players/modders to create behavior and change the game world. We want to put this kind of control in the hands of players.

Skyrim-creator Bethesda’s “creation kit” modding tool has its own scripting language as well, to edit objects/events in the game world. It is a bit technical though.

If we truly had a narrative scripting language, then we would be able to craft, with as little vocabulary and syntactic structure as possible, a description of any narrative content. More specifically, we should be able to describe with some measure of accuracy…

  • places’ geography, geometry (both its form and absolute and relative locations), and thematic atmosphere.
  • objects’ nomenclature, physical and functional characteristics, relative purpose, effects, history, ownership, and value.
  • human’s (and, as a categorical subset, animals’ and human-like creatures’) nomenclature, physical and emotional characteristics, relationships, state of mind, responsibilities, history and scars, beliefs, tastes, hopes and dreams, fears, biases, allegiances, knowledge, awareness, senses and observations, skills and powers.
  • concepts’ and ideologies’ subject domain, relationships, and significant details.

These qualities will allow the user to competently describe an environment and the items, creatures, and people in it. In addition, for accessibility and then functional purposes, we need the language to be useful for the following tasks: 1) scenario descriptions, 2) dialogue, and 3) computer processing. The attributes above cover the first case.

Ideally, dialogue “options” wouldn’t exist, and we would be able to directly input a writing system that the non-player characters would be able to understand without some preset arrangement of scripted responses.

As for dialogue, that means it must also be able to model questions, interjections, quotations, prepositions, nouns, adjectives, and adverbs (common syntactic structures). It must accommodate the linguistic relationships between terms and their relative priority, e.g. is X AND/OR’d with Y, are these words a noun/adjective/adverb, are they subject/object, which adjectives describe the noun more clearly, etc. We must also have some means of singling out identifiers, i.e. terms that refer to things that are not themselves a part of the language (a player’s name, for example).

Finally, we must also ensure that the language’s structural simplicity is reinforced so that its consequent processing is more easily conducted by computer algorithms. This primarily involves restricting the number of syntactic rules, lexicon size, but also includes more subtle nuances like the number of interpreted meanings for a given set of words and the number of compound words.

To maximize the utility of the language itself, the “root” words that are individually present in the language must have the following characteristics:

  • The words must be, as much as possible, “perpenyms” of one another; that is, they must be perpendicular in meaning to their counterparts, neither synonyms – for obvious reasons – nor antonyms, to prevent you from simply saying, “not [the word]” to get another word in the language.
  • The words must have a high number of literal or metaphorical interpretive meanings laced within them to ensure a maximum number of functionally usable words per each learned word. Keep in mind, these interpretations must also be strongly linked by theme so that the words’ definitions will be easy to remember. If possible however, these multiple meanings should be individually interpretive based on context, so that one can assume in any given context one interpreted meaning vs. another somewhat clearly. The more this is supported, the less work the computer will be doing.
    • For example, in Toki Pona, the word “tawa” can mean “to go” as a verb, “mobile/moving/traveling” as an adjective, “to, until, towards, in order to” as a preposition (notice how each of those are generally applied to different preposition objects, to help pick out which one is being used), or even, “journey/transportation/experience”, literally “the going” as a noun. Each context can be easily identified based on the positioning of the word in a sentence, each run along a common theme, but each also have a unique connotation that can be interpreted rather well in context.
  • The words must be highly reactive with their fellow terms so that a high number of reasonable compound words can be made. Again, the goal is to maximize the number of unique meanings we can derive from the minimum set of words to learn. This will increase the number of vocabulary terms, but in a less defined way as the minimalist nature of the language will make it favor interpretation of clear-cut meanings anyway. Therefore, alternative constructions for the same compound word concept won’t be too big of a deal.

Toki Pona’s Potential

As previously mentioned, Toki Pona has many desirable characteristics that we seek in developing a narrative scripting language. Limited syntax and vocabulary clears the user-accessible and computationally efficient requirements (3). Toki Pona also does an excellent job of functioning as dialogue since it was designed from the ground up to be used conversationally (2).

The language has an exceptional potential to detail a wide variety of topics, and although it tends to be extremely vague, enough detail can be made to elucidate the general meaning of an idea. However, there are some details about the language that have the effect of restricting its potential, namely the fact that the languages’ creator, Sonja Lang, designed it based on addressing the linguistic needs of a hypothetical, simplistic, aboriginal people on an island. As such, the language is not completely designed from the ground up to account for a maximization of functional vocabulary and in fact caters to the range of topics and activities that such a people would participate in.

The 2014 guide to Toki Pona, occupying an entire word in the language.

For example, the language includes words like “pu” (meaning “the book of Toki Pona”), which is utterly meaningless for our purposes, and “alasa” (meaning “to hunt, to forage”), which fails on perpendicularity since you can easily create the same meaning with phrases like “tawa oko tan moku” (meaning “to go eyeing/looking for food”).

Also, despite how much the language does to accommodate different modes of verbiage, (including past, present, future, progressive variations of each, etc.), it can be troublesome to express some necessary concepts since “wile” (the word for “want to”) itself encompasses want to, must or have/need/required to, and should/ought to, each of which are highly distinct and significant nuances.

Although, some inventive uses for verbiage have been adapted for a lacking vocabulary. For example, a person can convey that he or she should do something by using the command imperative on themselves as a statement, e.g. “mi o moku” => “Me, eat” => “I should eat”, distinct from “mi moku” => “I eat.” These intricacies must be learned on top of any such vocabulary or syntax rules though as they are built upon usage conventions and through their obscurity inevitably hinder the accessibility of the language.

As such, it would appear that the most effective strategy would be to develop a derivation of Toki Pona, built on the same principles and leveraging much of the same language, but stripping it down to only the elements that are most critical for communication and plugging up gaps in linguistic coverage as much as possible.

Narrative Language: Core Concepts

While we won’t iron out the entirety of a language in one sitting, we can likely get a sense for what sorts of concepts must be included as core elements. If we limit ourselves to include 150 words (and even that is really stretching it, if we want to keep the language as effective as possible), then let’s see what ideas are really needed.

  • Pronouns
  • Parts of the body
  • Spatial reasoning, i.e. directions, orientation, positioning
  • Counting, simple math
  • Colors
  • Temporal reasoning, i.e. time (and tense) references
  • Types of living things (including references to types of people, e.g. male/female)
  • Common behaviors (supports occupations, most helping verbs, basic tasks)
  • Common prepositions
  • Elements of nature (earth, wind, water, fire/heat, light, darkness)
  • Forms of matter
  • Grammar particles (obviously taken directly from Toki Pona more or less)

We must also then include elements that are uniquely necessary to fulfill our needs of describing the nature of people and relationships.

  • Elements of perception (able to describe the “feel”, “connotation”, or “theme” of an experience)
  • Elements of relationships (same for relationships, but also able to describe the expected responsibilities. Need basic words to help illuminate expected responsibilities, e.g. Toki Pona’s “lawa” for “leader”)
  • Elements of personality

Ideally, we would be able to get many of these meanings using the same words, but just applying them to a different object. For example, if we had the word “bitter”, we could use it to describe a perception, the overall nature of a relationship (or perhaps even the feelings experienced by one party in the relationship), or someone’s personality, e.g. they are a bitter and resentful person, etc.


In our analysis of Toki Pona, we covered the fact that it has many advantages in regards to fulfilling our requirements as a narrative scripting language; namely that it can be used as dialogue and has a high number of meanings per learned vocabulary term, and therefore it is able to cover a lot of topics with little base learning time involved. However, we have also determined that the language has many flaws due to the original design purpose: meeting the linguistic needs of an isolated and tribal hunter-gatherer people. As such, there are many unnecessary terms and some concepts which simply cannot be conveyed adequately, if at all.

We can therefore state that the best course of action would be to derive a new language from the structure and vocabulary of Toki Pona, shaving away “the fat” as it were, and editing the language to bolster its linguistic breadth and depth while still staying true to its minimalist nature. To that end, we outlined several topics of vocabulary that would be essential for outlining a narrative scripting language.

In future articles, I’ll begin to address the interface we may see in a narrative scripting editor and identify actual gameplay mechanics we could see with narrative scripting put into practice. Please feel free to leave comments if you have any further insights or criticisms and stay tuned for more!

Next Article: Interface and Gameplay Possibilities
Previous Article: What is “Minecraftian Narrative”?

Minecraftian Narrative: Part 1

Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Minecraft’s capacity for simple, direct, consumer-level editing of the 3D world has brought upon revolutions in game design and has joyously perforated industries across the world: education, design, architecture, and a whole host of other fields have felt its influence. Unfortunately, Minecraft’s revolution stops short of tangible representations; while it is excellent at modeling geometry and therefore the creation of concrete forms, it is not so great at enabling the same for ideological or abstract creations. That would be a task for some sort of new narrative scripting language.

What if we could take those same properties that made Minecraft so successful, its simplicity, its capacity to empower laymen for creation, and bring it into the realm of narrative development? What if the complex realm of natural language processing could be simplified to an extreme and made interactive such that even children could create characters, worlds, stories, and watch a computer bring them to life, nurture their growth, and develop them in real time? This series aims to suggest possibilities for just such a future and the remaining hurdles that must be dealt with.

Virtualization of Character

Emerging on the horizon is our imminent mastering of the “Turing” test, devised by Alan Turing. In it, one has a programmed machine anonymously speak with a series of judges. The judges vote on whether they are conversing with a human or a machine. One passes the test if the large majority of the judges are mistaken and are unable to tell the difference between a regular human and the designed machine. A chief example of this test in action is in the creation of software applications designed to communicate online via chat rooms and social media.

A community’s tribute to the Chinese chatbox Xiaoice

The development of these “chatbots” has advanced considerably. A teenage girl variety known as Xiaoice (Shao-ice) garnered over 663,000 live conversations in merely 72 hours. People were relying on her for support, companionship, and advice. She in turn responds dynamically, realistically, yet with her own personality, moving beyond the one-way static communication of traditional media.

Children of the 90’s and later are learning to bond emotionally with virtual characters in unprecedented ways with things like virtual pets, the evolving Toys-to-Life industry, and the Japanese personified, performing voice programs called “Vocaloids” with massively popular YouTube music videos, canonical relationships, and live concerts in New York with cheering fans.

A few of the Vocaloid performers growing famous around the world.

As the popularity of virtual characters increases and their commercial appeal grows, consumers’ desire to share memories, experiences, and relationships with them will also grow. The harbingers of these experiences will be restricted to the educated and practiced experts of writing and design historically responsible for creating such performances.

But what if this need not be the case, for games or any other sort of media? What if characters, and the stories surrounding them were just as editable as the blocks of Minecraft? What if they were “Minecraftian”?

What Does “Minecraftian” Actually Mean?

Minecraft: a video game that revolutionized the way people interact with 3D design and architecture. Fundamentally, it’s a video game that acts as a liberating force to unleash people’s creativity and bring to life the architectural and artistic wonders locked within one’s mind. And the medium of this transformation? Blocks. A limitless, vast world that’s most basic element is a square of space that can be directly interacted with by any average player.

A sampling of the blocks players may encounter.

There are a limited number of types of blocks. Some are for grass. Some are for wood. Others represent various types of stone or fluid. Some are static and some are animated. And as one becomes familiar with the basic types of blocks, they can then “mine” blocks for materials to be used in “crafting” new blocks. They can consume them, transform them, fuse them, add them back into the world, and in so doing directly edit every detail of their environment.

Take a second to imagine the power that this game presents. It isn’t very complicated; there simply aren’t too many types of blocks to remember. You never start out dealing with things you can’t handle. You steadily advance your knowledge of blocks as you play, learning to become proficient in your renderings. Your main activities involve mining or placing objects in the world and combining materials in menus. The more you interact with the world, the more you discover how to alter things and the greater your understanding of the relationships between blocks becomes. You soon get to a point where the path to realizing your vision forms in your mind effortlessly because you’ve mastered the simple mechanics ever so quickly. Soon a masterpiece stands before you and the journey there was far easier than you ever could have imagined.

“King’s Landing”, the capital city from the TV series Game of Thrones, built with blocks by a regular Minecraft player.

Some important things to consider about Minecraft’s block-building mechanic:

  1. It’s easy to learn and requires little overall knowledge. There aren’t so many different types of materials that you couldn’t memorize a reasonable set in a few dedicated hours of play.
  2. The basic editable element of the world is highly visual and interactive. The “resolution” of things is drastically reduced, facilitating comprehension and precision with a tangible granularity.
  3. Obscurity and concepts are your friends, not fine details. The limited degree of detail permitted ensures that players need not become experts in their creative portrayals as everyone is on an equal footing, complexity-wise.
  4. The “language” of creation is fully computerized. The medium is just as prone to manipulation by algorithms as it is the manual fine-tuning of a patient and determined player.
  5. The data is easily hack-able, mod-friendly, and adaptable. Everything exists on a simple grid, for which we have accumulated many algorithms already, i.e. there is already a vast array of knowledge for how to manipulate these data structures. Tinkerers and entrepreneurs everywhere can easily experiment and devise new ways of interacting with the system.

The Difficulties of Language

So how can we take the mechanics of Minecraft and the transformations it made to 3D design and bring it to narrative, character creation, and world-building? First step: identify our most fundamental element, our “block.” Perhaps words? After all, words – and the concepts associated with them – are what make up thoughts and ideas, right? We should just have everyone who plays our game learn the words that we use in our game/tool. Well, that certainly appears to be the solution, but it isn’t quite so simple unfortunately.

If you’ve ever tried to learn a second language, then you know it can be an arduous task, especially the further you go from your native tongue. Taxing hurdles build up one after another such as the varieties of grammar and syntax or the vast amounts of vocabulary. It can take years before one is competent enough to use a new language with any sort of astute precision. Compounding the issue is the practical element of whether or not the language can easily be comprehended, interpreted, and responded to by a computer system in real-time from multiple sources.


If our goal is to use a language that all of our players can be expected to engage with, then relying on any language as complex as English is still a disservice to the players of other cultures. It also loses much of its potential to appeal to younger audiences whose language skills may still be developing. What’s more, if we wish to make it as simple as possible for an average player to edit language contents to communicate with narrative tools and mechanics in-game, English and other common languages are too convoluted.

What we really need if we intend to apply a Minecraftian design to our linguistic mechanics is a revolutionary scripting language: something with a limited vocabulary that’s easy to learn, a simple granular syntax where you can easily pick out the parts of a sentence and the meaning therein, a language where a computer could easily understand the full breadth of its grammar at extremely efficient speeds to account for vast amounts of real-time processing. It wouldn’t have to be terribly detailed; just enough information for us to get a vague idea of what was meant. Ideally, the language would be highly visual and easily editable to appeal to children on the consumer end and hackers on the entrepreneur end.

In comes Toki Pona.

The entire Toki Pona language, minus compound words.

Toki Pona is an artificially designed language with a total of 120 basic words and particles and a syntax of a measly 10 rules. It can be completely learned fluently in a matter of weeks. Words can be clearly depicted as combinations of hieroglyphs that visually indicate the meaning of the word. The visual, minimalist approach of the language makes it highly accessible, adaptable, computational, universal, and overall plausible as a candidate for narrative editing, though further examination will be necessary to truly determine its utility for such a task.

If we simply 1) teach a computer to understand a Toki Pona-inspired language, 2) make interactions with such a system simple, intuitive, and visual, and 3) properly introduce the use of this visual language to players, we can devise gameplay mechanics that allow players to easily interact with and/or (re)define the narrative details of their world. In the next article, I’ll go into detail on how such a system might work and how it could be used for revolutionizing game design and narrative on a fundamental scale.

Please let me know what you think in the comments. I’m eager to hear people’s thoughts on the topic!

Next Article: Is “Toki Pona” Suitable for Narrative Scripting?

Game Visions: Modeling Human Behavior and Awareness

Edit: This is now part of a series of posts exclusively about the development of a procedural narrative system.
Part 1: Game Visions: A Roleplay-Inspired Procedural Narrative System
Part 2: Game Visions: Modeling Human Behavior and Awareness

This article is part two of a series covering interactive procedural narrative. Previously, we covered some elements that could be involved in the development of a robust, open-ended middleware application that can procedurally generate the building blocks of a narrative: sequences of associations between narrative entities. You can read the original article here.

As a quick review, here are the relevant terms…

  • Narrative Entity: a Character, Item, Place, piece of Lore, AKA “thing” that is relevant to the narrative in some way.
  • Storyteller facade: decides how content is generated (including the preparation of in-game events). Acts like a dungeon master.
  • Agent facade: responsible for the decision-making of a willed Narrative Entity. Acts like a role-player.

My first thought was the simplest: a model of relationships using a graph, nodes as narrative entities, the lines between them as the relationships. However, I saw several issues with this. It’s missing information about interactions, how narrative entities engage in interactions, and how those interactions affect the entities’ relationships.

In search of a more informative model, I found myself inspired by game engines such as Unity and Unreal Engine 4. Each of them sports an entity-component system whereby all game objects are composed of concrete behaviors that define them and the “type” of an object is interpreted logically based on the combination of its behaviors.

Under this model, I can’t create a “dragon” directly. I can create a “scale-armored”, “flying”, “fire-breathing”, “animalistic” entity that “periodically attacks villages” which I simply label as a dragon. Each of those attributes can be added or removed whenever the system needs, allowing for highly fluid and flexible actors. Relationship modeling could be improved by incorporating this design.

What We Need To Model

Assuming interactions involve an owning “source” entity and a targeted “destination” entity, we must model…

  • the entities that exist in a narrative.
  • the relationships they have to each other.
  • the role a given entity has in a given relationship.
  • the possible interactions that entities in a given relationship could engage in.
  • the probable interactions a given entity in a relationship would engage in.
  • the degree to which an owning entity’s interaction meets, exceeds, or fails to meet the target entity’s expectations of the relationship (which in turn implies the effect an interaction will have on the relationship).

In addition, we want the same model to be a viable method of simulating non-interactive information such as an Agent’s awareness of its relationships to other entities. This will allow for a full social simulation, complete with the need for information-gathering and the presence of limited information, misinformation, and deception. As such, we must also model…

  • an Agent’s awareness of a narrative entity.
  • an Agent’s awareness of another Agent’s interactions.
  • an Agent’s awareness of others’ perceptions, i.e. “Do I know that you know?”

Finally, it is possible that entities can be associated with one another indirectly via space or time. It will be important for an Agent to be able to identify associations of this sort if it is to draw accurate conclusions from perceptions.

Proposed Model: Interactions

In order to handle the additional complexity of these connections, let’s imagine a new logical representation. Suppose a graph models interactive connections not as lines, but as spheres where the poles of each sphere exist at each pair of nodes’ locations. Each node represents a narrative entity. Interactions between narrative entities involve a directed connection from a source node (the owner) to a destination node (the target) that runs along the surface of a given sphere layer. Interactions can be logically associated with a relationship where each endpoint of the connection is logically associated with a role that matches the relationship (such as “father”-“son” or “master”-“apprentice”). Each role has expectations (anticipated interactions) associated with it. To develop the quality of the relationship, expected actions must be executed. The combination of all minor relationships between two narrative entities establishes the major relationship, represented by the sphere. You can picture an application that can group or color-code the arcs running along the sphere based on a suspected role or relationship perception.

Each sphere would have three layers and a core.

  • Core: the data permanently associated with the relationship. Includes a history of interactions.
  • Outer Layer: “Possible Interactions”
    • The Storyteller pulls from a database of crowd-supplied interactions (assumed to be large), filters them based on dramatic relevance to the relationship using the story’s history and the relationship’s Core, and populates this layer with a highly narrowed subset of interactions from all of the possibilities.
    • This layer ensures that the only decisions that will be available to an Agent are decisions that lead to an interaction with something dramatically relevant.
  • Middle Layer: “Expected Interactions”
    • The Storyteller populates this layer by filtering interactions in the Outer Layer based on level of expectation in the relationship.
    • Each interaction in this layer is associated with a float value from 0 to 1 indicating the degree to which it is expected.
  • Inner Layer: “Probable Interactions”
    • The Agent pulls from the Outer Layer, factors in the relationship’s Core and the Agent’s associated Narrative Entities’ (usually a Character) traits/personality/goals/attributes to determine how it wants to update the status of the relationship. The more its own goals can be furthered by maintaining a good relationship, the more likely it is to engage in actions associated with the relationship’s expectations (to preserve a good relationship). Likewise, if it does not care for the relationship, it may decide those interactions are a lower priority and act against them or disregard them entirely.
    • The “quality” of a relationship can be calculated as how close the history of interactions matches each party’s expectations of the other (the Core mapped against the Middle Layer).
    • The Inner Layer is used exclusively as a decision-making pool for future actions, and has no direct bearing on the Outer or Middle Layers.
    • The Agent will choose from this layer a randomly selected interaction, promoting the variability of the system whilst still retaining the logical and relational consistency of the narrative.
    • The Agent’s selected interactions from this layer are recorded in the Core to be used in filtering future possible and expected interactions.

Interactions may also require dependencies with other interactions. Dependent interactions can be illustrated in this model as pole-to-pole segments of the sphere that are stacked on top of each other. In this example, lower level interactions may be necessary to be completed before higher level ones; however, the degree of necessity may also be variable. In one case, it may be literally impossible for an interaction to take place prior to another interaction (can’t happen) whereas in other cases it may simply be that an Agent is disinclined to engage in an interaction without another interaction occurring first (not likely to happen).

Proposed Model: Predictions

Because different types of interactions may be capable of fulfilling an expectation for a relationship, we can interpret an interaction as something involving inputs, outputs, and tags that help connect it to an expectation. Interactions will (maybe just optionally) require entities with a given state as inputs and would have outputs of a resulting (usually changed) state in 1 or more other entities. Relationships, and the roles composing them, are therefore functionally just an aggregation of the historical and expected interactions associated with them.

The logical assignment of roles and relationships assists in the modeling of Agent predictions. If a given Agent is to appear intelligent, it must be able to try and predict the actions of others. It would do so by analyzing perceived interactions and learning to associate them with given roles and relationships. The Agent would then use its native knowledge to examine its understanding of the expected interactions between entities with the given roles. Hopefully, it would be able to accurately identify the subsequent interactions of another Agent. These perceptions would additionally be colored by the Agent’s knowledge of the target Agent’s personality / traits / past, etc.

For example, suppose Agent A executes the interaction of pushing Agent B out of the way of an oncoming car. This interaction could be tagged by users of our application with words like “protect”, “safeguard”, and “selfless”. One relationship archetype that someone could logically assign to this kind of interaction could be a Parent-Child relationship where Agent A has the role of Parent and Agent B that of Child. Assuming people have also matched tags to the Parent-Child relationship already, the Storyteller and other Agents may begin to predict the history and expectations of A’s and B’s relationship. Those predictions may prove wrong as additional information comes to light that conflicts with that logically-assigned relationship, or future information could enhance the probability that it is an accurate assessment. Agent C, observing the interaction, could then form predictions of A’s future interactions based on the probability that it will engage in behavior that falls in line with the supposed role’s expected future actions.

Proposed Model: Perceptions

Perceptions can be included in the model as straight-line connections from a narrative entity to either another narrative entity, an interaction between two narrative entities, or another perception line. In this way, we can indicate a narrative entity’s awareness of other entities, their interactions/potential relationships, and an observed entity’s perception of others.

It is also important that we stack layers of inter-perception connections to around 5 layers. Mind games are an important aspect of simulating realistic human interactions. I’ll demonstrate this point with an example:

  1. There exists a secret, a type of Lore seeing as how it’s factual information for the narrative universe. I know that secret (a perception line from me, the Agent, to the other narrative entity, the secret). I may now take action that others wouldn’t since they aren’t privy to the information I have.
  2. My enemy, Agent B, is aware that I know the secret. This may prompt him to try and pry the information from me.
  3. My reaction to B’s attempts may be different based on whether or not I am aware he knows I hold the secret. If I know he knows I know, then I may be more cautious of handing out any related information if I don’t want to risk him learning the secret.
  4. If B in turn is aware that I’m on to him, he may change his tactics in attempting to acquire the information he seeks.
  5. I may also be self-aware enough that I realize my awareness of him could spook him, leading me to take actions that assume he might change tactics or come at me more directly, etc.

The degree to which an Agent plays these mind games could just be a function of the Agent’s insight (accuracy of predictions) and perceptiveness (breadth/depth of environmental understanding), attributes that are variable from Agent to Agent.

For the last category of perceptions, the indirect associations between objects, we must ensure that any perception of a narrative entity or interaction is mapped within a temporal-spatial rendition of reality unique to each Agent. That is, an individual Agent must be aware of where and when perceptions were encountered in order to be able to tie together connections between seemingly unrelated elements of their reality.

In order to have a realistic sense to Agents, each one will need to have a “recall” coefficient associated with perceptions that is a function of…

  • how long it has been since the Agent perceived it.
  • how strongly associated the Agent believes the perceived entity is to the Agent’s interests.
    • If the Agents’ simulation of others’ interactions leads it to believe another Agent is highly relevant to an object of its concern (regardless of whether the target Agent is or is not, in fact, relevant), that should affect how accurately the Agent perceives and/or recalls details associated with that target Agent. Consider the example of an obsessed detective tracking the subtle details of an old cold case suspect.
  • a generalized recognition attribute associated with the Agent directly (how good are they at remembering things in general?).
  • (optionally) how close the perceived entity was to an Agent’s focal point of attention.
  • (optionally) what method the Agent used to perceive it (sight, hearing, etc.)


Any model we use to simulate human relationships will best be served by analyzing the types of interactions that exist between humans and the potential effects of those interactions on relationships. It will also need to accurately simulate an Agent’s perception of its environment and the Agent’s formulation of predictions both regarding static surroundings and the behavior of other Agents.

The model I have proposed incorporates a combination of tri-layered sphere-connections to model interactions and traditional line-connections to model perceptions. Relationships and roles are assigned purely through logical interpretations of interactions, providing a highly robust and flexible presentation of the data linking narrative entities.

Hope you found this model useful and/or illuminating. Comments and suggestions are, as always, gladly accepted.

Game Visions: Ultrahaptics & the AR/VR Sensory Evolution

Developers will soon be able to write applications that enable a brain to learn custom information about any exposed perceptions, both real and virtual. Powering this functionality is a combination of technologies centered around the advent of Ultrahaptics and David Eagleman’s VEST concept. Ultrahaptics is a UK-based startup focused on using ultrasound technology to generate in-air tactile sensations. The VEST is a wearable technology that uses vibrations to physically feed digital information to one’s brain. Together, these technologies will revolutionize human-computer interactions completely.

We humans digitized physical data. We then integrated that digital data into almost every type of physical device. Now we are starting to introduce our digital products as physical elements of the world using holograms with technology such as Microsoft’s Hololens.

The next step is for information itself to become a physical entity by applying the motivations behind Eagleman’s VEST: feeding real-time data streams into interpreted sensory streams. Machine learning and data mining algorithms nowadays can derive valuable information from massive amounts of aggregate data. Computing these loads rapidly requires massive computing power, something not readily available on today’s smartphones. Many are shifting this load onto the Cloud, but another novel platform is available. After all, every one of us travels around with a free supercomputer in their head. Our brain can easily learn to use any information we serve it with. All it takes is the right delivery system.

The concept:

Eagleman’s prototype device:

What this really means is that any kind of blanket sensory information that could otherwise be interpreted by a machine learning algorithm can in fact be “learned” by our own bodies. Custom data feeds to pour into the sensory stream is all that is needed, and the sensory information itself need not only be vibrations.

Here’s an example of how people can gather data from video.

The sensory “streams” can be converted into other types of “streams” as needed, just like other types of computer data.

All manner of mixed reality technologies could facilitate the provision of custom sensory information. If people intend to “train” themselves in how to use a custom sense, it is likely that future applications will involve simulated environments that people interact with to practice a given sense, i.e. gamified training software.

The applications of this technology is more “limitless” than even the AR/VR excitement in 2016. While the technology is still down the road a ways, people should be keeping an eye out for the rise of tactile-feedback hardware, if only for the ramifications it has for bringing about a sensory evolution.