Godot’s Node System, Part 2: Framework Concepts

This article is part of a series.
Part 1: An OOP Overview
Part 2: Framework Concepts
Part 3: Engine Comparisons


While the last article grounded us in basic OOP concepts, this one will dive into common features of frameworks, i.e. collections of object relationships. Game frameworks do not require these ideas in order to be a framework by any means. Instead, these ideas will try to paint a picture of what one can expect from the game frameworks of today’s most popular game engines.



Some engines use Entity-Component (EC) systems in their frameworks. Because EC systems are becoming more prevalent, we will be assuming this to be the case in our overview of game framework concepts. If it doesn’t have an EC system, then chances are these concepts are still used, but the exact terminology and how they apply will be slightly different.

Pretty much all engines have users create some “space” in which to create content. While the name used for these spaces may vary, we’ll simply say “level”. Users create a level with an environment paired with a list of entities. Users can edit the environment or place entities to customize their game content.

But, these game engines don’t need any entities in a level to run it. An empty level will execute just as cleanly as a populated one. You usually need a Camera entity to see or hear anything, but the editors will still run the level without it.

The entities can own other entities as children (tree structures), so users can organize things. Each entity also supports ownership of components, but the components don’t own things. The entity maintains all ownership as the user organizes its internal components.



But what if a user wanted to define an entity with components that they will use many times? What if they could describe an entity with components, save it to a file, and then load it up whenever they wanted?

Well, first they’d need some automated way of converting the entity and all its components into a file format (usually binary, i.e. 0s and 1s). And thus the engines added Serialization for their entities. With a simple operation, users can take a built entity and save it for later production.

Devs do this by storing information about properties, usually its name and value. They first specify some means of formatting that information (text data, binary data?) and then write it directly to a file. They do this for the entity, all of its components, and then each of its sub-entities and their components.

When devs need to create the entity hierarchy again, they load up the file. They determine how each entity is related, which components they have, and which properties should have what values. Then, they create those entities and components with the associated properties, effectively reproducing the stored entity.

In no particular framework, here’s an example of how this might be done in C++.

int main() {
    // We will record this integer value under the name "count"
    int count = 250;
    File f;
    // Create and open a file called "stats.txt"
    bool is_open = f.open("stats.txt", File::WRITE);
    if (!is_open) {
        return ERR_COULD_NOT_OPEN_FILE; //report failure
    f.write_str("count"); // record the name to the File's buffer
    f.write_int(count);   // followed by the value.
    //the file will record how many bytes the integer takes up as well
    f.save(); // the data from the buffer has now been written to disk
    f.seek(0); // move back to the beginning of the file
    String name = f.read_str(); //fetch the name of the data
    if (name == "count") {      //in this case...
        int count = f.read_int(); //load the value into our variable
    //we have now de-serialized the "count" variable
    f.close(); //terminate the file connection to release OS resources



Users can now start creating many of this entity all throughout their project. But what if they need to make a change to each one? They’d need some way of editing one and having that change applied to all related entities.

Well, developers found a solution for that too. When users create the entity, the entity remembers which file the user deserialized it from. If all the entities use the file as a guide for how to exist, then editing the file can change all the entities at once.

Creating these entities from the serialized type is called “instancing”. Here’s another example of how this works in C++, using no specific framework.

int main() {
    //setup types and data
    Entity e1;
    Entity e2;
    CountComponent cc;
    cc.count = 250;

    //give each Entity a copy of the component
    //create a hierarchy of entities

    //save the hierarchy to a file.
    Serializer::save(e1, "entity.dat");
    //load the file
    SerializedEntity se;

    //Create a few instances.
    //Each has a copy of e1 and e2, each with a CountComponent.
    Entity e3 = se.instance(); //remembers "entity.dat" created it
    Entity e4 = se.instance(); //same

    //make sure one of them is overridding the default value
    e4.get_component<CountComponent>().count = 400;
    //update the original data
    e1.get_component<CountComponent>().count = 300;

    //re-serialize it
    Serializer::save(e1, "entity.dat");
    //de-serialize the data again.
    //another possibility is that the framework detects that "entity.dat" has changed and updates se automatically.

    //propagate the data to all instances.
    //Will create an instance of the data and then
    //diff all components and fields against each instance.
    //If the value is not overridden, go ahead and reset it.
    //Again, the framework may automatically do this.
    se.propagate(); //via Observer pattern, explained later.

    //prints reloaded 300
    //prints individually overridden 400

    e4.reset(); //framework may provide a means of resetting.
    //prints the reset value of 300

So now we can make edits to one thing and have those changes propagate to other instances. Sounds a little like inheritance, right? Well, it is an extension of the inheritance concept, but this time, it’s a little different.



Developers have a couple ways to design inheritance. They can build objects from an idea, i.e. “class” (which we covered earlier), or they can create an object and have the instanced copy look to it for guidance on how to exist. This difference is one of class vs. prototypal inheritance.

The key distinction is in what the objects look to for guidance. If a user requests a property from an object, the object has to check whether it has the property.

Class-based objects don’t define properties. Instead, they check their class and its inherited classes (which again, are just ideas). If any of those classes have the property, then the object will too. Users usually define classes in the beginning and can’t edit them as the program executes.

A Prototype-based object can define its own properties. But, it also has an inherited prototype, another object. If the current object can’t find a requested property, it asks its prototype. If the prototype object has that property, then so does the current one. Continued failures keep moving up the prototype chain until the base prototype (Delegation).

The trick here is that prototypes aren’t ideas. They are fully-fledged objects. If the prototypes gain new properties during execution, all their derived types do too. What’s more, a prototype can change at run-time, meaning inheritance is much more fluid. An object may be a drop-in replacement for one type one moment and another type the next moment.

I won’t be doing a code example of Prototype logic. Ample examples of that can be found everywhere since JavaScript, an extremely popular language, is built upon prototypal inheritance.



Devs often need to send data and execution instructions over a network. Most frameworks give people low-level controls for reading and writing network data. Writing involves sending network data to particular IP addresses and ports. Reading likewise involves listening for network data on particular ports.

Developers have several options when it comes to how to use or implement networking.

  • They can send network data on reliable, slow connections with TCP or on unreliable, fast connections with UDP (TCP vs UDP).
  • They can visualize the network data as individual messages (packets) or as a stream of data (WebSockets) (Packets vs. WebSockets).
  • They can have each application fully aware of the game state (peer-to-peer) or choose a single entity to be the authority on the game state (client-server) (P2P vs. client-server). Note that when the authority is not one of the players, devs call it a “dedicated server.”

Managing these relationships can be incredibly complex. Low-level controls like that are cumbersome for simpler, more common behaviors, such as…

  • Updating a local variable and synchronizing that update on the remote computers.
  • Calling a function locally and replicating that call on the remote computers.
  • Notifying the game state of a change, but not needing to notify other players, e.g. pausing.

Tasks such as these should be more accessible to developers. To simplify things, these frameworks can also define a high-level networking API (HLAPI). It generally creates utilities for client-server designs:

  • It assists in creating, managing, and destroying player connections.
  • It verifies which connected player is the authority.
  • It can test whether the currently executing context is the authority (to create client- or server-specific methods).
  • It enables users to mark which properties/methods it should replicate over the network automatically, to whom, and how.

A game engine’s editor usually has some setting in its GUI that enables one to trigger different aspects of the HLAPI in regards to particular variables or methods.

As you can tell, networking is a deep topic with many associated technologies. As such, a concrete example of its usage is difficult. But, you might see something like this at the low level…

int main() {
    Socket s;
    int count = 250;
    String ip_address = "";
    int port = 9000;
    //send the "count" value to the machine at IP address "ip_address"
    //and send the data to the "port" port on that machine.
    s.send_int(ip_address, port, count);
    //Now listen on the same port for a confirmation of arrival from
    //the other machine.
    while (s.listen(9000)) {
        //maybe s.listen() has logic for timing out & returning false

…and something like this at the high level (but still in engine code, sort of – this is kind of a bad example, but it’s shorter and requires little explanation to understand).

int main() {
    Integer count;
    Game game;
    //declares that "count" is a variable in the game w/ value 5
    //Also declares that it should synchronize the value to all players
    //All players now know that "count" exists on each machine.
    game.declare_int("count", &count, 5, REPLICATE_MODE::SYNC);
    //The Integer class is handling logic to propagate the new
    //value to other machines automatically through Game's HLNAPI
    count.value = 5;
Credit: Peachy Cheetah


So, users want to maintain a loose coupling, but they also want to connect behaviors. By separation of concerns, an owned object should never know the details of its parent or siblings. But what if the child does something, and the parent or siblings need to react? What’s the best way for it to notify the others?

A poor implementation might be for it to directly access the parent, and through it, the siblings, to call a method on them. That creates a tight coupling though as the notifier now has to know the type that it notifies. It also is forming a dependency whereby it can only exist in a parent that also has the required sibling.

To prevent this, devs usually make an event system available to the framework’s users. The parent already is aware of both siblings’ existence and types. It also knows that the sibling will need to react.

Instead of the notifier taking matters into its own hands, the parent simply tells the sibling to listen for the event on the notifier. The notifier can then emit the event. Because the sibling was listening for it, it will react. Neither the notifier nor the sibling knew each other. The parent orchestrates everything from behind-the-scenes. Devs call this design pattern the Observer pattern.

Almost every gaming framework has an event system of this nature. Some even have more than one for different purposes (input, collision detection, user-defined, etc.).

JavaScript has several examples of the Observer pattern since it is also built upon it, via EventListeners. This pattern can be found in virtually every language. We’ll cover its usage in modern game engines and their languages later.



Now users may start creating a huge number of entities in their games. What if they just need a certain group of them? What if they just need one?

It would be inefficient to cycle through all existing entities and compare their IDs to a list. Instead, it might be a good idea to set up a cache of entities that fit into particular categories. Looking through a list of 1 or 10 entities is a lot faster than looking through 1,000 or a 100,000.

Well, devs decided to allow users to create these caches at will. To access one, they need to give it a name and add entities to the cache. Users can then perform lookups for these names, a.k.a. “tags” to find entities or they can check which tags an entity has.

Here’s another simple example in non-specific C++ code.

int main() {
    Game game;
    PlayerEntity pe;
    game.add_entity(pe); //add the player
    //add 1000 miscellaneous entities
    Entity e;
    for (int i = 0; i < 1000; i++) {
    EntityArray arr = game.get_entities_by_tag("player");
    print(arr.size()); //prints 1
    TagArray arr2 = pe.get_tags();
    print(arr2); //prints ["player"]



While many more features are also shared between engines, these are the highlighted ones here. You should by now have an idea of what to expect in popular game engines nowadays, as well as why those features are available in the first place.

Again, we’re only touching the surface here, and we could never go into enough detail on even one of these topics (or sub-topics) in a single article. Feel free to look further into these concepts (especially otherdesign patterns” like the Observer concept above). There’s a nearly endless amount of material ripe for learning.

If you feel like you have a general grasp of these concepts, then you should be ready for us to talk about how some of today’s engines actually incorporate them. The next article will explain how Unity, Unreal, and Godot enable these designs for their users.

If you liked the article, or if something was confusing or off-sounding, please let me know! I’d love to hear from you in the comments. I do my best to ensure each article is updated as conversations progress. Cheers!


Godot’s Node System, Part 1: An OOP Overview

This article is part of a series.
Part 1: An OOP Overview
Part 2: Framework Concepts
Part 3: Engine Comparisons


My last post reviewed Godot Engine. It’s a strong open source competitor to Unity and Unreal Engine 4, among others. While I covered each engine’s scripting frameworks already, I wish to do that analysis more justice.

Newcomers to Godot are often confused about how to approach Godot’s nodes and scenes. This is usually because they are so accustomed to the way objects in the other engines work.
This article is the first in a 3-part series in which we examine the frameworks of Unity, Unreal Engine 4, and Godot Engine. By examining how Unreal and Unity’s frameworks compare, readers may be able to ease into Godot better. Readers should come away understanding Godot’s feature parity with the other engines’ frameworks.

Discussing the engine comparisons requires prior understanding of basic concepts, first with programming and then with game frameworks. The first two articles each respectively cover those topics. If you feel you are already familiar with a given topic, feel free to skip to the next article. Let’s go!


What is OOP?

Game Engines tend to define a particular “framework”. They are a collection of objects with relationships to each other. Each framework empowers developers to structure their projects how they want. The more a framework enables the developers to organize things as desired, the better it is.

To do so, they use Object-Oriented Programming (OOP) whereby users define groups of related variables and functions. Each group is usually called a “class”, an abstract concept. In the context of a class, the variables are “properties” and the functions are “methods”. Creating a manifestation of a class gives you an object, a.k.a. an “instance” of the class.

OOP has several significant elements to it. This article explores each of them in later comparisons. Therefore, it is imperative that we dive into them now.


The Big 3 of OOP

Devs want users to know how to use a class, but not how it operates. For example, if someone drives a car, they don’t need to know the complexity of how the car drives. Only how to instruct it to drive. Driving might involve several other parts. It might involve data and behaviors the user knows nothing about. The car abstracts away the complexity of the task (Abstraction).

Below is a simple GDScript example of Abstraction. The user is really dealing with a complex collection of data: length, width, and area. But certain rules apply. The area is not editable, and its value changes as length and width change. Also, the user needs to refer to the length and width together. The dev has made it possible for the user to organize a complex concept into a single logical unit.

# rectangle.gd
extends Reference
var length = 1
var width = 1
func get_area():
    return length * width

Users want to edit data or “state”. Devs don’t want users to access the data though. They instead opt to enable users to issue instructions on how to change a class’s data. Devs can then change the data’s structure without affecting the program’s logic. For example, the user can explain that they want to “add an Apple” to the Basket, but how the Apple is stored relative to the Basket is the devs’ concern, not the user’s. Devs should also be able to control the visibility of data to other devs (Encapsulation – Definition 2).

This type of Encapsulation guarantees flexibility of data structure. For example, what if the user attempts to set the length directly?

# main.gd
extends Node
func _ready():
    var r = preload("rectangle.gd").new()
    r.length = 5

Well then, what if we later decide to store the data as a Vector2? That is, a struct containing two floats together? Each Vector2 has an “x” and a “y”.

# rectangle.gd
extends Reference
var vec = Vector2(1, 1)
func get_area():
    return vec.x * vec.y

Well with the change, “length” is no longer a property. The user now has to search their entire code base for every reference to r.length and change it to r.vec.x. What if instead, we encapsulated the data behind instructions to change the data, as methods? Then…

# rectangle.gd
extends Reference
var _length = 1 setget set_length, get_length
var _width = 1 setget set_width, get_width
func get_length():
    return _length
func get_width():
    return _width
func set_length(p_value):
    _length = p_value
func set_width(p_value):
    _width = p_value
func get_area():
    return _length * _width

# main.gd
extends Node
func _ready():
    var r = preload("rectangle.gd").new()
    r.set_length(5) # data change is encapsulated by class's method.
    r.length = 5 # in GDScript, the setget keyword will automatically force a call to the method rather than direct access.

Now, if we want to change the data, we don’t need to modify the program. Either way, we are only calling set_length(5). Only the object, a self-contained area, must be modified.

# rectangle.gd
extends Reference
var _vec = Vector2(1, 1) setget set_vec, get_vec
func set_vec(p_value):
    pass # block people from setting it directly (if desired)
func get_vec():
    pass # block people from accessing it (if desired)
func get_length():
    return _vec.x
func get_width():
    return _vec.y
func set_length(p_value):
    _vec.x = p_value
func set_width(p_value):
    _vec.y = p_value
func get_area():
    return _vec.x * _vec.y

Users want to refer to a specialized class and a basic class with a common vocabulary. Let’s say you create a Square which is a Rectangle. You should be able to use a Square anywhere you can use a Rectangle. It has the same properties and methods as a Rectangle. But, it’s specialized behavior sets it apart (Inheritance).

The rule of a Square is that it is a Rectangle, but it’s sides are equal at all times. For this reason, the length and width are both referred to as its “extent.” Let’s change its instructions to maintain that rule.

# square.gd
extends "rectangle.gd"

# first, we setup the new rule
func set_extent(p_value):
    _vec = Vector2(p_value, p_value)

# Then, we override the original setter instructions.
# Now, only the Square's actions will execute
func set_length(p_value):
func set_width(p_value):

# main.gd
extends Node
func _ready():
    var s = preload("square.gd").new()

Note how Square explicitly creates a specialized version of the set_length() method. It keeps the width equal to the length. The Square can also still use get_area(), even though it doesn’t define it. That is because it is a Rectangle, and Rectangle does define it.

Now, specialized classes may or may not have read or write access to all the base properties and methods. GDScript doesn’t support this type of Encapsulation, but C++ does. C++ has what it calls “access modifiers” that determine what can only be seen by the current class (“private”), what specialized types can also see (“protected”), and what can be seen by all objects (“public”). This controlled access within a class allows for content to be encapsulated within particular layers of inheritance hierarchies (Encapsulation – Definition 1).

class Rectangle {
    int _area;
    bool _area_dirty;
    void _update_area() {
	_area = _length * _width;
	_area_dirty = false;
    int _length;
    int _width;
    int get_area() {
        if (_area_dirty) {
        return _area;

class Square : public Rectangle {
    void set_extent(int p_value) {
        _length = p_value;
        _width = p_value;

While the area content is locked to the Rectangle class (Square has no need for changing how area works), the Square still needs access to _length and _width in order to assert equivalent values for them. Square can only reference “protected” data in its set_extent() method, so area is inaccessible. This encapsulates the “area” concept safely within the Rectangle type.

In cases where you don’t want even the specialized classes to see or access data, access modifiers of this kind can be very useful in assisting the encapsulation, i.e. “hiding” of data from other parties.


Advanced Concerns

Developers want to minimize how much they need to change the code. How? Replace instructional changes with execution changes where possible. The instructions on how to use an object are its “interface.” The way an object performs an instruction is its “implementation.” Devs refer to a program or library’s public-facing interface as the Application Programmer Interface (API) (Interface vs Implementation).

You saw an example of the need for maintaining an interface with the first Encapsulation: if the user calls set_length() in all cases, then they don’t need to concern themselves with how it is performing its operations to change the data. An API is more generic though. If a user creates another object that also has a set_length() method, then the objects match the same API.

Some languages, like GDScript, use duck typing, and will allow the user to call the method simply because it exists when requested. Other static languages like C++ and C# will require the user to convert the object into a type that has the needed function strictly defined (if the language supports interfaces like C# or a form of it like C++).

A specialized class should execute its behavior in place of the base class’s where applicable. For example, let’s say Animal has specialized classes Dog, Monkey, and Human. If Animal has an “isHumanoid” method, each of them should have a specialized response to it (false, true, true). Accessing the “isHumanoid” method should always call the specialized response if possible (Polymorphism).

Below is an example of Polymorphism in GDScript. It outlines how the user can request the same information from different objects, but each one, depending on its type, provides a specialized response. There is no risk of the created WindowsOS somehow printing an empty line. Because Mac and Linux inherit from Unix, they inherit a polymorphic response, i.e. a response unique from the Windows response.

# main.gd
extends Node

class OpSystem:
func get_filepath_delimeter():
    return ""
class WindowsOS:
    extends OpSystem
    func get_filepath_delimeter():
        return "\\" # the first backslash is "escaping" the second
class UnixOS:
    extends OpSystem
    func get_filepath_delimeter():
        return "/"
class MacOS:
    extends UnixOS
class LinuxOS:
    extends UnixOS

func _ready():

To avoid over-complicating specializations, it can be useful to establish ownership between classes. If one class owns another class, then an instance of the second may only belong to a single instance of the first. Using ownership in place of specialization provides many advantages. (Aggregation).

  • It is easy to change a class’s ownership from a second class to a third class. It is difficult to change a single class’s specialization to another specialization (Loose Coupling).
  • Favoring Aggregation guarantees protection of the owned class’s data. This supports Encapsulation by ensuring that the class only has to worry about its own data and implementation (Separation of Concerns).
  • Aggregation abstracts away the complexity of using several classes at once. Inheritance abstracts away only the usage details of a single class. Inheritance has a deeper effectiveness for abstracting a single class’s complexity. But the tight coupling inheritance generates is often not beneficial in the long run when Aggregation is an option (stronger Abstraction).

In some cases, developers may want a class to rely on another class for its very existence. One class owns the other, but it is stronger than Aggregation. The owner creates the owned object directly, and if the owner dies, so does the owned object (Composition).

For example, computers and software have this relationship. Turning on the computer starts the software. Shutting it off kills the software. The software has a strong dependency on the computer. It “composes” the computer.

Below is a GDScript example of these concepts.

# main.gd
# Assuming a scene structure like so...
# - Node2D "main"
# - - Node2D "child1"
# - - - Sprite
# - - Node2D "child2"
# - - - Sprite
extends Node2D
func _ready():
    var texture = preload("icon.png")
    $child1/Sprite.texture = texture
    $child2/Sprite.texture = texture

Now, a key feature here is that each “child” isn’t a single object that has a texture property with logic to manipulate images. Instead, all of that is self-contained within a Sprite. The “child” nodes are free to have a variety of other features, but the Sprite tidbit is narrowly confined to that one node. This illustrates separation of concerns.

Because a node can be added and removed easily, it also shows loose coupling. Technically speaking, we could place another node of a different type where the sprites are, give them the name “Sprite”, and make sure they have a property called “texture”. If that were the case, the program would execute the same way. It isn’t tied strictly to the use of the Sprite node exactly.

In this example, the texture asset is aggregated while each Sprite node that uses it composes its parent “child” node. If the user deletes child1, its child Sprite will die with it. This is because of the Sprite’s compositional relationship to the parent node. But that doesn’t mean the texture is unloaded from memory, i.e. the image is not deleted. It is still in use by child2’s Sprite.

In this way, each Sprite “owns” a reference to the texture, but the texture doesn’t depend on any particular Sprite to exist. The Sprites do depend on a particular node (their parent) to continue existing though. That is composition.


The Birth of Entity-Component Systems

In game development, relying on inheritance alone is messy. Combining inheritance with aggregation and composition gives developers great flexibility.

Developers often want to describe game concepts as a set of attributes and behaviors. In the early days, devs took a naive approach to defining these qualities. They might define a basic object and then ever more specialized objects for all purposes of game development.

That quickly became too difficult to manage though. If two specialized classes need common behavior, devs had to give it to a basic class. The “basic” classes soon became too complex to maintain.

# main.gd
extends Node
class Bird:
    func fly():
        # REALLY complex flying logic here. Expensive to reproduce and maintain
class Camera:
    extends Bird
    func record():
        # REALLY complex recording logic here. Same story.

Let’s say there is functionality in a class somewhere that could be really useful somewhere else. The Camera needs the movement logic to fly smoothly through the environment. Well, if that logic has already been written in the Bird class, then it can make the job a lot easier to just have the camera extend the Bird class and get the flying logic for free. Then the user can just add the camera logic.

But that poses a significant problem. If the camera is a bird, it suddenly also has a lot of unrelated bird functionality too: feathers, a bone structure, animations, chirping logic, wings, etc. And what if the user creates different types of cameras? What if the user makes a TrackCamera that moves on a track that doesn’t even require free movement like the bird provides? In order to get the Camera logic, the TrackCamera now has to carry all of this unnecessary Bird logic!

To solve this, developers created a new technique: create an empty container of attributes and behaviors. Then add only what you need! Devs call these empty containers “entities”. The attributes and behaviors are then grouped into “components” and added to an entity. Destroying the entity also destroys the components (Entity-Component technique).

Through composition, devs could finally design their objects with only what they need. Nothing more. If the demands of an entity change, you add and/or remove a component. If the nature of a particular component needs to change, no problem! Swap out the component with another one that meets the same interface, but provides a new implementation.

Note that Devs often refer to the EC technique as ECS which adds “System” to it. The systems just allow one to perform an operation on all entities with a given set of components. However, that detail is not relevant to this discussion.



Hopefully that wasn’t too much for your brain! By now you should have an introductory understanding of Object-Oriented Programming and its applications in developing software. There are many more, far more detailed explanations for each of the reviewed concepts, and the links provided are simply a way to get you started.

The keywords mentioned should give you the means to find more information if you need a better understanding. Once you feel comfortable with the concepts of inheritance, ownership, and other programming principles, dive into the next topic: today’s gaming framework designs.

As always, please comment with any feedback you have, especially if something is in error or confusing. I frequently update my work over time to try to maintain its integrity.

Godot: the Game-Changer for GameDevs

Edit 1: Updated Graphics Comparison section with before/after shots of Godot and accuracy corrections.

Edit 2: There was some confusion over whether the Deponia devs used Godot for their PS4 port. I had removed that content, but now they they have officially confirmed that it WAS in fact a PS4 port. Links in the Publishing section.

Edit 3: I’ve received questions regarding the preference for a 2D renderer option, so I’ve added a paragraph explaining my motivations in the Graphics section.

Edit 4: Godot lead developer Juan Linietsky mentioned in a comment a few points of advancement present in the new 3D renderer. The Graphics section has been updated with a link to his comment.

Edit 5: I have personally confirmed that the GDNative C++ scripting on Windows 10 x64 is now fully functional in a VS2015 project. Updating the Scripting section.

Edit 6: I have received new information regarding the relative performance of future Godot scripting options. I have therefore updated the scripting section.

Edit 7: Unity Technologies that they are phasing out UnityScript. Updating the Scripting section. Also, making a correction to Unreal’s 2D renderer description in Graphics.

Edit 8: Unreal recently added C# as a scripting language. I’ve updated the Scripting section accordingly.

Edit 9: More “official” blog posts / tutorials have been made explaining other aspects of the Godot Engine, so I am including links to those where appropriate.

Edit 10: Adding a GM:S API diff link.

Edit 11: Including addendums to each section for GameMaker: Studio, because I frequently see people despairing that this article doesn’t include it in its comparisons.


I’ve been tinkering with game development for around 5 years now. I’ve worked with GameMaker: Studio (1.x, though I’ve seen 2.0), Unity, and Unreal Engine 4 along with some exploration of Phaser, Construct 2, and custom C++ engine development (SFML, SFGUI, EntityX). Through my journey, I’ve struggled to find an engine that has just the right qualities, including…

  • Graphics:
    • Dedicated 2D renderer
    • Dedicated 3D renderer
  • Accessibility:
    • Powerful scripting capabilities
    • Strong community
    • Good documentation
    • Preferred: visual scripting (for simpler designer/artist/writer tooling)
    • Preferred: Simple and intuitive scripting architecture
  • Publishing:
    • A large variety of cross-platform support (non-negotiable)
    • Continuously expanding/improving cross-platform support (non-negotiable)
  • Cost:
    • Low monetary cost
    • Indie-friendly licensing options
  • Customization:
    • Ease of extending the editor for custom tool creation
    • Capacity for powerful optimizations (likely via C++)
    • Preferred: open source
    • Preferred: ease of editing the engine itself for community bug fixes / enhancements

For each associated field, I’ll examine some various features in the top engines and include a comparison with the new contender on the block: Godot Engine, a community-driven, free and open source engine that is beginning to expand its graphics rendering capabilities. Note that my comments on the Godot Engine will be in reference to the newest Godot 3.0 pre-alpha currently in development and nearing completion.

Initial Filtering Caveats

Outside of the big 3, i.e. GM:S, Unity and UE4, everything fails on the publishing criteria alone since any custom or web-based engine isn’t going to be easily or efficiently optimized for publishing to other platforms.

The new GitHub for Desktop app is an example of Electron (web browser dev-tools on right).

It is true that technologies such as Electron have been developed that ensure that it is possible to port HTML5 projects into desktop and mobile applications, but those will inherently suffer limitations in ways natively low-level engines will not. HTML5 games will always need to rely on 3rd party conversion tools in order to become available for more and more platforms. If we wish to avoid that limitation, then that leaves us with engines written in C++ that allow for scripting languages and optimizations of some sort.

GM:S’s scripting system is more oriented towards beginners and doesn’t have quite the same flexibility that C# (Unity) or Blueprint (UE4) has. In addition, GM:S has no capacity for extending the functionality of the engine or optimizing code. The latest version has a $100 buy-in (or a severely handicapped free version). Without a reasonable free-to-use option available in addition to all of the other issues, GM:S fails to meet our constraints. That leaves only Unity, Unreal, and Godot.

Edit: for people who decide to try Godot, someone has started a repository for collecting API differences between Godot and GM:S.

Edit: Due to comments I’ve received over the past several months, I will add sections covering GameMaker: Studio. Note that while I have first-hand experience with 1, I’ve only watched some comparison videos for version 2, so it is possible that I may have missed something.

Graphics Comparisons

There is no doubt that Unreal Engine 4 is currently the reigning champion when it comes to graphical power with Unity coming in at a close second. In regards to 2D renderers, it should be noted that…

  1. Unity has no dedicated 2D renderer as selecting a 2D environment just locks the z-axis / orthogonal view of the 3D cameras in the 3D environment.
  2. Unreal’s dedicated 2D renderer, Slate, can be leveraged through the UMG widget framework to be used in the 3D environment. In a project consisting solely of UMG content therefore, you can more or less use Unreal to get the benefits of operating solely within a 2D renderer. However, all of Unreal’s official tooling and features related to 2D are confined within the non-UMG content (collision, physics, etc.), so it’s not exactly a legitimate “support” for 2D rendering.
  3. GameMaker: Studio has a dedicated 2D renderer. From what I understand, they have begun to simplify the workflow for 3D rendering as well, although, it personally sounds like it’s still a laborious process.
  4. Godot has a dedicated 2D renderer (what it started with in fact) and a dedicated 3D renderer with similar APIs and functionality for each.

For those who might wonder why a dedicated 2D renderer is even significant, the main reason is the ease of position computation. If you were to shift an object’s position in 2D, the positions (at least in Godot) are described purely in terms of pixels, so it’s a simple pair of addition operations (one for each axis). An analogous operation in a 3D renderer requires one to map from pixels to world units (a multiplication), calculate the new position in world coordinates (3 additions) and then convert from world space to screen space (a matrix multiplication, so several multiplications and additions). Things are just a lot more efficient if you have the option of working directly with a 2D renderer.

For the 3D rendering side, Unity and Unreal are top dogs in the industry, no doubt about it.

This is a few iterations old, but you get the idea of where they stand (very impressive)

I had some trouble finding sensible 3D examples for GameMaker Studio 2. This was one of 3 images I discovered. It appears to be decent, but not quite of the same caliber as the others.


Godot 2.x’s 3D renderer left something to be desired compared with Unity/UE4 (similar to GMS, although the workflow for the same quality of work in Godot 2.x appears to be much simpler than in GMS). The showcased marketing materials on their website look like this:

Godot 2.x 3D demonstration, from the godotengine.org 3D marketing content

With Godot 3.0, steps are being taken to bring Godot closer to the fold. Pretty soon, we may start to see marketing materials like this:

A test demonstration of the 3.0 pre-alpha’s power, shared in the Godot Facebook group.

I’d say that’s a grand improvement, and this graphical foundation lays the groundwork for an impressive future if the words of Godot’s lead developer, Juan Linietsky, are to be taken to heart. (Edit: Juan recently published an article that goes into MUCH further detail regarding how the 3D renderer works. He also recently updated the docs for the Godot shading language)

It remains to be seen exactly how far this advancement will go, but if this jump in progress is any indication, I see potential for Godot 3.0 to enter the same domain of quality as Unity and Unreal.

Publishing Comparisons

Unity is by far the leader in publishing platform diversity with Unreal coming in second and Godot coming in last.

Unity’s cross-platform support as of July 2017


Unreal Engine 4’s cross-platform support as of July 2017
Game Maker’s platforms as of March, 2018

(Note that Switch support for GMS is on its way soon.)

Godot’s publicly disclosable cross-platform support as of July 2017

Note that for Godot specifically, it also has the capacity to be integrated with console platforms since it is natively written in C++; all you need is the development kit. For legal reasons, however, Godot’s free and open source GitHub cannot include (and thereby publicize freely) the integrated source code for these proprietary kits. Developers who already own a devkit can provide ports, but legally, the non-profit that manages Godot Engine (more on that later) cannot be involved. Despite this setback, the PS4 port of the game Deponia was implemented in Godot.

In addition, Godot 3.0 has recently conformed to the OpenHMD API, integrating functionality for all VR platforms that rely on that standard (so that would include HTC Vive, Oculus Rift, and PSVR). The community is gradually adding VR support, documentation, demonstrations, and tutorials.


All in all, Unity is still the clear leader here, but both Unreal and Godot provide a wealth of options for prospective developers to publish to the most notable and widespread platforms related to game development. As such, this factor tends to be somewhat irrelevant unless one is targeting release on one of the engine-specific platforms.

Licensing Comparisons

Unity’s free license permits the developer to craft projects with little-to-no feature limitations (only Unity-powered services are restricted, not the engine itself), so long as the user’s total revenue from a singular game title does not exceed $100,000 (since time of writing). The license does limit the number of installations you can have, however, as they are linked to a “Personal Edition” of the software. If you end up exceeding the usage limits, then you must pay for a premium license that involves a $35 (to double the monetary limit) or $125 (to remove the monetary limit) monthly subscription rate.

Unreal Engine 4 likewise has a free license, however its license has no restrictions whatsoever on the size of your team or the number of instances of the engine you are using (distinct from Unity). On the other hand, it has a revenue-sharing license in which 5% of all income over $3,000 per quarter is contributed to Epic Games.


The licensing between these two platforms therefore can be more or less beneficial depending on…

  1. How long you plan to spend developing (if using Unity professionally).
  2. How quickly you expect the revenue from your game to roll in (if it dips into UE4’s 5% cut trigger).
  3. How much total revenue you expect to make from a single game (Unity’s revenue cap per title).

GameMaker: Studio is priced from the very beginning. It supplies a free trial with limited asset usage (a bit of an extreme nerf in my opinion) and then ever-increasing purchase rates to get licenses that can publish to more platforms, ranging from $39 (desktop only) all the way to $399 (desktop, mobile, web, and console).

Godot, as you might expect, has no restrictions in any capacity since it is a free and open source engine. It uses the MIT license, effectively stating that you may use it for whatever purposes you wish, personal or commercial, and have no obligation to share any resources or accumulated revenue with the engine developers in any way. You can create as many copies of the engine as you like for as many people as you like. The engine itself is developed through the support of its contributors’ generous free work and through a Patreon that is filtered by the Software Freedom Conservancy.

In this domain, Godot is the obvious winner. The trade-off therefore comes in the form of the additional tooling and effort you as a developer have to invest to develop and publish your game with Godot. This and more we shall cover in the below examinations.

Scripting Comparisons

Unity officially supports Mono-powered C#. With some tweaking, you could potentially use other .NET languages too (like F#). If you end up needing optimizations, you are restricted to the high level language’s typical methods of speeding things up. It would be more convenient and vastly more efficient if one could just directly develop C++ code that can be called from the engine, but alas, this is not the case. Unity also doesn’t have any native visual scripting tools, although there are several paid-for extensions to the engine that people have developed.

Unreal Engine 4 is gaining a stronger and stronger presence due to its tight integration of C++, the powerful, native Blueprint visual scripting language, and its recent addition of Mono C#. Blueprint is flexible, effective, and can be compiled down into somewhat optimized C++ code. Unreal C++ is an impressive concoction of its own that adds reflection and garbage collection features commonly associated with high level languages like C#.

Unreal’s Blueprint visual scripting language

GameMaker: Studio has its own GameMaker Language scripting language. It’s simple enough to use for beginners. What’s also cool is an even more beginner-friendly drag-and-drop programming system that can auto-translate to GML (at least in version 2). Somehow, this all gets mixed into a visual scripting framework that connects things (only in version 2, these are all disparate windows in version 1).

GameMaker Studio’s GML

It is in this area that Godot especially shines out from the others. Previous iterations of Godot have had directly implemented C++ and an in-house python-like scripting language called GDScript. It was used after having already tried Python, Lua, and other scripting languages and found all of them lacking in efficiency when dealing with the unique architectural designs that the Godot Engine implements. As such, GDScript is uniquely tailored for Godot usability in the same way that Blueprints for UE4 are.

Later on, a visual scripting system called VisualScript was implemented that could function equivalently to GDScript. Godot 3.0 is also including native support for Mono C# to cater to Unity enthusiasts.

Godot’s VisualScript visual scripting language

The power that truly sets Godot 3.0 apart however is its inclusion of a new C API for binding scripted properties, methods, and classes to code implemented in other languages. This API allows any native or bound language’s capabilities to be automatically integrated with every other native or bound language’s dynamically linked functionality. The dynamically linked libraries are registered as “GDNative” code that points to the bound languages’ code rather than as an in-engine script, effectively creating a foreign interface to Godot’s API. This means that properties, methods, and classes declared and implemented in one language can be used by any other language that has also been bound. Bindings of this sort have already been implemented for C++ (Windows, Mac, and Linux). Developers are also testing bindings for Python (already in beta), Nim, and D. Rust and JavaScript bindings are in the works as well, if I understand correctly.

In comparing these various scripting options, C# will likely have better performance than GDScript, but GDScript is more tightly integrated and easier to use. VisualScript will be the least performant of these, but arguably the easiest for non-programmers to use. If raw performance is the goal, then GDNative will be the most effective (since it is literally native code), but it is the least easiest to use out of these as you have to create different builds of the dynamic library for each target platform.


The “loose integration” this enables will empower any Godot developer to leverage pre-existing libraries associated with any of the bound languages such as C++’s enhanced optimizations/data structures, any C# Unity plugins that are ported to Godot, pre-existing GDScript plugins, and the massive library of powerful statistical analysis and machine learning algorithms already implemented by data research scientists in Python. With every newly added language, users of Godot will not have to resign themselves to the daunting “language barrier” that haunts game development today. Instead, they’ll be able to create applications that take advantage of every conceivable library from every language they like.

Edit: C# was recently merged in, and someone ran a comparison of the performance between GDScript, C#/Mono, and GDNative C++. In addition, here is a post I made on Reddit that goes more in-depth into the relationship between the engine’s scripting languages.

Framework Comparisons

Unity and Unreal have very similar and highly analogous APIs when it comes to the basic assets developers work with. There are the loadable spaces in the game world (the Scene in Unity or Level in Unreal). They then have component systems and a discrete root entity that is used to handle logic referring to a collection of components (the GameObject in Unity or Actor in Unreal). Loadable spaces are organized as tree hierarchies of the discrete entities (Scene Hierarchy in Unity or World Outliner in Unreal) and the discrete entities can be saved into a generalizable format that can be duplicated or inherited from (the Prefab in Unity or Blueprint in Unreal).

If you want to extend the functionality of these discrete entities, you then must create scripts for them. In Unity this is done by adding a new MonoBehaviour component within the 1-dimensional list of components associated with a game object. You can add multiple scripts and each script can have its own properties that are exported to the editor’s property viewer (the “Inspector”).

Multiples of these scripts can be added to a single GameObject if desired, but there is also no relationship defined between components.

In Unreal, a discrete entity has an Actor-level tree-hierarchy showing its components. Scripts, however, are not components themselves (although scripts can extend components too), but rather things directly added to the Actor Blueprint as a whole. An individual function may be created from scratch or extending/overloading an existing function. One can also create Blueprint scripts disassociated from any entity as an engine asset (called a Blueprint Function Library). The bad news is that Blueprints aren’t just a file you point to, i.e. you can’t just add the same script file to different Blueprints like you can with Unity’s C# files.

Components have their own hierarchy, but are merely variables in the scripting organized by context (event/function/macro) in Actors.

In GameMaker: Studio, you create “rooms” in which to place your objects, sprites, tiles, backgrounds, etc. Version 2 has added the ability for rooms to inherit from one another, which is an interesting nuance compared to Unity/UE4’s methods, allowing you to sort of “Blueprint” your room layout and initialized properties. The objects you place in these rooms can then have scripted responses to in-game events. Objects can inherit from one another, but there is no notion of a component system. In order to construct any sort of composition, the “has” relationships need to be declared overtly by searching the room for the object you wish to own and then manually assigning that object id to a variable. It feels clunky to me personally, but its simplicity can simplify things for beginners who don’t want to concern themselves with the cleanliness of their code.

Objects in GM:S have a sprite, physics, a parent, and events to which they can attach scripts.

In Godot, things are simplified a great deal. Components, called “Nodes” are similarly organized into discrete entities that can be saved, duplicated, inherited and instanced; however, Godot sees no difference between the way a Prefab/Blueprint would organize their components and the way a Scene/Level would organize the entities. Instead, it unifies these concepts into a “scene” in its entirety, i.e. a Prefab/Blueprint is a GameObject/Actor is a Scene/Level; everything is just a gigantic set of instanceable and inheritable relationships between nodes. Scenes can be instanced within other scenes, so you might have one scene each for your bullet, your gun, your character, and your level (using them as you would a Prefab/Blueprint). Scripts to extend node functionality are attached 1-to-1 with nodes, and nodes can be cheaply added with the attached script being built-in (Saved into the scene file) or externally linked from a saved script file.


This section is more or less just to demonstrate how each engine has their own way of organizing the game data and highlighting the relationships between elements of functionality. In my personal experience, I find Godot’s model to be much more intuitive to reason about and work with once preconceptions from other engines’ tropes are discarded, but to be honest, this is really just a matter of personal taste.

(Edit: for lack of another place to put this, I’m inserting here; Godot will soon be integrating the Bullet physics engine as an option you can toggle in the editor settings.)

Community and Documentation Comparisons

The glaringly divergent quality between the engines is the documentation. Unreal’s Blueprint and C++ documentation pale in comparison to the breadth and depth of Unity’s massive array of concepts, examples, and tutorials, built both by Unity Technologies and the large community. This is a damaging blow, but wouldn’t be so bad if Unreal’s documentation were at least adequate. Unfortunately, this is not the case: Blueprints have some diversity of tutorials and documentation (nothing like Unity’s though), especially from the user base, but Unreal C++’s documentation is abhorrently lacking. In-house tutorials will often times be several versions behind and the Q&A forums can take anywhere from a few days to weeks, months, or even over a year to get a proper response (several engine iterations later when the same issue is popping up still).

The ironic curve-ball in the situation is that Unreal Engine 4 publishes its own source code to its licensed users. One could arguably reference the source code itself in order to teach themselves UE4’s API and best practices. Unfortunately, Unreal C++ tends to be a huge, intimidating beast with custom compilation rules that are not well documented, even in code comments, and very difficult-to-follow code due simply to the complexity of the content. A typical advantage of source code-publishing projects is the capacity to spot a problem with the application, identify a fix, implement it, and submit a pull request, but the aforementioned complexity makes taking full advantage of UE4’s visible source code much more difficult for the average programmer (at least, in my experience and that of other programmers I’ve discussed it with).

GameMaker: Studio actually has very nice documentation for its usage which is a testament to its high usability for new beginners. And you can easily jump between functions and topics as most topics will provide a list of related functions. Therefore, learning about HOW to use a topic is often intertwined with the documentation on what the topic is (very beginner friendly). This is probably one of the highlights of using GM:S in my eyes.


Godot Engine’s documentation is stronger than Unreal’s, but still a bit weaker than Unity’s. I would also say there are some ways in which it is both better and worse than GameMaker: Studio’s (community managed means things can change very quickly, but it also means things need to be reported / discussed and someone has to actually do it. Proactive-ness is key). A public document generator is used for the Godot website documentation while an in-engine XML file is used to generate the contents of the Godot API documentation. As such, anybody can easily open up the associated files and add whatever information may be helpful to users (although they are approved by the community through pull requests).

On the downside this means that it is the developers’ responsibility to learn how to use tools. On the upside, the engine’s source code is beautifully written (and therefore very easy to understand), so teaching yourself isn’t really difficult when you really have to do it; however, that is often unnecessary as the already small community is filled with developers who have created very in-depth YouTube tutorials for many major tasks and elements of the engine.

You can receive fully informative answers to questions within a few hours on the Q&A website, Reddit page, or Facebook group (so it’s much more responsive than Unreal’s community). In this sense, the community is already active enough to start approaching the breadth and depth of Unity’s documentation and this level of detail is achieved with a minute fraction of the user base. If given the opportunity, a fully grown, matured, and active Godot community could easily create documentation approaching the likes of Unity’s professional work.


So, while Unity is currently still the winner here, it is also clear from Godot’s accessibility and elegance that even with a larger community, Godot could easily enhance the dimensions of its documentation and tutorials to compensate for the community’s needs.

(Edit: note, that the community has been doing weekly documentation sprints in anticipation of the 3.0 release. Even with API revisions between 2.1 and 3.0, the docs have already improved by roughly 20% over the previous version’s content in the past 5 weeks alone. If you are interested in assisting, please visit the Class API contribution guide to get involved and discuss your plans / progress with the #documentation Discord channel (Discord link).)

Extension Comparisons

Due to UE4’s code complexity and Unity’s closed-source nature, both engines suffer from the disease of needing to wait for the development teams to implement new feature requests, bug fixes, and enhancements.

UE4 supposedly exposes itself for editor extensions with their Slate UI that can be coded in C++, but 1) Slate is incredibly hard to read and interpret and 2) it relies on the C++ code just to extend the editor as opposed to a simple scripting solution.

Unity does supply a strong API for creating editor extensions though. Creating certain types of C# scripts with Attributes above methods and properties can allow one to somewhat easily (with a little bit of learning) develop an understanding of how to create new tools and windows for the Unity engine. The relative simplicity of developing extensions for the editor is a prime reason why the Unity Asset Store is so replete with available options for high quality editor extensions.

As far as I’m aware, GM:S provides some customization options for the UI itself, but doesn’t really provide any means of extending the editor (correct me if I’m wrong). They provide a means of making “extension packages” that let you bundle in scripting functionality and assets, but they don’t give users the ability to modify how the editor works very effectively. It has an extension marketplace similar to other high-profile engines.

Godot has an even easier interface for creating editor extensions than Unity: Adding the ‘tool’ keyword to the top line of a script simply tells the script to run at design time rather than run-time, instantly empowering the developer to understand how to manipulate their scripts for tool development: they need only apply their existing script understanding to the design-time state of the scene hierarchy.


EditorPlugin scripts can also be written to edit the engine UI and create new in-engine types. The greatest boon is that all of the logic and UI of the scripting API is the exact same API that allows them to control the logic and UI of the engine itself, allowing these EditorPlugin scripts to operate using the same knowledge already accumulated during one’s ordinary development. These qualities together make creating tools in Godot unbelievably accessible.

In a completely unexpected, but bewilderingly helpful feature, Godot also helps to simplify the process of team-based / communal extension development: all engine assets can be saved with binary files (the standard option for Unreal and Unity) OR with text-based, VCS-friendly files (.scn and .tscn, respectively). Using the latter kinda makes pull requests and git diffs trivially simple to analyze, so it comes highly recommended.

Another significant difference between Unity and Godot’s extension development is the cultural shift: when looking up something in the Unity Asset Store, you’ll often times find a half-dozen or more plugins for the same feature with different APIs, feature-depth/breadth, and price points.

Godot’s culture on the other hand is one of “free and open source tools, proprietary games”. Plugins on the Godot Asset Library must be published with an open license (most of them use MIT), readily available for community enhancements and bug fixes. There is usually only 1 plugin for any given feature with a common, community-debated implementation that results in a common toolset for ALL developers working with the feature in the engine. This common foundation of developer knowledge and lack of any cost makes integrating and learning Godot plugins a joy.



Given a desire for high accessibility, a strong publishing and community foundation, minimal cost, powerful optimizations, and enhanced extensibility, I believe I’ve made the potential of Godot 3.0’s affect on the game industry quite clear. If offered a chance, it could become a new super-power in the world of top-tier game engines.

This article is the result of my working with the Godot 3.0 pre-alpha for approximately 3 months. I had never investigated it before, but was blown away by the engine when I first started working with it. I simply wished to convey my experience as a C++ programmer and my insight into what the future of Godot might hold. Hopefully you too will be willing to at least give it a try.

Who knows? You might find yourself falling in love all over again. I know I did.

Minecraftian Narrative: Part 7

Table of Contents

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Unlike previous iterations of this series, today we’ll be diving into the field of linguistics a bit more intensely. The reason for my lack of any new posts in a month and a half has been the result of my work on a completely new language which is now approaching an alpha state (at which point, I will theoretically be able to build a full parser for it). Today, I’ll be covering why I decided to invent a language, where it came from, how it is different, and how it all ties into the overall goal of narrative scripting.

Future posts will most certainly reference this language, so if you aren’t interested in the background and just want the TL;DR of the language features and relevance to narrative scripting, then feel free to skip on down to the conclusion where I will review everything.

Without further ado, let’s begin!

Issues With “Toki Sona”

Prior to this post, I had puffed up the possibilities of using a toki pona-derived language, heretofore referred to as toki sona. While I was quite excited about tp’s potential to combine concepts together and support a minimal vocabulary with simple pronunciation and an intuitive second-hand vocabulary (e.g. “water-enclosedConstruction” = bathroom), there were also a variety of issues that forced me to reconsider its adaptation towards narrative scripting.


First and foremost is the ambiguity within the language. Syntactic ambiguity makes it nearly impossible for an algorithm to easily understand what is being stated, and tp has several instances of this lack of clarity. For example, “mi moku moku” could mean “the hungry me eats” or “I hungrily eat” or even some new double-word emphasis that someone is experimenting with: “I really dove into eating [something]” or “I am really hungry”.  With the language unable to clearly distinguish modifiers and verbs from each other, identifying parts of speech and therefore the semantic intent of a word and its relationship to other words is needlessly complicated.

The one remedy I thought of for this would be to add hyphens in between nouns/verbs and their associated modifiers, not only allowing us to instantly disambiguate this syntactic confusion, but also to simplify and accelerate the computer’s parsing with a clear delimeter (a special symbol to separate two ideas). However, this solution is not audibly communicable during speech and therefore retains all of these issues in spoken dialogue, violating our needs. Using an audible delimeter would of course be completely impractical.

The other problem of ambiguity with the language is the intense level of semantic ambiguity present due to the restricted nature of the language’s vocabulary. The previously mentioned “bathroom” (“tomo telo”) could also be a bathhouse, a pool, an outhouse, a shower stall, or any number of other related things. Sometimes, the distinction is minor and unimportant (bathroom vs. outhouse), but other times that distinction may be the exact thing you wish to convey. What happens then if we specify “bathroom of outside”?


One possibility is the use of “ma” meaning “the land, the region, the outdoors, the earth”, but then we don’t know if this is an outdoor bathroom or if it is the only indoor bathroom in the region, or if it is just a giant pit in the earth that people use. The other possibility could be “poka” meaning “nearby” or “around”, but that is even more unclear as it specifies a request purely for an indoor bathroom of a given proximity.

As you can see, communicating specific concepts is not at all tp’s specialty. In fact, it goes against the very philosophy of the language: the culture supported by its creators and speakers is one that stresses the UNimportance of knowing such details.

If you ever happen to speak with a writer, however, they will tell you the importance of words, word choice, and the evocative nature of speech. They can make you feel different emotions and manipulate the thoughts of the reader purely through the style of speech and the nuanced meanings of the terminology they have used. If we are to support this capacity in a narrative scripting language, we cannot be allowed to build its foundation on a philosophy prejudiced against good writing.

Lost and Confused Signpost

The final issue, related to the philosophy, is the grammatical limitations imposed by its syntax.

  1. Inter-sentence conjunctions like English’s FANBOYS (“I was tired, yet he droned on.”) are not entirely absent thankfully: they have an “also” (“kin”) and a “but” (“taso”) that can be used to start the following sentence and relate two ideas. One can even adverbial phrases (the only types of dependent clauses allowed) to assist in relating ideas. However, limitations are still present, and the reason for that is a mix of the philosophy and the (admirable) goal of keeping the vocabulary size compact.
  2. You cannot satisfactorily describe a single noun with adjectives of multiple, tiered details (“I like houses with toilet seats of gold and a green doorway”). This is a problem many have attempted to deal with revolving around the “noun1 pi multi-word modifier” technique that converts a set of words into an adjective describing noun1. Users of tp have debated on ways of combating this. One that I had considered, and which is mildly popular, was re-appropriating the “and” conjunction for nouns (“en”) as a way of connecting pi’s, but because you effectively need open and closed parentheses to accomplish the more complex forms of description, there isn’t really a clean way of handling this.

Through all of the limitations, prejudice of writing, and ambiguity, toki pona, and any language closely related to it, is inevitably going to find itself wanting in viability for narrative scripting. Time to move on.

Lojban: Let’s Speak Logically!

In an attempt to solve the problems of toki pona, a friend recommended to me that I check out the “logical language”, Lojban (that ‘j’ is soft, as in “beige”).

Lojban is unlike any spoken language you have ever learned: it borrows much of its syntax from functional programming languages like Haskell. Every phrase/clause is made up of a single word indicating a set of relationships and the other words are all things that act as “parameters” by plugging in concepts for the relations.


For example, “cukta” is the word for book. If you simply use it on its own, it plugs “book” into the parameter of another word. However, that’s not all it means. In full, “cukta” means…

x1 is a book containing work x2 by author x3 for audience x4 preserved in medium x5

If you were to have tons of cukta’s following each other (with the proper particles separating them), having a full version would mean…

A book is a book about books that is written by a book for books by means of a book.

You can also specifically mark which “x” a given word is supposed to plugin as, without needing to use any of the other x’s, so the word cukta can also function as the word for “topic”, “author”, “audience”, and “medium” (all in relation to books). If that’s not conservation of vocabulary, I don’t know what is.

It should be noted that 5-parameter words in Lojban are far more rare than simpler ones with only 2 or 3 parameters. Still, it’s impressive that, using this technique, Lojban is able to communicate a large number of topics using a compressed vocabulary, and yet remain extremely explicit about the meaning of its words.

Just as important to notice is how Lojban completely does away with the concept of a “noun”, “verb”, “object”, “preposition”, or anything of the sort. Concepts are simply reduced to a basic entity-relation-entity form: entity A has some relationship x? to entity B. This certainly makes things easier for the computer. In addition, while on the one hand one might think this would make things easier to understand when learning (since it is a much simpler system), the fact that it is so vastly different from the norm means that people coming from more traditional languages will have a more difficult time understanding this system, especially given the plurality of relationships that are possible with a single word.


Another strong advantage of Lojban is that it is structured to provide perfect syntactic clarity to a computer program and can be completely parsed by a computer in a single pass. In laymen’s terms, it means that the computer only needs to “read” the text one time to understand with 100% accuracy the “parts of speech” of every word in a sentence. There is no need for it to guess how a word is going to be syntactically interpreted.

In addition, Lojban employs a strict morphological structure on its words to indicate their meaning. For example, each of these “root set” words like “cukta” have one of the two following patterns: CVCCV and CCVCV (C being “consonant” and V being “vowel”). This makes it much easier for the computer to pick out these words in contrast to other words such as particles, foreign words, etc. Every type of word in the language conforms to morphological standards of a similar sort. The end result is that Lojban parsers, i.e. “text readers” are very very fast in comparison to those for other languages.

One more great advantage of Lojban is that it has these terrifically powerful words called “attitudinal indicators” that allow one to communicate a complex emotion using words on a spectrum. For example, “iu” is “love”, but alternative suffixes give you “iucu’i” (“lack of love”, a neutral state) and “iunai” (“hate/fear”, the opposite state). You can even combine these terms to compose new emotions like “iu.iunai” (literally “love-hate”).


For all of these great elements though, Lojban has two aspects that make it abhorrent to use for the simple narrative scripting we are aiming for. It is too large of a language: 1,350 words just for the “core” set that allows you to say reasonable sentences. While this is spectacularly small for a traditional language, in comparison to toki pona’s nicely compact 120, it is unacceptably massive. As game designers, we simply can’t expect people to devote the time needed to learn such a huge language within a reasonable play time.

The other damaging aspect is the sheer complexity of the language’s phonology and morphology. When someone wishes to invent a new word using the root terms, they essentially mash them together end-to-end. While this would be fine alone, switching letters around and having part of the latter consumed by the end of the former is unfortunately very difficult to follow. For example…

skami = “x1 is a computer used for purpose x2”
pilno = “x1 uses/employs x2 [tool, apparatus, machine, agent, acting entity, material] for purpose x3.”
skami pilno => sampli = “computer user”

Because “skami pilno” was a commonly occuring word in Lojban’s usage, a new word with the “root word” morphology can be invented on the fly by combining the letters. Obviously, this appears very difficult to do on the fly and effectively involves people learning an entirely new word for the concept.

All that to say that Lojban brings some spectacularly innovative concepts to the table, but due to its complex nature, fails to inspire any hope for an accessible scripting language for players.

tokawaje: The Spectral Language

We need some way of combining the computer-compatibility of Lojban with the elegance and simplicity of toki pona that omits as much ambiguity as possible, yet also allows the user to communicate as broadly and as specifically as needed using a minimal vocabulary.

Over the past month and a half, I’ve been developing just such a language, and it is called “tokawaje”. An overview of the language’s phonology, morphology, grammar, and vocabulary, along with some English and toki pona translations, can be found on my commentable Google Sheets page here (concepts on the Dictionary tab can be searched for with “abc:” where “abc” is the 3-letter root of the word). With grammar and morphology concepts derived from both Lojban and toki pona, and with a minimal vocabulary sized at 150 words, it approximates a toki pona-like simplicity with the potential depth of Lojban. While it is still in its early form, allow me to walk through the elements of tokawaje that capture the strengths of the other two despite avoiding their pitfalls.

Lojban has three advantages that improve its computer accessibility:

  1. The entity-relation-entity syntax for simpler parsing and syntactic analysis.
  2. Morphological and grammatical constraints: the word and grammar structure is directly linked to its meaning.
  3. The flexibility of meaning for every individual learned word: “cukta” means up to 5 different things.

toki pona has two advantages that improve its human accessibility:

  1. Words that are not present in the language can be estimated by combining existing words together and using composition to construct new words. This makes words much more intuitive.
  2. It is extremely easy to pronounce words due to its mouth-friendly word construction (every consonant must be followed by a single vowel).


“tokawaje” accomplishes this by…

  1. Using a similar, albeit heavily modified entity-relation-entity syntax.
  2. Having its own set of morphological constraints to indicate syntax.
  3. Using words that represent several things that are associated with one another on a spectrum.
  4. Relying on toki pona-like combinatoric techniques to compose new words as needed.
  5. Using a phonology and morphology focused on simple sound combinations that are easily pronounced. Must match the pattern: VCV(CV)*(CVCV)*.

Now, once more, but with much more detail:

1) Entity-Relation-Entity Syntax

Sentences are broken up into simple 1-to-1 relations that are established in a context. These contexts contain words that each require a grammar prefix to indicate their role in that context. After the prefix, each word then has some combination of concepts to make a full word. Concepts are each composed of some particletag, or root, (some spectrum of topic/s) followed by a precision marker that indicates the exact meaning on that spectrum.

The existing roles are… (pronounced like in Spanish):

  1. prefix ‘u’: a left-hand-side entity (lhs) similar to a subject.
  2. prefix ‘a’: a relation similar to a verb or preposition.
  3. prefix ‘e’: a right-hand-side entity (rhs) similar to an object.
  4. prefix ‘i’: a modifier for another word, similar to an adjective or adverb.
  5. prefix ‘o’: a vocative marker, i.e. an interjection meant to direct attention.

Sentences are composed of contexts. For example, “I am real to you,” is technically two contexts. One asserts that “I am real” while the other asserts that “my being real” is in “your” perspective. This nested-context syntax is at the heart of tokawaje.

These contexts are connected with each other using context particles:

  1. ‘xa’ (pronounced “cha”) meaning opening a new context (every sentence silently starts with one of these).
  2. ‘xo’ meaning close the current context.
  3. ‘xi’ meaning close all open contexts back to the original layer.

(These also must each be prefixed with a corresponding grammar prefix)

Examples of Concept Composition:

  1. ‘xa’ = an incomplete word composed of only a particle+precision.
  2. “uxa” = a full word with a concept composed of a grammar prefix and a particle+precision.
  3. “min” = root for pronouns, “mina” = “self”, full “umina” = “I”.
  4. “vel” = root for “veracity”, “vela”= “truth”, full “avela” = “is/are”.
  5. “sap” = root for object-aspects, “sapi” = “perspective”, full “asapi” = “from X’s perspective”.

Sample Breakdown:

“I am real to you” => “umina avela evela uxo asapi emino.”

  1. “umina” {u: subject, min/a: “pronoun=self”}
  2. “avela” {a: relation, vel/a: “veracity=true”}
  3. “evela” {e: object, vel/a: “veracity=true”}
  4. “uxo” {u: subject, xo: “context close”} // indicating the previous content was all a left-hand-side entity for an external context.
  5. “asapi” {a: relation, sap/i: “aspect=perspective”}
  6. “emino” {e: object, min/o: pronoun=you}


It’s no coincidence that the natural grammatical breakdown of a sentence looks very much like JSON data (web API anyone?). In reality, it would be closer to…

{ prefix: ‘u’, concepts: [ [“min”,”a”] ] }

…since the meanings would be stored locally between client and server devices.

This is DIFFERENT from Lojban in the sense that no single concept will encompass a variety of relations to other words, but it is SIMILAR in that the concept of a “subject”/”verb”/”object” structure isn’t technically there in reality. For example:

“umina anisa evelo” => “I -inside-> lie” => “I am inside a lie.”

In this case, “am inside” isn’t even a verb, but purely a relation simulating an English prepositional phrase where no “is” verb is technically present.

These contexts can be used without a complete context to create gerunds, adjective phrases, etc. For example, to create a gerund left-hand-side entity of “existing”, I might say

“avela uxo avela evelo.” => “Existing is (itself) a falsehood.”


You might ask, “how do we tell the difference with something like [uxavela]? Might it be {u: object, xav/e: something, la: something}? Actually, no. The reason the computer can immediately understand the proper interpretation is because of tokawaje’s second Lojban incorporation:

2) Strict Morphological Constraints for Syntactic Roles

Consonants are split up into two groups: those reserved for particles, such as ‘x’ and those reserved for roots, such as ‘v’. The computer will always know the underlying structure of a word’s morphology and consequent syntax. Therefore, given the word “uxavela” we will know with 100% certainty that the division is u (has the V-form common to all prefixes), xa (CV-form of all particles), and vela (CVCV-form of all roots).

Particles can be split up into two categories based on their usual placement in a word.

  1. Those that are usually the first concept in a word.
    1. ‘x’ = relating to contexts (as you have already seen previously)
      1. ‘xa’ = open
      2. ‘xo’ = close
      3. ‘xi’ = cascading close
      4. ‘xe’ = a literal grammar context (to talk about tokawaje IN tokawaje)
    2. ‘f’ = relating to irrelevant and/or non-tokawaje content
      1. ‘fa’ = name/foreign word with non-tokawaje morphology constraints
      2. ‘fo’ = name/foreign word with tokawaje morphology constraints
      3. ‘fe’ = filler word for something irrelevant
  2. Those that are usually AFTER a concept as a suffix (could be mid-word).
    1. ‘z’ = concept manipulation
      1. ‘za’ (zah) = shift meaning more towards the ‘a’ end of the spectrum
      2. ‘zo’ (zoh) = shift meaning more towards the ‘o’ end of the spectrum
      3. ‘zi’ (zee) = the source THING that assumes the left-hand-side of this relation.
        1. Ex. “uvelazi” => that which is something
        2. Shorthand for “ufe avela uxo”.
      4. ‘ze’ (zeh) = the object THING that assumes the right-hand-side of this relation.
        1. Ex. “uvelaze” => that which something is.
        2. Shorthand for “avela efe uxo”.
      5. ‘zu’ (as in “food”) = questioning suffix
      6. ‘zy’ (zai) = commanding suffix
      7. ‘zq’ (zow) = requesting suffix
    2. ‘c’ = tensing, pronounced “sh”
      1. ‘ca’ = future tense
      2. ‘ci’ = progressive tense
      3. ‘co’ = past tense
    3. ‘b’ = logical manipulation
      1. ‘be’ = not
      2. ‘ba’ = and
      3. ‘bi’ = to (actually, it is the “piping” functionality in programming, if you know about that)
      4. ‘bo’ = or
      5. ‘bu’ = xor

All other consonants in the language fall into the “root word” set. With these clear divisions, tokawaje will always know what role a concept has in manipulating the meaning of that word.

I’d also like to point out that informal, conversational uses of these two groups of particles may completely remove the distinction between them. For example, someone may simply say:

“uzq” => “Please.”

This would not actually impact the computer’s capacity to distinguish terms though. I even plan to make my own parser assume that lack of a grammar prefix implies an intended ‘u’ prefix (not that that’s encouraged)

3) Concepts in Tokawaje Exist on Spectra

Most every word in the language has exactly 4 meanings, with 3 non-root concepts using more than that: the grammar prefixes and ‘z’-based word manipulators (as you’ve already seen), and general expressive noises / sound effects which are vowel-only. This technique allows for vocabulary that is flexible, yet intuitive, despite its initial appearance of complexity.

4) Sounds and Structure are Designed for Clear, Flowing Speech

Every concept is restricted to a form that facilitates clear pronunciation and a consistent rhythm. Together, these elements ensure that the language is simple to learn phonetically.

Concepts have the form C (particles/tags) or CVC (roots) along with a vowel grammar prefix and a vowel precision suffix, resulting in a minimum word of VCV or VCVCV.

The rhythm to concepts emphasizes the middle CV: u-MI-na, a-VE-la, etc. Even with suffixes applied to words, this pattern never becomes unmanageable. The result is a nice, flowy-feeling language:

  1. uvelominacoze / avelominaco (“velomina” => a personal falsehood)
    1. u-VE-lo-MI-na-CO-ze (that which one lied to oneself about)
    2. a-VE-lo-MI-na-co (to lie to oneself in the past)


5) Tokawaje Employs Tiered Combinatorics to Invent New Concepts

The first concept always communicates the root “what” of a thing while the subsequent concepts add further description of the thing. This structure emulates toki pona’s noun-combining mechanics.

‘u’, ‘a’, and other non-‘i’ terms are primary descriptors and more closely adhere to WHAT a thing is. ‘i’ terms are secondary descriptors and approximate the additional properties of a thing BEYOND simply WHAT it is. Fundamentally, every concept follows these simple rules:

  1. Non-‘i’ words are more relevant to describing their role’s reality than ‘i’ words.
  2. However, individual words are described more strongly by their subsequent ‘i’ words than they are by other terms.
  3. Multiple non-‘i’ words will further describe that non-‘i’ term such that later non-‘i’ words act as descriptors for the next-left non-‘i’ word and its associated ‘i’ words.

Let’s say I have the following sentence (I’ll be using the filler particle “fe” with an artificially inserted number to reference more easily. Think of each of these as a root+precision CVCV form):

“ufe1fe2 ife3fe4 ife5 ufe6fe7 ife8 avela uxofe9 ife10 afe11”

This can be broken down in the following way:

  1. Any pairing of adjacent fe’s form a compound word in which the second fe is an adjective for the previous fe, but the two of them together form a single concept. For example, “ife3fe4”: fe4 is modifying fe3, but the two together form an adjective modifying the noun “ufe1fe2”.
  2. The subject is primarily described by “ufe1fe2” and secondarily by “ufe6fe7” since they are both prefixed with ‘u’, but one comes later. “ufe6fe7” is technically modifying “ufe1fe2”, even if “ufe1fe2” is also being more directly modified by the ‘i’-terms following it.
  3. Each of those ‘u’ terms are additionally modified by their adjacent ‘i’ term adjective modfiers.
  4. “ife5” is an adverb modifying “ife3fe4”.
  5. The “existence of ” the “ufe1-8” entity is the u-term of the “afe11” relation.
  6. The entirety of that u-term has a primary adjective descriptor of “-fe9” and a secondary adjective descriptor of “ife10”.


Suppose the word for “dog” were “uhumosoviloja” (u/lhs,humo/beast,sovi/land,loja/loyal = “loyal land-beast”). How might you describe a disloyal dog then? You would use an ‘u’ for stating it is a dog (that identifying aspect) and an ‘i’ for the disloyalty (the added on description). The spectrum of loyalty (“loj”) would therefore show up twice.

“uhumosoviloja ilojo” => “disloyal dog”

For clarity purposes, you may even split up the “loja”, but that wouldn’t impact the meaning since “uloja” still has a higher priority than “ilojo”.

“uhumosovi uloja ilojo” => “disloyal dog” (equivalent)

Let’s say there were actual distinctions between words though. How about we take the noun phrase “big wood box room”? Here’s the necessary vocabulary:

“sysa” => “big/large/to be big/amount”
“lijavena” => “rigid thing of plants” => “wood/wooden/to be wood”
“tema” => “of or relating to cubes”
“kita” => of or relating to rooms and/or enclosed spaces”

Now let’s see some adaptations:

  1. ukitasysa => an “atrium”, a “gym”, some space that, by definition, is large.
  2. ukita usysa => same thing.
  3. ukita isysa => a room that happens to be relatively big.
  4. ukita utema ulijavena usysa => cube room of large-wood.
  5. ukita utema ulijavena isysa => cube room of large-wood.
  6. ukita utema ilijavena usysa => room of wooden large-boxes.
  7. ukita itema ulijavena usysa => a cube-shaped room of large-wood.
  8. ikita utema ulijavena usysa => [something] related to rooms that is cube-shaped, wooden, and large.
  9. ukita utema ilijavena isysa => a room of large-wood cubes.
  10. ukita itema ilijavena usysa => the naturally large room associated with wooden cubes.
  11. ikita itema ulijavena usysa =>[something] related to cube-shaped rooms that is a large-plant.
  12. ukita itema ilijavena isysa => the room of large-wood boxes.
  13. ikita itema ilijavena usysa => [something] related to plant-box rooms that is an amount. (an inventory of greenhouses or something?)
  14. ikita itema ilijavena isysa => [something] related to rooms of large-wood boxes.
  15. ukita usysa utema ilijavena => large room of wooden boxes.
  16. ukita utema usysa ilijavena => room of wood-amount boxes.
  17. ukita utemasysa ilijavena => room of wooden big-boxes.
  18. ukita usysa iba ulijavena itema => A big-room and a cube-related plant.


Some of these are a little crazy and some of them are amazingly precise. The point is, we are achieving this level of precision using a vocabulary closer to the scope of toki pona. I can guarantee you that you would never have been able to say any of this in a language as vague as TP nor will it ever try to approximate this level of clarity. I can likewise guarantee that Lojban will never have a minified version of itself available for video games. Good thing we don’t need we have an alternative.


As you can see, tokawaje combines the breadth, depth and computer-accessibility of Lojban with the simplicity, intuitiveness, and human-accessibility of toki pona.

For those of you wanting the TL;DR:

The invented language, tokawaje, is a spectrum-based language. Clarity of pronunciation, compactness of vocabulary (150 words), and combinatoric techniques to invent concepts all lend the language to great accessibility for new users of the language. On the other hand, a sophisticated morphology and grammar with clear constraints on word formations, sentence structure, and their associated syntax and semantics result in a language that is well-primed for speedy parsing in software applications.

More information on the language can be found on my commentable Google Sheets page here (concepts on the Dictionary tab can be searched for with “abc:” where “abc” is the 3-letter root of the word).

This is definitely the longest article I’ve written thus far, but it properly illuminates the faults with pre-existing languages and addresses the potential tokawaje has to change things for the better. Please also note that tokawaje is still in an early alpha stage and some of its details are liable to change at this time.

If you have any comments or suggestions, please let me know in the comments below or, if you have specific thoughts that come up while perusing the Google Sheet, feel free to comment on it directly.

Next time, I’ll likely be diving into the topic of writing a parser. Hope you’ve enjoyed it.


Next Article: Coming Soon!
Previous Article: Relationship and Perception Modeling

Minecraftian Narrative: Part 6

Table of Contents

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Previously, we identified two narrative AIs: the StoryMind that manages story development and content generation behind the scenes, and the Agent that simulates the behaviors of a character. The Agent consults with a Character while interpreting narrative scripting input. It then relays instructions to the Vessel that executes those instructions in the virtual world on behalf of the Character. Today, we’ll explore how an Agent could model socio-cultural constructs, account for multiple layers of interactive perceptions, and integrate narrative scripting into each of these.

Vessel = gameplay logic, Character = personnel record, Agent = interpretation AI logic

Amorphic Relationship Abstractions

In games, programmers will often construct objects in the game world based on a flexible design called the “Component” design pattern. This technique builds game objects less by focusing on a hierarchy (a Dragon is a Beast is a Physical is a Renderable is an Object), and more by attributing generic qualities to them which can be added or removed as needed. The objects then simply function as amorphous containers for these “components” of behavior. You don’t have a “dragon”, you have a “fire-breathing”, “flying”, “intelligent” “serpentine” and “animalistic” object that “occasionally attacks cities” which we simply label as a dragon. A player could then talk with the dragon and convince it to become more peaceful mid-game. The “Component” system is what allows us to dynamically change the dragon Character by simply removing the city-attacking behavior.


This same model would seem to be extremely effective at describing our relationships in life. Relationships are amorphous and are often interpreted by context: which behaviors actually exist between two entities, and which behaviors are expected. Let’s say you are trying to understand whether you are in a “friendship” relationship with someone. If the other person isn’t doing what you are expecting a friend to do, then the likelihood that your unknown, actual relationship is the suspected one decreases. On the flip side, if you expect your friends to quack like a duck, and this random person does quack like a duck at you, then you have found someone who is likely to become your friend (though, you and she would be a bit weird). Critical to this is how each Character may have its own definition of what behaviors constitute “friendship.”

In addition, when one evaluates the satisfaction of a relationship, one typically focuses on the behaviors one wishes to engage in with others. However, the person doesn’t then start engaging in new behaviors immediately in the context of their old relationship; they first prioritize changing the relationship itself, so as to make the sought-after behavior more acceptable to the other party.


For example, if a boy likes a girl, he shouldn’t (necessarily) immediately go to her home and declare his love, but perhaps get her to first have a “familiar”, then “associate”, and then “friend” relationship first (though the way a relational path from relationship A to B is calculated for any given Character would be a function of the Character’s personality).

The implication is that these kinds of procedurally generated relational pathways can lead to characters that naturally develop a variety of human-like behaviors as they decide on a goal, calculate a possible social path towards that goal, and then further break down ways in which to change the situation they are in to meet their goals. This is related to the concept of hope. When you hope for someone to engage in a given behavior, then you are really stating that you will be more satisfied if you are in a relationship where those kinds of behaviors are expected (and where the person actually does those behaviors, indicating that they actually are in that relationship with you).


For example, a “daughter” entity D and a “father” entity F exist in the following way:

  • D & F both have the same expectations of F such that they both agree F is a “biological father” with D (he is responsible for impregnating her mother).
  • D & F both have the same expectations of each other such that they both agree F is a “guardian” with D (he houses/feeds/protects her, & pays for her healthcare/schooling, etc.).
  • D & F have different expectations of each other such that F believes he has a “fatherhood” relationship with D, but D does not believe this. F is always working, and D wishes he would play with her more often and come to her public achievements in school. As such, D is not satisfied since her conception of the “fatherhood” relationship is not the same.
  • Because D and F can both have different variations of the same relationship expectations, an AI will be able to support a system in which D and F may talk to each other “about” the same topic, but be thinking of totally different things (simulating the naturally confusing elements of the human condition). This is because the label for the relationship is equivalent, but the definition of each person’s idea of the relationship consists of different behaviors.

Tiered Relationship Expectations

Furthermore, the variations in relationship expectations may diverge at the individual level or group level. We may be able to assume that the vast majority of people within a given “group” have similar beliefs regarding one topic or another. However, we must also consolidate a hierarchy of relational priorities: for any given person, individual expectations override any group-level ones, and different groups will have various degrees to which they influence the individual’s social expectations of others. Let’s take The Legend of Korra as an example.


In this world, some people, called “benders,” can control a certain element (fire, earth, water, or air). Previously, the differences between types of benders and the cultures they came from led to conflict in the world. In “Republic City” however, those people can now live in peace with one another. This represents a “national” group with cultural expectations of uniting people despite their different cultures.


But for those who do not have powers at all, they are subject to the prejudice and general economic superiority of the “benders”. From there, spawns the political activist and terrorism group: the Equalists. This group adds another layer of people on top of the “national” group layer. An Equalist who still believes in the capacity for people to unite is simply someone who is not as loyal to the Equalist cause. Whether the person still has this hope for a positive relationship between the Republic City entity and themselves is something that others may notice and consider when evaluating the actions of this person. The Equalist leader will see this person as someone who must be further manipulated to the cause whereas peacekeepers will seek to redirect this person’s efforts of reform wrought from their emotions.

Finally, we have the individual level, which supersedes all group-level social expectations. Say one of these questionably-loyal Equalists also has another expectation that relates to equality: they believe a city should always be concerned about the well-being of the diversity of animals in the Avatar world. Animal-care efforts by the city therefore play a larger role in currying favor in this particular person, even if their Equalist position still puts them in a climate of distaste for the city. This person, like many who might join that organization, likely have a variety of internal conflicts that they are managing, expectations and needs that are battling for dominance of the mind. This is how our Agent’s should take into account decision-making: through a diverse conflict of interests.


Relationship Modeling

So, how to actually take this relational concept and model it in a way the computer can understand? Well, let’s first define our terms:

  • Narrative Entity: A basic “thing” in the narrative that has narrative relevance. Can be a form of Life, a non-living Object, a Place, or an abstract idea or piece of Lore. This is “what” the thing is and implies various sorts of properties. It also places default limits on things (for example, a Lore cannot be interacted with physically).
  • Character: A Narrative Entity that has a ‘will’, i.e. can desire for itself a behavior. This is “who” the thing is (ergo, it has a personality) and implies what sorts of behaviors it would naturally engage in.
  • Role: The name a Narrative Entity assumes under the context of its behaviors in a Relationship.
  • Relationship: The set of behaviors that have occurred between two Narrative Entities. They will always be binary links between NEs and may or may not be bidirectional, i.e. an Entity may not even have to do anything to be in a Relationship. It may not even be aware that it is in a Relationship with another entity.

A behavior as we will see it is defined by some action or state change. For actions, there is a source for this behavior and an object. As such, we can graphically portray transitive behaviors as directed lines running from one Narrative Entity to another. For intransitive verbs, we simply have a directed line pointing to a Null Entity that represents nothingness.

Rather than simply have these be straight lines however, it is easier to think of them as longitudinal lines between points on a globe.

Renderings may not necessarily place them at equidistant positions. This is merely the simplest rendering.
At each pole is a Narrative Entity and the entirety of the globe encompasses the actual relationship between the two. We can then define a set of “ideal” relationships that have their own globes of interactions. By checking the degree to which the ideal is a subset of the actual, we can calculate the likelihood that the actual includes the idealized relationship. This is an example of set logic in mathematics and its applications in identifying and relating relationships.

I further propose that these globular relationships have a sequence of layers: a core globe summarizing the history of behaviors that have occurred between the two entities, and an intermediate layer composed of hoped-for behaviors for any given Character.

The intermediate layer is far more complex since it is both hypothetical and subjective between any 2 Characters (visualized as two clearly-divided hemispheres) or a Character and a Narrative Entity (a globe).  The intermediate (hemi)sphere(s) would be calculated from an algorithm that takes into account the historical core of the relationship and the associated Character’s personality. Given Character goals X and past interactions Y, what type of relationship, i.e. what collection of behaviors does the Character wish to have with the target of the relationship?

Picture each division of this orange as the source hemisphere of two respective Characters: clearly divided, yet maintaining the same directed-lines-as-globe structure.

Perception Modeling

Furthermore, we must ensure that we can simulate the accumulation of knowledge and the questionable nature of it: How are we to model perceptions of knowledge, e.g. “I suspect that you ate my cookies.”

In this scenario, Person 1 is fairly certain Person 2 stole their cookies, but Person 2 has not yet even realized that Person 1’s cookies are missing. Person 2 also does not know how Person 1 obtained the cookies.
For this, we must allow even behaviors themselves to be abstracted into Narrative Entities that can be known, suspected, or unknown. Without this recursiveness, without the ability to form interactions between Characters and knowledge of interactions, you cannot replicate more complicated scenarios such as…

  • A actually knows a secret S1.
  • B hopes to know S2.
  • A suspects B wants to know S1 and therefore attempts to hide their knowledge of S1 from B.
  • B has reason to believe that A knows S2, so B pays more attention to A, but tries to avoid revealing this suspicion to A.
  • A has noticed B’s abnormal attention directed at him/her, so A surreptitiously engages in a behavior X to help hide the “way” of learning about S1.
  • C witnesses X and tells B about it, so B is now more confident that A knows about S2.
  • (We don’t even necessarily know if S1 and S2 are the same secret).
  • etc.

With this quick example, you can see how perceptions need to be able to have various degrees of confidence in behaviors (actions and state changes) to help inform the mentalities of Agents.

Narrative Scripting Integration

As far as codifying these Entities and Behaviors goes, that is where the narrative scripting comes in to play. Every Entity, every Behavior, and therefore every Relationship is described solely in terms of narrative scripting statements. This is to prevent situations where the technology must do an intermediate translation into another language during interpretation of scripted content into logical meaning.

So, for example, Cookies might have the following abstract description:

Cookies are…

  1. a bread-based production.
  2. have a flavor: (usually) sweet.
  3. have a shape: (usually) small, circular, and nearly flat.
  4. have a source material: bread-based semi-solid
  5. have a creation method: (usually) heated in a box-heat-outside (i.e. “oven”, distinct from the box-heat-inside, i.e. “microwave”).

These properties are defined in an order of priority such that if something were to refer to an entity that is a bread-based treat that is small and circular, the computer would have a higher percentage confidence in evaluating that statement as the entity that shares the other qualities “sweet”, “made in an oven”, “nearly flat”, etc. vs another entity described as having a different shape or a different taste.


With innumeral globes of interactions and perception lines linking everything together, a fully-rendered model might look something like this:

Some of you may recognize this from the article on “Modeling Human Behavior and Awareness”
This concept has really grown out of a pre-existing theory on how to model these same kinds of behaviors that I developed. No doubt it will receive revisions as an actual implementation is underway, but before we get to that, we’ll have to dive once more into the field of linguistics.

The great break in content between articles here is because I’ve been hard at work on developing my own constructed language that is quite distinct from Toki Pona/Sona. To hear about the reasons why, and what form this new language will take, please look forward to the next article.

As always, comments and criticisms are welcome in the comments below. Cheers!

Next Article: Evolution of Toki Sona to “tokawaje”
Previous Article: Dramatica and Narrative AI

Minecraftian Narrative: Part 5

Table of Contents

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Today’s the day! We’ve gotten an idea of what form Toki Sona-based narrative scripting will take, and we’ve examined some of the concerns regarding its integration and maintenance with code. Now we’re finally going to dive into my favorite part: theorizing the behavior of classes that would actually use Toki Sona and react.

The most brilliant illustrations of media, in my opinion, are those which exhibit the Grand Argument Story. These stories have an overarching narrative with a particular argument embedded within, advanced throughout the experience by the main character and those he or she meets as they personify competing, adjacent, or parallel ways of thinking.

But how are we to teach a computer the narrative and character relationships as they appear to us? Thankfully, a well-fleshed out narrative framework already exists to help us as we figure it out. Its name is Dramatica, and from it, we shall design the computer types responsible for managing a dynamic narrative: the Character, Agent, and StoryMind.

Brief Dramatica Overview

The Dramatica Theory of Story is a framework for identifying the functional components of a narrative. In its 350-page introductory book which is available for free on their website (the advanced book can be found over here too), it defines a set of story concepts that must exist within a Grand Argument Story in order for it to be fully fleshed out. If anything is missing, then the story will be lacking. To be honest, the level of detail it gets into is rather jaw-dropping as a writer. Its creators even had to create a software application just to help writers manage the information from the framework! How detailed is it? Check this out…

Dramatica defines four categories of Character, Plot, Theme, and Genre.

It also defines 4 “Throughlines” which are perspectives on the Themes.

  • Overall Story (OS) = The story summarized as everyone experiences it. A dispassionate, objective view.
  • Main Character Story (MC) = The story as the main character experiences it. The character we relate to, experiencing inside-out.
  • Influence Character Story (IC) = The story as the influential character experiences it. The character we sympathize/empathize with, experiencing from the outside-in.
  • Relationship Story (SS) = The story viewed as the interactions between the MC & IC. An extremely passionate, objective view.

Within Theme, there are 4 “Classes” that have several subdivisions within them.

  • Universe: External/State => A Situation
  • Physics: External/Activity => An Activity
  • Psychology: Internal/Activity => A Manner of Thinking
  • Mind: Internal/State => A State of Mind

One Throughline is matched to each of the Classes so that, for example, the MC is mainly concerned about dealing with a state of mind, the IC is trying to avoid a situation related to his/her past, the community at large is freaking out about the ongoing activity of preparing for and running a local tournament, and there is an ongoing difference in methodologies between the MC and IC that draws tension between them.

Each Class can be broken down into 64 elements. Highlighted: Universe.Future.Choice.Temptation Element.

For each Class, you select 1 Variation of a Concern per story. The 4 Plot Acts (traditionally exposition, rising action, falling action, and denouement) each then shift between the 4-Element quad within the chosen Variation. Since Variations each have a diagonal opposite, diagonal movements (a “slide”) don’t change the topic Variation as intensely as shifting Variations horizontally or vertically (a “bump”).

This slideshow requires JavaScript.

For example the Universe.Future.Choice variation has the two opposing Elements, “Temptation” and “Self-Control” plus the other two “Logic” and “Feeling”. Notice these are two distinct, albeit related spectra of the human experience that come into play when making decisions about the future regarding an external situation that must be dealt with. Shifting topics from Temptation to Self-Control wouldn’t be as big of a change as going to Logic or Feeling since the former deals with the same conflicting pair of Elements.

Each of those Elements can be organized with the Acts in a number of permutations. 3 patterns arise, each of which have 4 orientations and can be run forwards or backwards (2). That gives 24 possible permutations for each Variation. 16 Variations per class, 4 Classes per story, and then times 4 again since each of the 4 Throughlines can be paired with a Class. Altogether, that comes out to 6,144 possible Plot-Theme permutations.

The Theme Classes are also matched up with Genre categories which can help the engine identify what sort of content to create at a given point of the story (doesn’t increase multiplicity).

The merging of Plot with Genre

On top of that, there are the Characters to consider. There are 8 general Archetypes, each of them composed by combining a Decision Characteristic and an Action Characteristic  for each of 4 aspects of character: their reason for toiling, the way they do things, how they determine success, and what they are ultimately trying to do.


You can make any character by combining 2 Characteristics from 2 unopposed Archetypes. So, (7!) permutations of any given characteristic within an aspect (not matching up with an opposite for each of them). 5,040 * 4 aspects * 2 characteristics = 40,320 permutations of Characters, optimally.

Finally, there’s the number of Themes that can be delivered by the external/internal successes and failures of the MC…

4 Possibilities

…and whether the MC and IC remained steadfast in their Class or changed (e.g. did they stay/change their state of mind?) and the success/failure thereof.

This slideshow requires JavaScript.

That makes 4 * 4 possible endings: 16.

PHEW! Okay, now, altogether that’s 16 endings * 40,320 characters * 6,144 plots…

Carry the 3…there we go:

3.96 billion. Stories.

And that’s without even “skinning” them as pirate, sci-fi, fantasy, take your pick.

Needless to say, these kinds of possibilities are EXACTLY the sort of variation we should be looking for in procedural narrative generation. Even if you knocked out the Informational genre in the interest of counting only the  non-edutainment games, that still leaves about 2.97 billion possibilities. Good odds, I say.

Also, keep in mind, any given video game will often times have several sub-stories within the overarching story, ones where minor characters have their own stories to explore and see themselves as the Main Character and Protagonist of their own conflict. In these stories, you, the original main character, may play the role of Influence Character (think Mass Effect 2 loyalty missions if you’ve ever played that: every character’s unique storyline is critically affected by the decisions you make while accompanying them for a personal, yet vital journey). Assuming any given story has, say, 9 essential characters (pretty small number by procedural generation standards, but pretty normal for children’s books), that would imply any single gameplay experience may involve 26.73 billion story arrangements.

It isn’t just Dramatica’s variability that makes it so appealing though. Each of these details are designed to be clearly identified and catalogued. This has two important consequences. The first is that the engine will know what goes into into making a good story and will therefore know how to create a good story structure from scratch. The second, and far more important to us, is that the engine will know when and how any of these qualities are not present or properly aligned. It will therefore understand what has happened to the story when the player changes things and how to fix them. Even better, because of its understanding of related story structures, it will even be able to adapt with completely new story forms should it wish to.

Head hurting yet? Fantastic! Let’s dig into characters as computer entities.

Characters & Agents

While Dramatica gives us the functional role of Characters, it doesn’t really flesh them out properly. Unfortunately, writers don’t really maintain a consolidated list of brainstorming material, but you can find several odds and ends around the Internet (list of character needs, list of unique qualities for realistic characters, and a character background sheet, for example). Any and all of these can be used to help flesh out and define the particular aspects of our Characters, beyond just their functional role.

The main interest we have with these brainstorming materials is to define a set of fields that an AI can connect Toki Sona inputs to. Given some Toki Sona instruction A, a definition of Character B, and a certain Context C, what course of action D should I take? Answering this question is the job of the Agent.

What exactly does an Agent entail? They would be the singular existence that represents the computer logic for the entirety of an assigned Character. In our case, we’re going to define a Character as ANY Narrative Entity that has (or could resume having) a will of its own. A Narrative Entity would simply be anything that requires a history of interactions with it to be recorded such as a Life, an Object, a Place, or a piece of Lore.


Notice that characters don’t have to be living beings specifically. For example, an enchanted swamp may have an intelligence living amongst the trees. It would most certainly be a Character; however, it would also definitely be a Place that people can enter, exit, and reside in. As a swamp entity would be the embodiment of both the land, the plants, and the animals within, one could also extend its attributes to Life as well. As a result, we’d have the swamp Agent that accesses the Character which in turn maintains properties of both the Life and Place for the swamp Narrative Entity.

Sample low-effort UML Class Diagram for the Agent Subsystem (made with UMLet)

In the diagram above, we specify that a single Agent is responsible for something called a Vessel rather than for a Character directly. What’s more, the Vessel can “wear” several Characters! What is the meaning of this?

Let’s say we wished to create a Jekyll & Hyde story. Although Jekyll and Hyde have different personalities, they also share a body. Whatever one is doing, wherever one is, the other will also be doing the moment they switch identities. This relates back to assets too. Whatever one sprite/model animation will be doing, the other will also be doing when those assets are switched to another set. In this way, Characters and Vessels are fully changeable without affecting the other. A multiple personality character might change Characters while not changing Vessels. A shapeshifting character might change Vessels while not changing Characters. In the case of Jekyll and Hyde, it would be a swap for both Character and Vessel as their personalities and bodies are BOTH different, but it will always be tied to the same location and activity at the time of switching.


So, the Agent is just an AI that doesn’t care what it’s controlling or to what ends. It looks to the Character to figure out what it narratively should and can do, and it issues instructions based on that to the Vessel. It doesn’t care whether the Vessel knows how to do it. It simply assumes the Vessel will know what the instructions mean. In the process, we’ve divorced the concept of a Character 1) from the in-story and in-engine thing that they are embodied as and 2) from the logic that figures out what a given Character should do given a set of Toki Sona inputs from the interpreter.

The last important thing to note about the Characters and Agents here is that the Agents are informed, context-wise, by their associated Characters. As such, an Agent’s decisions are constrained by their Vessel’s current Character; only its acquired knowledge, background of experience and skills, and personality will invoke behavior. An Agent will therefore factor into its decision-making the Character’s history of perceptions, likes and dislikes, attitude, goals, and everything else that constitutes the Character. It then translates incoming Toki Sona instructions into gameplay behavior. For example, what might a Character do if asked, “What do you know about the aliens?”


Maybe they don’t know much about the aliens. Or maybe they do, but it’s in their best interest to only reveal X information and not Y. But maybe they also really suck at lying, so you can see through it anyway. How will they know exactly what to say? How will they say it? Does the personality invite a curt, direct response, or do they swathe the invading aliens with adoration and delight in a giddy, I’m-too-obsessed-with-science kinda way?

The StoryMind

Finally, we address the overall story controls: the StoryMind. In Dramatica, the StoryMind is the fully formed mental argument and thought-process that the story communicates. In our context, the StoryMind is the computer type responsible for delivering the Dramatica framework’s StoryMind. It understands the possible story structures and makes decisions regarding whether the story can reasonably deliver the same themes with the existing Characters and Plot or whether it will need to adjust.

The StoryMind will have full and total control over that which has yet to be exposed to a human player within the story. It’s job is to generate and edit content to deliver a Grand Argument Story of some kind to the player. What might this look like?

Story time:


Typical Fantasy RPG world/game. You’re a strength-and-dexterity-focused mercenary and you’ve developed a bit of a reputation for taking on groups of enemies solo and winning with vicious direct onslaughts. You’re walking through town and come across a flyer about a duke’s kidnapped heir (one of a few pre-generated premises made by the StoryMind). You ask a barkeep about it (and it alone), so the StoryMind begins to suspect that you may be interested in pursuing this storyline further (rather than whatever other premises it had prepared for you). It therefore begins to develop more content for this premise, inferring that it will need that story information soon. In fact, it takes the initiative.


You are blocked in the road by a woman named Steph who overheard you outside the bar and wishes to accompany you on your journey to rescue the heir. She says that she’s a sorceress with some dodgy business concerning the duke and she needs a bargaining chip. Let’s say you respond with “Sure. I only want the duke’s money,” (in Toki Sona of course). All of a sudden, the StoryMind knows a couple of things:

  1. You care more about the reward money than pleasing the duke.
  2. Because you have already invited risk into your relationship with the yet-to-be-met, quest-giving duke, you are even more likely to behave negatively towards this particular duke and his associates in the future. You also might have a natural bias against those of a higher social status (something it will test later perhaps).
  3. You have some level of trust towards Steph, though it’s not defined.
  4. You are not a definitive loner. You accepted a partnership, despite your past as a solo mercenary. But how deep does this willingness extend? It’s possible it might be worth testing this too.

Since you may have related goals, the StoryMind sets her up as the Influence Character. It randomly decides to attempt a “friendly rivals / romance?” relationship (partnership of convenience), modifying her Character properties behind the scenes so that she is similar to you (based on your actions and speech).


Along the way, a group of goblins ambush and surround you both, so you dash in to slaughter the beasts. The StoryMind may have been designing Steph to support you, but unbeknownst to you, in the interest of generating conflict, it changes some of Steph’s settings! Steph yells for you to stop, but you ignore her and slash through one of them to make an opening. In response, Steph sparks a blinding light, grabs your hand, and runs away in the ensuing chaos.


As soon as you’re clear, she starts yelling at you, asking why you wouldn’t wait. After you get her to calm down a bit and explain things, she confides that she is hemophobic and can’t stand to see, smell, or be anywhere near blood. She’d prefer to stealthily knock out, sneak passed, trick, or bloodlessly maim those who stand in her way. How will you react? Astonishment? Scorn? Sympathy? Is this a deal breaker for your temporary partnership? Remember, she’s always paying attention to you, and so is the StoryMind. This difference in desired methodologies is but a small part of the narrative the StoryMind is crafting.

  • Throughline Type: Class.Concern.Variation.Element, Act I
    • Description
    • Genre
  • Overall Story Throughline: Physics.Obtaining.SelfInterest.Pursuit
    • A dukedom heir has been kidnapped.
    • Entertainment Through Thrills: Pursuit of an endangered royalty.
  • Influence Character Throughline: Mind.Preconcious.Worry.Result
    • Steph is worried about how to deal with her hemophobia. (StoryMind shortly generates this afterward=) She can’t find work beast-slaying or healing because of it, and is now low on money. The duke is evicting her, despite her frequent requests for bloodless work as payment. Everything’s so stressful, and it’s all because of that stupid blood!
    • Comedy of Manners: the almighty sorceress, the bane of beasts and harbinger of health, brought down by the mere sight of blood.
  • Relationship Throughline:
  • You’d prefer to hack away at enemies, but she can’t stand blood and prefers alternative approaches to removing obstacles. As such, you each have different manners of thinking about how you feel obstacles should be dealt with.
  • Growth Drama


  • Main Character Throughline: Universe.?
    • If nothing interrupts the progress of the other 3, the StoryMind is fit to throw external state-related problems your way, and these problems will necessarily dig into deeper, thematic issues. For example…
    •  Universe.Progress.Threat.Hunch
      • You eventually learn that those who took the heir have loose ties to the duke himself. Since you’re in pursuit to rescue him/her, you have a hunch that you may be a target soon as well. You need to unearth the mystery surrounding this. Does this impact your ability to trust the various characters you come across?
      • Entertainment Through Atmosphere: You’re experiencing a fantasy world!



And to think, if you’d just said, “No, thanks. I’m more of a loner,” at the beginning, Steph might instead have developed as a hindering rival Influence Character who tries to steal the heir for herself, popping in and out of the story when you least expect it! Does she even know about the duke’s possible relation to the kidnapping? Too bad we’ll never find out. After all, you didn’t say that. The characters and experiences in this world are real and permanent. You live with your choices, build relationships, and engage with a game world that truly listens to you, more intimately than any other game has before.


I have found that Dramatica is an excellent starting point from which to build story structures and inform our StoryMind narrative AI of how to craft and manipulate storylines and characters. I hope you too are interested in the potential of this sort of system so that one day we might see it in action.

Also, it’s entirely possible I might have slightly messed up some calculations concerning the Dramatica system as the book doesn’t do a great job of clearly defining the relationships in one place (it’s sort of scattered about in the chapters). As far as I can tell, I’ve got them right, but I don’t terribly favor my math skills. I’d be happy to correct any mistakes someone notices.

In the future, expect to find an article diving into the hypothetical technical representation of Agents: their relationships, perceptions, and decision-making. Again, I’d love to hear from you below with comments, criticisms, and/or questions. Cheers!

Next Article: Relationship and Perception Modeling
Previous Article: Toki Sona Implementation Quandries

Minecraftian Narrative: Part 4

Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


At this point, I’ve communicated the basics of the Toki Sona language (a “story-focused” Toki Pona), its potential for simply communicating narrative concepts, and the types of interfaces and games that could exploit such a language.

This time, we’ll be diving into some of the nuts and bolts that might revolve around the actual interpretation of Toki Sona and how it might tie into code. An intriguing array of questions come into play due to Toki Sona’s highly interpretive semantics. The end result is a sort of exaggerated problem domain taken from Natural Language Processing. How much information should we infer from what we are given? How do we handle vague interpretations in code? And what do we do when the language itself changes through usage over time? Let’s start thinking…

Variant Details In Interpretation

What we ultimately want in a narrative engine is to be able to craft a computer system that can dynamically generate the same content that a human author would be able to create. To accomplish this, we must leverage our main tool: reducing the complexity of language to such an extent that the computer doesn’t have to compete with the linguistic nuances and artistic value that an author can imbue within their own work. Managing the degree to which we include these nuances requires a careful balancing act though.

For example, “It was a dark and stormy night…” draws into your mind many images beyond simply the setting. It evokes memories filled with emotions which an author may use to great effect in their manipulation of the audience’s emotional experience. Toki Sona’s focus on vague interpretation leaves many different ways of conveying the same concept, depending on one’s intent. Here are some English literal translations:

  • Version A: “When a black time of monstrous/fearful energy existed…”
    • tenpo-pimeja pi wawa-monsuta lon la, …
  • Version B: “This is the going time: The time is the black time. The air water travels below. As light of huge sound cuts the air above…”
    • ni li tenpo-kama: tenpo li tenpo-pimeja. telo-kon li tawa anpa. suno pi kalama-suli li kipisi e kon-sewi la, …

You’ll notice that version A jumps directly into communicating the tone that the audience should understand. As a result, it is far less particular in setting the scene’s physical characteristics about the weather.

Version B on the other hand takes the time to establish scene details with particulars (as specific as it can get, anyway). Although it takes several more statements to present the idea, it eventually equates itself loosely with the original English phrase. In this way, it manages to conjure emotions in the audience through imagery the same way the original does, but you can also tell that the impact isn’t quite as nuanced.


One of the key aspects of Toki Sona is that it is unable to include two independent phrases in a single statement. It is also unable to include anything beyond a single, adverbial dependent clause in addition to the core independent clause. These restrictions help ensure that each individual statement has a clear effect on interpretation. Only one core set of subjects and one core set of verbs may be present. Everything else is simply details for the singularly described content. As a result, a computer should be able to extract these singular concepts from Toki Sona more easily than it would a more complex language.

So while both database queries and statistical probability calculations are factors in interpreting the text, the algorithms will rely more on the probabilities due to the diminished size of database contents (not as many vocabulary terms to track). This is also likely because words frequently have several, divergent meanings that are relevant to a given context. As such, algorithms will often need to re-identify meanings after-the-fact once successive statements have been interpreted.

Our difficulty comes in when we must identify how interpreted statements are to be translated into understood data. Version B is far more explicit about how things are to be added, while version A relies far more heavily on the interpreter to sort things out. How many narrative elements should the interpreter assume based on the statistical chances of their relevance? The more questionable elements are added, the more items we’ll need to revisit for every subsequent statement. After all, future statements could add information that grants us new insight into the meaning of already stated terms.

To illustrate this, let’s break down how the interpreter might compose a scene based on these statements into pseudocode, starting with version B. We’ll leave English literal translations in and identify them as if they were Toki Sona terms.


Version B
contextFrames[cxt_index = 0] = cxt = new Context(); //establish 1st context

"This is the going time:" =>
contextFrames[++cxt_index] = new Context(); //':' signifies new context
cxt = contextFrames[cxt_index]; //future ideas added to new context
cxt += Timeline(Past); //Add the "time that has gone" to the context

"The time is the black time." =>
cxt += TimeOfDay(Night) //Add the "time of darkness" to the context

"The air water travels below." =>
cxt += Audio(Rain) + Visual(Rain) // Add "water of the air" visuals. Audio auto-added.

"As light of huge sound cuts the air above..." =>
cxt += {Object|Visual}(Light+(Sound+Huge)) >> Action(Cut) >> Visual(Sky+Air);
cxt += Mood(Ominous)?
// The scene includes a light that is often associated with loud noises. These lights (an object? A visual? Is it interactive?) are cutting across the "airs in the sky", likely clouds. All together, this combination of elements might imply an ominous mood.

Version A
contextFrames[cxt_index = 0] = cxt = new Context(); //establish 1st context

"When a black time of monstrous/fearful energy existed..." =>
cxt += TimeOfDay(Night)? + Energy(Terrifying)? + Mood(Terrifying) + Mood(Ominous)?
// Establish night time and presence of a terrifying form of energy in the scene. Based on these, establish that the mood is terrifying in some way with the possibility of more negatively toned content to follow soon. Possible that "monstrous energy" may imply a general feel rather than a thing, in which case "black time" may reference an impression of past events as opposed to the time of day.

To emphasize ease of use and make a powerful assistance tool, it’s best to let the interpreter do as much work as possible and then just update previous assumptions as new information is introduced. That way, even if the user inputs a small amount of information, it will feel as if the system is anticipating your meaning and understanding you effectively. To do otherwise would save significantly on processing time, but would result in far too many assumptions being made that don’t account for the full context. This would in turn result in terrible errors in interpretation. Figuring out exactly how the data is organized and how the interpreter will make assumptions will be its own can of worms that I’ll get to some other day.

Data Representation

An additional concern is to identify the various ways that words will be understood logically as classes or typenames, hereafter “types” (for the non-programmers out there, this would be the organization the computer uses to better identify the relationships and behaviors between terms). Examples in the above pseudocode include TimeOfDay, Visuals and Audio elements, etc. Ideally, each of these definitions would alter the context in which characters exist. It would inform their decision-making and impact the kinds of events that might trigger in the world (if anything like that should exist).

One option would be to create a data structure type for each Toki Sona word (there’d certainly be few enough of them memory-wise, so long as a short-cut script were written to auto-generate the code). Having types represent the terms themselves, however, is quite unreliable as we don’t want to have to alter the application code in response to changes in the language. Furthermore, any given word can occupy several syntactic roles depending on its positioning within a sentence, and each Toki Sona word in a syntactic role comes with a variety of semantic roles based on context.


For example, “kon”, the word for air, occupies a variety of meanings. As a noun, it can mean “air”, “wind”, “breath”, “atmosphere”, and even “aether”, “spirit” or “soul” (literally, “the unseen existence”). These noun meanings are then re-purposed as other forms of speech. The verb to “kon” means to “breathe” or, if being creative, it could mean “to pass by/through as if gas” / “to blow passed as if the wind”. To clarify, when one says, “She ‘kon’s” or “She ‘kon’ed”, one is literally saying “she ‘air’ed”, “she ‘wind’-ed”, “she ‘soul’-ed”, etc. The nouns themselves are used AS verbs, which in turn results in language conventions for interpreted meaning. You can therefore understand the interpretive variations involved, and that’s not even moving on to adjectives and adverbs! Through developing conventions, we could figure out that when a person “airs”, its semantic role is usually that the person breathes, sighs, or similar, not that they spirit away or become one with the atmosphere or something (which are far less likely to use “kon” as an verb in the first place – probably an adverb if anything).

In the end, a computer needs to understand a definitive behavior that is to occur with a given type name. However, since the nature of this behavior is dictated by the combination of terms involved, we can understand that Toki Sona terms are meant to serve as interpreted inputs to the types. Furthermore, it seems most appropriate for types to serve two purposes: they must indicate the syntactic role the word has in a sentence, and they must indicate the functional role the word has in a context.

In the pseudocode excerpt I came up with, we chose to highlight the latter route, defining described content based on how it impacted the narrative context: is this an Audio or Visual element that will affect perception or is this a detail concerning the setting’s external details such as the TimeOfDay, etc.? In addition to this, we’ll also need to incorporate syntactic analysis to better identify what the described content will actually be (is it a noun, verb, adjective, etc.?). As mentioned before, the way a word is used will greatly affect the type of meaning it has, so the function should be built on the syntax which is in turn built on the vocabulary.


Language Evolution

In addition, a system that implements this sort of code integration should be built around the assumption that the core vocabulary and semantics will change. As it stands, we already want to give users the power to add their own custom terms to the language for a particular application. These custom terms are always re-defined using a combination of sentences made of core terms and pre-existing custom terms.

However, because the integration of a living, breathing, and spoken language into a code base is a drastic measure, it is vital that the code be designed around the capacity for the core language to change. After all, languages are not unlike living creatures that adapt to environments, evolve to meet their needs, and strive to achieve their goals in the midst of it. In this sense, we can rest assured that players and developers alike will look forward to experimenting with and transforming this technology. This transformation will assuredly extend to the core terms, so not even the language should be tightly bound to them.

Given the lack of assurances in regard to the core terms over an extended period of time, it would behoove us to incorporate an external dictionary. It should most likely be pre-baked with statistical semantic associations derived from machine learning NPL algorithms and then fed into runtime calculations that combine with the context to narrow down the interpretation most likely to meet users’ expectations.


In simple terms, Wyrd should be given a massive list of Toki Pona (or Toki Sona, later on as it becomes available) statements periodically, perhaps with a monthly update. It should then scan through them, learn the words, and figure out what they likely mean: How frequently is “kon” used as a noun? What verbs and adjectives is it often paired with? What words is it NEVER associated with? What sorts of emotions have been associated with the various term-pairings and which are most frequent? These statistical inputs will assist the system in determining the functional and syntactic role(s) words possess. Combining this data with the actual surrounding words in context will let the application have a keen understanding of how to use them AND grant it the ability to reload this information when necessary.


Wyrd applications should also keep track of all Toki Sona input (if the user has volunteered it) so that they can be used as new machine learning test material. If people start using a word in a new way, and that trend develops, then the engine should respond by learning to adapt to that new usage and incorporate it into characters’ speech and applications’ descriptions. To do this, the centralized library of core terms must be updated by scanning through more recent Toki Sona literature. Ideally, we would pull this from update-electing users, generate new word data, and then broadcast this update to those same Wyrd users.


Well, we’ve explored some of the more in-depth programming difficulties that reside in using Toki Sona. There’ll likely be more updates in the future, but for now, this has all just been a brainstorming and analysis activity. I apologize for those of you who weren’t more tech-savvy (tried to make things a little simpler outside of the pseudocode). From here on out, it’s likely we’ll end up dealing with things that are a bit more technical than the previous fare, but there will also be plenty of high level discussion, so worry not!

For next time, I’ll be diving into the particulars of Agents, Characters, and the StoryMind: the fundamental tools for manipulating and understanding narrative concepts!

Next Article: Dramatica and Narrative AI
Previous Article: Interface and Gameplay Possibilities

Minecraftian Narrative: Part 3


Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


The last time, we discussed the concept of a narrative scripting language that could revolutionize the way players interact with a game world. We considered the possibility of using Toki Pona, an artificial 120-word language with 10 syntax rules, as a starting point for creating a custom language to be used for scripting purposes. In this article, we’ll be focusing a bit more on the ways in which the language might be used and what form its interface might take.

To begin with, I would like to clarify both the licensing plans I have for this concept as well as what terms are involved:

  • Toki Pona: (“The Simple Language”) The original artificial language. This is already free to be used for any purpose.
  • Toki Sona: (“The Story Language”) The modified language that I will be deriving from Toki Pona (Toki Pona’s word for knowledge/story is “sona”). This too will be free to use for any purpose (as it should be). A free C++, C#, Javascript, and APIs would probably be made available for engine/application integration.
  • Wyrd: a paid-for plugin for various popular engines that will include an AI system for interpreting and responding to Toki Sona dynamically for spontaneous dialogue, game events, and character behavior.

Now that we’ve defined things, I’ll be exploring how the heck Wyrd might show up in a game!

Interface Possibilities: Suggested GUI Input

Imagine you’re playing a game and you are given the chance to say something to another character. Rather than being given a succinct list of possible responses, you could simply be given a Minecraft-crafting style of word-composition system.

A character is encountered in the world.
An input field and a word bank are made available to the user (players could summon it at will).
Players can click on an image from the word bank. Think of this as picking a “block” in Minecraft.
As players select statements, the system visually hints at how things are being interpreted, for example: {Myself} {Want} {Go}. These would be the things that are “crafted” from putting terms together.
This hinting informs players of what concepts are ACTUALLY being communicated. For example: tomo a.k.a. {enclosed space} => “home” or “house”
When players need to combine concepts together, for example: {enclosed space} and {water}, it can show them that it is understood as “bathroom”
Players could then check what other interpretations are available for that combination (perhaps by clicking on it).
If they wished to communicate the desire to bathe/shower instead, they could select that option.
Obviously, it would be the responsibility of the Toki Sona engine to ensure there is a standardized image available for all of the desirable concepts, but limits will be necessary.

Another possibility that may be more realistic is to make it so that the hinted images are generated based on the full content of the statements made. For example, the {enclosed space} {water} combo may be assumed to be “bathroom”, but then when the player follows up that statement with, {myself} {feel} {dirty} (mi pilin jaki), then it might show the bathroom image modified to one of bathing after-the-fact. In this way, users wouldn’t be responsible for the interpretation (it can all be automated) which will allow the system to not have too much of a scope-creep going on, mapping images to concepts, etc. It also makes the player have to do less in order to fully interact with the system. Users would also be able to see how their statements impact the interpretation of previous statements.

Interface Possibilities: Suggested Text Input

The text concept is very much like that of the GUI input, however all that would be displayed to the user instead is a text field. Typing into that text field would display a filtered subset of the word bank below the typed text. Things typed would be assumed to be Toki Sona words (for example, “tomo”, meaning “enclosed space”, i.e. “home”, “house”, “construction”). Players would be able to hit [Tab] to move down the hint list and hit [Enter] to auto-complete the selected word and have the image and hinted interpretation pop up. This would allow for MUCH more fluid communication once a player has pieced together the actual vocabulary of the language (you’d eventually get to the point where you wouldn’t even want/need the suggested text).

We would also likely need both input types to have expected grammar displayed, i.e. having a big N underneath the beginning to show you need a noun for a subject. If you have a noun typed, it might suggest a compound noun or a verb, etc. All versions would also auto-edit what you have typed so that it is grammatically correct in Toki Sona (things like, [auto-inserting forgotten grammar particle here], etc.).

Gameplay Possibilities

Creation: One could easily envision a game where the player is capable of supplying WHAT they want to make in narrative terms and then clicking on the environment to place that thing. One could then edit the behavior of anything placed in the scene by selecting it, etc.

Simulation/Visual Novel: Something more like the Sims where the player is given a character and must direct them to do and say things to proceed. What they do and say, and to whom / where they do it may trigger changes in the other characters, the story, the environment, etc. This could naturally progress things.

PuzzleA game where the player is given a certain number of resources (limited points with which to spend for creation, a limited vocabulary, etc.) and must solve a problem. This would be something more like a Don’t Starve or Scribblenauts variety.

Generic RPG: A regular RPG game that allows the player to pop-up a custom dialogue window for speaking purposes, but which would otherwise not rely on directing player controls through the Wyrd system.

Roleplay-Simulation: A game that directly attempts to simulate the experience of a live role-playing game. The system acts as a Game/Dungeon Master and various players can connect to the game to participate in the journey together. A top-down grid environment may show up during enemy encounters of some kind, but players would completely interact with and understand the environment based on the text/graphic output of the system. More like an old school text adventure, but hyped up to the next level.


These are just some of the ideas I’ve had for interface and gameplay using the Wyrd system. Obviously this system still needs a lot of work, but I feel there is clearly a niche market that would long for experiences like this. If you have any comments or suggestions please let me know!

I know this article was a little bit shorter / lighter than my usual fare, but I promise I will develop some more detailed content for you next time. Cheers!

Next Article: Toki Sona Implementation Quandries
Previous Article: Is “Toki Pona” Suitable for Narrative Scripting?

Minecraftian Narrative: Part 2

Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


In the previous article, I explored the necessary elements of a “Minecraftian” game mechanic: one tailored for accessible and steady skill development, one that is equal parts editable and adaptable, visual and simple, granular and tabular.

I then addressed many issues with leveraging common languages to describe abstract concepts in this kind of mechanic. They are frequently hard to master. The Latin-based ones focus more on sounds than they do meanings. Their complexity warrants excessive processing for computer algorithms that are impractical for any imminent use on the scale with which we intend to use them. Relying on existing language saves learning time, but only for a subset of the intended audience; for others, it is an ostracizing element that comes with the expectation of translating into other existing languages to provide the same privileges to alternative audiences. It would also bias any software made against younger players with underdeveloped language skills.

Because of these considerations, we began to consider the language Toki Pona as a possible tool to adapt for narrative scripting. What are the advantages of this Simple Language? Are there any problems with it? Let’s dive in and find out.


Ideal Narrative Scripting

Let’s first review what exactly we mean by “narrative scripting”. What sorts of tasks are we actually wanting to perform with this language? We’ve already established many of the characteristics we are looking for from our Minecraft analysis, and while Toki Pona meets many of these criteria, we must also consider the actual usage environment of our target language before we can significantly evaluate the utility of Toki Pona.

Screen Shot 2016-09-02 at 8.39.49 PM.png

Before continuing, I would also like to point out that this sort of narrative scripting is entirely distinct from the “narrative scripting” language known as Ink. Scripting languages in general are just languages that are more user-friendly and provide a more intuitive, simple interface for computer-tasks that would otherwise be fairly complex. With Ink, the goal is to inform the computer of the relationships between lines of dialogue in branching story lines. In our case, the goal is to inform the computer of the narrative concepts associated with game world objects, actions, and places so that it can 1) interpret meaning based on those associations and 2) trigger events that can be leveraged by AI characters, world controls, and human players/modders to create behavior and change the game world. We want to put this kind of control in the hands of players.

Skyrim-creator Bethesda’s “creation kit” modding tool has its own scripting language as well, to edit objects/events in the game world. It is a bit technical though.

If we truly had a narrative scripting language, then we would be able to craft, with as little vocabulary and syntactic structure as possible, a description of any narrative content. More specifically, we should be able to describe with some measure of accuracy…

  • places’ geography, geometry (both its form and absolute and relative locations), and thematic atmosphere.
  • objects’ nomenclature, physical and functional characteristics, relative purpose, effects, history, ownership, and value.
  • human’s (and, as a categorical subset, animals’ and human-like creatures’) nomenclature, physical and emotional characteristics, relationships, state of mind, responsibilities, history and scars, beliefs, tastes, hopes and dreams, fears, biases, allegiances, knowledge, awareness, senses and observations, skills and powers.
  • concepts’ and ideologies’ subject domain, relationships, and significant details.

These qualities will allow the user to competently describe an environment and the items, creatures, and people in it. In addition, for accessibility and then functional purposes, we need the language to be useful for the following tasks: 1) scenario descriptions, 2) dialogue, and 3) computer processing. The attributes above cover the first case.

Ideally, dialogue “options” wouldn’t exist, and we would be able to directly input a writing system that the non-player characters would be able to understand without some preset arrangement of scripted responses.

As for dialogue, that means it must also be able to model questions, interjections, quotations, prepositions, nouns, adjectives, and adverbs (common syntactic structures). It must accommodate the linguistic relationships between terms and their relative priority, e.g. is X AND/OR’d with Y, are these words a noun/adjective/adverb, are they subject/object, which adjectives describe the noun more clearly, etc. We must also have some means of singling out identifiers, i.e. terms that refer to things that are not themselves a part of the language (a player’s name, for example).

Finally, we must also ensure that the language’s structural simplicity is reinforced so that its consequent processing is more easily conducted by computer algorithms. This primarily involves restricting the number of syntactic rules, lexicon size, but also includes more subtle nuances like the number of interpreted meanings for a given set of words and the number of compound words.

To maximize the utility of the language itself, the “root” words that are individually present in the language must have the following characteristics:

  • The words must be, as much as possible, “perpenyms” of one another; that is, they must be perpendicular in meaning to their counterparts, neither synonyms – for obvious reasons – nor antonyms, to prevent you from simply saying, “not [the word]” to get another word in the language.
  • The words must have a high number of literal or metaphorical interpretive meanings laced within them to ensure a maximum number of functionally usable words per each learned word. Keep in mind, these interpretations must also be strongly linked by theme so that the words’ definitions will be easy to remember. If possible however, these multiple meanings should be individually interpretive based on context, so that one can assume in any given context one interpreted meaning vs. another somewhat clearly. The more this is supported, the less work the computer will be doing.
    • For example, in Toki Pona, the word “tawa” can mean “to go” as a verb, “mobile/moving/traveling” as an adjective, “to, until, towards, in order to” as a preposition (notice how each of those are generally applied to different preposition objects, to help pick out which one is being used), or even, “journey/transportation/experience”, literally “the going” as a noun. Each context can be easily identified based on the positioning of the word in a sentence, each run along a common theme, but each also have a unique connotation that can be interpreted rather well in context.
  • The words must be highly reactive with their fellow terms so that a high number of reasonable compound words can be made. Again, the goal is to maximize the number of unique meanings we can derive from the minimum set of words to learn. This will increase the number of vocabulary terms, but in a less defined way as the minimalist nature of the language will make it favor interpretation of clear-cut meanings anyway. Therefore, alternative constructions for the same compound word concept won’t be too big of a deal.

Toki Pona’s Potential

As previously mentioned, Toki Pona has many desirable characteristics that we seek in developing a narrative scripting language. Limited syntax and vocabulary clears the user-accessible and computationally efficient requirements (3). Toki Pona also does an excellent job of functioning as dialogue since it was designed from the ground up to be used conversationally (2).

The language has an exceptional potential to detail a wide variety of topics, and although it tends to be extremely vague, enough detail can be made to elucidate the general meaning of an idea. However, there are some details about the language that have the effect of restricting its potential, namely the fact that the languages’ creator, Sonja Lang, designed it based on addressing the linguistic needs of a hypothetical, simplistic, aboriginal people on an island. As such, the language is not completely designed from the ground up to account for a maximization of functional vocabulary and in fact caters to the range of topics and activities that such a people would participate in.

The 2014 guide to Toki Pona, occupying an entire word in the language.

For example, the language includes words like “pu” (meaning “the book of Toki Pona”), which is utterly meaningless for our purposes, and “alasa” (meaning “to hunt, to forage”), which fails on perpendicularity since you can easily create the same meaning with phrases like “tawa oko tan moku” (meaning “to go eyeing/looking for food”).

Also, despite how much the language does to accommodate different modes of verbiage, (including past, present, future, progressive variations of each, etc.), it can be troublesome to express some necessary concepts since “wile” (the word for “want to”) itself encompasses want to, must or have/need/required to, and should/ought to, each of which are highly distinct and significant nuances.

Although, some inventive uses for verbiage have been adapted for a lacking vocabulary. For example, a person can convey that he or she should do something by using the command imperative on themselves as a statement, e.g. “mi o moku” => “Me, eat” => “I should eat”, distinct from “mi moku” => “I eat.” These intricacies must be learned on top of any such vocabulary or syntax rules though as they are built upon usage conventions and through their obscurity inevitably hinder the accessibility of the language.

As such, it would appear that the most effective strategy would be to develop a derivation of Toki Pona, built on the same principles and leveraging much of the same language, but stripping it down to only the elements that are most critical for communication and plugging up gaps in linguistic coverage as much as possible.

Narrative Language: Core Concepts

While we won’t iron out the entirety of a language in one sitting, we can likely get a sense for what sorts of concepts must be included as core elements. If we limit ourselves to include 150 words (and even that is really stretching it, if we want to keep the language as effective as possible), then let’s see what ideas are really needed.

  • Pronouns
  • Parts of the body
  • Spatial reasoning, i.e. directions, orientation, positioning
  • Counting, simple math
  • Colors
  • Temporal reasoning, i.e. time (and tense) references
  • Types of living things (including references to types of people, e.g. male/female)
  • Common behaviors (supports occupations, most helping verbs, basic tasks)
  • Common prepositions
  • Elements of nature (earth, wind, water, fire/heat, light, darkness)
  • Forms of matter
  • Grammar particles (obviously taken directly from Toki Pona more or less)

We must also then include elements that are uniquely necessary to fulfill our needs of describing the nature of people and relationships.

  • Elements of perception (able to describe the “feel”, “connotation”, or “theme” of an experience)
  • Elements of relationships (same for relationships, but also able to describe the expected responsibilities. Need basic words to help illuminate expected responsibilities, e.g. Toki Pona’s “lawa” for “leader”)
  • Elements of personality

Ideally, we would be able to get many of these meanings using the same words, but just applying them to a different object. For example, if we had the word “bitter”, we could use it to describe a perception, the overall nature of a relationship (or perhaps even the feelings experienced by one party in the relationship), or someone’s personality, e.g. they are a bitter and resentful person, etc.


In our analysis of Toki Pona, we covered the fact that it has many advantages in regards to fulfilling our requirements as a narrative scripting language; namely that it can be used as dialogue and has a high number of meanings per learned vocabulary term, and therefore it is able to cover a lot of topics with little base learning time involved. However, we have also determined that the language has many flaws due to the original design purpose: meeting the linguistic needs of an isolated and tribal hunter-gatherer people. As such, there are many unnecessary terms and some concepts which simply cannot be conveyed adequately, if at all.

We can therefore state that the best course of action would be to derive a new language from the structure and vocabulary of Toki Pona, shaving away “the fat” as it were, and editing the language to bolster its linguistic breadth and depth while still staying true to its minimalist nature. To that end, we outlined several topics of vocabulary that would be essential for outlining a narrative scripting language.

In future articles, I’ll begin to address the interface we may see in a narrative scripting editor and identify actual gameplay mechanics we could see with narrative scripting put into practice. Please feel free to leave comments if you have any further insights or criticisms and stay tuned for more!

Next Article: Interface and Gameplay Possibilities
Previous Article: What is “Minecraftian Narrative”?

Minecraftian Narrative: Part 1

Table of Contents:

  1. What is “Minecraftian Narrative”?
  2. Is “Toki Pona” Suitable for Narrative Scripting?
  3. Interface and Gameplay Possibilities
  4. Toki Sona Implementation Quandries
  5. Dramatica and Narrative AI
  6. Relationship and Perception Modeling
  7. Evolution of Toki Sona to “tokawaje”


Minecraft’s capacity for simple, direct, consumer-level editing of the 3D world has brought upon revolutions in game design and has joyously perforated industries across the world: education, design, architecture, and a whole host of other fields have felt its influence. Unfortunately, Minecraft’s revolution stops short of tangible representations; while it is excellent at modeling geometry and therefore the creation of concrete forms, it is not so great at enabling the same for ideological or abstract creations. That would be a task for some sort of new narrative scripting language.

What if we could take those same properties that made Minecraft so successful, its simplicity, its capacity to empower laymen for creation, and bring it into the realm of narrative development? What if the complex realm of natural language processing could be simplified to an extreme and made interactive such that even children could create characters, worlds, stories, and watch a computer bring them to life, nurture their growth, and develop them in real time? This series aims to suggest possibilities for just such a future and the remaining hurdles that must be dealt with.

Virtualization of Character

Emerging on the horizon is our imminent mastering of the “Turing” test, devised by Alan Turing. In it, one has a programmed machine anonymously speak with a series of judges. The judges vote on whether they are conversing with a human or a machine. One passes the test if the large majority of the judges are mistaken and are unable to tell the difference between a regular human and the designed machine. A chief example of this test in action is in the creation of software applications designed to communicate online via chat rooms and social media.

A community’s tribute to the Chinese chatbox Xiaoice

The development of these “chatbots” has advanced considerably. A teenage girl variety known as Xiaoice (Shao-ice) garnered over 663,000 live conversations in merely 72 hours. People were relying on her for support, companionship, and advice. She in turn responds dynamically, realistically, yet with her own personality, moving beyond the one-way static communication of traditional media.

Children of the 90’s and later are learning to bond emotionally with virtual characters in unprecedented ways with things like virtual pets, the evolving Toys-to-Life industry, and the Japanese personified, performing voice programs called “Vocaloids” with massively popular YouTube music videos, canonical relationships, and live concerts in New York with cheering fans.

A few of the Vocaloid performers growing famous around the world.

As the popularity of virtual characters increases and their commercial appeal grows, consumers’ desire to share memories, experiences, and relationships with them will also grow. The harbingers of these experiences will be restricted to the educated and practiced experts of writing and design historically responsible for creating such performances.

But what if this need not be the case, for games or any other sort of media? What if characters, and the stories surrounding them were just as editable as the blocks of Minecraft? What if they were “Minecraftian”?

What Does “Minecraftian” Actually Mean?

Minecraft: a video game that revolutionized the way people interact with 3D design and architecture. Fundamentally, it’s a video game that acts as a liberating force to unleash people’s creativity and bring to life the architectural and artistic wonders locked within one’s mind. And the medium of this transformation? Blocks. A limitless, vast world that’s most basic element is a square of space that can be directly interacted with by any average player.

A sampling of the blocks players may encounter.

There are a limited number of types of blocks. Some are for grass. Some are for wood. Others represent various types of stone or fluid. Some are static and some are animated. And as one becomes familiar with the basic types of blocks, they can then “mine” blocks for materials to be used in “crafting” new blocks. They can consume them, transform them, fuse them, add them back into the world, and in so doing directly edit every detail of their environment.

Take a second to imagine the power that this game presents. It isn’t very complicated; there simply aren’t too many types of blocks to remember. You never start out dealing with things you can’t handle. You steadily advance your knowledge of blocks as you play, learning to become proficient in your renderings. Your main activities involve mining or placing objects in the world and combining materials in menus. The more you interact with the world, the more you discover how to alter things and the greater your understanding of the relationships between blocks becomes. You soon get to a point where the path to realizing your vision forms in your mind effortlessly because you’ve mastered the simple mechanics ever so quickly. Soon a masterpiece stands before you and the journey there was far easier than you ever could have imagined.

“King’s Landing”, the capital city from the TV series Game of Thrones, built with blocks by a regular Minecraft player.

Some important things to consider about Minecraft’s block-building mechanic:

  1. It’s easy to learn and requires little overall knowledge. There aren’t so many different types of materials that you couldn’t memorize a reasonable set in a few dedicated hours of play.
  2. The basic editable element of the world is highly visual and interactive. The “resolution” of things is drastically reduced, facilitating comprehension and precision with a tangible granularity.
  3. Obscurity and concepts are your friends, not fine details. The limited degree of detail permitted ensures that players need not become experts in their creative portrayals as everyone is on an equal footing, complexity-wise.
  4. The “language” of creation is fully computerized. The medium is just as prone to manipulation by algorithms as it is the manual fine-tuning of a patient and determined player.
  5. The data is easily hack-able, mod-friendly, and adaptable. Everything exists on a simple grid, for which we have accumulated many algorithms already, i.e. there is already a vast array of knowledge for how to manipulate these data structures. Tinkerers and entrepreneurs everywhere can easily experiment and devise new ways of interacting with the system.

The Difficulties of Language

So how can we take the mechanics of Minecraft and the transformations it made to 3D design and bring it to narrative, character creation, and world-building? First step: identify our most fundamental element, our “block.” Perhaps words? After all, words – and the concepts associated with them – are what make up thoughts and ideas, right? We should just have everyone who plays our game learn the words that we use in our game/tool. Well, that certainly appears to be the solution, but it isn’t quite so simple unfortunately.

If you’ve ever tried to learn a second language, then you know it can be an arduous task, especially the further you go from your native tongue. Taxing hurdles build up one after another such as the varieties of grammar and syntax or the vast amounts of vocabulary. It can take years before one is competent enough to use a new language with any sort of astute precision. Compounding the issue is the practical element of whether or not the language can easily be comprehended, interpreted, and responded to by a computer system in real-time from multiple sources.


If our goal is to use a language that all of our players can be expected to engage with, then relying on any language as complex as English is still a disservice to the players of other cultures. It also loses much of its potential to appeal to younger audiences whose language skills may still be developing. What’s more, if we wish to make it as simple as possible for an average player to edit language contents to communicate with narrative tools and mechanics in-game, English and other common languages are too convoluted.

What we really need if we intend to apply a Minecraftian design to our linguistic mechanics is a revolutionary scripting language: something with a limited vocabulary that’s easy to learn, a simple granular syntax where you can easily pick out the parts of a sentence and the meaning therein, a language where a computer could easily understand the full breadth of its grammar at extremely efficient speeds to account for vast amounts of real-time processing. It wouldn’t have to be terribly detailed; just enough information for us to get a vague idea of what was meant. Ideally, the language would be highly visual and easily editable to appeal to children on the consumer end and hackers on the entrepreneur end.

In comes Toki Pona.

The entire Toki Pona language, minus compound words.

Toki Pona is an artificially designed language with a total of 120 basic words and particles and a syntax of a measly 10 rules. It can be completely learned fluently in a matter of weeks. Words can be clearly depicted as combinations of hieroglyphs that visually indicate the meaning of the word. The visual, minimalist approach of the language makes it highly accessible, adaptable, computational, universal, and overall plausible as a candidate for narrative editing, though further examination will be necessary to truly determine its utility for such a task.

If we simply 1) teach a computer to understand a Toki Pona-inspired language, 2) make interactions with such a system simple, intuitive, and visual, and 3) properly introduce the use of this visual language to players, we can devise gameplay mechanics that allow players to easily interact with and/or (re)define the narrative details of their world. In the next article, I’ll go into detail on how such a system might work and how it could be used for revolutionizing game design and narrative on a fundamental scale.

Please let me know what you think in the comments. I’m eager to hear people’s thoughts on the topic!

Next Article: Is “Toki Pona” Suitable for Narrative Scripting?