Emergent Structure


The documentation system tenet of emergent structure states that the categorization and organization of documentation should be capable of continuous overhaul. This tenet is the most fundamental in this text because it supports all the four DocOps principles:

  • It supports the principle of generative content because newly extracted properties may require new categories, or provide categorical information in and of themselves. 
  • It supports the principle of shared responsibility because co-creators need the ability to evolve categories organically based on the emerging body of documentation.
  • It supports the principle of truth proximity because the emerging body of documentation may start drifting away from the categories that were defined in the past.
  • It supports the principle of low cognitive load because fitting content into categories that may be incorrect, ambiguous, too fine-grained, or too coarse-grained, produces cognitive stress.
The tenet of emerging structure observes that as documentation evolves, associated categorization schemes must evolve as well
The tenet of emerging structure observes that as documentation evolves, associated categorization schemes must evolve as well

Bush (1945) noted that, “A record if it is to be useful to science, must be continuously extended”. Nelson (1987) also observed that, 

“I mention my mistrust of categories and hierarchies, not for its metaphysical value (if any), but because it provides a fine orientation for building information systems. Because if you are not falsely expecting a permanent system of categories or a permanent stable hierarchy, you realize your information system must then deal with an ever-changing flux of new categories, hierarchies and other arrangements which all have to coexist, it must be a tolerant system which allows them to cohabit comfortably, helps track their variations and disparities, and is forever ready to accommodate new arrangements on top of those already present.” [emphasis added]

Likewise, Berners-Lee (1999), when at CERN, was also particularly suspicious about the value of documentation systems based on the premise of shoehorning information into preconceived categories:

“What I was looking for fell under the general category of documentation systems—-software that allows documents to be stored and later retrieved. This was a dubious arena, however. I had seen numerous developers arrive at CERN to tout systems that ‘helped’ people organize information. They’d say, ‘To use this system all you have to do is divide all your documents into four categories’ or ‘You just have to save your data as a WordWonderful document flames’ or whatever.” 

Berners-Lee concluded that “A computer typically keeps information in rigid hierarchies and matrices, whereas the human mind has the special ability to link random bits of data.”.

It is counterproductive to define an inflexible classification system in advance—a label system, a taxonomy, a prescribed table of contents, etc.—because not all of the relevant users may be involved in the process simultaneously, and those who are involved may not have all the facts in hand. As Rosenfeld et al., (2015), noted “organization systems are intensely affected by their creators’ perspectives” and “this challenge is complicated by the fact that most information environments are designed for multiple users, and all users will have different ways of understanding the information.”

Moreover, the users in charge of defining the first categories may not have witnessed sufficient content instances that prove the fitness of their chosen preconceived classification scheme. 

Problems that may arise when using a prescriptive classification scheme are multiple fold:

  1. Categories may easily become unbalanced; some may contain hundreds of content instances whereas others may contain just a few.
  2. Content that is frequently consulted may be trapped underneath a deeply nested structure, resulting in a classification hierarchy which is ‘taxonomically sound’ but broken from an information seeking workflow perspective.
  3. A priori classification schemes are typically unidimensional; they usually organize content by one criteria only (e.g., business domain) making life difficult for users who use alternative information seeking workflows or who hold mental models other than those defined by the original authors.

Embracing the tenet of emergent structure is ultimately about humility. We simply can’t anticipate the shape of our continuously evolving documentation base, thus pinning down the structure prematurely is like taking a wooden box and cutting out square holes on its side, in the expectation that we will only get square pegs.

Choo (2002) observed that “people in organizations are not content with structured transactional data, they also want information technology to simplify the use of the informal, unstructured information”, and that “users want a seamless web of formal and informal data, and internal and external data, represented in structures and models that are meaningful to them for cultivating insight and developing choices.”

In Haeckel & Nolan’s (1993) three-prong Corporate IQ model, structuring, which refers to the ability to meaning from data, is essential. In their study, they noted that “when information from previously unrelated sources is structured in a meaningful way, human beings become capable of thinking thoughts that were previously unthinkable.” (emphasis added).

For the aforementioned reasons, this tenet demands the ability to refine and refactor our documentation base’s categorization scheme continuously, which in turn hinges on the tenets of uniform addressability and flat namespace. 

Document views may be classified in a number of ways. The most common classification schemes are labels, taxonomies, and properties:

  • Labels: they associate a document view with multiple tags.
  • Taxonomies: they allow placing a document view in a hierarchy.
  • Properties: they allow defining key/value pairs.

Let’s now look at each of them in detail.

Labels

Given that many DocOps automation workflows rely on labels to select and filter content, we should start with them.

Labels, also called tags, or hashtags, help place a content instance in multiple categories. Labels are useful to find related content—from a given content instance’s perspective—and to implement selections and filters when embedding content. 

If we take the saying “the hardest thing in software engineering is naming things and cache invalidation” by heart, it is not far fetched to conclude that one of the hardest things in documentation systems is labeling content. We usually struggle at coming up with sensible label names, avoiding labels that are too general, or too specific, and find it difficult selecting which labels to apply.

Let us start with the first problem: what to call labels. We will most certainly pick a name that we will eventually regret. The documentation platform should allow the continuous renaming of labels without breaking links along the way. Likewise, documentation views should create an implicit label associated with them. For example, all content instances using the label #customer, not only allows discovering other content instances that have the same label, but also finding out more about what the definition of a customer is. In other words, the creation of docs.allscuba.co.uk/Customer implies the presence of #customer and vice versa.

Second, we have the problem of granularity. If labels are too fine grained, we end up having to add quite a few of them to every content instance. If they are too coarse grained, instead, we need less labels per documentation view, but the labels become less helpful when finding related content. For example, labels such as #business and #technical are perhaps too general, while labels such as #java and #python could be too specific.

In this example, a web-based editor allows managing labels in bulk as opposed to forcing users to perform such operations at the document level
In this example, a web-based editor allows managing labels in bulk as opposed to forcing users to perform such operations at the document level

The point is that we will never get the granularity right upfront. Whatever choices we make today are unlikely to be appropriate tomorrow. Therefore, what we need is the ability to reorganize our labels quickly and cheaply. In practical terms, this means the ability to perform bulk merge and split operations without having to edit each document view one by one—and without breaking links.

In a bulk merge operation we take two labels, say, #java and #python and combine them into one label, for example, #code. In a bulk split operation, instead, we take one label, say, #business, and split it into two labels, for example, #b2c and #b2b. In this case, some of the content may apply to both whereas we may need to drop either #b2c or #b2b from the content instances that apply to only one of the labels.

Needless to say, the adding and removing of labels should also work in a bulk fashion. 

Tags should not only be considered an author-centric mechanism. Rosenfeld et al.  (2015) mentions the notion of free tagging in which users define and tag content for which they aren’t necessarily the authors. While this system may be seen as form of crowdsourcing the task of organizing content, it may also prove to be a more effective means to organize the content in such a way that it best aligns to the majority of users’ mental model. 

Taxonomies

A taxonomy, despite its academic-sounding name, is nothing more than a hierarchy, or a tree-like structure. Whereas labels are flat, taxonomies define a parent-child relationship between the categories defined in them. Given how central taxonomies are in documentation systems, we expand on this topic when discussing the tenet of floating taxonomies.

From an emergent structure perspective, similarly to labels, we need the capability of keeping evolving taxonomies continuously:

  • Creating and deleting taxonomies
  • Merging two or more taxonomies into one taxonomy
  • Splitting a taxonomy into two separate taxonomies
  • Renaming a taxonomy or a category within a taxonomy
  • Creating and deleting categories within a taxonomy
  • Reordering categories within a taxonomy
  • Changing parent-child relationships within a taxonomy

Performing any of these actions should not only be easy and quick, but be feasible on a bulk basis without requiring editing one document view at a time, and naturally, without breaking links as a result of these activities. This process is much easier if taxonomies piggy back on the label system, which is what I suggest when I introduce the notion of floating taxonomies further on.

Properties

Document views may be classified not only using labels and taxonomies, but also properties. Some properties are typically intrinsic such as the document’s title, the author’s name, the date of publication, and so on, while others are explicit. Explicit properties are user-defined.

Explicit properties are usually implemented using the labeling mechanism whereby the labels become the key in key/value pairs. Furthermore, properties may be associated with a schema—for example, an enumeration of possible values. As far as the tenet of emergent structure is concerned, we need the same degree of flexibility for both implicit and explicit properties.

Given that a property key is effectively a label, we require the same flexibility we have already discussed in terms of continuous splitting, merging, renaming, and so on, but we also have values. Effective bulk operations on values require further flexibility to be effective. 

  • Filtering document views before editing to avoid iterative editing
  • Updating multiple values side by side (normally using a tabular view)
  • Performing value-type relevant bulk operations
  • Regular expressions on strings
  • Arithmetic on numeric values
  • Referential operations (e.g., using the values from other labels)

In short, the properties applicable to collections of document views should offer a user experience similar to that of a spreadsheet.

Headings

Headings may not be intuitively seen as a classification scheme but they play a significant structural role in modern documentation systems. 

First of all, a document view’s table of contents is in most cases derived from headings: they are normally displayed like a taxonomy and, depending on the documentation platform of choice and its configuration, such a taxonomy may also be attached to the navigation system in the form of an interactive outline.

Second, headings act like labels in the sense that they tag compartments of a document so that said compartments can quickly be identified across different documents. For example, whenever two or more document views present a heading called ‘Introduction’, ‘Business Context’, and ‘Closing’, users come to expect the same kind of content under such headings.

Third, headings play a role in composite document views because smaller document parts typically have ‘child’ headings, say, level 2, under the expectation that they will be inserted under a ‘parent’ heading, say, level 1. 

Last, headings can also be used to demarcate content blocks and to structure information in DocOps automation workflows. For example, in the below figure, the heading ‘Info-Box:Personas’ can be used to generate a floating table with information about personas rather than being displayed as a regular heading.

Headings can be used to structure data in a document. In this example, headings with the prefix ‘Info-Box:’ are processed to be rendered as a sidebar
Headings can be used to structure data in a document. In this example, headings with the prefix ‘Info-Box:’ are processed to be rendered as a sidebar

In short, given that headings exist in all document formats—they are universal— they can be relied upon to create taxonomies, labels, metadata, compositional aids, and much more. 

Given that headings are the most coarse-grained building block in a document, they should be treated as fungible, easy-to-manipulate objects; we require ample flexibility in terms of selection, cross-document view bulk operations, and in-document operations.

Selecting a header—before we can move it, delete it, and so on—is a little more nuanced than it initially appears. A heading may be followed by non-heading content, by a child heading (e.g., a heading level 2 after a heading level 1), or a combination of both. Any heading-wise operation thus requires 

  • Selecting discrete headings (e.g., ‘Introduction’, but not ‘Business Context’)
  • Selecting heading ranges (e.g., all headings between ‘Business Context’, and ‘Appendix’)
  • For a given heading:
    • Choosing whether to include the heading title itself 
    • Choosing whether to include the body of text following the heading
    • Choosing whether to include the child headings
    • Depth limit (e.g., do not include headings below level 5)

For each selected heading(s) we then need the ability to reorder them, promote and demote them (e.g., decrease and increase the heading level), without requiring tedious copy-paste workflows.

What is key, though, is the ability to perform cross-document operations, for example, selecting headings from one document and moving them (or copying them) to another document. As headings often act as demarcated components, it is also important to treat them as ‘columns’ so that they can be compared and edited on a side-by-side basis. Suppose that the established convention is the inclusion of a heading called ‘Summary’ in every document. In this case, it would be unacceptable to force users to edit this text by tediously opening every document one by one.

Other Categorization Schemes

Labels, taxonomies, properties, and headings are the most hands-on (i.e, labor-intensive) mechanisms to organize documentation. Others, like alphabetical indexes, and sitemaps can be generated automatically; in this case, they would abide by the tenet of emergent structure without any effort.

We still need to be mindful of other manual classification systems such as keyword lists, curated related content links, and so on, that may need the same kind of flexibility and automation applicable to labels and taxonomies discussed here.


© 2022-2024 Ernesto Garbarino | Contact me at ernesto@garba.org