The documentation system tenet of emergent structure states that the categorization and organization of documentation should be capable of continuous overhaul. This tenet is the most fundamental in this text because it supports all the four DocOps principles:
Bush (1945) noted that, âA record if it is to be useful to science, must be continuously extendedâ. Nelson (1987) also observed that,Â
âI mention my mistrust of categories and hierarchies, not for its metaphysical value (if any), but because it provides a fine orientation for building information systems. Because if you are not falsely expecting a permanent system of categories or a permanent stable hierarchy, you realize your information system must then deal with an ever-changing flux of new categories, hierarchies and other arrangements which all have to coexist, it must be a tolerant system which allows them to cohabit comfortably, helps track their variations and disparities, and is forever ready to accommodate new arrangements on top of those already present.â [emphasis added]
Likewise, Berners-Lee (1999), when at CERN, was also particularly suspicious about the value of documentation systems based on the premise of shoehorning information into preconceived categories:
âWhat I was looking for fell under the general category of documentation systemsâ-software that allows documents to be stored and later retrieved. This was a dubious arena, however. I had seen numerous developers arrive at CERN to tout systems that âhelpedâ people organize information. Theyâd say, âTo use this system all you have to do is divide all your documents into four categoriesâ or âYou just have to save your data as a WordWonderful document flamesâ or whatever.âÂ
Berners-Lee concluded that âA computer typically keeps information in rigid hierarchies and matrices, whereas the human mind has the special ability to link random bits of data.â.
It is counterproductive to define an inflexible classification system in advanceâa label system, a taxonomy, a prescribed table of contents, etc.âbecause not all of the relevant users may be involved in the process simultaneously, and those who are involved may not have all the facts in hand. As Rosenfeld et al., (2015), noted âorganization systems are intensely affected by their creatorsâ perspectivesâ and âthis challenge is complicated by the fact that most information environments are designed for multiple users, and all users will have different ways of understanding the information.â
Moreover, the users in charge of defining the first categories may not have witnessed sufficient content instances that prove the fitness of their chosen preconceived classification scheme.Â
Problems that may arise when using a prescriptive classification scheme are multiple fold:
Embracing the tenet of emergent structure is ultimately about humility. We simply canât anticipate the shape of our continuously evolving documentation base, thus pinning down the structure prematurely is like taking a wooden box and cutting out square holes on its side, in the expectation that we will only get square pegs.
Choo (2002) observed that âpeople in organizations are not content with structured transactional data, they also want information technology to simplify the use of the informal, unstructured informationâ, and that âusers want a seamless web of formal and informal data, and internal and external data, represented in structures and models that are meaningful to them for cultivating insight and developing choices.â
In Haeckel & Nolanâs (1993) three-prong Corporate IQ model, structuring, which refers to the ability to meaning from data, is essential. In their study, they noted that âwhen information from previously unrelated sources is structured in a meaningful way, human beings become capable of thinking thoughts that were previously unthinkable.â (emphasis added).
For the aforementioned reasons, this tenet demands the ability to refine and refactor our documentation baseâs categorization scheme continuously, which in turn hinges on the tenets of uniform addressability and flat namespace.Â
Document views may be classified in a number of ways. The most common classification schemes are labels, taxonomies, and properties:
Letâs now look at each of them in detail.
Given that many DocOps automation workflows rely on labels to select and filter content, we should start with them.
Labels, also called tags, or hashtags, help place a content instance in multiple categories. Labels are useful to find related contentâfrom a given content instanceâs perspectiveâand to implement selections and filters when embedding content.Â
If we take the saying âthe hardest thing in software engineering is naming things and cache invalidationâ by heart, it is not far fetched to conclude that one of the hardest things in documentation systems is labeling content. We usually struggle at coming up with sensible label names, avoiding labels that are too general, or too specific, and find it difficult selecting which labels to apply.
Let us start with the first problem: what to call labels. We will most certainly pick a name that we will eventually regret. The documentation platform should allow the continuous renaming of labels without breaking links along the way. Likewise, documentation views should create an implicit label associated with them. For example, all content instances using the label #customer, not only allows discovering other content instances that have the same label, but also finding out more about what the definition of a customer is. In other words, the creation of docs.allscuba.co.uk/Customer implies the presence of #customer and vice versa.
Second, we have the problem of granularity. If labels are too fine grained, we end up having to add quite a few of them to every content instance. If they are too coarse grained, instead, we need less labels per documentation view, but the labels become less helpful when finding related content. For example, labels such as #business and #technical are perhaps too general, while labels such as #java and #python could be too specific.
The point is that we will never get the granularity right upfront. Whatever choices we make today are unlikely to be appropriate tomorrow. Therefore, what we need is the ability to reorganize our labels quickly and cheaply. In practical terms, this means the ability to perform bulk merge and split operations without having to edit each document view one by oneâand without breaking links.
In a bulk merge operation we take two labels, say, #java and #python and combine them into one label, for example, #code. In a bulk split operation, instead, we take one label, say, #business, and split it into two labels, for example, #b2c and #b2b. In this case, some of the content may apply to both whereas we may need to drop either #b2c or #b2b from the content instances that apply to only one of the labels.
Needless to say, the adding and removing of labels should also work in a bulk fashion.Â
Tags should not only be considered an author-centric mechanism. Rosenfeld et al. Â (2015) mentions the notion of free tagging in which users define and tag content for which they arenât necessarily the authors. While this system may be seen as form of crowdsourcing the task of organizing content, it may also prove to be a more effective means to organize the content in such a way that it best aligns to the majority of usersâ mental model.Â
A taxonomy, despite its academic-sounding name, is nothing more than a hierarchy, or a tree-like structure. Whereas labels are flat, taxonomies define a parent-child relationship between the categories defined in them. Given how central taxonomies are in documentation systems, we expand on this topic when discussing the tenet of floating taxonomies.
From an emergent structure perspective, similarly to labels, we need the capability of keeping evolving taxonomies continuously:
Performing any of these actions should not only be easy and quick, but be feasible on a bulk basis without requiring editing one document view at a time, and naturally, without breaking links as a result of these activities. This process is much easier if taxonomies piggy back on the label system, which is what I suggest when I introduce the notion of floating taxonomies further on.
Document views may be classified not only using labels and taxonomies, but also properties. Some properties are typically intrinsic such as the documentâs title, the authorâs name, the date of publication, and so on, while others are explicit. Explicit properties are user-defined.
Explicit properties are usually implemented using the labeling mechanism whereby the labels become the key in key/value pairs. Furthermore, properties may be associated with a schemaâfor example, an enumeration of possible values. As far as the tenet of emergent structure is concerned, we need the same degree of flexibility for both implicit and explicit properties.
Given that a property key is effectively a label, we require the same flexibility we have already discussed in terms of continuous splitting, merging, renaming, and so on, but we also have values. Effective bulk operations on values require further flexibility to be effective.Â
In short, the properties applicable to collections of document views should offer a user experience similar to that of a spreadsheet.
Headings may not be intuitively seen as a classification scheme but they play a significant structural role in modern documentation systems.Â
First of all, a document viewâs table of contents is in most cases derived from headings: they are normally displayed like a taxonomy and, depending on the documentation platform of choice and its configuration, such a taxonomy may also be attached to the navigation system in the form of an interactive outline.
Second, headings act like labels in the sense that they tag compartments of a document so that said compartments can quickly be identified across different documents. For example, whenever two or more document views present a heading called âIntroductionâ, âBusiness Contextâ, and âClosingâ, users come to expect the same kind of content under such headings.
Third, headings play a role in composite document views because smaller document parts typically have âchildâ headings, say, level 2, under the expectation that they will be inserted under a âparentâ heading, say, level 1.Â
Last, headings can also be used to demarcate content blocks and to structure information in DocOps automation workflows. For example, in the below figure, the heading âInfo-Box:Personasâ can be used to generate a floating table with information about personas rather than being displayed as a regular heading.
In short, given that headings exist in all document formatsâthey are universalâ they can be relied upon to create taxonomies, labels, metadata, compositional aids, and much more.Â
Given that headings are the most coarse-grained building block in a document, they should be treated as fungible, easy-to-manipulate objects; we require ample flexibility in terms of selection, cross-document view bulk operations, and in-document operations.
Selecting a headerâbefore we can move it, delete it, and so onâis a little more nuanced than it initially appears. A heading may be followed by non-heading content, by a child heading (e.g., a heading level 2 after a heading level 1), or a combination of both. Any heading-wise operation thus requiresÂ
For each selected heading(s) we then need the ability to reorder them, promote and demote them (e.g., decrease and increase the heading level), without requiring tedious copy-paste workflows.
What is key, though, is the ability to perform cross-document operations, for example, selecting headings from one document and moving them (or copying them) to another document. As headings often act as demarcated components, it is also important to treat them as âcolumnsâ so that they can be compared and edited on a side-by-side basis. Suppose that the established convention is the inclusion of a heading called âSummaryâ in every document. In this case, it would be unacceptable to force users to edit this text by tediously opening every document one by one.
Labels, taxonomies, properties, and headings are the most hands-on (i.e, labor-intensive) mechanisms to organize documentation. Others, like alphabetical indexes, and sitemaps can be generated automatically; in this case, they would abide by the tenet of emergent structure without any effort.
We still need to be mindful of other manual classification systems such as keyword lists, curated related content links, and so on, that may need the same kind of flexibility and automation applicable to labels and taxonomies discussed here.
© 2022-2024 Ernesto Garbarino | Contact me at ernesto@garba.org