My New Web

Requirements for Transclusion
Managing the data
Objects and History
Implementation Notes
Using the Site
Features
Non-features

Requirements for Transclusion

I would like to support transclusion. For example, I would like to include a previous article as a section in a new article. Transclusion cannot be done satisfactorily in HTML or Markdown, because the document is syntactically a list of block-level elements. An include in this situation would introduce an unknown number of elements that are not separated from the surrounding content. I need a markup standard where the syntax tree is isomorphic to the outline tree, so when I include things they are nested in a smaller hierarchy level. DocBook satisfies this requirement, and I can use XInclude for inclusion. Then everything else is left for the page compiler.

Managing the data

Proposing the following dichotomy. An item is either content-addressed and immutable, or name-addressed, mutable, owned and versioned.

Immutable

content-addressed (hash-addressed)
name makes no sense
no central authority
versioning makes no sense
no lifecycle

Named

name-addressed (the name does not have to be human-readable)
will go on to have a human-readable title, but not as the address
meaningful
warrants versioning
can refer to other resources independently in each version
versioning only applies to writes
usually requires central authority, or owner
can have namespaces
global namespace must have global version ordering

This has been implemented in the following way: all meaningful objects are in XML, and their root element each has a xml:id.

Both

can be referred to, resulting in coreferences, for each version of the complete site

History only makes sense for mutable objects. In the past, I have seen history-enabled systems done by wrapping a global history around the business semantics. For example, a database with a transaction log is such a system. This kind of systems have a problem: It is not trivial to answer questions about the lifecycle of objects inside the system. An object in this system can get deleted, but its identifier remains available for future use. In contrast, on my new Web, every identifier has exactly one life. Reusing the identifier of a dead object is prohibited.

Can we theorize this? There seems to be a mixture of linearity and nonlinearity here. The linear part is every identifier having exactly one life. The nonlinear part is the forced creation of new identifiers for new objects.

Implementation Notes

Each object has a 256-bit ID serialized as hexadecimal string. This applies to named objects, immutable objects, and Git hashes. The length of the ID is chosen to coincide with that of the ubiquitous SHA-256. Note: Named objects are still not content-addressed, despite having names that have the same format as hashes. For named objects, each ID lives once at most. An object with some ID can be created if the ID has not been used before. At the same time, IDs must be unique across named objects, immutable objects, and commit hashes.

Immutables are not stored in the Git repository. One day they might be on IPFS, but today it is in a plain directory on my server. A nuance that I handled was the automatic serving of MIME types, since I wanted to query with only a 256-bit ID.

问群友一个 nginx 配置问题。我有一堆无扩展名的的文件，但是我知道它们的 mime type。如何让 nginx 以正确的 mime type 来 serve 它们？我可以在服务器上给它们加扩展名，但是必须支持客户端发出不带扩展名的请求。每一个文件 mime type 都不一样而且无规律。我不能把这个映射放在这个配置里，它需要是放在文件系统上的，我有几百条这个映射。

每个文件 ln 到正确扩展名。然后 nginx 里 try_files 把所有扩展名列一遍。

截至发稿收获了 2 个 🎉

All named objects are written in DocBook XML. XInclude is supported. The single database for named objects is a Git repository. It contains a flat set of files; there is no hierarchy. Each file has the same name as its xml:id at the XML root node. There is no file name extension. Multiple users can work on the same named object in the Git way, meaning each change has a provenance.

During compilation, for each Git commit, a coreferences table is constructed. Naively, for each commit, this computation takes $O (n)$ time where $n$ is the number of objects at that snapshot. This algorithm is currently deployed. Compilation is not done for references, because it takes $O (1)$ time.

Using the Site

A user can choose to write articles in another markup language, then use Pandoc or a comparable tool to convert it to DocBook XML. The user needs to configure a git client to use a good username. The user can create, modify, and delete named objects. However, a set of changes should only be committed to git if they are consistent. For example, if an object is deleted, then all references to it must be deleted.

Features

Server-side transclusion with XInclude (FYI, transclusion ~= include + hyperlink)
- listing references
- listing coreferences (backlinks)
- jumping to a coreference right at the transclusion site
Collaborative, versioned editing à la Git
- Git versioning semantics
- commit log of each object
- commit log of each user
- blaming each object
- viewing commit diff
Orthogonal integration
- All features above are supported, even if they are combined.

Non-features

No dependent history: It is not possible to transclude a past revision of any named object.