Archaeology

Tuesday, 1 September 2009.


I learned something this last week that is somewhat counterintuitive, and therefore interesting: it is easier to discern history indirectly, rather than directly. (In the terms of a computer scientist, I would say that it is more efficient to search history breadth-first, rather than depth-first.)

Allow me to illustrate. I've recently decided to acquire several knives (of various sorts), and as those who are close to me are aware, that makes for several hours of entertaining research for me, as I try to sort out what’s worth my time, what isn’t, and (most importantly for the purposes of current discussion) what the history of it all is.

Since I knew rather little about knives other than basic use and maintenance (from Boy Scout training all those years ago), I more or less started scouring the Internet at random for information, not really starting from any source in particular, but achieving a shallow depth of understanding at great breadth, across the entire spectrum of knives. One of the knives I settled on ordering (an Okapi 907e) had very little in the way of historical knowledge associated directly with it (which is illustrated by this terse Wikipedia article), essentially that the knife was originally produced around 1902 in Germany for export to German Africa, and has been produced in South Africa since 1988. All well and good, but knives (like everything else) do not simply appear out of nowhere, so from where did the Okapi originate?

Well, at this point, if I had continued to dig deeper into the history of the knife, I'd have found nothing; in fact, there is little other information on the knife except that it is a common working knife in Africa and that it’s been favored as a cheap weapon in Jamaica by rude boys. This is where the depth-first search fails.

But, since I had already searched a wide range of knives at this point, I had already seen a few knives that looked similar, even if they weren’t directly related: the Spanish navaja and the French laguiole; the former popular in the 1600’s, and the latter popular in the 1800’s; and, what’s more, itself a French evolution of the Spanish design.

While I can’t be certain of any connections (such is the lack of information at my disposal), it seems pretty likely that the lovely Spanish design migrated north and east, first to France for domestic use, and slowly to Germany, which appropriated the design for foreign use (at the same time making it cheaper to produce, and less elegant). The design continued to evolve slowly into the “three-star” design still produced today. And again, while I can’t be certain of the connection, it still seems pretty compelling; that is where the breadth-first search succeeds.

Thus, I find that doing a wide search (with some mild note-taking) on seemingly-unrelated information results in a much greater depth of knowledge on each particular item; while this makes sense, it’s not something I'd have immediately recognized.


On a technical tangent, it thus strikes me that a good learning agent might be developed by searching the web breadth-first (as any web spider might), and by including an inference engine of some kind (such as a neural network or (impractically large) SDM) to pick out similar or closely related information by content, rather than hypertext links (like Google or Yahoo! would). The result would be an information classification system. How well it performs would probably be heavily dependent upon how “smart” the process is that extracts information from a web page is; ideally, something that can understand written language, but those sorts of parsers aren’t at the level of sophistication necessary yet, but you can probably fake intelligence pretty well by cheating…

Lavender, the Lonely Pink Elephant