Open Note Databases & the Promise of the Memex

Vannevar Bush’s Memex

In 1945, Vannevar Bush wrote a now famous piece for The Atlantic called “As We May Think.” In it, he proposed the development of a machine called the Memex. This desk shaped machine would be able to display printed and handwritten texts and would be able to record notes made with a special stylus. The machine would be able to record meta-data noting connections between various sources and all of this information could be stored on removable cards.

The Memex served as inspiration in the development of the modern personal computer. Hundreds of articles in the 1990s pointed to the invention of the World Wide Web as the culmination of Bush’s vision, and those comparisons have continued into the new millennium with the modern Internet. But while the stylus-touch interfaces of modern tablets and the proliferation of online media do fulfill much of Bush’s technological vision, the key underlying epistemic concept put forth by Bush has been largely neglected.

The rapid sharing of ideas was an obvious use of the Memex, but Bush proposed that the educational value of the tool would be the ability to retrace the thought processes and connective strands that others had made through the ever growing sea of data. Bush said, “The inheritance from the master becomes, not only his additions to the world’s record, but for his disciples the entire scaffolding by which they were erected.” While academia is slowly moving towards open access models of publication, this deeper level of sharing the cognitive “scaffolding” of our theories has received far less consideration.

On Wednesday, Jeremy Dean and Jon Udell of Hypothes.is joined with Gardner Campbell as part of #openlearning17 to discuss how web annotation is allowing us to record and retrace our thoughts as we move from one website to the next. The collective pooling of reading notes, along with the ability to retrace the steps of an individual go a long way to fulfilling the epistemological vision of Bush’s Memex.

Here though I want to propose a different kind of database as an alternative and complimentary implementation of Bush’s vision. I have been working on an open note database called Situating Chemistry for a couple of years now. The concept is that researchers interested in the history of chemistry (don’t laugh, we exist) can come together to share a wide variety of notes. In addition to reading notes, the system can be used to create a profile of a historical person to record biographical notes compiled from both archival and secondary sources. These people can then be linked to each other to show familial, business, educational or any other type of relationship. Users can also record notes on the places that chemistry was done. A record can be created for a paritcular factory site, a university lab or lecture hall, or the site of an important conference. These sites can be linked back to the people who used them and mapped. Here we see a map from the 18th century of Paris with clustered pins representing the sites in our database:

A screen shot of the world map in the Situating Chemistry database featuring data about 18th century Paris

In addition to people and places, users can use the system to record notes on organizations, events, courses, sources, objects, collections, processes, and theories. The key is that researchers can use the database to take notes on any facet of their studies and then connect that facet to both its particular context and the broader population of notes within the system. Users can choose to put a password on their records, but the default and usual practice is to leave the note sets open to all users so that we can extend each others’ individual records and pool our collective research.

Designing an Open-Notes Database

Web-based databases of historical people like the Prosopography of Anglo-Saxon England (PASE) and the Prosopography of the Byzantine World were produced as conclusive publications of completed research. As such, the developers could accommodate, truncate, or omit problematic pieces of data. Published databases can also choose data visualizations that best highlight key records or insights. Researchers with Stanford’s Mapping the Republic of Letters, put together this beautiful dashboard to explore the correspondence of Enlightenment thinkers.

Screenshot of the Republic of Letters Visualization dashboard

Unlike these published databases, an open note system is designed to accommodate active note taking. Rather than structuring database fields to best present the data already collected, Situating Chemistry was built to be flexible. Because a researcher will more often than not have incomplete data, the only required field is the title of the record. The fields to record dates of birth and death can be partially filled out when only a year or year and month are known. They can also be marked approximate to indicate ambiguity in the historical record. Similarly, visualizations are employed to bring the researcher’s attention to interesting data and suggest new pathways for research.

Within any prosopographical project, the amount of biographical information available for the research subjects can vary widely. One project in Situating Chemistry focuses on the students of the noted Scottish professor William Cullen (1710-1790). In 1756, fifty-nine students attended Cullen’s chemistry course at the University of Edinburgh. Amongst them was George Fordyce (1736-1802) of Aberdeen who would go on to earn his medical degree from Edinburgh and become a lecturer of chemistry and medicine in London. Although there is no monograph length biography of Fordyce, he has entries in the Dictionary of National Biography and the Dictionary of Scientific Biography and is a relatively well-documented individual.

The other fifty-eight attendees of the course are less known. We know from Cullen’s notes that Robert Cumming was from Edinburgh, that John Richardson was from Northumberland, and that Henry Dunston was from some unspecified part of England. More surprisingly, at least two of the attendees were from Virginia—Thomas Clayton and James Taylor—and one was from Antigua—Christopher Hodge. Clayton, Fordyce, and eight other students would go on to earn their MDs from Edinburgh.

In designing Situating Chemistry database we wanted to ensure that we could capture structured, machine-readable data on someone like George Fordyce, or for that matter William Cullen. Additionally, we also wanted to be able to create records for people like Henry Dunston, for whom we had only a name and relationships of interest, in this case that he was a student in Cullen’s chemistry course, in Edinburgh, in 1756, with these other people, and he was from England.

Screen shot of the Situating Chemistry Database depicting the fields available for recording data about a person

The record for any individual can be linked to other individuals in several different ways. In addition to familial relations, we also have structured fields to collect information on instructor-student relationships and correspondents. There is also a somewhat generic person-to-person connection field that offers a list of relationships that can be expanded when needed. We designed the database such that every individual is the subject of their own record. A field denoting that a person was active in chemistry is automatically checked for every new record, but can be deactivated for familial relations, business partners, and others who are of interest but were not actively ‘doing’ chemistry even in the broadest definition.

In addition to linking a person to other individuals within the system, a person can also be linked to many other kinds of data. The database was initially conceived of as a way to catalogue sites of chemistry. We thus started the database with a table to collect information on apothecary shops, lecture halls, pharmaceutical manufactories, bleach fields, labs, etc. For a given site, the latitude and longitude of the site along with a modern address can be recorded along with information about the ownership and financial history of the site, the chemical activities associated with it, the organizational history, related images, documents, sources, etc. For each individual in the system, we display the sites that they owned and operated and also those additional sites that they were associated with.

After developing tables for the sites and people involved in chemistry, we developed further tables for chemical substances collections, courses, documents, events, images, letters, objects, organizations, primary and secondary sources, processes and techniques of chemistry, and archival and museum repositories. Any two records can be connected with an extensible series of subject-predicate-object relationships. For example a given individual could be a member of an organization or might have studied a particular chemical substance or been a practitioner of a particular process or technique of chemistry. Every record, whether it be for a person or any other type of data, can and should be sourced by linking it to primary and/or secondary sources. For the system as a whole then, we have tables for more than a dozen types of information and hundreds of structured data fields, all strung together into a relational web of information.

Interoperability & Extensibility

In its first conception, the Situating Chemistry database was thought of as a single table for sites with about a dozen fields. However, this variety of tables and fields grew organically through discussion of the research questions and practices that we, as historians of chemistry, conduct. The goal for the project was not to publish a completed set of sites or records, but rather to facilitate active research. A researcher could enter the data that they were collecting for a research project to organize and analyze the information, and they could take the database with them into the archives to continue to collect information. Researchers can access their records and add new records to Situating Chemistry from a laptop, tablet, or even a phone.

To accommodate both offline note-taking and the rapid upload of external data sets, the database has also been designed so that users can upload CSV files (excel-type tables). Any data in the system can in turn be downloaded as a CSV or in other structured formats including XML, RDF, and JSON. Because Situating Chemistry was designed as a research tool rather than a data-publication, the goal of the database is to allow users to both enter and access whatever fields and records sets they consider interesting. Several visualizations including tables, graphs, and a timeline are built into the system. The user can also extract whatever structured data they want to pull from the system, so that she can also generate her own visualizations using tools like Tableau or programming languages like Python and R.

The schema of the database is not specific to our current project, nor to the period 1760-1840. It could be readily adapted for use by other historians of chemistry (and alchemy) and historians of other sciences.  If you were to dump out the 5000 records that we have input into the system, you could convert the project into a Sites of Archeology database and record the digs of the 19th and 20th century.  You could just as easily record the observatories, telescopes, and astronomers of the 16th and 17th centuries or plot the biological specimens collected by Linnaeus’s correspondents.

While we certainly hope that our database will be used by and be useful to historians of chemistry, the real point of the project is to enable the collaborative epistemology proposed by Vannevar Bush.  History and humanities more generally are dominated by the single-author article and monograph, so a system built to pool research notes may seem counterintuitive. However, we need to remember that the point of these publications is to share our knowledge. If we all share our coffee stained notebooks, idiosyncratic excel files, and shoeboxes full of notecards, we can engage in deeper and more nuanced studies in the history of chemistry and science more broadly. Without sacrificing traditional academic products, we can collectively populate searchable, interlinked reference guides that will accelerate research and model our methodologies for the generations to come.

Please visit our site to learn more about the project and let me know if you would like to set up an account or get a copy of the code.

3 comments

  1. A. Nelson

    The site, and the concept behind are so inspiring! Thanks for sharing this (more thoughts via Hypothes.is)

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php