Project Andvari Workshop: constructing a thesaurus

The Project Andvari team met November 7th and 8th, 2014.  The task of the workshop was  to create a basic thesaurus, fulfill a ‘proof of concept’ requirement (i.e. create a pilot project), and discuss future steps in the evolution of the project.

When completed, Project Andvari will be an online database for scholars and the public to search for pre-Christian images from the Medieval Norse, Nordic, Anglo-Saxon, and northern European traditions, covering roughly 400 AD to 1200 A.D.  For more background information regarding the history and parameters of the project,  grants received and submitted, individuals involved, and other delightful information, go to the blog.

 Friday, November 7th, 2014

Lilla Kopár, co-project director, convened the meeting and gave an itinerary for the next two days.  Her first big question was:  can we do this project at all? We discussed in the workshop in 2013 the digital tool we wanted to build: a portal for art and artifacts in medieval northern Europe. We want to go beyond the traditional boundaries. Terminology and chronological boundaries from the 4th -12th centuries will be based on the geographical context.

Constructing the thesaurus(l to r) Samuel Russell, Daniel Pett, Danielle Joyner, and Nancy Wicker.

The aims, scope, data plan, and future work were discussed at last year’s meeting. The NEH grant was received and paid for the workshop. A new grant was submitted recently with the goal that if accepted, a pilot project will be created.  The original goal was to create an iconographical database for Norse mythology. It was determined to be too huge and difficult, so it was scaled down to artifacts of the early medieval period.

The beauty, originality, and significance of this project is that there is no controlled vocabulary for an iconographical database . Nancy Wicker mentioned that there are models for Christian databases but none for a Norse iconographical database.  A number of students in my LSC634 Digital Humanities did research to determine whether there was a thesaurus that would support such a database but to no avail.  However, they did find some information about constructing thesauri for image databases.

Friday’s meeting determined the best way to approach this part of the project. On Saturday, we would construct the thesaurus template.

Things to consider

Joseph stated that we have broad concepts and specific questions that must be answered. Do we want a general database for the public or something advanced for scholars? Broad conceptually covering content but not so granular that it becomes a Herculean task. Broad applicability while minimizing the sacrificing of granularity of the icons. Lilla would like the project to be a discovery tool, something to lead the public and scholars to other resources.

Daniel Pitti mentioned that the granularity could be inherently unnecessary. Chris mentioned that Cambridge has databases that were ‘dirty’ and the databases had to be recreated every few years; we want to avoid this scenario. Further requests to Cambridge for information on particular items lead to ‘I don’t know’ responses. Daniel Pitti mentioned that a standard vocabulary would be a solid basis. Will the quality of the descriptions vary—yes, but standardization is a good thing. Cross referencing the concepts can be done later.

Joseph mentioned a gap in the resources. We are trying to create an access point for research and discovery. We will be using a SKOS (Simple Knowledge Organization System) environment with its inherent hierarchy for multilinguistic labels . For example, do we label something a serpent or a dragon (or both?).  Another example: an arm or a limb? Marcus Smith mentioned that if an image is tagged as a ship, we could incorporate a ship ontology into the system which would save time and effort. One question: how broad does this need to be? We have an idea of how useful it will be (the ‘overfishing’ approach) but we would like feedback on how broad do we have to go. Ideally, we want to link into existing iconography without covering everything. The goal is to have something functional for a pilot. Enough to be functional but not too big to drive us insane. The paradigm case of how description can quickly become complicated would be the Franks Casket.  The Casket has many images, stories, and symbols that can be tagged.

Daniel Pitti mentioned that we should define scope and decide how far we have to ‘drill down.’ Daniel asked: how important are tags?

Lilla kept the meeting moving forward by mentioning that we had other questions to be answered. We need to follow standards that have not been duplicated and have interoperability. Codifying vocabulary that is interchangeable and interoperable. We do not want to reinvent the wheel. We looked at ICONOCLASS as an example at the head categories but has the same pitfalls that we have seen with our own project. The outline in ICONOCLASS demonstrates the complexity of assigning subjects and tags. Some subjects and tags will overlap; there is no way around it. Joseph mentioned that the relics from later periods will require a lot more work in terms of detail description. We want our project to have a structure and rules independent of the content.

Creating a classification can go two ways:

  1.  cataloging an identifiable object
  2. for browsing when the researcher doesn’t know exactly what she wants.

Do we have a list of object types? Yes, from the British Museum Portal Antiquities Scheme.

What would you use it for?
Lilla stated that we will create our scheme and test approximately 80 photographs against the scheme to see what works, where the road blocks are, etc.

Maintenance and Improvement Policies
The Institute for Advanced Technology in the Humanities will maintain the database. Question: who has the right or privilege to add content and make the database grow? We need to determine a method for when authorities disagree. Another issue is ensuring that we have persistent identifiers and that they are the standard in the industry.

Tools discussed

  • Pool Party will be used to manage our thesauri terms (broader, narrower, and related concepts). We will use the languages English, German, and Swedish. However, it is very expensive.
  • Protégé is a cheaper version from Stanford.
  • CRM conceptual reference point was also discussed. CIDOC is a description of objects and models the real world. It models the world and parts of a description in a broad way. It creates a series of statements.

Daniel Pett mentioned that they used crowdsourcing at the British Museum (their Bronze Age Index cards could be useful). We could use a similar approach for translating terms into different languages. Two suggestions for crowdsourcing:

Daniel Pitti mentioned that we should carve out a small part of the world and rule it! Our project should be unique and complement other sources (e.g. the Deutsche Digitale Bibliothek, Europeana Library, etc.).

Conclusions for Day One
In wrapping up, Lilla asked us to take one concept from the proto-thesaurus list devised earlier in the day and that would be what each participant would work on for Saturday. Karen stated that the scheme has three levels: object representation is one level, pattern is another level, and religion is a third level, in increasing order of abstraction. Erwin Panofsky’s Perspective as Symbolic Form, a seminal text in perspective in intellectual history, was mentioned as a template for thinking about the images. It was noted that motif and iconography are difficult to distinguish.

Saturday, November 8th, 2014

Several issues were raised in Friday’s meeting about the structure of the project:

  1. The technological aspect has to be addressed. Feed Protégé into Pool Party and move on. We will feed Pool Party today and it will be hosted by Daniel Pitti at his website (Institute for Advanced Technology in the Humanities).
  2. What do we do with the Christian iconography and how decisions are to be made.
  3. The content crowd should come up with a mission statement of our project. What is there and what is excluded?
  4. How shall we describe patterns?
  5. What shall we do with the mythology and individuals—what do we want to restructure (ie. top categories) exactly?
  6. Do we need an advisory board and who will be involved, how do we divide it up, etc., who will do the tagging?

Collecting the concepts for the thesaurusJoseph Koivisto (standing) and Daniel Pitti

Where will many of our images come from? Daniel Pett stated that we have access to 44,000 images from the British Museum and 281 object types for medieval iconography. This can be pared down to what will be useful for our project. 140,000 images from Kringla. Norwich Castle Museum has useful icons (an unknown number), especially in the Anglo-Saxon and Viking Gallery ( Do we add the Medieval Coin database at Fitzwilliam Museum? Chris also asked whether there will be links to the museum’s collection or are we aggregating the data? Worthy stated that we will do both. Daniel Pitti stated that we will have identifiers in a table and we will decide what will be applied. For our pilot project we should have 1,500-2,000 images for demo purposes. The Museum of Ireland has possible sources but there are too many practical and administrative problems. National Museum of Scotland is more receptive.

What will be the geographical and chronological scope? The answer is Northern Europe, Scandinavia, British Isles, Denmark and Germany. Scotland and Ireland can be included as well. We want to expand eastward; the whole Baltic rim ideally. Finally, how far south shall we go, southern Germany, for example?

Panofsky’s work was discussed the previous day.  Karen and Lilla provided handouts to explain Panofsky’s concepts as thye apply to the categories, the geographical limitations (‘northern’ for Panofsky meant the Netherlands, northern France which would not work for our project), and the focus on Christian iconography.

Next we discussed the main categories (top-down approach) before working on the descriptions of the icons (bottom-up approach). Worthy mentioned that we want to do ‘iterative refinement,’ move back and forth between the two approaches. One can have an interpretation of an image with multiple terms. Daniel Petti asks what are the possible ways of depicting a figure? As a category, named identities can be at an abstract level. Named identities can be tied to literary traditions. Chris Roberts demonstrated on the board:  does composition affect iconography? Generally speaking, no.

The general pattern was as follows:

General or Specific
Natural world
Built Environment

First, you ask what general category the image falls under. Second, you go to the specific category. For example, if animal is selected, then what types of specific animals are listed?

What are the top categories? Descriptives and Interpretation are the top categories.
Images can be signed to more than one top category.

Figure human
Built [Environment]

We broke into groups and analyzed about 80 images of Viking and Norse iconography. Each group described the image as best it could using the schema decided upon above.
In the afternoon, we listed the main descriptors on the board. Chris, Nancy, and Karen expanded and refined the original 129 categories gathered by the groups and reduced the list to 111 categories. This list was put in an Excel spreadsheet and it will be the basis for further development.

Gathering the descriptorsJoseph Koivisto records the descriptors from the members

The categories will be entered into Protégé and Daniel Pitti and Worthy will host it at their institute’s site. Daniel Pett will upload the database into the British Museum website as a crowdsourcing project.  Danielle provided the mission statement.

The raw thesaurusThe basic descriptors for the thesaurus

Time Line and Next Steps
The second stage of the NEH DH grant proposal was submitted in September.  We will know if we are successful in March, 2015. If successful, the grant will begin in May and go for 18 months. Joseph will be the official Project Assistant.

We will have access to Protégé beginning in December.

The crowdsourcing project will be run through the British Museum platform. For crowdsourcing, we will need the images and a robust thesaurus before it can be put on the BM web site. The crowdsourcing project will run from early January and we hope it will provide valuable feedback. We should have a project put together by late December.

We will need a sample of objects; ideally from the British Museum, Kringla, and the Norwich Castle collection, Anglo-Saxon and Viking Gallery. The images should be broadly representative of the objects. Tagging and language transcription will be done by Daniel Pett at the British Museum. Crowdsourcing is to be provocative rather than definitive. Karen will have her students code 10 objects in her spring, 2015 course. Daniel Pett demonstrated the crowdsourcing project using the MicroPasts platform.

More photographs are archived on Flickr.


