I recently attended a workshop on machine learning in libraries: Libraries Facilitating Cross-Disciplinary Research sponsored by the University of Notre Dame, Hesburgh Libraries on May 31, 2019 in Washington, DC. The purpose of the workshop was to and brought together computer scientists, literary scholars, librarians, and other folks. They received a planning grant from the Institute of Museum and Library Services (IMLS) to “assess the need for library-based machine learning and natural language processing tools to facilitate automated metadata creation and classification in support of cross-disciplinary discovery and research.”
For library applications, you should take a look at chapter 16 by Alan Darnell in Ken Varnum’s New Top Technologies Every Librarian Needs to Know. Chicago: ALA/Neal-Schuman, 2019. He provides examples of library applications including collection management. Making books that are not easily catalogued and more discoverable through natural language processing. Categorizing collections that are difficult to catalog (e.g. music?) is another possibility, and speech-to-text tools that assist in video-captioning and image recognition tools and can help in extracting images from movies.
Two primary objectives for the workshop: to understand and document current practices of each group. Second, examine topic modeling and other techniques to facilitate discovery in library systems and collections.
There were a number of projects presented during the morning session. I provide a quick overview of one here by Jon Dunn at Indiana University titled ‘AMP: The Audiovisual Metadata Platform.’ as part of the Media Digitization and Preservation Initiative. Dunn’s proposal is to automate digitization and born-digital acquisition methods using machine learning techniques. Some slides below cover the sources used, overall process, the workflow by archivists, and beneficial outcomes for scholars, staff, and students.
Challenges and Strategies
Some of the challenges and strategies discussed in the afternoon session included:
The IMLS award, full proposal, and bibliography are here. The final white paper will be available here as well.
I gave a short presentation to my CUA colleagues in August. Again, take a look at chapter 16 by Alan Darnell in Ken Varnum’s New Top Technologies Every Librarian Needs to Know. Chicago: ALA/Neal-Schuman, 2019. The bibliography is informative.