Open data: promise, but not enough progress from G20 countries – Transparency International

Transparency International is an organization devoted to stopping the abuse of power, bribery and secret deals perpetrated by government, individuals, and businesses:

“Open data is a pretty simple concept: governments should publish information about what they do – data that can be freely used, modified and shared by anyone for any purpose.

“This is particularly important in the fight against corruption. In 2015 the Group of 20 (G20) governments agreed on a set of G20 Anti-Corruption Open Data Principles. These principles aim to make crucial data public specifically because they can help stop corruption. Publishing this data would allow civil society to monitor things like the use of public resources and taxes, the awarding of public contracts, and the sources of political party finance. It would make it easier to hold governments to account and deter criminal activities like bribery and nepotism.”

Transparency International has 6 Principles of Open Data:

(from their website)

For more information on the challenges of getting open data, check out their most recent post:

Bibliographic Description & DH in Libraries


Jean Bauer (Associate Director of the Center for Digital Humanities at Princeton University) gave an excellent presentation titled This Just Got Meta: Bibliographic Description, Data Visualization + Digital Humanities in Libraries at American University, Bender Library on April 21st. The talk was part of their Colloquium on Scholarly Communication series. Bauer talked about the relationship between library catalog data, archival data and digital humanities projects. How do DH scholars use library metadata for their research and DH projects? How can these projects be used to showcase library material? How can librarians help DH scholars using Linked Open Data methods?

Bauer’s background is in American History with hands on experience building databases. Early on in her career–pursuing her Ph.D.–as Bauer worked on projects such as Project QuincyDocuments Compass, and the Dolley Madison Social Events Database, she came to realize that there are many ways a person interacts with the world and the bibliographical information documenting this could be shared through open linked data. She has created several annotated diagrams reflecting these complexities.


Here we have the realization that this data has been here for a long time but we just didn’t notice. Library catalog records predate relational databases, XML, and other formats.

Bauer echoed the point made by Deb Verhoeven in her paper Doing the Sheep Good: Facilitating Engagement in Digital Humanities and Creative Arts Research: “…we don’t just learn from the data itself but also from the way that data is used and reused.”

Bauer talked about DPLA and the many has many apps created to use and manipulate the metadata.

Yet all data and metadata are theory-laden. Borg and Sadler in their article Feminism and the Future of Library Discovery write about the myriad of ways “the practices of libraries and librarians influence the diversity (or lack thereof) of scholarship and information access.” (worth the read!).

Bauer used the example of the Mapping Colonial Americas Publishing Project (created by Jim Egan and Jean Bauer) that uses the American Antiquarian Society database (downloaded in a CSV file, 40,000 records) and the Brown University Library catalog.

Once this open data is linked, research questions were posed from the project asked about this project: What can subjects tell us about genres? How do library catalog shape literary fields? How do collections reframe book history? What should be the [eye]space of colonial publishing? Can literary history be told visually? Can we investigate zones of printing? Can we build visual portals for students and scholars?

Bauer views DH projects to be specialized libraries–designed to answer specific humanities research questions and library catalogs can be used to answer research questions by humanities scholars. Book history requires specialized catalog records whereas the move to simplify library catalogs for public consumption (and you don’t want to anyway—too time consuming and expensive and not needed anyway) may impact humanities scholarship in the future. Data can be publicly shared in repositories like

Bauer’s talk was especially insightful in that she showed how her ideas evolved over the years and how they took form in the various web sites she created. It is rare to create something and have it retain its original form over the years. The manifestation of a project evolves over time with the acquisition of new skills, new people are brought on board, and the ad hoc method of trial and error ceases to become a heuristic as the process matures.

UPDATE: The entire talk is available on YouTube:


Scholarly Publishing and the Open Access Ecosystem

As part of the Catholic University of America Libraries speaker series on Open Access for International Open Access Week, our final presentation occurred on October 28th, 2015 at the Busboys and Poets across the street from CUA. The panel discussion was devoted to answering the question, “What do scholarly authors and researchers need to know about Open Access?

Kim Hoffman, Coordinator of Scholarly Communications at Catholic University, introduces the panel.
Kim Hoffman, Coordinator of Scholarly Communications at Catholic University, introduces the panel.

Moderator: Dr. Rikk Mulligan, ACLS Public Fellow and Program Officer for Scholarly Publishing, Association of Research Libraries.

Faculty Panel:

  • Dr. Trevor Lipscombe, Director of the Catholic University of America Press
  • Dr. James Greene, Vice Provost and Dean of Graduate Studies, The Catholic University of America
  • Dr. Jennifer Paxton, Assistant Clinical Professor, Department of History, and Assistant Director, Honors Program, The Catholic University of America

Rikk provided a historical overview of Open Access (OA). Many of his points can be found in the excellent overview of Scholarly Communication: Transformation of Scholarly Communications.  Research Library Issues, no. 287 (2015) published by the Association of Research Libraries.

Scholarly communication began in 1665 with the Royal Society of London collecting notes and letters from members. These items were published in the Philosophical Transactions of the Royal Society. As scientific research increased, more journals began to sprout up. Peer review and editorial practices were specialized and done by a few individuals for non-profit. Given the small size of the audience, sales of the journals were modest and generally not sufficient to cover production and labor costs. Consequently, much of the labor became essentially an act of reputation and prestige (i.e. this process later evolved into assessment, promotion and tenure for faculty). Universities became non-commercial publishers as scholarship and research expanded and increased in output.

Public funds are given to libraries to purchase. When the budget cuts took place in the 1970’s, the commercial interests bought up small publishers which has led to the situation we have today: most scholarship is owned by commercial publishers. The publishers created bundled packages of journals which has not satisfied anyone (a respected journal is bundled with ‘lesser’ journals with the librarians being forced either to buy the entire bundle or make the decision not to buy the bundle, the latter decision frustrating scholars). The consequence of all this is that prices continue to increase. Hence the rise of the Open Access movement.

By the 1990’s, scholarly publishing was in a state of flux. Print copies were declining while electronic format were rising. ArXiv, a scientific journal repository, was an early adopter of OA. Humanities projects like Valley of the Shadow were early accomplishments in the Digital Humanities realm. While the HTML coding on this project has become obsolete, the data is still useful.

l to r: Rikk Mulligan, Trevor Lipscombe, James Greene, and Jennifer Paxton
l to r: Rikk Mulligan, Trevor Lipscombe, James Greene, and Jennifer Paxton

The discussion focused on where we are today. Dr. Paxton talked about the concept of simony in the medieval era, the sin of paying for the gifts of the Holy Spirit. If the masters could not take payments, they were allowed to take gifts and this is how they survived. The analogy applies to the situation with faculty and modern scholarship: scholars don’t need to be compensated because they are being taken care of by their universities. The limitations of this process can be seen with the idea of subvention fees: a) some universities can pay the fees, others cannot, b) adjunct faculty–who make up an increasingly larger part of the teaching load, are often shut out of the entire scholarly process (and can’t afford to pay for the fees either).

Dr. Geene asked a pertinent question: is OA a good thing? Gutenberg was the OA of its day. Writings were highly restricted before the printing press came along (assuming one could read). Once printing became prevalent, the dissemination of knowledge increased significantly. OA should be discussed in the context of the dissemination of knowledge. Dr. Lipscombe mentioned that complaints about the quality of scholarly work were present even in medieval times. The philosopher Roger Bacon (1214-1294), a noted publisher, lamented the quality of output by his minions. It was a common notion of the time that men who were married wouldn’t be able to devote themselves exclusively to manuscript production since their minds would be divided between their work and home life. Lipscombe noted that there has always been competing interests between publishers, editors, and scholars, and the notion of scientists versus humanists.

Question: Is there a hierarchy of the access to knowledge?

Paxton mentioned that we have a specialized body of knowledge. Most people have access to a library, and cannot consume knowledge without one (to a certain extent). Greene stated that most people have access to the internet. He used the example of Gopher, a system of storing and displaying files on a server that predates the WWW from the early 1990’s. It was immensely popular and an early example of OA until the University of Minnesota started charging for it. Consequently, the volume plummeted and the WWW took off. Liscombe mentioned that independent scholars and adjunct faculty don’t have access to money to publish. Mulligan added that not only independent scholars but scholars with PhDs who go the Alt-Ac career route don’t have access to subvention fees. Could we not have a waiver fee, he asked?

Lipscombe remarked that some publishers have acknowledge the problem. For example, Project Muse has given access to scholars in developing nations if access fees are an institutional hardship. Greene mentioned that Catholic University does have a fund for subvention fees but noted that this doesn’t guarantee OA. Another issue is citation relevancy. Green noted that 18% of physics articles are behind pay walls (82% were not) yet there is a much larger number of citations in OA articles than the closed off articles. Last, he noted that OA speeds up innovation. For example, the speed of innovation in Android is greater than the iPhone because anyone can gain access to the Android operating system and improve it. Paxton made the point that many different versions of a manuscript can exist yet there is no substitute for quality control; who, in the end, pays for it?

Question: How do you increase the positive perception by university administration of OA?

Greene stated that producing tangible results in scholarship would impress administrators. Greene took an example from the medical sciences. Francis Collins, former director of NIH, wrote an article detailing how a chromosome of a particular gene could be cloned. Another researcher saw the article and contacted him to collaborate on this project. Within a year they had solved the problem and made an advancement to fighting a disease. Paxton mentioned that publishers should be more transparent in costs.

The session wrapped up with some comments and questions from the audience:

OA looks different for humanities vs the sciences–how exactly? Scientists are more interested in publishing articles as the speed of scientific innovation is much greater than in the humanities. Preprints (ArXiv) are especially valuable. Scientists are interested in grants, charges, and fees as they impact the scholarly process. In the humanities, the dominate mode of scholarly transmission is the monograph which takes longer to produce.

Someone mentioned that OA allows scholars to translate works into other languages.

Open Access and Institutional Repositories: the DRUM Experience

As part of the Catholic University of America Libraries speaker series on Open Access for International Open Access Week, Terry Owen, Digital Scholarship Librarian from the University of Maryland, College Park, gave a talk October 20, 2015 on institutional repositories. While he focused on the Digital Repository at the University of Maryland(DRUM), he discussed also the general issues and problems that are faced by any academic institution that wants to set up an IR. His comments are summarized below.
OA literature can be defined as “…digital, online, free of charge, and free of most copyright and licensing restrictions.” (Peter Suber, 2004)
SPARC has their definition: “Open Access is the free, immediate, online availability of research articles, coupled with the rights to use these articles fully in the digital environment.”
Mr. Owen defined open access as “a digital collection capturing and preserving the intellectual output of a single or multi-university community.”
The Directory of Open Access Repositories (OpenDOAR) lists over 3,000 academic IRs.
Terry Owen
Terry Owen describes how DRUM came about

Digital Repository at the University of Maryland (DRUM)

  • Initial proposal to Provost from ULC May 2003
  • Mission: to store, index, distribute, and preserve the research works of UMD faculty
  • Developed using DSpace
  • Open source
  • Active user community
  • Out-of-the-box implementation
DRUM was launched in August, 2004 with more than 1,100 documents. As of October, 2015, the repository contains more than 16,500 documents: 10,744 these and dissertations and 5,897 faculty and student papers and projects. The documents have been auto-added for many years.
Having good policies on managing an IR can save time and effort.

DRUM policies:

  • Depositor must be a UMD faculty member
  • Depositor must have the ability to assign needed rights to UMD
  • Deposits must be substantial works of scholarship
  • Deposits must be complete and able to stand alone as a work or collection(no notes, etc.)
  • No restrictions on the formats

Faculty outreach is the key to getting faculty involved. Terry offered the following suggestions:

  • Campaign to faculty offering to deposit papers on their behalf and check copyright permissions
  • Participate in new faculty orientation
  • Open access week
  • Work extensively with library liaisons

Faculty need to know their rights and obligations. The SHERPA/RoMEO website satisfies this need by cataloging publisher copyright policies and self-archiving limits.

Faculty needed to be convinced to deposit in DRUM. Faculty had many concerns:

  • Redundancy
  • No time
  • Copyright confusion
  • Fear of plagiarism
  • Work associated with inconsistent quality
  • Publishers might not accept articles for publication if freely available

However, there are advantages:

  • Research widely available
  • Greatly increases the chances of the research being cited
  • Access is maintained with a permanent URL
  • No need to maintain files or changing URLs on personal websites
  • Easy to deposit works along with associated content
  • Research is quickly and easily available from any computer at any location
  • Number of downloads shows impact of research
DRUM and other IRs have focused on adding grey literature from:
  • Research centers/institutes
  • Undergraduate research
  • Research data
  • Electronic theses and dissertations (ETDs)

For DRUM, adding ETDs electronic theses and dissertations have been mandatory since September, 2003. The documents are submitted via the web in PDF form, are automatically deposited in DRUM, and are still available in the ProQuest Dissertations and Theses Global database.

Students have embargo options:

  • Restrict access for one year
  • Seek patent protection for material in the thesis
  • Published in a journal that has restrictions for depositing in an open access repository can restrict access for six years
  • Publish a book based on the research
  • Restrict access indefinitely (requires written approval by the Dean of the graduate school)

Terry has found that embargo requests in DRUM average 39% per semester by students which seems high (Terry does not know why).

Open Access Proposal was submitted to the UMD Academic Senate in 2009:

The Resolution

  • The president should collaborate with other universities
  • Library should keep faculty informed and assist in negotiating copyright arrangements
  • Researchers are encouraged to publish in OA journals, negotiate to retain the right to post, and consider journal price
  • Researchers are encouraged to deposit in DRUM

However, the senate voted in April, 2009 and turned it down!

What did we learn?

  • Need a clear message
  • Focus on one aspect: self archiving
  • Don’t assume faculty know about OA
  • Senate not the best place to start
  • Build support from the ground up
  • Customize message to fit needs and interests of each department

It should be noted that Open Access is different for the sciences and the arts/humanities.

Open Access for the arts and humanities:

  • Books play a larger role in journal articles
  • Demand for articles decline slowly and sometimes grows
  • Use and citation of journal articles is typically much longer
  • Timely access to journal research not as critical as in the sciences
  • Little government funding

What’s happened since at the UMD?

  • Invited to faculty meetings
  • University Library Council spent majority of 2009 to 2011 discussing OA
  • Establish joint Provost/Senate open access task force in 2012
  • UMD signed the Berlin Declaration February, 2013
  • Establish OA publishing fund September, 2013

The session ended with questions from the audience.

Update: December 8th, 2015: Video of the lecture is on YouTube: