XML oXygen Editor Users Meetup

I was in Rockville, Maryland on May 5th for an all day meeting of the DC area oXygen XML Editor users. Hosted by George Bina and Radu Coravu, who outlined the new features in oXygen version 19 released last month.  This meeting was special in that it doesn’t happen every year so I availed myself of an interesting opportunity.

Many Improvements to DITA

DITA reusable components view, insert DITA Key References

FullSizeRender (2)

Improvements included converting DITA to PDF using CSS, DocBook 5.1 schema updates and stylesheets, and Markdown, XSLT improvements with Saxon, convert between multiple xsl:if and xsl:choose, using partial XPath paths to create new code and templates, and other refactoring features. Many of the improvements are designed to speed up workflow. TEI Schemas 3.1.0 were updated. Web Author improvements including CMS connectivity.

User Assistance with Schematron

Schematron is a ISO standard (ISO/IEC 19757)–DSDL (Document Schema Definition Language); a very simple schema language less than 10 elements and 20 elements in total; a different kind of schema–defines business rules, not the document structure, the error messages are specified inside the schema. XPath uses Schematron to match and assert; XSLT to extend XSLT based Schematron implementations; and SQF provides quick-fixes to identified issues defined as small scripts annotating the Schematron assertions. Schematron can help you in the authoring phase of a project instead of review time, publishing time, and production time. Identify and fix the problems as early as possible, says George!   Intelligence style guide project for Dynamic Information Model;  https://github.com/dim

Some rules:

FullSizeRender (2)

Learn DITA from a Markdown perspective
Markdown is  a text-to-HTML conversion tool for web writers. Markdown recognizes Markdown fragments in DITA topics and convert them automatically to DITA markup. the code is on GitHub: https://github.com/oxygenxml/ditamark. Recognized Markdown patterns include lists, quotes, links, images, tables, and titles.

Schematron Quickfix with SQF can be found here: https://github.com/schematron-quickfix/sqf.  SQF is a simple language to use (only 4 actions!).

Changing  XML with XSLT and XQuery

Implement Author actions and refactoring options

XML refactoring actions allow making repetitive structural changes in XML documents based on specific use cases.

Lightning Talks

There were some short presentations on oXygen XML editor features and applications.

oXygen for training sessions

JATS Support in Oxygen XML Editor

Wendell Piez from Piez Consulting Services provided an overview of JATS support. A wiki is available: http://jatswiki.org/wiki/Tools#JATS_Framework_for_oXygen_XML_Editor .

Testing XSLT

XSpec is a unit test and Behaviour Driven Development (BDD) framework for XSLT and XQuery. https://github.com/xspec/xspec/wiki

Code is available at GitHub: https://github.com/xspec/xspec

Three common ways for testing XSLT: matching scenarios,  named scenarios, and function scenarios.

Discover the Author mode

For development XML based languages like:

Making XML Editable on the Web

XAAS –XML Authoring as Service

oXygen XML Web Author = A REST service to interact with XML content!


  •  URL –pointing to the file to edit
  • ditamap — pointing to a DITA map for editing context
  • … — more parameters are available

Integration is key

The web XML editor DOES NOT EQUAL opening XML to edit in browser. The real power of a web editor can be seen when it is integrated into your workflow.


xproc.org and exproc.org are similar services (GitHub, Travis).

Integration is the key for web XML authoring!

Discover the oXygen Collaboration Platform

Why oXygen Content Fusion?

You need to:

  • collaborate
  • have access to specific tools
  • work within a specific workflow
  • access a repository
  • approve changes before reaching the repository
  • have a less formal way of receiving feedback.

Content Fusion allows you to create a review/collaboration task, share tasks URL with your contributors, and get changes back . There is a Content Fusion Connector plugin for oXygen as well.

Using Saxon JS

Wendell Piez gave a presentation on using Saxon JS.

Future Plans

George Bina outlined future plans for oXygen. The summary can be found here: https://www.oxygenxml.com/events/2017/futurePlans.pdf


Science & Math in the Humanities

On Wednesday, October 19, at The Catholic University of America, our new Dean of the School of Arts and Sciences, Dr. Aaron Dominguez, gave a talk titled “Science & Math in the Humanities” to the Department of Library and Information Science. Dean Dominguez was the Associate Dean for Research and Global Engagement and a Full Professor of Physics and Astronomy at the University of Nebraska — Lincoln (UNL). Dr. Dominguez, whose area of research is experimental high energy physics, has a strong history of research and grant activity.

What connections can we made in the liberal arts? Specifically, can we connect creative mathematical and physical thinking with creative thinking in the humanities? If so, what is to be gained by this? He provided two examples: text analysis and network analysis.

Dean Dominguez started from his background in particle physics and how he thinks as a physicist.



Dean Dominguez has worked at the CERN Large Hadron Collider in Switzerland for many years. As physicists, they had to distinguish and separate the various types of particles in order to extract meaning from the noise.  He noted that this strategy/methodology is used in text analysis and other digital humanities areas.

How does this apply to digital humanities? Dean Dominguez used the humorous video, ‘Kurt Vonnegut on the Shapes of Stories.’ to illustrate that stories and narratives have structure that can be quantified and graphed:

Professor Matthew Jockers has tried to do just that—see if there are hidden structures in large body of texts. Dean Dominguez has been friends with Jockers since they were colleagues at the University of Nebraska-Lincoln. Dean Dominguez mentioned that most of what followed in his lecture is from Jocker’s work which can be found here:

Jocker’s blog: http://www.matthewjockers.net

Jocker’s book: http://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature

Access to R package (“Syuzhet”) for text analysis: https://github.com/mjockers/syuzhet

I highly recommend Jocker’s book for folks interested in learning text analysis (and R!). I use the book in my digital humanities course that I teach in the department of library and information science at CUA.

Opinion and sentiment analysis is a hard problem. You will need to go through a body of text and ‘tag’ each word as positive, negative, or neutral. Lexicons of positive and negative words have been developed over the years and a researcher interested in doing this type of analysis should seek them out as it would be a real time saver. Dean Dominguez uses the following example:


You can try this method with whole novels. Jockers used Portrait of the Artist as a Young Man as a test case:


The plot looks indistinguishable from the ‘noise’ so we use a low pass filter (similar method outlined from the CERN slides above):


Try a Discrete Cosine Transformation as a Low Pass (another way to do a Fourier transformation):


The plot begins to stand out from the noise.

Dean Dominguez asks: But What about Human Readers?:

  • How well does this simple approach correspond to real human readers interpretation sentiment?
  • In order to debug your computer code or understand your physics equipment, it must be calibrated. I.e. it must reproduce the known correct answer.
  • Pay students to read novels and score overall sentiment of each sentence is negative, neutral, positive.
  • Compare with computer-generated response.



“Proof of Principle is demonstrated,” says Dominguez:

  • “The fairly simple algorithm seems to capture the basic plot arc of the novel
  • Without filtering, it is lost in the high-frequency noise
  • Calibration signal exists (human coders)
  • Jockers has a 40,000 corpus of novels which he is running this on. Looking for archetypal plot arcs is doable
  • Similar technique could be used for related analysis on video and audio; suspense; threat”

How can we encourage such connections between scholars in the digital humanities?

While Dean Dominguez met Jockers personally from their work University of Nebraska, Lincoln, he noted that there are many such potential collaborations across disciplines.

The second example deals with network analysis. The question that needs to be asked: “How can we recognize existing and potential possible collaborations?” As a starting point, Dean Dominguez used the College of Arts and Sciences at the University of Nebraska, Lincoln, to see what types of relationships existed between the departments, the scholars, and the administration.

The College has 18 departments in the humanities, social sciences, and sciences. Dominguez wanted to know:

  1. “What does our externally funded research portfolio and engagement look like?
  2. Can we gain with further interdisciplinary research activities?
  3. What role do the existing interdisciplinary research centers play?
  4. What do the current research collaborations look like in terms of departments and individual PIs?
  5. What potential collaborations seem to be missing?”

To answer these questions, he turned to network analysis using Gephi. After outlining the methodology (i.e. what are nodes, edges, closeness centrality, etc., see below)……………….



The results on the next couple of photographs show relationships between departments. While some departments are linked ‘naturally,’ other departments are linked by way of another department. Are there opportunities available for collaboration between departments that are not directly linked but seem to have some commonalities?  img_4477img_4478

The photo below shows the relationships between faculty. Notice the circular set of separate dots in the center of the photograph: these denote humanities faculty members. The lone humanities scholar is not a cliche!


Dean Dominguez hopes to apply this network analysis to the departments and faculty at Catholic University to see what underlying relationships exist.

In summary, Dean Dominguez demonstrated with two specific examples of how creative thinking can take place between the sciences and the humanities and that hidden relationships can be discovered. CUA is the perfect place to do this given the liberal arts mandate of the University and the inherent interdisciplinariness of many of CUA’s programs.  Dean Dominguez noted that since we are a Catholic school, ‘there is an orthogonal dimension as well: we ask what the meaning of things is, what is true, what is the right way to do it to help people.’

40 people showed up for the colloquium which suggests an increasing interest in DH research at CUA.

Bibliographic Description & DH in Libraries


Jean Bauer (Associate Director of the Center for Digital Humanities at Princeton University) gave an excellent presentation titled This Just Got Meta: Bibliographic Description, Data Visualization + Digital Humanities in Libraries at American University, Bender Library on April 21st. The talk was part of their Colloquium on Scholarly Communication series. Bauer talked about the relationship between library catalog data, archival data and digital humanities projects. How do DH scholars use library metadata for their research and DH projects? How can these projects be used to showcase library material? How can librarians help DH scholars using Linked Open Data methods?

Bauer’s background is in American History with hands on experience building databases. Early on in her career–pursuing her Ph.D.–as Bauer worked on projects such as Project QuincyDocuments Compass, and the Dolley Madison Social Events Database, she came to realize that there are many ways a person interacts with the world and the bibliographical information documenting this could be shared through open linked data. She has created several annotated diagrams reflecting these complexities.


Here we have the realization that this data has been here for a long time but we just didn’t notice. Library catalog records predate relational databases, XML, and other formats.

Bauer echoed the point made by Deb Verhoeven in her paper Doing the Sheep Good: Facilitating Engagement in Digital Humanities and Creative Arts Research: “…we don’t just learn from the data itself but also from the way that data is used and reused.”

Bauer talked about DPLA and the many has many apps created to use and manipulate the metadata.

Yet all data and metadata are theory-laden. Borg and Sadler in their article Feminism and the Future of Library Discovery write about the myriad of ways “the practices of libraries and librarians influence the diversity (or lack thereof) of scholarship and information access.” (worth the read!).

Bauer used the example of the Mapping Colonial Americas Publishing Project (created by Jim Egan and Jean Bauer) that uses the American Antiquarian Society database (downloaded in a CSV file, 40,000 records) and the Brown University Library catalog.

Once this open data is linked, research questions were posed from the project asked about this project: What can subjects tell us about genres? How do library catalog shape literary fields? How do collections reframe book history? What should be the [eye]space of colonial publishing? Can literary history be told visually? Can we investigate zones of printing? Can we build visual portals for students and scholars?

Bauer views DH projects to be specialized libraries–designed to answer specific humanities research questions and library catalogs can be used to answer research questions by humanities scholars. Book history requires specialized catalog records whereas the move to simplify library catalogs for public consumption (and you don’t want to anyway—too time consuming and expensive and not needed anyway) may impact humanities scholarship in the future. Data can be publicly shared in repositories like Opencontext.org.

Bauer’s talk was especially insightful in that she showed how her ideas evolved over the years and how they took form in the various web sites she created. It is rare to create something and have it retain its original form over the years. The manifestation of a project evolves over time with the acquisition of new skills, new people are brought on board, and the ad hoc method of trial and error ceases to become a heuristic as the process matures.

UPDATE: The entire talk is available on YouTube:


Poem Viewer: a DH tool for close reading

Poem Viewer is a data visualization tool for close reading of poems that you can upload.

“Poem Viewer was created as part of the project Imagery Lens for Visualizing Text Corpora. The project explored new visualization techniques for use in large scale linguistic and literary corpora using the collections of the British National Corpus and various smaller collections of poetry. The project was funded as part of the international ‘Digging Into Data Challenge’, funded by the JISC in the UK, and the National Endowment of the Humanities in the USA.”

Using W.B. Yeats poem ‘A Drinking Song’ as an example, the image of the first two lines is given in the following snapshot.


The researcher has the option of manipulating several features: layout and overview, phonetic units and attributes, phonetic features like vowel position and consonant features (voiceless and voice), phonetic relations (e.g. rhyme, end rhyme, alliteration frequency), word units and attributes, and finally, semantic relations.  The orange, blue and grey boxes above represent vowels, consonants and punctuation respectively.   Here is a partial image of the panel of choices available:


Oxford e-Research Centre at the University of Oxford created the Poem Viewer as an experiment in analyzing poetry through data visualization.

If you are interested in more information about this tool, there is an open access article:

Rule-based Visual Mappings – with a Case Study on Poetry Visualization. A. Abdul-Rahman, J. Lein, K. Coles, E. Maguire, M. Meyer, M. Wynne, C. R. Johnson, A. Trefethen, and M. Chen. In Computer Graphics Forum, 32(3):381-390, 2013. [PDF] [BibTex].   A video describing the poetry visualization tool is available. [MP4 video (18.7 MB, audio)]
Thank you to my buddy, Kim Hoffman, for bringing this to my attention.