The Toolchain: First Pass

Today I’ve been kicking around the ICT office with Alex, figuring out how to make Jenkins (our wonderful CI server) build and publish the latest version of the CWD with all the bells and whistles like compilation of CSS using LESS, minification, validation of code and so-on. As part of this we managed to fix a couple of bits and pieces which had been bugging me for a while, namely the fact that GitHub commit notifications weren’t working properly (fixed by changing the repository URI in the configuration) and the fact that Campfire integration wasn’t working (fixed by hitting it repeatedly with a hammer).

This brought me to thinking about how our various things tie in together, so I set about charting a few of them up. After a while I realised the chart had basically expanded into a complete flowchart of the various tools and processes that hang together to keep the code flowing in a steady stream from my brain – via my fingers – into an actual deployment on the development server. Since it may be of interest to some of you, here’s a pretty picture:

This is (approximately) the toolchain I currently use for Orbital, including rough details of what is being passed around

The beauty of this is that the vast majority of the lines happen completely by themselves — I get to spend my days living in the small bubble of my local development server and dipping in and out of Pivotal Tracker to update stories. The rest is magically happening as I work, and the constant feedback through all our monitoring and planning systems (take a look at SplendidBacon for an epic high-level overview) means that the rest of the project team and any project clients can see what’s going on at any time.

Orbital and the OAIS reference model

In the Orbital project bid, I wrote,

We intend to re-use and develop some of the underlying tools we have built to provide an institution-wide service for the ingest, description, preservation and dissemination of research data, which is informed by the OAIS reference model.

My first encounter with OAIS was about seven years ago, when I was designing a digital archive for Amnesty International’s image, film and video archives. If you begin to do any design work in the digital archiving domain, you come across the OAIS model very quickly. It is the standard and though somewhat daunting, when you get your teeth into it, you realise that it does an excellent job of describing what any decent digital archive should be doing anyway.

The mistake to make with OAIS is looking at the model and thinking that you have to create a system that is designed in such a way, rather than functions in such a way. The OAIS standard is a tool that allows Archivists, Designers and Developers to share a common language when discussing and planning the implementation of a digital archive and what that archive should do, not how it should be designed.

Here is the high-level OAIS model. (Here is the full, composite model).

OAIS Functional Entities
OAIS Functional Entities

Here is a high-level model of Orbital.

Orbital design
Orbital design

They’re very different because they should be. Remember, the first is a functional model, the second is a logical model of Orbital’s server requirements.

Here’s another model I published recently relating to building staff profiles.

Building staff profiles
Building staff profiles

 

The task ahead of the Orbital Team is to consider how our on-going designs like these two above, relate to the OAIS functional model. For example, the staff profile diagram (which is not simply an abstract model, but a retrospective design document) tells us where some of the OAIS Submission Information Package (SIP) information will be derived from. When a ‘Producer’ (i.e. a Researcher) signs in to Orbital and uploads a dataset, that content and the information we know from their university profile, as well as further information that they provide, constitutes the SIP.

As I was reading around the OAIS standard recently, I came across a nice piece of work done by a collaborative project between Cornell and Göttingen State universities. Their MathArc project was run using the XP agile project management methodology and as part of their development process, they broke down the OAIS reference model into a deck of 33 cards and tackled each one as a specific iteration. You can download the cards, here [.doc].

Although we’re not planning on 100% OAIS ‘compliance’ during this pilot stage of the Orbital project, it is our intention that Orbital is informed by the OAIS standard and these cards provide a useful and provocative set of functional requirements that we have added to our project tracker. Nick’s currently working on authentication and security for Orbital, or rather, card #3 ‘Provide Security Services’:

Protect sensitive information in the system, including authentication, access control, data integrity, data confidentiality, and non-repudiation services.

Cards #1 and #2 (‘Provide O/S Services’ & ‘Provide Network Services’) are requirements that go beyond the Orbital project itself and are largely met by the wider IT infrastructure that we’re working in at Lincoln. However, our work on ‘piloting the cloud‘ also addresses issues relating to the operating environment and networking of our future RDM infrastructure.

Nick and I met a couple of weeks ago to look at the OAIS model in detail and consider it in light of what we’re beginning to implement. As we’d hope, the high-level stuff you see in the diagram at the top of this post is easy to ‘tick off’. You couldn’t really say you were building an infrastructure to manage research data if you weren’t clear on the basic functional entities of OAIS. The lower-level components of the OAIS standard are where the work gets interesting and more demanding, and where the deck of cards is useful. Along the way, we’ll be creating diagrams like those above to show how we’re iteratively addressing the OAIS standard so that by the end of the project, we should have a model of our own that maps reasonably well onto the OAIS composite model.

I’d be very interested to hear from other MRD projects that are looking at the OAIS standard in detail, as well as people from earlier projects that have been through this process before. I know there have been a number of such efforts in the JISC community. A couple of the documents I found useful were Alex Ball’s Briefing Paper and Julie Allinsons’s OAIS as a reference model for repositories. The paper I remembered from my time at Amnesty is Brian Lavoie’s Introductory Guide. If you’re new to OAIS, I’d recommend that you at least read Lavoie’s report before tackling the full OAIS standard document (PDF).

Tracking progress

Did you know that you can watch our user requirements gathering and see how Orbital development is progressing by following our Github and Pivotal Tracker activity? Here are the key links:

Orbital Manager (the front end) (RSS)

Orbital Core (the back end) (RSS)

Pivotal Tracker (RSS)

Updates are also merged in a single stream of activity on Splendid Bacon.

Internally, we watch all of this activity through Campfire, thanks to Hubot and a bit of plumbing. Commits to Github, new stories and other activity on Pivotal Tracker, fire off API notifications which Hubot (‘Zakia’), delivers to Campfire. Here’s what this afternoon’s activity looked like.

Campfire
Watching Orbital progress on Campfire, using Hubot (Zakia)

Using a mixture of friendly APIs, asynchronous messaging and a chat bot provides us with a handy method of keeping track of what’s going on when we can’t all be in the same room.

Gluing people together

In December, colleagues in the Web Team (who manage the corporate web site in the Department of Marketing and Communications) approached a few of us about building a tool to allow staff to edit their profile for the new version of the lincoln.ac.uk website. We suggested that much of the work was already done and it just needed gluing together. Yesterday we met with the Web Team again to tell them that our part of the work is pretty much complete. Here’s how it works.

Quick sketch of profile building at Lincoln
Quick sketch of profile building at Lincoln

This requires a bit of explanation, but let me tell you, it’s the holy grail as far as I’m concerned and having this in place brings benefits to Orbital and any other new application we might develop. Here’s a clearer rendering.

 

Building staff profiles
Building staff profiles

The chart above strips out the stuff around authentication that you see in the bottom right of the whiteboard photo. That’s for another post – something Alex is better placed to write.

Information about staff at the university starts with the HR database. This feeds the Active Directory, which authenticates people against different web services. Last year, Nick and Alex pulled this data into Nucleus, our MongoDB datastore, and with it built a new, slick staff directory. Then they started bolting things on to it, like research outputs from the repository and blog posts from our WordPress/BuddyPress platform. To illustrate what was possible, they started pulling information from my BuddyPress profile, which I could edit anytime I wanted to. It got to the point where I started using my staff directory link in my email signature because it offered the most comprehensive profile of me anywhere on a Lincoln website.

By the time we first met with the Web Team about the possibility of helping them with staff profiles, Alex and Nick had 80% of the work already done. What remained was to create a richer number of required fields in BuddyPress for staff to edit about themselves and a scheduled XML dump for the Web Team to wrangle into their new templates on www.lincoln.ac.uk.

So the work is nearly done. The XML file is RDF Linked Data, which means that we have a rich aggregation of staff information and some simple relationships, feeding the Staff Directory, being refreshed every three hours and then being output either as HTML, JSON or RDF/XML.

For the Orbital project, all this glue is invaluable. When staff login to Orbital (Nick’s working on this part right now), we’ll already know who they are, which department they work in, what research outputs they’ve deposited in the institutional repository, what their research interests are, what projects they’re working on, the research groups they’re members of, their recent awards and grants, and the keywords they’ve chosen to tag their profile with. It’s our intention that with some simple AI, we’ll be able to make Orbital a space where Researchers find themselves in an environment which already knows quite a bit about their work and the context of the research they’re undertaking. Once Orbital starts collecting specific staff data of its own, it can feed that back into Nucleus, too.

This reminds me of our discussion last month with Mansur Darlington of the ERIM/REDm-MED project. Mansur stressed the importance of gathering data about the context of the research itself, emphasising that without context, research data becomes increasingly meaningless over time. Having rich user profiles in Orbital and ensuring that we record data about the Researcher’s activity while using Orbital, should help provide that context to the research data itself.

Orbital, therefore, becomes an infrastructure not only for storing and managing research data, but also a system for storing and managing data about the research itself.

Building on the ERIM and REDm-MED projects

On January 20th, Dr. Mansur Darlington from the ERIM & REDm-Med projects came to Lincoln to discuss his work in relation to the Orbital project. Mansur has a consultancy role on the Orbital project and will be joining us again later on in the year, to help us evaluate our progress. It was a very useful and interesting meeting for all of the Orbital Team and the Engineering Researchers working with us. What became clear to us is that while ERIM offers the Orbital project a great deal of the underlying research and analysis of how Engineers work with data, Orbital can reciprocally feed back observations and issues arising from ERIM’s recommendations, which are theoretically robust but have not yet been tested in implementation. Similarly, with the REDmMEd project, which finishes in May/June, I hope that we can take the outputs of that prototyping work and build on them in the development of Orbital.

Here are Mansur’s slides from the meeting and below that, my notes.

  1. Purpose of the meeting
  2. Introductions: Bev, Annalisa, Bingo, Chunmei, Joss, Stuart, Lee, Mark, Nick, Paul, Mansur. Apologies, Chris Leach.
  3. Engineers: Bingo, Chunmei, Stuart
  4. See slides. ERIM research offers good spread of Engineering research data.Industry collaboration is vitally important.
  5. MRD in general:

* Need to find out which RC (%), the funding into Engineering School comes from.
* All institutions have to put together a roadmap for RDM by May 2012 for EPSRC.
* Siemens/Lincoln spend a lot of effort in discovery of existing data to base investigations on.
* No national, dedicated Engineering data archive
* Need to look at API integration with DPMOnline (DCC)
* Orbital as tool for managing research projects?
* Ask DCC to visit Lincoln for Policy development and training.
* Reporting to DCC is a formal requirement.
* Include costs of MRD in the university overhead when bidding for funds.
* Datasets as an outcome of research projects. More ‘efficient’ to deal with RDM as part of project.
* ‘Market’ for data. Expectation of costs and benefits of MRD

6. The Nature of Engineering Research Data:

* ERIM: Engineering Research Information Management: Research activity data as well
* Problems with terminology. Need for definition. Both theoretical and practical/empirical outputs from the project.
* Good slides for terminology and understanding domain
* How does Orbital fit into the VRE puzzle?
* Transparent logging and capture of as much activity data as possible.
* Knowing the context is vital for understanding data. Orbital needs to concentrate on contextual data as much as ‘research data’.
* Orbital supports research lifecycle from bidding to completion?
* ‘Engineering research data’ covers pretty much all types of data.
* Need to identify other types of Engineering users to broaden scope of ‘Engineering data’
* Look outside Engineering for variety of data types/activity. Look beyond Engineering. Generalisable.
* Data types is one thing; methodologies and the data they produce are another.
* We manage data so that it can be RE-USED (by someone)
* Must not add to bureaucracy of research