Drafting a Research Data Management Policy

This morning, four of us (Bev Jones and Paul Stainthorp, Library; Annalisa Jones, Research Office; and Joss Winn, Centre for Educational Research and Development) met for three hours to draft a Research Data Management Policy for the University.

We began by Paul and Bev summarising their experience attending the RDM Policy workshop in Leeds last month, and then went on to look at the requirements of UK funding bodies, as summarised by the DCC. We then reviewed the four university RDM policies linked to from the DCC’s institutional policy page and set about creating a draft policy for Lincoln, which will first be reviewed by the Orbital Steering Group later this week and then referred to the Academic Board and Research, Innovation and Enterprise Committee for approval.

Our draft policy is modelled on the Cross Council Policy Overview by the DCC, which broke down funders’ policies as follows:

  • Published outputs: a policy on published outputs e.g. journal articles and conference papers
  • Data: a datasets policy or statement on access to and maintenance of electronic resources
  • Time limits: set timeframes for making content accessible or preserving research outputs
  • Data plan: requirement to consider data creation, management or sharing in the grant application
  • Access/sharing: promotion of OA journals, deposit in repositories, data sharing or reuse
  • Long-term curation: stipulations on long-term maintenance and preservation of research outputs
  • Monitoring: whether compliance is monitored or action taken such as withholding funds
  • Guidance: provision of FAQs, best practice guides, toolkits, and support staff
  • Repository: provision of a repository to make published research outputs accessible
  • Data centre: provision of a data centre to curate unpublished electronic resources or data
  • Costs: a willingness to meet publication fees and data management / sharing costs

We then drew from Edinburgh’s policy to look at how it meets each of these points. Then, we began merging points and writing a policy response, again borrowing from Edinburgh at times.

You can read our draft policy online. If you’re interested in seeing in detail how it was written, go to the File menu, click on See revision history and then at the bottom of the page, click Show more detailed revisions. Amendments to the policy will continue to be made at that location, so we should see the full history of the policy development over time.  Sorry, it appears that in read-only mode, Google docs doesn’t allow access to the document revision history. UPDATE: See the link to a version maintained on Github in the comments below.

I should note that this is intended to be a pithy policy statement, similar to what other institutions have written and will be supported by more detailed written guidance, which we’ll develop over the course of the Orbital project.

It’s like having a whole new research partner…

Orbital, as you know, has many cool features in the pipeline which are designed to make research easier. From keeping tabs on your data and helping you find what you’re looking for through to helping you build your data management plan Orbital is going to be an essential tool in your daily research work. Today we’re pleased to announce that it’s being made even better, making use of some really rather clever machine learning and artificial intelligence to actually do parts of the research for you.

To start with, all you need to do is to upload your research data in whatever format you’ve got it in. If you don’t have research data then just give Orbital a few keywords and it’ll generate research data for you based on over 200 individual variables gathered from (amongst other things) the news, weather, stock markets and punctuality of the rail network. Once your data is loaded Orbital will begin to sift through it, sorting it into a more easily understood form which can be searched and queried with ease. From there Orbital will begin to look for statistically significant patterns of data, pick them out for further analysis and finally output a conclusion for you – complete with any necessary citations – ready for inclusion in your paper.

Yet another example of how Orbital isn’t just a place to keep your data, but an active part of your day-to-day work.

Data, Data Everywhere…

For a project which is essentially about storing data, we’ve not actually done that much talking about it. This may seem sensible to some — after all, everybody knows what data is, don’t they?

It turns out that what people define as ‘data’ is a hugely wide ranging topic (you can find a myriad of research on how different people define it), and what we’re trying to do is basically trying to fit mis-shapen data into a one-size-fits-nothing storage system. Allow me to elaborate.

First of all we had to look at what data was currently available to us. Fortunately we have some awesome project partners in the School of Engineering who provided us with some of what they’re researching on, and thus presented the first problem: The data doesn’t exist in any kind of standardised format. We’ve got to content with flat text database formats, weird (often invalid) XML, Excel spreadsheets, CSV files (again often invalid), folders of images or audio files, proprietary binary formats, non-binary flat files which nonetheless need parsing to be made understandable, plain strings of data, and the occasional random file format which even the source of the data can’t explain.

The solution to this problem is fairly simple in principle, yet complex in practice. First of all when it comes to archive storage of files (ie without any pre-processing) Orbital is designed to be file type agnostic — if you give it a random stream of bytes and say it’s a file then a Orbital will duly store the file as provided, with no further work needed. It doesn’t care if your XML file has no DTD and has unclosed tags, since it doesn’t do any work inside the stream. You will later be able to retrieve the file exactly as it was first loaded into the system without any changes or alterations. It’s worth pointing out, however, this does mean that if Orbital is given a corrupt file to store then it will do so blindly without any attempt at validation.

Continue reading “Data, Data Everywhere…”

Keeping Research Data Safe and the benefits of Orbital

Note of apology: early in December 2011 we attended the launch event of the JISC Managing Research Data programme at the National College for School Leadership in Nottingham. I managed to blog day 1 of the event there and then. Unfortunately my notes on day 2 fell into an abyss. Here they are: late, but unscathed.

Aspire!

The first exercise (on this second day of the programme launch event) was to examine the benefits and metrics checklists provided by the KRDS frameworks project, and to identify the benefits that Orbital will provide & that we can measure. Then to blog a first statement of the benefits we expect Orbital will generate.

KRDS = Keeping Research Data Safe

Notes from Neil Beagrie‘s presentation on the benefits analysis toolkit (which I have already blogged about at the RDMF7 event, but noted here in more detail.)

  • There are two strands to the KRDS toolkit. These tools can be combined for maximum effect (and to reduce wasted effort); tools can also be customised to specific project needs:
    1. The KRDS Benefits Framework (guide + worksheet)
    2. The KRDS/I2S2 value chain and benefit impact tool (guide + impact statement + impact analysis worksheet)
  • Designed for use by wide audience over the full RDM project lifecycle.
  • Conisider the KRDS Benefits Framework ‘triangle’
    • What is outcome? direct/indirect
    • When is it received? near-term/long-term
    • Who benefits? internal/external
  • Tips: quantitative benefits must be measurable (“cashable“) – if not within the project lifecycle then longer-term benchmarking… qualitative benefits could take the form of case studies (working in a team can help to tone down the subjectivity of benefit assessments. Don’t go it alone!)
  • More information at: http://beagrie.com/krds-i2s2.php
  • Previous RMD programme produced benefits report & case studies which can be useful reference points.

Practical workshop

The KRDS benefits and metrics handouts provided here were extremely useful in developing this first statement of benefits for the Orbital project.

Points from the round-table discussion:

  • Checklist v useful brainstorming exercise – not a to do list!
  • Want to do everything and world peace too
  • But how make relevant to project? Target useful examples of top-level things
  • How evidence?
  • Lack of evidence/measurement not a reason not to do it – think of a way of measuring!”
  • Don’t rely on q’aires 🙂
  • Think of benefits from the programme as a whole into which orbital can feed in
  • Practical time & efficiency savings for researcher – i.e. not having to go to london with a USB in pocket
  • Similarities engineering with other applied – e.g. NHS
  • Case studies/user story – iterative method  – as user requirements change (become more mature) – that’s a way of measuring benefit!
  • Set actions for the steering group / RIEC

Benefits of Orbital

This is the list of benefits we came up with. Bear in mind, some of them are benefits specific to an MRD project, such as Orbital, but some of benefits of any large project where the institution has a vested interest. Note that some of these can also be found in the ‘Anticipated Outputs and Outcomes’ section of our Project Plan. As Joss mentions in the post on awareness of open source, not all benefits can be anticipated and there may be outcomes of the project, which are quite tangential to the original objectives. We especially look forward to those!

  • Very mention of Orbital attracting expressions of interest from research staff applying for funding. Researchers have to consider RDM when writing bids. We’re doing their work for them!
  • Knock on effect on other university services: authentication, repository, staff profiles, cloud computing, software development environment and methodology, open source awareness and guidance.
  • Supports the development of RDM plans and policies.
  • MRD programme activity is akin to staff training and development of a community of practice.
  • Combines and improves our understanding about research administration, research methods, research data and research outputs.
  • Changes to researcher practices. Improves RDM practices.
  • Should reduce institutional risk (legal liabilities of commercial contracts)
  • Simplifies collaboration among researchers
  • Produces open source software for re-use
  • Provides rapid access to results and derived data
  • Increases awareness of support among researchers. e.g. Aids grant writing.
  • Produces reliable citations of research data
  • Embeds institutional support and training
  • No recreation of existing data. Better security, greater efficiency.
  • Improved version control and transparency.
  • Improved understanding of research methods.
  • Further thinking about and planning for the sustainability of institution-wide services. Who pays?

Understanding and participating in open source culture

There are direct and anticipated outcomes of running relatively big projects like Orbital – outcomes which are integral to the success of the project, such as those listed in our project plan: a technical infrastructure for research data, support and training, an institutional data management policy and a business plan for sustaining the work of the project. There are also outcomes which, to be honest, I didn’t entirely anticipate, such as Orbital becoming the pilot project for how the university tackles integration with the cloud; or the implementation of a new development tool-chain and associated working practices.

Yesterday wasn’t originally anticipated either, as the Orbital project hosted a meeting to raise awareness of ‘open source’ among staff at the university. It’s a term that we hear quite often these days and increasingly it’s being applied to non-software domains, such as hardwaredata and education. In effect, it’s being used to refer to a method of participation and collaboration, as much as a legal statement about the ownership of property. In my day-to-day experience, more often than not, it’s a term that’s poorly understood and mis-used, so an open source software development project like Orbital seemed like a good opportunity to ask the question, “what is open source?” and see if anyone else was interested in learning more about what it means and how it relates to the work of a university. With that in mind, I arranged for Sander van der Waal, from OSSWatch, to lead a meeting where we discussed open source in general, but also began to address some specific issues that I think we need to work on as we continue to both re-use and produce more open source software.

The meeting ran all morning, from 9.30-12, and could have gone on for longer. I kicked things off with the slides below, which were intended to provide a brief overview of the work we’ve been doing over the last four years where the use of open licenses was central, and in particular, give a brief summary of why we undertake the work we do and some of the benefits of ‘openness’. I finished up with a list of things I think we need to address and take forward for further discussion. I was pleased that Dr. James Murray, the IP Manager for the university was attending and keen to engage in this discussion, too.

Having set the scene, I handed over to Sander, who led the rest of the meeting. As you can see from his slides below, he covered a lot of ground, which we were grateful for, and we intend to draw from them in our next follow-on meeting. I hope that the Orbital project will now act as a catalyst to the development of guidelines on the use and creation of open source software, as well as a clearer understanding of the business case and business models for open source.

On a more personal note, having joined the university in 2007 as Project Officer on the JISC-funded LIROLEM institutional repository project, yesterday felt like a bit of a milestone, when I was able to draw together a lot of our work under the banner of ‘open’, and impress upon colleagues what we’ve learned and achieved and the direction I think we need to go in.

Before too long, I’d like there to be a greater appreciation across the institution of how the open source movement is changing the way some of us think about (intellectual) property and the nature of work and how this is reflected by the environment we work in. Open source (and its open * derivatives), is not a panacea to society’s problems, by any means, but its impact on our lives in just twenty years or so has been quite profound and it’s impact on the nature of research, teaching and learning is increasingly apparent. Since the development of time-sharing systems fifty years ago, programmers have been building tools with each other that allow them to share their knowledge and their productive capacity across divides in space and time that once presented significant barriers to collaboration. Variations on these tools (hardware, software, legal), are now available to researchers, teachers and students outside Computer Science programmes and present challenges as well as new ways to conceive the organising principles of property and work.

In the future, I’d like institutional projects (not just discreet research projects), such as Orbital, to somehow be tied into curricula for courses where we turn classrooms into hackerspaces, project work into apprenticeships, award degrees on the basis of participation in and learning from open source projects, and help students form start-ups by creating an intensive but supportive learning environment along the same lines that Y Combinator has done. None of this is beyond the capability of our institutions, nor in conflict with the idea of the university. From where I stand, it is the only direction available to us if we wish to remain relevant to young people’s lives and aspirations: on an every day level, technology is a determining force in society and is determining how we undertake research, teaching and learning, but in response, it’s the hackers who are changing technology and therefore have a role in the future of the university.

If you’re interested in further reading about open source, I recommend the following books:

Benkler, Y. (2006) The wealth of networks: how social production transforms markets and freedom.

Fogel, K. (2006) Producing open source software: how to run a successful free software project.

Lindberg, V. (2008) Intellectual property and open source.

Weber, S. (2005) The success of open source.