'Dataverse makes it easy for me to store, archive and share my data'
Sander Groffen about how and why he shares his research data.
03/11/2019 | 3:31 PM
'When we publish something about our research, we feel it's important because of academic integrity to make the data available as well.' Other people should be able to view and reuse the data, according to lecturer Sander Groffen, who works in the Functional Genomics department (VU, Science/VU University Medical Center). 'The online platform DataverseNL is ideal for sharing your data.'
How do nerve cells emit signals?
'Our department studies the function of genes. We are primarily interested in those genes that have to do with neuroscience. Our goal is to learn how nerve cells emit the signals picked up by other nerve cells. Together, they create a network that allows them to pass on information. All of this happens in a flash, in mere milliseconds.'
Live imaging using fluorescent cells
In order to study this through live imaging, Sander and his colleagues culture special nerve cells that they turn fluorescent, for example by inserting fluorescent proteins. The researchers then record time lapse footage of the cells, some three minutes in length and with a great many frames per second.
Making data available upon publication
'This generates an enormous amount of data’, Groffen explains. 'For example, our PhD candidate Roberta Mancini collected five terabytes of data in the course of her research. When she publishes that research, we want to make the data available to all. More and more journals are adopting this requirement, which I see as a positive development.'
'The downside of working with such large volumes of data is that things can get really expensive', Groffen says. 'The statutory retention period is 10 years, starting from the moment of publication. However, it can take a few years to get your data published in the first place, which means you must store – and back up – those data for a period of some 10 to 15 years, all told. To do that yourself, you'd need a really, really big server.'
Persistent identifier URL
'When I first encountered DataverseNL, I was looking for a place to deposit the huge amount of data I had used for my paper somewhere where they would be accessible to all. When you import your dataset into Dataverse, you are assigned a persistent identifier URL, which you can note in your publication. This makes your data easy to find and ensures they remain accessible, so that other researchers can reuse them.'
Researchers decide who has access
When you store your data in Dataverse, you have the option to share those data with others. As the researcher who gathered them, you can decide for yourself who has access to which material and what kind of rights they will be granted (user, employee or curator). You can select a type of licensing as well (within the limits of your institution's policy). Groffen chose Creative Commons Zero (CC0, no rights reserved). 'I want everyone to be able to reuse my data from the moment they are published. This licence allows me to see how often my data are downloaded, but not by whom. My first dataset has been downloaded 60 times so far.'
Groffen has already stored a second dataset in Dataverse. 'It's user-friendly and is managed by people who really know what they're doing. I'm not an ICT professional. This is a way to totally outsource the data storage aspect. There's no longer any need to send your data to anyone: they can just download them for themselves.'
DataverseNL is a network of data repositories that relies on the Dataverse software developed by Harvard University. It is offered as a joint initiative of the participating institutions and Data Archiving and Networked Services (DANS). DANS has served as the network administrator since 2014. The institutions themselves are responsible for managing the data in their respective local repositories. Dataverse is not suitable for storing sensitive or confidential information.
Costs of using DataverseNL
Institutions that make use of data storage provided by DataverseNL must pay DANS an annual fee for this service. At VU Amsterdam, this fee is covered by central budget resources. This means that researchers can make use of Dataverse for datasets up to 50 GB in size at no cost to themselves. For datasets larger than 50 GB, researchers will need to obtain permission from their departmental managers.
In order to keep the costs of data storage and archiving services to a manageable level for VU Amsterdam and its faculties, it is important that you critically consider which data truly need to be retained before storing a dataset. It is also vital that those who submit grant applications request funding for data archiving purposes in the RDM section of their requests whenever possible. The fee VU Amsterdam currently pays for storage in Dataverse is €3.50 per GB per year.
DANS fact sheet: learn more about Dataverse
Dataverse Vrije Universiteit Amsterdam
VU Researchers can choose to store their data in the following repositories:
2. ArchStor, a research data archive with a 10-year retention period. Data stored in ArchStor can only be accessed for verification purposes.
3. DarkStor, an offline archive for storing sensitive information/data. Once archived, access to the data can only be requested by authorised individuals, i.e. the original researcher or a research coordinator.
VUNet: Research Data Archiving