University of Oregon

Research Data Management

Best Practices

Data Storage and Backup

Data storage and backup is important because:

  • Properly storing data is a way to safeguard your research investment.
  • Data may need to be accessed in the future to explain or augment subsequent research.
  • Other researchers might wish to evaluate or use the results of your research.
  • Stored data can establish precedence in the event that similar research is published.

See also Archiving & Preservation, Intellectual Property, and how to Share Data.

Data Protection

In order to maintain the integrity of stored data, project data should be protected from physical damage as well as from tampering, loss, or theft. This is best done by limiting access to the data. PIs should decide which project members are authorized to access and manage the stored data. Notebooks or paper questionnaires should be kept together in a safe, secure location away from public access, e.g., a locked file cabinet. Privacy and anonymity can be increased by replacing names and other information with encoded identifiers, with the encoding key kept in a different secure location. Ultimately, the best way to protect data may be to fully educate all members of the research team about data protection procedures.

How Can Data Be Protected?

Theft and hacking are particular concerns with electronic data. Many research projects involve the collection and maintenance of human subjects data and other confidential records that could become the target of hackers. The costs of reproducing, restoring, or replacing stolen data and the length of recovery time in the event of a theft highlight the need for protecting the computer system and the integrity of the data.

Electronic data can be protected by taking the following precautions:

Protecting access to data:

  • Data protection should be a part of every project's plan for data storage.
  • The best way to protect data, whether in written or electronic form, is by limiting access to the data.
  • Electronic data storage offers many benefits but requires additional consideration and safeguards.

Protecting your system:

  • Keep updated anti-virus protection on every computer.
  • Maintain up-to-date versions of all software and media storage devices.
  • If your system is connected to the Internet, use a firewall.

Protecting data integrity

  • Record the original creation date and time for files on your systems.
  • Use encryption, electronic signatures, or watermarking to keep track of authorship and changes made to data files.
  • Regularly back up electronic data files (both on and offsite) and create both hard and soft copies.

In addition:

  • Lab notebooks should be stored in a safe place.
  • Computer files should be backed up and the backup data stored in a secure place physically removed from the original data.
  • Samples should be appropriately saved so they will not degrade over time.

Data Access and Sharing

Understanding the flow of the data will assist in determining appropriate access controls and security measures as the data moves around. The growth of interdisciplinary and cross-institutional research has seen a corresponding change in the need to share data: it has become much easier, faster, and more reliable than ever before. Consider how you will share the research data ahead of time and determine if there will be cost implications.

How much data will there be? How long do I need to keep it? How will I access it in the future?

One of the most costly considerations in any IT implementation is the ability to store and restore data. It is important to weigh the cost of the storage against the risk of data loss. Using cheap desktop storage without any redundancy or backups could end up being the most expensive choice you make.

When planning storage and backup architecture, take into account the following points:

  • Calculate storage volumes by projecting a baseline and rate of growth for the duration of the project.
  • Determine if the data is static with only periodic needs to access and/or update, or if the data is dynamic with frequent changes and updates.
  • Determine backup and retention policies. If data were lost, is it possible to re-create the present state and how far back would you need to go to recover the present state?

For more information on storage infrastructure offerings at Oregon, and how to plan and implement storage for your research data, see: http://it.uoregon.edu/systems/services/storage.

Campus Based Storage

The University of Oregon Information Technology offers limited file storage to students, faculty and staff. While this data is backed up regularly, it may not provide the security or protection required of your research work. Additional network based storage may be available via UO Information Technology Storage Services.

Cloud Based Storage

Cloud based storage holds data on remote servers, reducing the burden of access and management issues. However, protected and sensitive data should not be stored on third-party servers. Costs related to data transfer can add considerably to the budget required for cloud-based storage.

EXSEDE
The Extreme Science and Engineering Digital Environment (XSEDE, formerly TeraGrid) works by combining the commuting power of eleven providers into a large powerful network of super-computers Integrating high-performance computers, data resources, and high-en experimental facilities around the country, the TeraGrid is the largest, most widely distributed, and most robust open cyberinfrastructure available to researchers. Currently, TeraGrid has more than 2 petaflops of computing capability and more than 50 petabytes of online and archival data storage. Researchers can also access more than 100 discipline-specific databases. See also UO IT TeraGrid information.

Repositories and Data Centers

Repositories and data centers are options for published/publicly available datasets, but should not be viewed as primary storage during the research project.

Version Control

Version control software is useful for tracking updates to files as you work with them.

Backing Up Data

Data should be backed up on a regular basis. Scheduled backups, to an off-site location (i.e., a different building or a different geographic area) will help protect from catastrophic data loss.

Contact the IT group for your unit

Other Computing Resources Questions

Below, are specific questions grant applicants should consider to ensure appropriate IT needs are available or requested in the grant application.

  1. What computing resources are needed for this research project?
  2. Servers or compute cycles: I have heard a bit about virtual machines -- what role could they play in my research effort?
  3. Storage
    1. How will my data be accessed and shared?
    2. How much data will there be? How long do I wish to keep it? How will I access it in the future?
    3. What are my sponsor's storage, retention, and archive requirements? Do I need to keep my data available to them after my grant is finished?
  4. Security: Can the security requirements of your data be identified? Will protected data (HIPAA, FERPA, PHI, etc.) make up my research data?
  5. Applications: What applications does my project require?
  6. Support: How is all of this going to stay running? Does my research group possess the appropriate level of expertise to support my computing resources? Do I want my researchers spending their time patching servers? Should I really use my research assistants to run this critical environment?

1. What computing resources are needed for this research project?

Regardless of what type of research you are undertaking, you may stand to benefit from expanding your computing resources beyond a simple computer-per-researcher setup. Consider the following objectives and how additional lab resources or leveraging central resources, could streamline your effort:

  • Collaboration and Data Sharing: For some forms of collaboration, sharing the data via email or visiting a colleague is adequate or even preferable. However, using access-controlled central file storage or other collaborative tools may improve your ability to extend your research effort beyond the boundaries of your group or extend your boundaries beyond the lab.
  • Software Licensing: leveraging Oregon's campus-wide licensing agreements can mitigate Software licensing costs.
  • High Performance Computing: Some research will require a higher level of computing. Researchers should consider utilizing a central cluster, cloud resources or other unused cycles. For more information about high performance computing, contact Sean Sharp in Information Services (ssharp@uoregon.edu) or see: http://it.uoregon.edu/systems/doc/hpc.
  • Data Security: Some projects involve dealing with data that is more sensitive than others, or have particular compliance requirements regarding data handling. Please see the Security section below.
  • Support: In some cases, computer support for your lab may be addressed by arrangements already made by your School. More frequently, you will have to provide your own support.

2. Servers and Compute Cycles

I have heard a bit about virtual machines… what role could they play in my research effort?

Many groups at Oregon have implemented partial or complete virtual infrastructures as part of their computing strategy. Virtual server infrastructures offer several advantages over physical infrastructures. They require a specific set of expertise to plan and implement, virtual server service is already offered by Information Services. Leveraging virtualization may help your research project meet sustainability, cost containment, efficiency, and scalability goals. Contact your local IT support staff, or see UO Information Services VMware for further information.

3. Storage

a. How will my data be accessed and shared?

Understanding the flow of the data will assist in determining appropriate access controls and security measures as the data moves around. The growth of interdisciplinary and cross-institutional research has seen a corresponding change in the need to share data: it has become much easier, faster, and more reliable than ever before. Consider how you will share the research data ahead of time and determine if there will be cost implications.

b. How much data will there be? How long do I need to keep it? How will I access it in the future?

One of the most costly considerations in any IT implementation is the ability to store and restore data. It is important to weigh the cost of the storage against the risk of data loss. Using cheap desktop storage without any redundancy or backups could end up being the most expensive choice you make.

When planning storage architecture, take into account the following points:

  1. Calculate storage volumes by projecting a baseline and rate of growth for the duration of the project.
  2. Determine if the data is static with only periodic needs to access and/or update, or if the data is dynamic with frequent changes and updates.
  3. Determine backup and retention policies. If data were lost, is it possible to re-create the present state and how far back would you need to go to recover the present state?

For more information on storage infrastructure offerings at Oregon, and how to plan and implement storage for your research data, contact your local IT support staff or see UO Information Services Storage information.

c. What are my sponsor's storage retention and archive requirements? Do I need to keep my data available to them after my grant is finished?

See Archiving & Preservation and consult with NSF guidance or other specific funded research guidance for resources and information.

4. Security

Can the security requirements of your data be identified? Will protected data (HIPAA, FERPA, PHI, etc.) make up my research data?

Both the University and many sponsors have specific requirements for handling, transmitting, and storing certain types of data used in research. Familiarize yourself with those details ahead of time: it can be costly if you hadn't anticipated your data protection needs until after the fact. See Intellectual Property and Sharing Data pages, and Research Compliance Services if you are dealing with human subjects data.

5. Applications

What applications does my project require?

You need licensed software for all your current systems and any additional computers you may purchase. You might also need additional user licenses for your project applications. The applications you use may undergo several updates or revisions over the duration of your project, ensure that you have an application maintenance contract or budget for upgrades. Contact your local IT support staff and see UO Information Services Software Licensing.

6. Support

How is all of this going to stay running? Does my research group possess the appropriate level of expertise to support my computing resources? Do I want my researchers spending their time patching servers? Should I really use my research assistants to run this critical environment?

As computing needs become more fundamental to your research, ensure that you have support options covered. You don't want to suspend research to address repairs or problems. Knowing who the local network administrators and systems/storage administrators are and how to contact them in the event of problems is important. They can help to schedule ongoing maintenance to ensure interruptions to your research are minimized. The cost to have your environment managed by IT professionals may be less than you expect. Contact your local IT support services and/or Sean Sharp in Information Services.

Maintained by: Brian Westra, bwestra@uoregon.edu
Created by bwestra on Jul 24, 2012 Last updated Sep 17, 2015
University of Oregon Libraries
1501 Kincaid Street Eugene, OR
97403-1299
T: (541) 346-3053
F: (541) 346-3485
Make a Gift