University of Oregon

Data Management Best Practices

Preservation planning

Before deciding on a local solution for archiving and preservation, we recommend investigating subject- or discipline-based repositories for archiving your data. We have compiled a list of subject repositories and data centers. Please contact ResearchDataMgmt@uoregon.edu if you are unsure whether there is a repository for your discipline.

The UO Libraries supports an institutional repository, Scholars' Bank, which can also be used for data preservation. In using this repository, your data will be preserved according to the digital preservation standards enacted by the Libraries for all digital collections. Scholars' Bank will allow multiple types of files to be uploaded, but they will have to be downloaded for re-use. If you plan on preserving multiple versions of your data, please make sure to coordinate with the Scholars' Bank administrators prior to deposit. Scholars' Bank was originally designed for research papers and PDF files. If you have a data set over 50 MB, please contact the Scholars' Bank administrators to make sure the files can be downloaded efficiently and there is room on the servers. The Scholars' Bank adminsitrators can also help with wording to put in your data mangement plan specifying software versions and back-up procedures.

Preservation Best Practices and Things to Mention in your Plan

  • Back-up intervals

  • Data loss strategies

  • File format migration plans and schedules (including software required to view files)

  • Bit-integrity checks / check-sums

  • Multiple copies

  • Multiple storage locations

  • Storage media (e.g. tape, online/local, and online/cloud)

  • Data security and access issues

  • Version control

Fragility of data

Digital data - made up of bits and bytes - are in many ways more fragile than paper records for a number of reasons. Depending on the type of media on which the data are stored (magnetic, optical, and so forth), over time they are subject to different forms of 'bit rot' or decay, in which the electrical charge representing a bit disperses.

This gradually introduces either minor or major errors in the data, and their ability to be read by computer software.

Strategies

  • Refreshment - move data files onto new storage media well within the projected lifespan of the media.

  • Replication - by keeping more than one copy of a data file, the risk of losing a readable copy over time is reduced.

These strategies apply to both online and offline storage media. Where data are kept on a server, backup procedures and disaster recovery planning may take into account the necessary procedures. Ask your system administrator about their procedures and tests.

Offline storage media include optical discs such as compact discs (CDs) and digital video discs (DVDs). Depending on the quality, these may need to be refreshed every ten years or less. Portable flash drives can be useful for short-term backup and portability but are not reliable for preservation purposes.

Software obsolescence

Another threat to long-term accessibility of datasets is software obsolescence. When a new version of a software product is unable to render a file created in an older version, or when a software company retires a product, goes bankrupt, etc, there may be no available version of the software to be used on newer operating system platforms.

Strategies

  • Migration - when a new software version has become established, the data file is converted or 'migrated' to the new software version or package.

  • Emulation - a specialised strategy to recreate the functionality of the obsolete software package on a new operating system, or, for example, on a Java Virtual Machine system.

  • Format conversion - the most pro-active method is to select a format that is most easily imported into a number of suitable software programs, or that is based on a universal standard.

Data archives and repositories

Services may exist that could relieve you as a researcher of taking on long-term preservation of data yourself.

Digital preservation and data curation are represented by emerging professional fields that are increasingly specialised. Specialists are knowledgeable about preservation planning and procedures, as well as standards, informatics, and discipline-specific knowledge and norms.

A big advantage of depositing your data in an archive or repository is that it will be preserved - even for your own future use!

Source: University of Edinburgh

In order to maintain the integrity of stored data, project data should be protected from physical damage as well as from tampering, loss, or theft. This is best done by limiting access to the data. Researchers should decide which project members are authorized to access and manage the stored data.

Notebooks or paper questionnaires should be kept together in a safe, secure location away from public access, e.g., a locked file cabinet. Make digital copies of notebooks and backup the digital scans with the rest of your data.

Version control software is useful for tracking updates to files as you work with them.

Data should be backed up on a regular basis. Scheduled backups, to an off-site location (i.e., a different building or a different geographic area) will help protect from catastrophic data loss.

Consider:

  • How much storage will be needed for the project.
  • How often should the data be backed up (or: how much can you afford to lose if it has not been backed up for a day/week/month)?
  • Can you automate the backups (scheduled, or synchronized whenever you connect to the network)
  • Determine backup and retention policies. 

For more information on backup and storage options contact your local IT support staff or UO Campus Information Services.

If you will be dealing with sensitive data, or Intellectual Property (IP), you may need to follow certain requirements for managing that data. See the Sensitive Data page and Innovation Partner Services for more information.

Other things you can do:

Data can be protected by limiting access to data during the project, through accounts and account permissions.

Protect your system:

  • Keep updated anti-virus protection on every computer.
  • Maintain up-to-date versions of all software and storage devices.
  • If your system is connected to the Internet, use a firewall.

Protect data integrity:

  • Record the original creation date and time for files on your systems.
  • Use encryption, electronic signatures, or watermarking to keep track of authorship and changes made to data files

NOTE: Repositories and data centers are for preserving access to shared data, not for backup/storage during a research project

When writing a paper or doing a presentation, it is important to cite not only the literature consulted but also the data files used, even if they are data files that you have produced. Citing data allows easier access to datasets, increases acceptance and use of data and incorporates it into the scholarly record, provides verification of research, encourages future study, and gives the data producer appropriate credit.

Citation Elements

A data citation should include at least the following elements. The utility of these elements will depend on the research discipline, source data center/repository, and data format.

  • Responsible party (i.e., study PI, sample collector, government agency)
  • Title of table, map, or dataset with any applicable unique IDs
  • Version
  • Name of data center, repository, and/or publication
  • Analysis software, if required
  • Date accessed
  • URL and/or DOI/DOI link or other persistent link

You can create a data citation by entering the DOI on this site.

Data Citation Guidelines and Examples

Style guides/manuals typically do not include data as a resource type. However, some journals, data centers/repositories and societies may provide more specific guidance on how to cite data.

Here are some examples:

Data Citation Tools

Most bibliographic management software programs do not provide templates for citing data sets. However, they can be used to store citations to data sets.

Depending on the research discipline, data can often be deposited in one or more data centers (or repositories) that will provide access to the data. These repositories may have specific requirements :

  • subject/research domain
  • data re-use and access
  • file format and data structure, and
  • metadata.

See sharing data for guidelines on what to consider as you select a place for your data.

You may want to use the University of Oregon Libraries' institutional repository, called Scholars' Bank, for your data.

Some journals and societies also have published basic criteria for repositories. For instance, Earth System Science Data (Journal) repository criteria.

Discipline-related Repositories

See also: browse or search the re3Data.org list of repositories, or the DataBib list of repositories and other directories of repositories.

Chemistry

  • Cambridge Structural Database - small molecule crystal structures
  • ChemSpider - free-to-access collection of chemical structures and their associated information
  • eCrystals - x-ray crystallographic data
  • PubChem - NCBI's repository of bioactivy/bioassay data and information for "small" molecules (i.e. not macromolecular). Both text-based and structure-based search tools are provided

Computer Science

Environmental and Geosciences

GIS and Geography

Life and Biological Sciences

Physics

Social Sciences

  • ICPSR (Inter-university Consortium for Political and Social Research) A non-profit, membership-based data archive located at the University of Michigan. The UO is a member of ICPSR, which allows students, staff, and faculty to access ICPSR data files and documentation for research.

    You can also deposit your data with ICPSR
     
  • Dataverse Network is a collection of social science research data contained in virtual data archives called "dataverses". Maintained by the IQSS (Institute for Quantitative Social Sciences at Harvard), you can create your own "dataverse" and upload your data, subject to certain terms.
  •  

Directories of Repositories

re3data ("REgistry of REsearch REpositories") List of repositories

DataBib List of repositories

DataCite List of Repositories Compiled by the British Library, BioMed Central, and the UK's Digital Curation Centre.

Distributed Data Curation Center: Other Data Repositories Managed by Purdue University Libraries, the Distributed Data Curation Center lists of more than 50 open data repositories from a range of science disciplines.

Gene Expression Omnibus The Gene Expression Omnibus (GEO) is an open data repository which provides access to microarray, next-generation sequencing, and other forms of functional genomic data submitted by the scientific community.

Global Change Master Directory The Global Change Master Directory, maintained by the Earth Sciences Directorate at the National Aeronautics and Space Administration (NASA), provides access to more than 25,000 earth and environmental science data sets, relevant to global change and Earth science research.

MIT Data Management and Publishing: Sharing Your Data The MIT Libraries' subject guide on data management and publishing includes a list of open data repositories spanning the disciplines of astronomy, atmospheric science, biology, chemistry, earth science, oceanography and space science.

Oceanographic Data Repositories Funded by the National Science Foundation, the Biological and Chemical Oceanography Data Management Office (BCO-DMO) provides access to several oceanographic data repositories created by the US Joint Global Ocean Flux Study and US Global Ocean Ecosystem Dynamic programs.

Open Access Directory: Data Repositories Launched in 2008 and hosted by the Graduate School of Library and Information Science at Simmons College, the Open Access Directory is a wiki that lists links to over 50 open data repositories in the disciplines of archaeology, biology, chemistry, environmental sciences, geology, geosciences and geospatial data, marine sciences, medicine and physics, as well as multidisciplinary open data repositories.

Public Data Sets on Amazon Web Services Amazon Web Services provides a centralized place to download public domain and non-proprietary astronomy, biology, chemistry and climatology data sets.

Research data can be defined as: "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings." (OMB Circular 110). Research data covers a broad range of types of information, and digital data can be structured and stored a variety of file formats.

Research data does not include:

  • Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and
  • Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.

Records Management

 Although they might not be addressed in a data management plan, non-research records (e.g. consent forms, correspondence, and other project files) should also be managed. See University Libraries Records Management for more details about how to manage these kinds of records.

File formats and file naming according to standards are necessary to ensure that your data can be uniquely identified and made accessible for future uses. When selecting tools for storing your data, pay special attention to the output formats of your data.

For preservation purposes, whenever possible use data formats that are:

  • Open standard
  • In an easily re-usable format (e.g. .txt as opposed to .pdf)

When listing out the data format you will be using, make sure to include:

  • Software necessary to view the data (e.g. SPSS v.3; Microsoft Excel 97-2003)
  • Information about version control
  • If data will be stored in one format during collection and analysis and then transferred to another format for preservation: List out features that may be lost in data conversion such as system specific labels.

Example Preferred File Formats

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG, AVI, MXF
  • Sounds: WAVE, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML, PDF/A, HTML, ASCII, UTF-8
  • Web archive: WARC

 https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-formats

UK Data Archive recommendations for file formats: Managing and Sharing Data

File Naming

1. Be consistent.

  • Have conventions for naming (1) Directory structure, (2) Folder names, (3) File names
  • Always include the same information (eg. date and time)
  • Retain the order of information (eg. YYYYMMDD, not MMDDYYY )

2. Be descriptive so others can understand your meaning.

  • Try to keep file and folder names under 32 characters
  • Within reason, Include relevant information such as:
  • Unique identifier (ie. Project Name or Grant # in folder name)
  • Project or research data name
  • Conditions (Lab instrument, Solvent, Temperature, etc.)
  • Run of experiment (sequential)
  • Date (in file properties too)
  • Use application-specific codes in 3-letter file extension and lowercase: mov, tif, wrl
  • When using sequential numbering, make sure to use leading zeros to allow for multi-digit versions. For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100.
  • No special characters: & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > -
  • Use only one period and before the file extension (e.g. name_paper.doc 
  • NOT name.paper.doc OR name_paper..doc)

Example: Project_instrument_location_YYYYMMDD[hh][mm][ss][_extra].ext

File Renaming Applications:

Version Control
Keep track of versions of files (e.g. Bulk Rename Utility, Renamer, PSRenamer, WildRename)

Manually: Use a sequential numbered system: e.g. v01, v02

OR:

Use version control software (SVN) (e.g. Bazaar, TortoiseSVN,Mercurial,Git)

National Science Foundation

NSF guidance is included in the revised NSF Proposal & Award Policies & Procedures Guide. The change upholds the existing guidelines advocating open data, "[NSF] expects PIs to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work."
 

“... Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan”. This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include:

  1. the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;
  2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);
  3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;
  4. policies and provisions for re-use, re-distribution, and the production of derivatives; and
  5. plans for archiving data, samples, and other research products, and for preservation of access to them. Data management requirements and plans specific to the Directorate, Office, Division, Program, or other NSF unit, relevant to a proposal are available at: http://www.nsf.gov/bfa/dias/policy/dmp.jsp. If guidance specific to the program is not available, then the requirements established in this section apply.”

Levels of NSF Policies

The NSF has published its policies for Data Management Plans in several documents, each of which is more specific than, and adds to, the last.

In order to find the NSF policies for your discipline, we recommend going to the NSF's Dissemination and Sharing of Research Results page and looking from the bottom-up:

  1. First, look on the page for the "Requirements by Directorate, Office, Division, Program, or other NSF Unit" section to see if there are specific guidelines for your field.
  2. If there's no specific document for your field, look to the more general "NSF Data Management Plan Requirements" section, which applies to all fields that don't have individualized instructions.
  3. It can also be helpful to read over the "NSF Data Sharing Policy" section of the page, which explains NSF expectations at a more general level than the other sections, or the Frequently Asked Questions page.

National Institutes of Health

See the NIH Statement on Sharing Research Data, which includes the following:

All investigator-initiated applications with direct costs greater than $500,000 in any single year will be expected to address data sharing in their application...

In some cases, Program Announcements (PA) may request data sharing plans for applications that are less than $500,000 direct costs in any single year...

The rights and privacy of people who participate in NIH-sponsored research must be protected at all times. Thus, data intended for broader use should be free of identifiers that would permit linkages to individual research participants and variables that could lead to deductive disclosure of the identity of individual subjects. When data sharing is limited, applicants should explain such limitations in their data sharing plans.

NIH has also outlined Key Elements to Consider in Preparing a Data Sharing Plan

Other Agency Guidelines and Policies

This is not a comprehensive list, but provides examples of other agency guidelines.

Intellectual Property Rights

Intellectual property (IP) is a societal innovation to manage relationships among competing groups by defining a role for creators, enforceable by statute and contract. The NSF Data Management Plan does not change the way intellectual property has been handled under federal awards. Universities will still be able to hold copyright in works created under the award and obtain title to patents conceived or reduced to practice under the award. What research projects will need to do is to articulate how they are providing permissions/licenses to the data and this may or may not involve intellectual property rights depending on the type of data.

Bringing Data Into Your Research Project

It is possible that your project may need to arrange for access to third party data or associated research artifacts that may have specific limitations in how they can be distributed (based on IP or the agreement by which your project obtained the data or artifact). The Office of Innovation Partnership Services (IPS, formerly Technology Transfer Services) can help your project obtain permissions. Your research project may also have received data under confidentiality or other restrictions that will need to be identified and explained in your data management plan. IPS is also happy to assist you in handling these issues.

Facts alone are not copyrighted but their arrangement may be sufficient original expression to merit copyright. For databases, there may be a mix of copyright and data for your project to consider. Some countries recognize certain rights in databases (i.e. Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases). If you receive data that holds no copyright, the agreement or permissions you obtain the data under may be a simple donation or a bailment, conditioned on your respecting certain rules called out in the permission.

Using Intellectual Property Rights To Make Your Data Available

Your research project should consider what permissions are appropriate for users when you make the data and/or copyrighted works from your research project available. You should consider potential users other than the federal government because it already receives a non-exclusive, royalty free license for government purposes to copyrighted works and data created under federal awards (2 CFR 15). There are a number of factors for you to consider (e.g., attribution, notification of use, redistribution, quality control/standards, and risk).

In some instances, your research project will not be concerned about any of these factors and may effectively donate the data to the public. In other instances, you will want to create some type of "commons" with respect to sharing and use of the data that sets expectations for what community members agree to for sharing. You may also need to restrict certain portions of the data (or types of data) based on restrictions you agreed to in receiving data from a third party. Intellectual property rights such as a copyright are simply tools that are used as part of the permissions (or license) your research project provides access to the data under. You can find more information on how to create a permissions statement in the Constructing Data Permissions section of this website.

Inventions

In the event that your project recognizes that an invention has resulted from federally funded research, contact the University of Oregon's Office of Innovation Partnership Services to complete an invention disclosure and discuss your goals for the work. IPS works with research projects at any stage to help them think creatively about dissemination and knowledge transfer strategies. Ideally, this discussion proceeds the dissemination of data under the data management plan and allows your research project to execute on your plan for distribution of research artifacts created under your award. The university has an obligation to disclose each new invention created under the grant to the federal funding agency within two months after the inventor discloses it in writing to the university. It is not always clear when an "invention" has arisen, so feel free to contact IPS anytime you have a question.

Lab notebooks, whether in print or electronic form, are a critical component of tracking and recording research. Consistent documentation of your research methods, calculations, and results is important not only for your personal use, but will help when you publish or otherwise share research, and when others want to reproduce what you have done.

Listed below are links to several guidelines. Please let us know if there are other guidelines that are used in your lab, institute, or department.

How to Start -- and Keep -- a Laboratory Notebook: Policy and Practical Guidelines (ipHandbook of Best Practices)

See also:

What is Metadata?

Metadata is a term that has primarily been used by library and archives communities to describe standards used to aid the discovery of objects. Metadata standards are composed of metadata elements, sometimes called metadata fields. Metadata standards are created to facilitate searching similar items by using similar terms and constructs to describe them. A metadata record consists of all the metadata elements describing an object. Metadata records are often expressed in XML or other machine-readable formats for easy integration within systems.

There are three basic categories of metadata elements: descriptive, technical/structural, and administrative. All objects also have a unique identifier metadata element.

  1. Descriptive metadata elements consist of information about the content and context of an object. For example, descriptive metadata for an image may include: title, creator, subject (tags), and description (abstract).
  2. Technical/structural metadata elements describe the format, process, and inter-relatedness of objects. For example, technical/structural metadata for an image may include: camera, aperture, exposure, file format, and set (if in a series).
  3. Administrative metadata elements describe information needed to manage or use the object. For example, administrative metadata for an image may include: creation date, copyright permissions, required software, provenance (history), and file integrity checks.

Metadata Guidelines

Data centers and repositories may require specific metadata standards in order to deposit data. Check with any repositories before you begin outlining the metadata plan for your data. If you are unaware of what metadata fields are required for your repository, contact ResearchDataMgmt@uoregon.edu.

A good starting place for a metadata plan if a standard has not been defined for your discipline is Dublin Core or Data-Cite's recommendations. The UO Libraries Digital Library Initiatives group is happy to help with the instructions and/or application of these standards. You may also want to look at various metadata fields used in Dryad or other data repositories to see how other researchers are describing their data.

If your discipline or repository does not require a specific metadata standard, the UO Libraries Digital Library Initiatives group can help advise. Based on the complexity of description, the amount of hours required to create a metadata plan can vary. Please make sure to meet with Metadata Services and Digital Projects (MSDP) to budget for developing a metadata plan before submitting your grant.

Metadata Best Practices

Good data documentation includes information on:

  • the context of data collection: project history, aims, objectives and hypotheses
  • data collection methods: data collection protocol,sampling design, instruments, hardware and software used, data scale and resolution, temporal coverage and geographic coverage
  • dataset structure of data files, cases, relationships between files
  • data sources used
  • data validation, checking, proofing, cleaning and other quality assurance procedures carried out
  • modifications made to data over time since their original creation and identification of different versions of datasets
  • information on data confidentiality, access and use conditions, where applicable

At data-level, datasets should also be documented with:

  • names, labels and descriptions for variables, records and their values
  • explanation of codes and classification schemes used
  • codes of, and reasons for, missing values
  • derived data created after collection, with code, algorithm or command file used to create them
  • weighting and grossing variables created
  • data listing with descriptions for cases, individuals or items studied

Variable-level descriptions may be embedded within a dataset itself as metadata. Other documentation may be contained in user guides, reports, publications, working papers and laboratory books. (from UK Data Archive)

Additional Information

If possible, include unique identifiers for the identify of authors/contributors with the Open Researcher & Contributor ID (ORCID).

Register public data sets with DataCite (this may be done automatically by some repositories, so confirm with them)

These are recent recommendations by JISC Managing Research Data Program. Contact the Data Services Librarian, ResearchDataMgmt@uoregon.edu for more information.

Metadata in Action - Examples

The following are examples of items with metadata highlighted in purple.

Flickr Metadata Example

Fig. 1. Image metadata in Flickr with title, user (creator), creation date, camera used, photostream (group or relation), tags, copyright information, and privacy setting. See item in Flickr, and additional metadata.

Dryad Metadata Example

Fig. 2. Data set in Dryad with title, bibliographic citation of published work, identifier, description, data package identifier, keywords, date depostied, file name, file size, file format, file type, and copyright information. See item in Dryad and full metadata view.

Other places to find metadata in action

  • If datasets contain identifiable information on human subjects, a document outlining the terms and conditions of future access should be created by the research unit and deposited with the data.
  • De-identification or anonymization of data can sometimes be used to prepare data so that data can be shared without including information that might identify the participants in a study.
    • Examples of Identifiers in data that will need to be removed [Note: this is not an exhaustive list]:
      • Direct identifiers: data elements gathered that directly identify a respondent, like a complete name or address
      • Indirect identifiers: variables that when aggregated can identify a person
      • Geographic location-embedded data (like geo-referenced information or street addresses).
  • As an alternative, datasets can become restricted use collections in a repository if de-identification impedes the research value of the data.
    • Restrict access to a list of approved researchers so that mediated access is available.

UO Specific Resources

UO Research Compliance Services provides detailed information on protocols for protecting respondent confidentiality.

General Resources

Sharing data can be as simple as sending a file to a colleague. There are other ways to make your data available, and while they may require more work up front, they can increase access to (and citation of) research data.

Why share your data?

How to share your data

Preserving the data in data centers or repositories which are managed by trusted entities for long-term access is the most common way to share data. Other options are to share directly with colleagues via email, or collaborative networks.

In many cases, repositories and data centers will have their own policies regarding access permissions. If you are going to use a repository/data center, check their policies before constructing your own access permissions or including them in a data management plan.

There is a growing body of guidance on how to cite data sets, and groups such as Dryad and DataCite are working on ways of tracking the use of data sets.

Registering published data sets with DataCite will facilitate finding and citing the dataset. A DOI or other persistent identifier makes it that much easier for others to cite your data. Contact ResearchDataMgmt@uoregon.edu for help with registering your data sets.

Constructing access permissions

Types of permissions
Your research group should consider the permissions you wish to use for making data available under your Data Management Plan. There are a number of important factors to consider and there may be constraints or specific rules on sharing that a particular repository or other distributor enforces with respect to what you can and cannot require in making your project's data available through them.

This section is meant to inform you as to the options that may be available and to stimulate your thinking about what sharing data means within the context of the type of data you are making available, the culture and expectations of the field in which that data is shared, and any limitations that may exist based on your inclusion of data that brings with it certain permissions, or through your use of a resource with its own particular rules. Funders, such as NSF, will have expectations as well regarding what types of data need to be shared and while they provide researchers significant discretion regarding how data is shared, they expect a clear rationale, particularly for data with significant restrictions.

Permission checklist

  1. Does your research project have sufficient permissions necessary to disseminate the project data.
    • Did any of the data come from a third party source?
    • If so, did the project obtain permission to disseminate?
    • Are there any restrictions you need to include in your permission statement?
  2. Do you need to provide access to all the data produced under a grant?
    For NSF purposes, data required to be shared will be determined by the community of interest through the process of peer review and program management. The federal government in Circular A-110 (2 CFR 215) defines the default terms and conditions for recipients of federal funding with respect to data rights and provides specific guidelines on what research data is not required to be shared or archived. These include:
    • preliminary analyses
    • drafts of scientific papers
    • plans for future research
    • peer reviews, or communications with colleagues
    • physical objects (e.g., laboratory samples)
    • trade secrets
    • commercial information
    • materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law
  3. Does your data include any private information, medical information, or other information with possible confidentiality concerns? 
    The Federal government also defers data sharing compliance for data including "personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study."
  4. Would the project like Attribution/Acknowledgment to be required or requested?
    If you require attribution, you could:
    • provide a specific citation in your permission statement; or
    • provide a link to a URL on your website so that users can find the most current citation; or
    • ask the user to contact you in person for the appropriate information
  5. Do you desire to indicate that your expectation is that attribution will be in a certain form but you do not want to require compliance?
     If you are requesting attribution/acknowledgment, you could use the same mechanisms as above but would only make the request based on the research user's good will. It may help to explain why you are making the request. Attribution is very important in the academic environment both with respect to building the reputation of a research team but also to understand the provenance of the results. It is also important to consider whether for certain types of data, attribution requirements may present difficulties and may negatively impact utilization of your data.
  6. Would the project like to receive information regarding the use of the project data by users?
    If you require the reporting of use do you want:Information demonstrating use is often very important in illustrating uptake and validation of your research. Showing broad adoption and utilization can be a factor to feature in future grant applications and also is an opportunity for your research group to grow its network and engage in relationships that might otherwise not occur. You should seek to balance your gating of access to your research groups ability to respond promptly and reasonably to the research community. In some cases you may only wish to request that data users provide you information regarding proposed use and that they send a copy of any publication that results from use of the data.
    • user to request access to receive the data?
    • copy of publication to be sent to you?
  7. Does the project need to be the source of the project data for quality control or other research integrity reasons (such as privacy) or would the project like to provide permission for users to redistribute project data under certain conditions?
    • Not allowed
    • Allowed but no sale or commercial use
    • Allowed but limited in scope

In some cases, certain types of data are not appropriate for all audiences. For example, drug interaction data may be complex and require users to have healthcare professional credentials in order to be responsibly utilized. This may require your research team or institution to remain the source of data distribution and the mechanisms used to distribute data may need to be more formal to protect confidentiality or other elements of the data. The Federal government understands that in some cases there may be incremental costs associated with making data available, and your research group may charge fees to recover these costs. The Office of Innovation Partnership Services (formerly Technology Transfer Services) can help you with data dissemination that charges for incremental expenses.

Examples of Permissions Language
The following examples are provided for you to think about the aspects of data exchange and sharing that are important to your research project and how you might actually articulate your expectations.  Please note that your optimal choice of permissions may conflict with the rules of a repository or other preferred access mechanism for your data.  You may then need to balance whether it is better to comply with the repository rules or find alternatives, including the possibility of managing access through UO.

1. Unrestricted Donation
You may copy, modify, distribute and perform the work(s) or data, even for commercial purposes, all without asking permission provided that you: a) agree that we make no warranties about the work(s) or data, and disclaim liability for all uses of the work(s) or data, to the fullest extent permitted by applicable law; and b) when using or citing the work(s) or data, you should not imply endorsement by us.

1(a). Addition of Request for Attribution/Acknowledgment
We request that you cite our research project and applicable publications if you use our work(s) or data in your publications or presentations.

1(b). Addition of Request for a copy of Publications that used your work(s) or data
We request that if you use the work(s) or data in your publication that you provide us a copy of the publication.

2. Attribution Permission
You may copy, modify, distribute and perform the work(s) including data, even for commercial purposes, all without asking permission provided that you: a) cite our research project and publications as follows <<enter citation information here>> ; b) agree that we make no warranties about the work(s) or data, and disclaim liability for all uses of the work(s) or data, to the fullest extent permitted by applicable law; and c) when using or citing the work(s) or data, you should not imply endorsement by us.

3. Noncommercial Permission
You may copy, modify, distribute and perform the work(s) including data, solely for non-commercial purposes (no sales), all without asking permission provided that you: a) cite our research project and publications as follows <<enter citation information here>> ; b) agree that we make no warranties about the work(s) or data, and disclaim liability for all uses of the work(s) or data, to the fullest extent permitted by applicable law; and c) when using or citing the work(s) or data, you should not imply endorsement by us.

4. No Redistribution Permission
You may copy, modify, and perform the work(s) including data, solely for non-commercial purposes, all without asking permission provided that you: a) cite our research project and publications as follows <<enter citation information here>> ; b) agree that we make no warranties about the work(s) or data, and disclaim liability for all uses of the work(s) or data, to the fullest extent permitted by applicable law; and c) when using or citing the work(s) or data, you should not imply endorsement by us. Please contact <<research project contact here>> if you would like to request permission to redistribute the work(s) or data.

5. Notification Required for Permission
Please contact <<research project contact here>> to request permission to use the work(s) or data. Include your proposed use of the data to assist us in determining your eligibility and to help us navigate possible conflicts between research projects. We will provide you with a short data sharing agreement for you and your authorized institutional official to sign prior to your receiving the data.

Other examples of licenses and permission
There are also licenses and permissions that are provided by organizations interested in standardizing and streamlining the exchange of information, including works that are copyrighted. Creative Commons has a number of licenses/permissions that researchers have found very useful. These include:

  • Attribution
  • Attribution-ShareAlike
  • Attribution-NoDerivatives
  • Attribution-NonCommercial
  • Attribution-NonCommercial-ShareAlike
  • Attribution-NonCommercial-NoDerivatives

One of CC's projects directed in part to data access is called CC0 "No Rights Reserved". In certain instances, this may be an option for making data available. You should only apply CC0 to your own work unless you have the necessary rights to apply CC0 to another person's work. For example, UO as an institution, or a research sponsor may have an interest in certain types of data in which case a waiver mechanism such as CC0 made by the research project would be ineffective.

The Open Source Initiative provides a list of licenses directed at software distribution and that are used in countless projects. The Office of Innovation Partnership Services (IPS, formerly the Office of Technology Transfer Services) can provide more information on choosing the appropriate license/permission to meet your research project's goals or assessing the impact of a funder's mandatory requirement of a specific license. IPS can also construct custom license/permissions for a project.

Open Science

Open Science (and a subset, Open Notebook Science) is gaining traction in certain disciplines. There is not a commonly held definition for open science, but broadly speaking it is a movement to promote greater sharing and transparency in science.

Open Notebook Science is: "a URL to a laboratory notebook that is freely available and indexed on common search engines. It does not necessarily have to look like a paper notebook but it is essential that all of the information available to the researchers to make their conclusions is equally available to the rest of the world. Basically, no insider information."

Case studies and examples

Science Commons projects:

University of Oregon Libraries
1501 Kincaid Street Eugene, OR
97403-1299
T: (541) 346-3053
F: (541) 346-3485
Make a Gift