skip to primary navigationskip to content
 

Bite-Sized Research Data Management

01: Naming Conventions

Bite Sized Data Management

01: Naming Conventions

 

NamingConventions

 Have a look at the filenames in the above attachment, and consider how useful they’ll be in three years’ time, when those files are sitting in a folder called ‘For filing’ along with several other desktop dumps, and hundreds of other, almost identical filenames? You shouldn’t rely on metadata like creation or modification dates for help - they're often corrupted or lost when files move between operating systems, platforms and media.

Adopting a naming convention needn’t be a huge burden - the following suggestion might help you to plan a convention that would work for you.

 

   GRPH_GlacierRetreat50yrs_V01_20170127.pzfx

 

GRPH - Invent a personal (or lab) 4-character code for every type of file you create, e.g. ‘GRPH’ for graphs, ‘PRTN’ for presentations, ‘WBLT’ for Western blots etc. Avoid more common-sounding codes like ‘BLOT’ or ‘PRES' because they may return false positives in a search.

GlacierRetreat50yrs - Add a meaningful and easy-to-read ‘keyword' title, without punctuation or special characters. Most filesystems permit around 255 characters in a filename so there’s plenty of space to include good keywords in this element, but it’s better to be economical - use just enough to help you to find the file later.

V01 - A version number using 2 digits. Much better than ’Thesis-final-final-final’.

20170127 - Date in YYYYMMDD format. This is the most universally understood format, and is readable by most computing systems.

PZFX - An intact, valid file extension is essential.

A convention such as the above will save a LOT of time in future, and make searching and sorting much easier, not only for you but also for colleagues or group leaders who might have to find crucial data in a disk that you leave behind at the end of your project.

(ad327, February 2017)

 

 

 

02: Directory Structure

Bite Sized Data Management

02: Directory Structure

 

Example 1 - not great:

DirectoryStructureBad

Example 2 - better:

DirectoryStructureBetter

Devising and implementing a directory structure that suits your work is very easy - you could make a dramatic improvement in just one lunchtime! Consider sorting by category, or by project name - whatever suits the way you work.

MANIPULATING ALPHABETICAL ORDER:

A common trick for people who have a large number of top-level folders, is to insert spaces/periods/underscores etc as prefixes in folder names, to force those directories to the top or bottom of an otherwise alphabetical order. Be aware however, that these special characters may cause problems when describing a path to a directory in a terminal window, for example. Also consider any other people who may have to browse through your data - if they don’t see a Protocols directory between ‘Projects’ and ‘Reagents', they may assume it doesn’t exist.

DIRECTORY STRUCTURE TEMPLATES:

If your work requires a large list of headings which all have common subfolders (such as the list of projects above), it might be helpful to create a ‘template' structure of empty folders/subfolders which you can copy/paste/rename as required, rather than building every project directory from scratch.

(ad327, March 2017)

 

 

03: Discoverability

Bite-Sized Data Management

03: Discoverability

We expend a lot of energy, and money, to make sure that our data remains safe and secure. But how much time is wasted by double-clicking through those very high-quality hard disks, when you don’t know the name or location of the file you’re looking for? Discoverability is important!

How will you find your data in 5yrs time?

  •  Good directory structure;
  •  Good naming conventions.

How will your PI find your data after you’ve left the Institute?

  •   Good directory structure;
  •   Good naming conventions;
  •   Screenshots - can be a useful visual ‘map’ for someone unfamiliar with your complex directory structure.
  •   Read-me files - in certain directories, can provide explanatory information about what is nested within.

How will the the broader scientific community find your data?

  •   Community repositories (e.g. Flybase, Gene Expression Omnibus);
  •   General repositories (e.g. Apollo, Zenodo, Github);
  •   Extensive metadata - think of the search terms people will use to find your work, and enter as many keywords as you can when uploading your files.

Spend a little time just now and save yourself, and others, a lot of time in the future. Easy.

(ad327, April 2017)

 

 

04: Data Management Plans

Bite Sized Data Management

04: Data Management Plans

It’s a requirement of all funding bodies that researchers describe what data will be produced in a project, and how it will be managed during the project, and how it will become/remain accessible to the scientific community after the project. The normal method for compliance with these policies is to write a Data Management Plan. 
 
Many funding bodies provide forms or templates for your DMP and ask very specific questions, so your first step should be to find out if your chosen organisation provides such a resource. If not, a template is available to download, or you can use http://www.dmponline.dcc.ac.uk. If you prefer to write your own from scratch, the key points for inclusion are:
 
  • What types of data will be created?
  • How will these data be processed?
  • How will they be stored and backed up? [1]
  • How will they be documented (inc. naming conventions, directory structures etc)? 
  • How will these data be of benefit to the broader scientific community?
  • How will they be archived and will they comply with any data/metadata standards?
  • How will they be made available and discoverable to the broader community?
  • What are the policies for sharing, re-use etc?
 
The primary aim is to convince the funding body that you will do good, reproducible work with their funds, and that this good work will be of benefit to the whole community. However, we would recommend the above approach as a useful organisational tool at the start of any postgraduate or postdoctoral research project.
 
 
Footnotes:
[1] For researchers in the Gurdon Institute, research data is stored in a large, secure, enterprise-class filestore with offsite replica and internal backups.

(ad327, May 2017)

 

 

05: The README file

Bite Sized Data Management

05: the README file

How many of you have noticed files entitled 'README' as you’ve been browsing through directories or ftp sites, but haven’t ever opened them to see the contents?

As it happens with many aspects of computing, having a README file has become a consensus among good computer practices. They store information on other files found in a directory or most commonly about the particulars of a computer programme or process. In other words, they will tell you what is the folder about and how can you find the stuff you are looking for.

They can also be seen as a set of instructions for users or to self! For example, when doing online research and downloading a bunch of articles into a folder, the README file could have the list of keywords and websites visited.

README files can have ANY text-based type of information and instructions. They are text files, often written in a plain text format (found in your Notepad or TextEdit editor).  

This is particularly useful when:

  • working in collaborative projects of either a stable or changing team.  
  • data is intended to be used by several people
  • or simply as a “note to self” or record of your own work


When making your own README file, make sure to:

  • Always include the date of the entry or modification
  • Avoid editing previous text, you can add “amendments” instead
  • A contact email address, preferably two in case one stops working
  • If you got help from someone to develop the contents of the folder/software, make sure you acknowledge their contribution.

(avp25, June 2017)

 

 

 

Studying development to understand disease

The Gurdon Institute is funded by Wellcome and Cancer Research UK to study the biology of development, and how normal growth and maintenance go wrong in cancer and other diseases.

 

Share this