What is data and why should you care? Dr. Kalpana Shankar

What is data and why should you
care?
Dr. Kalpana Shankar
School of Information and Library
Studies, UCD
5 November 2012
What do Apollo 11, the Domesday Project, and award
winning scientists from the US National Science
Foundation have in common?
What is research data?
“The data, records, files or other evidence, irrespective of their content or
form (e.g. in print, digital, physical or other forms), that comprise research
observations, findings or outcomes, including primary materials and
analysed data.” – Australian National Data Service
Examples:
•Statistics and measurements
•Results of experiments or simulations
•Observations e.g. fieldwork
•Survey results – print or online
•Interview recordings and transcripts
•Images, from cameras and scientific equipment
What is ‘data’?
Any information you use in your research
“The whole thing
is incredibly dull.”
Why are we talking about data
management?
“PhD students lose material all the time…and they
are exactly the people who want to be backing up.
These are people who are creating data which are life
and death important to them”
Rising volume and complexity of
research data
• According to the European
Bioinformatics Institute, the
volume of new biological data is
doubling every 5 months
• For example, in genomics:
–
–
we can now analyse the equivalent of
a human genome every 14 minutes at
a cost of $5,000 - 400 times quicker
than when the draft human genome
was first published in 2000.
1,000 Genomes Project: 200
terabytes — the equivalent of 16
million file cabinets filled with text, or
more than 30,000 standard DVDs
A hard drive after 6 years’ research
Image by Lindsay Lloyd-Smith
113 Gb
42,699 Files
3,466 Folders
So, why is data management important for
research?
• It is increasingly integral to all areas
of research
• It is a rapidly escalating issue
• It is important to research funders –
likely to be increased follow-up in
the future
• It has major resource implications –
which need to be planned for
carefully
• In short, it creates major challenges
which aren’t going to go away!
Why data management is important
to YOU (II)
What would happen to your
data if there was a fire or
theft in your office,
department or home?
“Fire” by andrewmalone via flickr.: http://www.flickr.com/photos/andrewmalone/2032844649/
Writing a Data Management Plan
1.
Formalises the definition of
your research data
2.
Documents the contextual
and technical details of
your data
3.
Check on File Structure /
Naming
4.
Plans for data sharing,
access, and archiving
Getting started
• Your Data Management Plan won’t be perfect
• It is not a static document
– Change and update it as your research progresses and you
understand more about your data
• Think about key issues that might affect your data…
o …while you work on them
o …in the future
• It’s better to have a plan that covers some aspects than no plan
at all
• Ask for advice if you’re uncertain
Questions to ask yourself
• Platform: Windows, Macintosh and/or Unix ?
• Objective: Store? Manage? Share? Publish?
• Extent of collaboration
– Your research group/lab only
– Your group + externals
– Cast of thousands?
• Nature of data?
– Level of security?
– Human records (de-identified)?
– Intellectual Property?
• Amount of data? MB? GB? TB?
– Rate of accumulation of data?
– How much needed online to do useful work?
– Period of preservation?
By twechy (Flickr ID): “Library Bookshelf”
http://www.flickr.com/photos/twechy/6829994084/
CC BY 2.0
By Anne (Flickr ID: I like): “Voltaire & Rousseau”
http://www.flickr.com/photos/ilike/2616342739/
CC BY-NC-ND 2.0
Give your data a structure…
…it makes it easier to find things
Something to try:
Use post-it notes to create a map of your
file structure
•
•
•
•
Write each existing file and folder name onto a post-it
Arrange folders on your desk in a sensible hierarchy
Put your ‘files’ into ‘folders’
Do you need new folders? Do you have too many?
What’s in a name?
• Names tell us what a file is (contextual information)
• Use a combination of different types of information to make
context and content clear, eg
– Author (or Initials)
– Date
– Data source
– Theme
– Experiment
– Sample
• …But try not to let file names get too long
Why create documentation?
• Creating documentation
might seem like a waste of
time
• Good documentation will
include a lot of
information that might
seem obvious
www.flickr.com/photos/smutjespickles/2434418686/
Document your data as you go
If you don’t, it may become impossible
for you – or someone else – to
understand and re-use data later on
Question Mark Sign by Colin_K
on flickr:
http://www.flickr.com/photos/colin
kinner/2200500024/
Make research material
understandable
What’s obvious
now might not
be in a few
months, years,
decades…
Image: http://www.flickr.com/photos/archer10/5692813531/
MAKE SURE
YOU CAN
UNDERSTAND
IT LATER
Make research reproducible
• Detailing your
methodology helps
people understand
your research better
• Explaining your
algorithms, search
methods etc makes
your work reproducible
• Conclusions can be
verified
Image by woodleywonderworks on flickr:
http://www.flickr.com/photos/wwworks/4588700881/
Make material reusable
• Material may be reused by someone in a
different discipline
• Provide context to
minimise the risk of it
being misunderstood/
misused
Backing up
•
Lots Of Copies Keeps Stuff
Safe (LOCKSS): make
multiple back-ups
•
Keep back-ups in a
separate place to the
original
•
Use different types of
storage media, eg CDs, pen
drives, networked storage,
external hard drive
From: “Copy Copy Copy” by David Goehring (CarbonNYC)
via flickr
For everything you keep….
Make sure you can:
• find it again later
• understand later
Where to get help
• Earth Institute will be putting up links on
Website
• Your supervisor
• Library
• Funding agencies
• Earth Institute will be putting up links on
Website
Oh yes…what do Apollo, the Domesday Project, and
award winning scientists from the US National Science
Foundation have in common?
Questions?
• My contact information:
– Kalpana Shankar (kalpana.shankar@ucd.ie)