Collecting & Organizing Data

  • You  can  recognize  the  type  of  data  format  a  file  is  by  its  filename  extension.
  • When  creating  data,  choose  a  format  that  is:
    • Self-describing,  i.e.,  contains  embedded  metadata  that  help  interpret  the  context  and  structure  of  the  data  file.
    • Lossless, it  retains  data  in  its  original  state  or  contains  as  much  of  the  original  information  as  possible.
    • Non‐proprietary and  compatible  with  different  versions  of  software.
    • Open  standard,  and  has  specifications  that  are  well  documented  and  accessible  to  the  public.
    • Common  or  has  widespread  use
  • Choose  a  file  name  that  is  concise and  meaningful.
  • Consider  including  dates:  yyyy‐mm‐dd.  When  referring  to  year  only,
    always  use  four  digits.
  • Terms  separated with  a  dash  ( ‐ ).
  • Avoid  punctuation.
  • Be  consistent in  using  file  structures:  on  all  of  your  computers,  in
    your  Dropbox,  and  in  your  Google  Docs.
  • Example:  Species_Site_Date_FileType.FileExtension
    might  be  the base  structure,  and  files  might  be:
    • Eaffinis_nanaimo_20100901_FieldCounts.xls
    • Eaffinis_nanaimo_20100901_ANOVAcode.R
    • Eaffinish_nanaimo_20100901_adult232.tiff
  • Keeping  track  of  changes  or  revisions  of  data  files  is  referred  to  as
    version  control.
  • Consistent  file  names e.g. 2010‐03‐05  Female  Health  Survey  Results;  2011‐04‐15  Female  Health  Survey  Results,  etc  where  results  are  collated annually.
  • File  hierarchy refers  to  the  number  of  levels  or  sub‐folders  in  the directory
  • Folder direction determines how  folders  are  ordered  e.g.  Results  >  2012  or 2012  >  Results.
  • Ambiguous naming or overlapping  categories, especially  at  the  top‐level, can  cause  confusion.