From LabWiki
Jump to: navigation, search

Backups and archives

Backups are not are not archives. We backup the system in order to recover from fires, earthquakes, disk failures and stupid things that we all do from time to time. Archives are what you should create when you are finished with a project so that you or your biographer or a history graduate student in the year 2435 will be able to reproduce your work. It requires thought, organization and lots of notes. Backups are just snapshots of how the filesystem looked at a particular instant -- Most likely just before you hit save on the file containing that cure for cancer.

Backup frequency

We can't backup everything all the time or we would run out of molecules in the universe, so we have to be a bit selective. That said, we really aren't all that selective. With a few exceptions, noted below, We back up every user file every day to a dedicated server and we retain copies for generally 30 days (six months for home directories).

Every week we send a copy of the most recent backup to the clouds where it is retained for two weeks.

Consequently, the only ways you can get in trouble are:

  • By breaking the law less than one or (or in some cases six months) before a subpoena arrives.
  • By deleting something important and forgetting to ask for help recovering it before a month goes by
  • By deleting something important before it has been on the system long enough to get backed up (one full day to be safe)
  • By storing hard to replace files in places where your supposed to store easy to replace file.

/72hours is not backed up

It is useful to have disk space allocated for "temporary" storage of large, intermediate and easily reproduced files. 10GB of IPUMS data that you can download again with one click, for example, or Stata datasets that you only write out in order to merge or collpse them. Even jstor articles for which you have stored the permanent link don't really need to be stored and backed up endlessly.

For this purpse we have directories called /72hours. Despite the name, stuff that you put in /72hours will be there for at least 60 days--unless of course the disk fails -- then they are gone because we do not backup /72hours. Files will also disappear from /72hours after 60 days without modification. That's good thing. All that stuff that you download and lose interest in after 10 minutes. Put it in /72hours.

Two more features of /72hours:

  • Every server has its own /72hours directory. There is NOT just one that is shared across the network (like COMMONS). The good ting about this is that local disk is quicker to read and write to--that can save you minutes when you are dealing with huge files.
  • It is a good practice to store large important (hard to reproduce) data files in COMMONS in compressed tar files and simply unpack them onto /72hours when you are need them.