Jump to content

Electronic Data Storage and Archive?


Recommended Posts

Yeah, I'm sitting on about 500 GB and growing at about 250 GB a year now that we both have dSLRs. If we ever start doing any video it's gonna get really bad... :eek: My back up (BU) drives are full, and my primary drives are getting close.


So, there are a truly amazing array of options out there, and I've been thinking them over all weekend.


Here's some thoughts:


1. RAID: Configured correctly, provides disc redundancy, but not true back up. Files may be lost due to corruption, OS failure, viruses, accidental deletion, fire, flood, electrical storm, theft, etc. However, it does automatically create copy of everything in real time for hard drive failure. Depending on your machine you may need a RAID card, disc drives, or other things. Setting one up seems to involve an operating system reinstall, so I think all data must be backed up to something else for the install. Setting up the RAID is pretty easy, but a pretty big job then to reconfigure your computer after the clean install, restore BU'ed data, software, and getting all your software back to updated status.


2. External hard drives: Not automatic unless attached to the system, then subject to all of the problems above. To be done well you really need to keep a set of them and shuttle them to and from an off site location on a regular basis. Data between shuttle cycles is at risk, as above. Likely there is some BU software that will manage shuttled discs, but I haven't found it yet. Some come with BU software, some are bare, some PCs can use internal type drives in front bays. Drive quality levels and speeds differ, drive connectivity differs. Lot's to figure out. Requires discipline and work. Tends to fail for those reasons.


3. Network Attached Server (NAS) or old PC converted to server: Assuming you have several computers in the house, these devices can act as storage for all your machines. They also can act as media servers throughout the house, and even across the internet. They attach to your router (if you have one) by various means, usually ethernet, they may or may not be able to play host to and network your peripherals, they may or may not be able to host BU drive(s), they may or may not come with usable BU software. Prices vary widely. Essentially however, they are no more secure than a RAID. They may, depending on the software, protect against accidental deletions, but they are certainly subject to theft, virus, corruption, fire and flood. However, like a RAID, once configured they are generally automatic and work in real time. They are likely easier to install than a RAID. One will work for all your machines, assuming you picked the right one.


A bewildering array of offerings from simple to complex, single drive (essentially an external hard drive with some network software) to multiple bay boxes where you buy hard drives separately, configure RAID (or not) arrays to your desires, and set up BU and sharing in a customizable way. A whole set of jargon and specs to decipher.


Or you can set up an old pc similarly, but you'll need network software, BU software, and even more expertise. Be aware that Windows 7 native BU function does not work on networked drives unless you have the "Ultimate" version.


4. Accidentally deleted this option. :)


5. You could store with a service on "the cloud." Like Carbonite, Crashplan, Dropbox, SmugMug, etc.


Upload speeds are reputed to be horrifically slow, and getting 500 GB uploaded might take months, as well as put me on the wrong side of my ISPs data quota. Some allow you to send up a "seed disc" to get around the upload speed restrictions. Although download speeds are better than uploads, you need to choose a service that will overnight you a replacement disc if you have any significant amount of data. Some of the services are very pricey, others are more reasonable. Services can go out of business without warning, you don't control access to your data, terms and conditions and prices can all change and you may be locked in. Most photographers seem to feel the upload speed restrictions, even after initial upload with a seed disc, are a show stopper for this type of service. We can come back from a shoot with 50 gB easy, and it could take weeks to get it uploaded.


6. A new twist on the cloud, but not sure how it works: Crashplan will let you use their system to back up to a drive you have on an offsite pc, for instance at a friends place. Some think this will work faster than cloud storage, it is automatic and offsite and seems to protect against accidental deletes. I haven't heard from anyone that actually has done it. It's not clear if it really is any faster. It's not clear if you can "seed" the BU disc. To be real time the other pc must be on when you are working. Although evidently the program will do the BU whenever both pc's are on. There may be issues with firewalls and security software. There can be issues with ISPs and maintaining the connection across the net.


7. CDs and DVDs. Generally viewed as unacceptable due to size limitations and "disc rot." Also dependent on a still rapidly changing technology. Will you have a dvd reader in 5 years? Most likely, but I couldn't believe how fast floopy drives went by the wayside.


Ok, those are all the basic tools. To get a good secure system you need to design a system that likely combines several of the tools. It's complicated, a lot of work, some real money, and we haven't even talked about file formats yet!


You need to recognize that your pc's or mac's or what have you are going to change over time. You need to see your plan as having a potentially finite lifetime.


So, this is meant to be the start of some discussion. What are you doing for BU? What have you tried? Where are you headed? Why? I figure this a problem for lots of folk with more than a few GB of data, and a growing problem for all. So I figured it was of general enough interest to post on BMWST, although you can certainly find discussions elsewhere. Good to get folks thinking about it.



Link to comment
Joe Frickin' Friday
7. CDs and DVDs. Generally viewed as unacceptable due to size limitations and "disc rot." Also dependent on a still rapidly changing technology. Will you have a dvd reader in 5 years? Most likely, but I couldn't believe how fast floopy drives went by the wayside.


Howzabout a Blu-Ray burner? 25 GB per disc, the price is approaching $1 apiece. SATA burners can be had for about $100 now. This will also let you burn HD videos and slide shows that can be presented on a big screen HDTV.

Link to comment

I'm afraid I don't understand how any individual's data needs can grow by 250gb/year. I've got a 5-6 year old iMac G5 with 250gb HDD, no more than half full. I've been running Linux on a netbook for a year, and I have 21gb free space on a 32gb SSD.


That said, libraries have been dealing with these issues for a long time -- and will be doing so for an even longer time, since "preservation" to a librarian may mean centuries. Emory University acquired Salman Rushdie's "papers" several years ago. Rushdie uses Macintosh computers, and when he filled up a drive, he just bought another computer, so over a period of time, he used a large number of different programs for writing. In the long run, changing data formats is probably a bigger issue than storage media, as it's relatively easy to migrate from 3.5" floppy to Zip drive to optical, etc. but if there is no software to read the data, all you have is a blob of bits.


The U.S. Government Printing Office standardized on PDF many years ago, which tends to make PDF an enduring standard, regardless of its merits -- much like the QWERTY keyboard layout. And of course, PDF is useless when you are talking about preserving something like a database, or multi-media.


I attended an archives workshop once in which the speaker (without tongue in cheek) recommended printing documents as optical images as the best long-term preservation option available at this time. Yeah, I'm talking microforms. Silver emulsion (properly processed and stored), has a predicted lifetime of centuries, and all you need to read it is an optical magnifier. Once I got over the shock, I realized that this makes a great deal of sense for text, but not so much for other forms of data.


IF one accepts that much of what we are creating is worth preserving (a highly debatable proposition), then, as a society, we are probably losing a larger percentage of what we are creating than ever before in our history. I have a ledger book from a country store (1789-1791) that is perfectly usable. I have magnetic disks in 8", 5.25", and 3.5" formats, but no hardware to read them.


Collection of Last Resort, U.S. Government Printing Office.

Link to comment

I have a 3.0 TB RAID (mirrored) that runs Time Machine (hourly backups).


I have a secure (locked and inaccessible by thieves) 1.0 TB RAID that does a bootable backup (i.e., disk image) of my 500GB hard drive every day (it takes 7-8 minutes).


I have off-site storage which I rotate every two weeks.

Link to comment

1. How critical is the data?


Will any loss be expensive? How much can be recreated? Over what period of time?


2. How critical is it that the data be restorable immediately?


Some backup methods are easier to recover from than others.


3. How long could you go before discovering potential data loss?


This will determine your data archive period. One of my old customers generated as much as 10 GB per day and around 2 TB per year, all of which needed to be kept in perpetuity. (It was scientific imagery that supported papers, experiments, grant proposals, etc.) It could take years before any local data loss was noticed.


Other data will be noticed right away, and you don't need perpetual archives, but merely what existed last week.




If the data is super critical, it probably makes sense to invest in high quality storage equipment. Note that high quality storage equipment is very expensive.


Most data will be fine with commodity RAID. Setup shouldn't be difficult, and maintenance should be rare. Key to any of it working is that you monitor it and always have spare disks on-hand.


Alternatively, external hard drives may be perfectly fine. With eSATA, speed is there. Commodity capacity is up to 3 TB and growing. Mirror between external drives regularly and you've got a reasonable solution. Rotate external drives out regularly and you've got a pretty redundant solution.


Most commodity NAS, on the other hand, is pretty bad. I'd avoid it. Something like Windows Home Server might not be bad, but its future seems rocky (with some declaring the recent removal of the "Drive Extender" functionality to be its death.) I don't have a lot of first-hand experience with it. I do have some first hand-experience with several commodity NAS products, all of which have been trouble prone.


I'm with you on optical media.


My personal solution is pretty simple. Non-critical data can go to the cloud for backup. If I lose all of my video files and it takes a month to recover, no biggie; I'll survive. More critical data is mirrored nightly, which I've decided is often enough for my needs. That includes a bootable mirror.


I should probably rotate an archive off-site here and there, but I haven't. But then, I don't have a lot of data that couldn't be replaced.

Link to comment

I've got a Netgear ReadyNAS that I've been pretty darn happy with, but I don't generate anything like the amount of data you do. One nice feature ReadyNAS has is their X-RAID format, that allows you to increase storage space by simply hot-swapping in new drives one at a time (mine is a 4-bay unit). I have the NAS backup the data that isn't already a backup to an external hard drive for redundancy. If I were smart and conscientious I'd swap that drive with an offsite one once a week.


Our computers all back up to the NAS, plus we use it for shared file storage. I also back my photos up to MobileMe, so at least they're offsite. But I don't think I even have 10 GB of photos, total.

Link to comment
I've got a Netgear ReadyNAS that I've been pretty darn happy with, but I don't generate anything like the amount of data you do. One nice feature ReadyNAS has is their X-RAID format, that allows you to increase storage space by simply hot-swapping in new drives one at a time (mine is a 4-bay unit).


The ReadyNAS is one of those I'd chalk up as questionable. I found it just slow and a bit annoying. One coworker had enough issues that he forced a group he was supporting to chuck it and buy something else. And another coworker who's had one at home has had a litany of small issues.

Link to comment

I'm sure we all could go a long way to reducing the problem if we would do a better job of filtering what we keep. I try but am not too successful. I'll take 20 shots of my dog, sometime in the 6 fps mode to get one really good one but don't spend the time to save only the ones that are good, it's easier to save them all. I'm sure a full 30% or so of my photos could be deleted without loosing much. My wife would say I could delete 95% but I won't go there!!!

Link to comment

I shoot only in digital RAW and I've scanned all my medium- and large-format negatives as high quality TIFFs. And shooting for 30 years, my Aperture library is only 69GB.


Jan, there's no way a lot of the stuff you're keeping is worth keeping.

Link to comment
Jan, there's no way a lot of the stuff you're keeping is worth keeping.
I'm sure that's true, but is the time to sort out the bad stuff worth the cost of just saving it all? It isn't for me and I don't have nearly as much as Jan.
Link to comment

Its not just about the sorting out of the keepers, how do you catalog the 250+ GB so you can find what you want when you want it?

Link to comment
Its not just about the sorting out of the keepers, how do you catalog the 250+ GB so you can find what you want when you want it?
That could be a problem, if you tried to. I just dump everything onto the backup ASAP and then do whatever I want with the new material. Anything that gets changed gets saved again as a new file. The backup is only intended as catastrophic loss protection for me.
Link to comment

Any you think managing 250GB growth annually is a problem? We just (yesterday) introduced a new product capable of archiving an Exabyte of data (and my challenge last year was Petabytes of data). I don't see any end to this growth. There are market projections of 35 Zetabytes in the next 9 years.


Having said that, Greg is asking all the right questions. I tell my customers that they shouldn't concentrate on a 'backup' or 'archive' strategy, and instead focus on a 'restore' strategy and work back from there. I can tell you from MANY more years of experience than I can count, that backup is easy. Restore...that's an entirely different animal.


FWIW, for my home systems, inexpensive USB attached drives work fine (albeit, I attempt to intelligently make decisions about what I absolutely can't lose - that doesn't amount to 250GB of annual growth). And yes, RAID (that's another can of worms depending on how 'exposed' you want to be during a RAID rebuild)


Mike O

Link to comment

Lots of great thoughts!


So, if you were going to set up some sort of home server would you buy one of the NAS's or convert an old unused pc? What are the advantages of each route? If you would convert a pc, what software and OS would you use for serving files and back-up utilities?


The candidate pc has 2 160 gb sata hdds w/3.0 gb/s interfaces in a RAID 1 configuration, and room for two more hdds. It's running XP, which I just updated. It has USB 2.0 and firewire ports, but no eSATA. Ethernet of course. It's an AMD X2 4600 processor (Dual 2300 mhz cores, sort of). 1 GB of RAM (It was a lot at the time!). It seems to be running well. I don't think it matters much for the purpose, but it has an evga nvidia 7600 GS graphic card w/256 mb ram. the MB will take up to 4 GB of ddr2 (pc3200) RAM.


I ran the MS upgrade advisor and found that it would support W7 32-bit or W7 64-bit with more RAM (minimum 2 GB).


Here's sort of what I'm thinking:


The dedicated nas server boxes come with server and bu software and intended for the purpose. It's not so clear that they are easy to run and configure, and most will not support a lot of peripherals. Price wise, I'd probably be around $600 for a decent box and an hdd. This choice would likely be more energy efficient. I guess these are addressed through one of the networked pcs, and have no keyboard and monitor of their own, a space savings.


The converted pc can obviously handle all the peripherals I want (presently two printers, a scanner, and I would add an external hdd for bu to offsite storage). On the down side, I don't really see a good server/bu software package out there. I think I would add a large drive, maybe a 2 TB WD Black and kill the internal RAID. Putting the peripherals on the server is not mandatory, they are already networked through w7 homegroup, but it gets them off of my main desktop which allows me to regorganize the room better and improves the space around the working desktop. Potentially they could be in different rooms.


Can I just put W7 on it and use that for my network software or would I be missing functionality? If I do this, maybe I will use some bu software bundled with the external hdd. Seagate or WD?


Just thinking out loud here... but I would appreciate any thoughts on the matter.

Link to comment


This topic is now archived and is closed to further replies.

  • Create New...