Twiiter

Twitter Updates

    follow me on Twitter
    Loading..
    Loading..

    Entries in Mac (5)

    Saturday
    Oct242009

    The loss of ZFS

    Well, in case you haven't read any of the myriad stories about it, it appears that Apple has decided not to use ZFS on Mac OS X. Gruber has sources that say it was primarily licensing concerns, which is consistent with what people have implied to me, both recently, and around WWDC (although at that time I think there was probably still hope of resolving the issues).

    Now, some people jump may comment that it couldn't be licensing issues, since ZFS is opensource (under the CDDL), and that Apple already uses CDDL software (DTrace). That may be true, but often in deals that involve large companies there is more to it than that. Apple may have wanted guarantees of indemnification in the NetApp lawsuit. Maybe it wanted guarantees that certain modifications it wanted to make would be accepted upstream, or even to get Sun to make certain changes. It also might have wanted additional distribution rights that were not granted under the CDDL. It is typical for companies to negotiate custom agreements in such cases (and for some money to change hands), so the idea that licensing issues are why it fell through is entirely reasonable, even though it is an opensource product. Obviously Sun's steady decline in the market place, and the uncertainty caused by the Oracle acquisition may have greatly complicated any such negotiations.

    Why not do a new filesystem?

    Apple has a lot of talented filesystem engineers. They are certainly capable of doing something comparable to ZFS, at least for their target market. The problem with developing a new modern filesystem is that it generally takes longer than a single OS release cycle. Most companies are really bad at having large teams focused on projects that will not ship in the next version of the project they are working on.

    This is a particularly acute problem at Apple, which traditionally has done things with very few engineers. I don't want to get into exact numbers, but I recall having a discussion with the head of a university FS team who was discussing the FS he was working on. He was pitching it to a group of Apple engineers. It was some interesting work, but there were some unsolved problems. When he was asked about them he commented that they didn't have enough people to deal with them, but he had some ideas and it shouldn't be an issue for a company with a real FS team. It turned out his research team had about the same number of people working on their FS as Apple had working on HFS, HFS+, UFS, NFS, WebDAV, FAT, and NTFS combined. I think people don't appreciate how productive Apple is on a per-engineer basis. The downside of that is that sometimes it is hard to find the resources to do something large and time consuming, particularly when it is not something that most users will notice in a direct sense. That is especially true if senior management is not excited about the idea.

    Because of that, I was fairly convinced ZFS was a credible future primary FS for Apple. Not because it was an optimal design for them (it isn't), but because it was a lot less work than doing a new design from scratch. The fact its fundamental architecture is 20 years newer than HFS meant it would still be better than HFS+ in almost all respects even if it was not designed for Apple's exact needs. Clearly I was wrong, since Apple has stopped the ZFS project.

    What changed?

    Well, a couple of things have happened. The first is that Mac OS X has gotten more mature. They no longer need to port all of those FSes, they already have them working, and in most cases they work fairly well. That frees up some engineers. Apple has also greatly expanded the number of people working on their kernel since it is amortized over many different products (Mac OS X, iPhone, AppleTV, etc).

    Suddenly the notion of doing a new filesystem seems doable, so long as it is a real priority and the FS team doesn't get pulled to keep adding features or doing major work to legacy FSes. That is still a lot of work when Apple had ZFS approaching production quality on OS X.

    Apple can do better than ZFS

    Sun calls ZFS "The Last Word in Filesystems", but that is hyperbole. ZFS is one of the first widely deployed copy on write FSes. That certainly makes it a tremendous improvement over existing FSes, but pioneers are the ones with arrows in their back. By looking at ZFS's development it is certainly possible to identify mistakes that they made, and ways to do things better if one were to start from scratch. From where I sit, there are 3 obvious ways doing a new FS will be better for Apple than ZFS:

    1. There have been new fundamental research since ZFS was designed that simplifies many of the issues involved with it. In particular the "B-trees, Shadowing, and Clones" (PDF). That paper is the basis for the design of BtrFS, which has a very similar feature set to ZFS, but internally is entirely different. LWN has an article about BtrFS that explains the significance in some detail (it is written Valerie Aurora, who worked on ZFS at Sun).

    2. ZFS was designed for the storage interfaces available a decade ago. Spinning disks are going to be with us for a long time, especially for bulk storage in data centers and on backup devices. The future is all about solid state. Flash SSDs have significantly different performance characteristics than spinning media, and there may be FS design decisions one could make that would benefit from that. Now, any FS Apple designs will have to work acceptably on traditional drives, but if they are designing for the future then flash is what to target.

      ZFS has had some optimization work for flash, but it is all in terms of using flash as part of a storage hierarchy. That makes complete sense, since ZFS's primary deployment targets are high-end systems and data center storage. Those systems have multiple drives, so the idea of separate flash drives for a ZIL and L2ARC are completely reasonable. Most consumers have one drive in their system, and maybe an external drive for bulk data, data exchange, and backup.

    3. That brings up the last point. ZFS is designed for big systems. It works on small systems, but most of the tradeoffs favor very large computers, with lots of drives. This shows up in a number of ways. The first is that ZFS is not currently capable of adding single drives to an existing vdev or migrating vdevs between various types (mirror, raidz, raidz2). This is a major feature for smaller users who might want to add a single drive, but is a non-issue for data center users who tend to add large number of drives all at once, since they will add whole vdevs. Another issue is that ZFS assumes you have a lot of ram. NEC has been doing a port of OpenSolaris to ARM, and they determined they could not get ZFS to use less than 8 megabytes of ram without making incompatible format changes (Compacted ZFS). With those changes they could squeeze it into a more reasonable 2 megabytes. On a desktop that doesn't seem like a big deal, but on an iPhone 3G or a Time Capsule 8MB of wired memory is an enormous issue.

    The only major downside is that if Apple is just starting on a next generation FS now it could be a long time before we get our hands on it.

    But now we are going to have another incompatible next generation filesystem

    Wolf brought this point up during some of the ZFS talk on twitter yesterday. My general opinion is that it doesn't matter. People use drives for two largely unrelated tasks. One is running their computers. This is fixed storage. The other is for data exchange. In the old days people used floppies for their sneakernet media, which made the situation much simpler to understand. In recent years the market realities have caused people to move to using SD cards, thumbdrives, and hard drives as the exchange medium of sneakernet.

    The important point is that understand is that while the physical devices may be the same, the use model is different, just as the using a floppy disk and an internal hard drive were different. Nobody would balk at the notion that floppies should use different FSes than internal drives. Likewise, most people shouldn't care that their external drives are formatted differently than their internal drives.

    There are complicated features you want for your boot drives and system disks. Ideally you could have them on your interchange disks, but there are other features that are more important, particularly interoperability, and simplicity. ZFS didn't bring either of those. There might have been a few people who were psyched to be able to use ZFS to share disks between a Mac and a Solaris or FreeBSD box, but honestly those people are few and far between. Whether Apple used ZFS or something else it is just as interoperable with Linux and Windows (which is to say, not at all). So that fact that Apple looks to be doing a new FS does not impact interoperability in any real sense.

    The other feature you really want for an interchange FS is simplicity. There are a lot of devices out there that use an FS to communicate with a computer. The simplest example is a digital camera via its media cards, but there are many others. Something like ZFS is way too complex for those devices, and honestly most of the features of ZFS like multiple drive support and snapshots are useless since the devices don't have the physical interconnects or user interfaces to expose those features. There is certainly an argument to be made that we could use something a bit better than FAT32 or exFAT as that format, but ZFS was not the right solution for that.

    In other words, for that disk you want to use as an external drive to drag between computers you don't want something like ZFS, you want something that is simple enough that a firmware engineer can write a read-only implementation from the specs in less than a week. For the disk embedded in your computer (operationally or literally) you want something like ZFS, but it doesn't matter if it is interoperable with anything else because you won't be moving it between systems.

    This is basically how Windows works. Microsoft generally uses NTFS for internal drives, but FAT for external drives. Ultimately somebody should design a filesystem explicitly for use as an interchange format and license it for free, then everyone can deal with their internal FSes and do what makes the most sense for their OSes and markets.

    Monday
    Apr272009

    Maybe doomed was too strong a word

    When I wrote my first post about Time Capsule it was mostly a rant so I could point friends to it instead of going over it again. I am naive, I know once you put something out on the Internet it is there for everyone to see. I didn't mind people reading it, I tweeted that I had written it, but I didn't expect to get fireballed. The honest truth is I was still fiddling around with blog templates an doing changes that were temporarily making the blog unreadable when a friend texted me asking "Are you the one writing /dev/why!?!" My response was "Yes, and how did you hear about it?"

    In my first post I might have been overly harsh, and I probably focused on one particular issue too much. To be fair, I have had multiple corrupted backups, and I feel that does entitle me to be a bit harsh. On the other hand some very talented people have spent a lot of time trying to make Time Capsule into a product that finally makes it pleasant for users to backup their data. Given that most users are unwilling to expend any effort backing things up it is probably worth cutting Apple some slack, even if it has been a bit of bumpy ride getting there.

    So what about the issues?

    I spent a lot of time focused on the issue of data reliability. As Drew and Dominic pointed out that is a bit of redherring. I acknowledged in the comments that it was more an issue of stacked complexity, that the deeper things get the more complicated and harder to trace they are, and that Apple chose a relatively deep stack (HFS+ on Mac OS X -> a disk image on Mac OS X -> AFP client on Mac OS X -> AFP server on NetBSD -> HFS+ filesystem on NetBSD). In general doubling the number of components in a stack like that more than doubles the complexity, at least in terms of catching all the weird edge cases you need to make it reliable. After the number of corrupted systems I had over the course of a year I just assumed it was too complicated a setup to make work. This was reinforced by several Software Updates that had the note "Improves Time Machine reliability with Time Capsule," but didn't solve my problems.

    As it turns out there were some fairly significant issues, but they were not insurmountable, just difficult to track down. After talking with several contacts I have a fairly good understanding of what was going on, and there was a bug that was fixed in 10.5.6 and the Time Capsule 7.4.1 firmware. If you are using a Time Capsule you should make sure you update to those if you have not already.

    So problem solved?

    Well, problem mostly solved. First off, until I have been running it for a while I can't be 100% confident about the fix, but some very smart people have told me it is fixed, and I am confident enough in their judgement to use my Time Capsule as my primary backup device. I have also been doing some particularly evil things (purposefully cutting the networking, pulling power on the Time Capsule and/or the Mac, etc). So far the backup integrity has withstood all of this, so I am mostly statisfied?

    Mostly?

    While the integrity of the data (which is what I focused on in the last post) seems to be assured now, there were some other issues with Time Capsule that I had neglected to mention in my previous post, because I got sidetracked on that one issue and the post was really long. Compared to the integrity issues they are minor, but they are worth mentioning, if only to help out other people.

    The first one is the while interrupting backups does not have the potential to corrupt backups any more, it can greatly increase the time the next backup takes. A full explanation of when and why that happens would take an entire blog post, but the short version is that if the disk image is properly unmounted it can correctly track what directories have had changes done to them, and if it is not unmounted it needs to scan substantial portions of the backup to figure out where it was when the volume was unmounted. For the most part that is not a big deal, but it does suck a bit when a backup takes 30 minutes to scan things instead of 5. Strictly speaking this a limitation of HFS+, not Time Machine or Time Capsule.

    Scanning large directories is just painful. You will notice in the above paragraph I said that when HFS+ is correctly unmounted the system maintains a list of directories that were modified, not files. For various reasons it is way too expensive to keep a list of all files that were modified on HFS+, so it tracks the directories that are modified, and Time Machine takes that last of directories and scans the files in those directories for changes. It is a very reasonable compromise that works very well unless you have directories with thousands of files that frequently change. Scanning those folders can be very slow, and if you have to do it every backup it becomes an issue.

    My problem is that I use gmail. Gmail's IMAP bridge is kind of weird, but the big issue is that it doesn't really have mailboxes, but it emulates them by putting everything with a particular tag into an IMAP mailbox. That means your inbox contains every mail you have ever received, and the folder changes every time you receive a mail. It also means that most of your mails are duplicated at least once if you tag them. Since Mail stores every a ."emlx" file for every email in a directory that corresponds to the IMAP folder that means I have several folders with thousands to tens of thousands of emails, and INBOX that is immense, all of which frequently have new files added.

    As a result Time Machine and Time Capsule takes 3+ hours canning for every backup, more if the previous backup was interrupted. Once the backups get that long the odds of interrupting them are pretty high, so I got into a state of perpetually scanning and never completing a backup. To be fair, this is a combination of limitations HFS+ imposes on Time Machine, Gmail making some poor choices in their IMAP bridge, and Mail making some poor choices in their file storage, all three of which conspire to make for a bad experience. Fortunately it was simple enough to fix by excluding ~/Library/Mail/ from the backup. Since my mail is stored on a server backing it up locally is not strictly necessary.

    So, I have gone from doom and gloom to functional but with some problems. My initial backup took ~35 hours, and my Time Capsule's AFP server has wedged 3 times (but the Time Capsule itself kept running and it was possible to reset the just the AFP server by hitting "Disconnect All Users" in the admin tool) since I started using it again.

    I am told my blog posts get kind of long winded, so I am also posting a Networked Time Machine Best Practices post that just gives simple advice without all the wandering analysis and speculation. I plan to post another update after about a month of using it. Additionally, I imagine I will post another update sometimes after Snow Leopard ships just to look at what has changed. While no new features have been disclosed, Snow Leopard is supposed to be all about cleanup, reliability, and performance, so I am eager to see if there are any improvements over the current experience.

    Monday
    Apr272009

    Networked Time Machine Best Practices

    Here is a short best practices post for Time Machine on network storage.

    Make sure your Macs are running Mac OS X 10.5.6, and your Time Machines are running firmware 7.4.1

    Prior versions had some issues that could reduce the reliability of your backups, you really should upgrade.

    Use a Time Capsule or Mac OS X Leopard as your AFP server

    Time Machine depends on at least two undocumented extensions to AFP 3.2. While the netatalk guys appear to have reverse engineered them, they are not in a stable netatalk branch yet. Even when they put them in a release they may not be 100% reliable because the OSes netatalk is running on may not allow some of the functionality those AFP commands require. Finally, netatalk 2.0.3 was released in 2005, 2.0.4 has been in beta for months, and these patches are not planned until 2.1, which may quite a while away.

    Apple sells 500GB and 1TB Time Capsules. If you need more space than that a bit limited. Ideally you would buy a Mac Pro or an XServe, but that is probably cost prohibitive for a personal backup server. My best advice would be to get a Mac Mini and an external drive. As I mentioned in several of my other posts the bridge chips for external drives are often a problem. Realistically fr that to be an issue you would need to lose power in the middle of backup and be a bit unlucky, but your backups are your last resort, so taking chance with them is not a good idea. The sync issue can be somewhat mitigated by placing the Mac Mini and the drive on a UPS. If the computer is set to immediately shutdown then the drives track cache should be flushed long before the battery runs out. It is less than ideal, but short of buying a Mac Pro there are not many options.

    If someone actually knows of a bridge chip that pushes syncs (and any enclosure vendors using it) please let me know and I will update this post.

    Don't interrupt backups if you can avoid it

    While interrupting a backup should not cause data loss, it can substantially increase the amount of time your next backup will take.

    Exclude directories with lots of little files if they change frequently

    Scanning a directory with lots of little files to determine what to backup can be very slow. Directories that have lots of files but never change are fine since directories don't generally need to be scanned unless something changes. In my experience the most common culprit are IMAP mailboxes in ~/Library/Mail

    Thursday
    Apr232009

    Good programs are lazy

    Once upon a time my job title was "Performance Engineer." Back then my job was to make things go faster. Now for every piece of code the particulars of how you do that varies, but at the end of the day almost all code optimization is fundamentally about figuring out how to do less.

    It seems fairly basic, things run faster when you are doing less, when you need to send less data, when you need to walk through smaller structure. Except for hiding latency (making a long process appear look shorter actually without making it shorter, incrementally displaying results for instance) almost all optimization is about figuring out how to do less.

    That is why code being lazy tends to be a big win. By lazy I mean procrastinating, putting off anything it can do until the last moment. If you need an array for some operation why should you ever initialize it until the first time you try to do that operation. Of course that is just a rule of thumb, in some cases there may be responsiveness or latency concerns that require you to build some structures ahead of time, but people tend to default to precalculating things, and special casing lazy initialization when their profiling tells them something.

    The problem with that is that individually most of the time non-lazy initialization is not a huge a deal, but as Jamie points[Jamie] out, small performance issues everywhere do add up. Not only do you burn fewer cycles, but I have found that if I am aggressively making my code use lazy initialization techniques it tends to reduce ram footprint and heap fragmentation, because any time some precalculated object turns out not to be used it is never allocated in the first place. On small devices with no paging (like an iPhone) that can be be a significant improvement.

    So why don't people write lazy code? Well, it is harder. That's right, being being lazy can be more work. The reason it is harder is more of a mindset and tool issue than a fundamental one. It is a systemic problem, and moving to being consistently lazy code requires a systemic solution. After I found myself doing the same pattern repeatedly I decided to codify it.

    Imagine you had an Objective C class with an init like this:

    - (id) initWithString:(NSString *)string_ {
      if ((self = [super init])) {
        cache = [[NSMutableDictionary alloc] init];
        derivedStringValue = [[self expensiveCalculation: string_] retain];
      }
    
      return self;
    }
    

    In that code we are allocating a cache array and doing some expensive calculation at object creation, but we don't know for sure the user is going to require us to use either of those. So, how would we rewrite that lazily? Well, here is what I used to do:

    - (id) initWithString:(NSString *)string_ {
      if ((self = [super init])) {
        string = [string_ retain];
      }
    
      return self;
    }
    
    @synthesize derivedStringValue;
    
    - (NSString *) derivedStringValue {
      if (!derivedStringValue) {
        derivedStringValue = [[self expensiveCalculation: string_] retain];
      }
    
      return derivedStringValue;
    }
    
    @synthesize cache;
    
    - (NSMutableDictionary *)cache {
      if (!cache) {
        cache = [[NSMutableDictionary alloc] init];
      }
    
      return cache;
    }
    

    Okay, that code doesn't allocate anything until we need it. In exchange it is no longer safe to directly access the ivar, you always need to go through the getters (i.e. [self derivedStringValue] or self.derivedStringValue). There are also added complications if you have some dependent values, you need to be careful their initializers don't form a loop. We don't have to worry about initializing the ivars to nil since the Objective C runtime zeros out newly allocated objects. Generally this works well, but it is more code and it is no longer clear what is initialization code. Since we use synthesized properties, if we call the generated setter it will bypass the initializer, which is equivalent to manually setting the ivar. So the question is, can we do better? Since I am writing a blog post about it, I must think the answer is yes.

    First, lets look at what is common between the two initializers, and try to isolate it:

    #define LG_SYNTHESIZE(name, type, initializer)  \
    @synthesize name;     \     
                                             \
    - (type) name {              \
      if (!name) {                   \
        name = (initializer);  \
      }                                     \
                                            \
      return name;               \
    }
    

    Now, using that macro lets rewrite the lazy version:

    - (id) initWithString:(NSString *)string_ {
      if ((self = [super init])) {
        string = [string_ retain];
      }
    
      return self;
    }
    
    LG_SYNTHESIZE(derivedStringValue, NSString *, [self expensiveCalculation: string_]);
    LG_SYNTHESIZE(cache, NSMutableDictionary *, [[NSMutableDictionary alloc] init]);
    

    Now that is much better. Ignoring the macro, it is about the same length as the original, and it is clear what is initialization code. One interesting thing to point out is that in many cases doing this allows you to eliminate the -init method. This example requires it because we need to stash an a parameter, but in general any class that can be initialized with -init (as opposed to -initWithFoo:) can be written with lazy initialization so that you don't have an explicit -init (of course some super class will implement -init).

    So, by coming up with a few macros, working out a few rules, and always using accessors over ivars there is a lazy init pattern for Cocoa that works fairly well. I have found that there is a need for several different synthesis macros, basicly corresponding to Objective C property attributes (LGSYNTHESIZEATOMIC, etc) as well as different macros if you to initializer a value that defined in a super via chained accessors. In some cases it is also necessary to implement the setter as part of macro. Personally I have only bothered to implement this and refactor my code for nonatomic retained synthesis, since that hit ~95% of the cases in my code.

    This could be made a lot nicer by implementing a source->source transform in clang and extending the existing synthesizer syntax, but I have found that adding custom extensions to your compiler to make your source code nicer tends to carry a very high cost in the long run.

    Update: changed source code highlighting to Syntax Highlighter 2.0.

    Saturday
    Apr182009

    Why Time Capsule is doomed to suck

    Update: There is some new information and some good news at the bottom of the article.

    One of the features that was introduced with Mac OS X Leopard was Time Machine. I use Time Machine constantly. I make sure my laptop (which is my primary machine) is backed up before I do anything particularly risky, like running tools that modify my drive, or taking my machine out of the house. That way I know that no matter what happens there is a safe copy of my data waiting at home.

    The problem is that Time Machine is not automatic if you are a laptop user. I need to walk over, plug my laptop into a drive, and then wait while it runs. On my system it usually runs quickly, but it is still requires me to getting involved with the backup. It would be better if it could automatically backup across my wifi network. Apple supports network Time Machine backups between Leopard machines as well as selling a backup NAS product, Time Capsule. I have a Time Capsule and a Leopard desktop machine that I use as an AFP server, but I have given up on using either of them for Time Machine backups, since they have corrupted my backups multiple times. Unfortunately the current Time Machine over network implementation is fundamentally flawed and will never work correctly.

    How Time Machine Works

    Time machine works by literally cloning your drive into a subdirectory of another drive. If search for it on google you will find references to HFS+ hardlinks and metadata, but those are all internal implementation details to make it run with acceptable performance. If you drill down into a .backupdb bundle you will see several folders, and each one of them is a complete clone of your system at a specific point in time, minus any folders you have chosen to omit.

    This is great in many ways. In particular, it means that all the applications that use Time Machine don't have to pull the files out of some archive format to work on them. That lets Finder navigate through them quickly, and hand them to third party QuickLook filters. It also means that that any filesystem that those files are stored on must support all of HFS+/HFSX's features or there will be a loss of fidelity. By fidelity I mean precise accuracy of all details of the file data and metadata, including full name (in whatever encoding your volume was using), extended attributes, permissions, acls, forks, etc.

    Historically most filesystems have not been able to store a file originated on an HFS volume with full fidelity (that is why Apple used to tuck data in ._ files, they were used to stash all the data that would be lost), though that has been getting better in recent years. While losing some info might be okay when transferring a file to a foreign computer, it is never okay for a backup system to lose that kind of information. Because fidelity is such an issue and Apple has to use a filesystem that supports all HFS+ and HFSX's semantics Apple generally creates HFSX volumes for time machine volumes, since they can store content of both HFS+ and other HFSX volumes with no loss.

    Backing up to a network

    Okay, so in the local case Apple copies files between two drives, and it works great. Once you move to networks things get a lot more complicated. Besides from the reduced speed, most people are using laptops via wireless. Between the increased length of the backups, and the transient nature of the connections it is much likely that you will have an interrupted backup (though that can also happen with a local disk based backup, people love to just unplug drives...). Also, unless you are using something like iSCSI you can't directly use HFS+ on a remote disk, so something has to change. There are a couple of obvious solutions, all of which have drawbacks.

    1) Use a network filesystem

    This would be an ideal solution, if not for the filesystem fidelity issue. There are currently no network filesystems in wide usage that preserve all HFS+/HFSX semantics (particularly if you include the directory hardlink "implementation detail" of Time Machine). Of course Apple has its own network filesystem, AFP, which it could rev to support features it needs. There are two major problems with that. The first is that that most network filesystems leak the semantics of the underlying filesystem. For instance, some SMB volumes preserve case and some don't, and that is a side effect of whether or not the filesystem of the server preserves case.

    So even if Apple revved AFP, the best they could do is guarantee that AFP served from HFSX using their server software would have HFSX semantics. Second, a large number of devices use embedded AFP servers on completely different OSes and FSes. There is no way Apple can know how netatalk on a consumer NAS serving files off of ext3 will handle things, but it is a good bet it will not match the semantics they depend on. So Apple would need to either block all 3rd party devices, or implement some sort of mangling in Time Machine to try to preserve all attributes in a way that would be durable. Since everyone hated ._ files the first time, that seems like a bad idea.

    2) Use iSCSI/ndb/AoE

    Time Machine already works with an HFSX backup disk connected via USB, so why not just connect the disk over the network. That would certainly solve any potential fidelity issues. The problem is that it introduces a completely separate set of issues. When you lose a network connection while doing a file transfer via a network filesystem the behavior is deterministic. The last files you sent over got there, the next ones you were planning to send didn't, and the one you were in the middle of might be there or not depending on exactly what happened, but you can pickup where you left off once you check that one file.

    Disk drives aren't that simple. Since your machine is directly responsible for the block allocation it goes through the entire driver stack, just like it was a disk. It does io scheduling, block layout, etc. When you cut a network connection it is the equivalent of pulling out a USB cable without unmounting the drive. Mac OS X complains when you do that, because it can lead to data corruption. Most of the time it doesn't, but it is much more likely to if you are in the middle of writing stuff. Now take a situation where the cable is ethereal, it gets cut everytime your computer is put to sleep, and it is only connected when it is actively backing up files (doing lots of writes). It is a recipe for unrecoverable filesystem corruption on your backup drive.

    The fact that Apple does not include support for any of these technologies in OS X or its embedded storage products certainly does not improve the case for using them.

    3) Use a custom protocol

    This is what commercial network backup systems do. It lets them deal with disconnects in a sensible way, and they don't care about filesystem fidelity because instead of storing files 1 to 1 they store the backed file as a blob in a database somewhere, and can store all of the attributes about it in their database. This is a lot more work to implement because now everything in time machine is no longer accessible through the normal filesystem interface. Depending on exactly how they implemented this they might be able to do it on a network filesystem, a raw network block store, or they might need a custom server.

    What Apple actually did…

    Okay, so those are the 3 obvious options. I left out things like "Design a whole new local and new network filesystem from scratch" as pie in the sky and not doable in the short term, though those are certainly options. Apple did not take any of the 3 obvious choices. Instead it did something allowed them to approximate solution number 2 using their existing technology stack. In short they used HFSX disk images stored on AFP volumes.

    The problem is that doing that has all the downsides solution number 2. Every time you put your computer to sleep midback up it is like pulling the plug of a HD mid backup. Except that the drive is connected over a slow connection, and is thin provisioned (which makes it seem larger than it is), which makes actually preforming fscks on it completely impractical, so they have to be omitted or reduced. And disconnects happens quite frequently, so the OS does not pester you about disconnecting the drive. It is even worse because it is doing it over a network filesystem, which adds a whole extra layer of indirection and other issues.

    If there was some way to make this solution work it would also mean there is a way to make it safe to randomly unplug hard drives. Trust me, if Apple knew how to do that it would be done, and the OS would not chastise you for doing something stupid when you unplug your USB pendrive without telling it first. Since they haven't figured out how to let you safely unplug USB drives unannounced it seems like a bad idea to base a backup solution on what is in essence a wireless USB cable that is phasing in and out of existence.

    Update:

    There are have been a bunch of great comments, but I want to call attention to one from Dominic. While my recent lost backup occurred even with all the newest updates, the backup was created before the latest software update or Time Capsule firmware. It is entirely possible the original corruption happened a while ago, but only lead to data loss recently. It sounds like if everything you are using is up to date and your backups are not already corrupted then everything should work. I am creating a fresh backup right now in order to test it out.

    If you have not updated you should make sure you are using at least:

    Time Capsule 7.4.1 (thanks to gerritvanaaken for pointing out I had the wrong version listed) AND Mac OS X 10.5.6 (10.5.0-10.5.6 Combo Update)