Twiiter

Twitter Updates

    follow me on Twitter
    Loading..
    Loading..
    Saturday
    Jun062009

    Off to WWDC

    It is that time of the year again, and I am off to WWDC all the fun new technologies I will be able to use in a year or two once I can require Snow Leopard ;-)

    I also wanted to give a brief update on the Time Capsule situation. I have been running it since my original post, and I have not seen any corruption. I have purposefully caused numerous disconnects and other failure cases, and while it has been possible to get the system into a state where it takes several hours to do deep scans, I have not been able to reproduce the data corruption. I still think there are a lot of improvements that can be done in order to make Time Machine over a network a better experience, but I no longer worry about the data integrity.

     

    Wednesday
    May202009

    What do you get when you mix threads and red black trees?

    I was reading through Robert Sedgewick's 2008 paper on left leaning red black trees last week, and I had a sudden flash of inspiration. Of course, the flash of inspiration had nothing to do with my current project, so I was going to set it aside. Unfortunately I mentioned it to Rob, who convinced me I should actually implement it. Sometimes I wonder what happens to my weekends...

    This post is pretty tech heavy and I am going to assume people reading this know a binary search tree is, and have heard of a red black tree. If you haven't this may be pretty harsh read.

    Anyway, I was thinking to myself about how when would implement LLRB trees in Lisp. When I looked at must of the implementations out there they used commands that broke referential transparency, so I spent a bit of time thinking about how to implement the trees using immutable objects, and I came to a realization, a copy on write LLRB is a substantially more powerful data structure that would solve a lot of problems. I just had to implement one.

    Consider the below red black tree:

    before_insert.png

    In the above image the top number in each node is the nodes key, the bottom number is a reference count for how many other nodes point to that node. So long as the root node is held it will stabilize the entire subtree of nodes underneath it, and since the structure is immutable it is safe to walk through them from multiple threads without taking locks so long as we know we hold a reference to the root node.

    Now imagine we want to do an insert. Since we cannot mutate the objects since other threads may be walking around in them, we instead figure out what changes would need to happen to the tree during the insert, and clone the appropriate nodes, creating an alternate root for the tree:

    during_insert.png

    You will note that most of the nodes remain exactly the same as before, they are the same color, have the same value, and the same children. The new root node keeps the entire left subtree, but it does have to increase the reference count on the 2 node. Similiar relations happen through out the whole tree. Once we complete building this new root we can substitute it for the old root atomically (it actually requires a lock to bump the ref on the extant root node because of some arcane issues with atomic CPU operations), and all new accesses will go through the old tree. Once all extant users of the old tree release their references to the old root it is deallocated and cascades down the releases, leaving us with the final, updated tree:

    after_insert.png

    Okay, so we have this fancy new data structure, so lets do something fun with it:

    #include <pthread.h>
    #include <libkern/OSAtomic.h>
    
    #import <Foundation/Foundation.h>
    #import "LGTSMutableDictionary.h"
    
    LGTSMutableDictionary *dict = nil;
    
    void *dumper(__unused void *unused) {
      while(1) {
        NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
        uint32_t x = rand();
        [dict setObject:[NSNumber numberWithInteger:x] forKey:[NSNumber numberWithInteger:x]];
        [pool drain];
      }
    }
    
    void *sweeper(__unused void *unused) {
      while(1) {
        NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
    
        for (id key in dict) {
          [dict removeObjectForKey:key];
        }
        [pool drain];
      }
    }
    
    int main(int argc, const char * argv[]) {
      NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
      dict = [[LGTSMutableDictionary alloc] init];
      pthread_t dumper_thread;
      pthread_t sweeper_thread;
    
      (void)pthread_create(&dumper_thread, (const pthread_attr_t *)NULL, &dumper, (void *)NULL);
      (void)pthread_create(&sweeper_thread, (const pthread_attr_t *)NULL, &sweeper, (void *)NULL);
      [pool drain];
      sleep(120);
    
      return 0;
    }
    

    Conventional wisdom says the above code is really, really bad.

    1. We have a thread mutating collection while another is reading it
    2. That mutation going on while we are in the middle of a foreach style enumeration
    3. And we are mutating the underlying collection inside of the iterator, that isn't safe even single thread

    Guess what, LGTSMutableDictionary handles all of those things correctly, it just works, and it is based on the data structure above. So let me try to explain what is going on. The way the thread safety works should be pretty obvious, it does mutations as described above, then swaps the root node of the tree in that instance of LGTSMutableDictionary. If multiple conflicting mutations happen whichever one is committed first happens and the other are rolled back and then redone against the new tree.

    The mutation safe iterators are possible because they take advantage of the copy on write nature of the data structure. When an LGTSEnumerator is created it stashes a reference to the root node of the tree it was created against. That means so long as it uses that root node it has a stable view of the data set, no matter what mutations are happening. Once the collector is released it drops it reference, and which point that view of the world disappears, but in the mean time all the changes it was making have been live to all new accessors of the data structure. That is also why we can safely remove elements from inside the for loop itself.

    So, here are some of the other facts:

    1. It supports NSFastEnumertion, but because of some limitations calling break inside the loop can cause a leak. As a work around you can create a keyEnumerator and pass that into the for loop.
    2. Lookups are pretty fast, and on MP systems it handily beats locking and looking up things a hot NSMutableDictionary
    3. Inserts are about 10x slower than NSMutableDictionary on a single thread, that number gets better the more CPUs and threads there are contesting the dictionary
    4. -copy and -mutableCopy or O(1) operations, they just copy the root node of the tree
    5. The code doesn't have unit tests, though it does have internal functions for validating trees
    6. The code also can generate Graphviz commands to visually debug trees (the above pictures are generated from debug code)
    7. The code has never been used for anything, it is a weekend hack job, so it does have bugs
    8. There are some obvious cleanups and refactors that should be done
    9. I am not guaranteeing anything I put up at the moment is totally working, but I thought it was interesting enough that I should post it and see what people thing

    10. It is up on my github if you want to take a look

    Monday
    May112009

    In Defense of Data Islands

    Every so often various smart people ponder why are still using the desktop metaphor, a paradigm that is almost 30 years old. It seems that we should have figured out how to do things better. I am sure everyone has stories about how their family or friends still don't understand how files and folders work.

    Recently Steven Frank made a post discussing this issue, and mentioning a number of systems that had attempted to use alternate interfaces, the big commonality being that they had they generally abandoned hierarchical filesystem. I actually agree with this point, I think the reason the desktop metaphor is so hard to move on from is that it is an incredible way to model an hierarchical filesystems, and as long as we design systems that expect the users to interact directly with the filesystem we are going to be using a desktop metaphor for that interaction.

    Where I think I disagree with Steven is that it appears he thinks there is some implicit problem with devices that are "data islands." Data islands are totally fine. The problem with the Newton wasn't that it was an island, the problem was that the interconnect tools for it sucked. Living on an island works great when there are ferries, barges, and ship routes. When you have to move an SUV between two islands and all you have is a canoe then living an an island can seem like a burden.

    The important thing to understand is how the methods we use to exchange data has changed. When the Newton was released not every computer was networked, and most Newtons were not. In order to exchange data between computers you generally would use a floppy disk. That meant that the computer reading the data had to know the filesystem on the floppy disk, and how to read the data in those files. Transferring data that way is problematic because it means the data may not have been generated with intention for other software to read it. It makes the entire process more complicated, especially when two similiar pieces of software store slightly different pieces of data. These impedance issues can cause some amount of data loss, and if you talk to people about exchanging files you will hear many stories about losing periphery information (such as text formatting or some obscure fields in address records), so anecdotally this appears to be true.

    The main difference is that we don't exchange data in that way nearly as much any more. Nowadays the vast majority of data exchange is done over the network, and in that case there are two involved parties that can negotiate about how to transfer their data based on their capabilities. The other difference is there are a lot more formats designed explicitly for interchange. Things like vcard and ical are not designed to be efficient ways to store thousands of addresses, they are designed as a common interchange format. No serious contact manager internally stores their data as a vcard, almost all of them store it in an embedded database that is proprietary to that application. When you design formats in that way it is also possible to design them so that apps that don't understand some pieces of the data can at least preserve it with full fidelity. In other words, AddressBook is already a data island, the rest of Mac OS X just has phenomenal transit service to and from it.

    So I think we can get past the desktop metaphor, and I think it will be done by designing systems that do not have a filesystem as an exposed concept for the user. Personally I would love to see a modern desktop framework designed without any filesystem at all. I think the biggest issue to doing it is no longer any technical issue (though there are still some major technical issues), but rather the latent fear of data islands the industry has developed due to older, failed systems.

    Monday
    May042009

    Information durability

    I grew up in the 80s, which probably colors my views on many things. It was early enough in the "digital age" that some of today’s ubiquitous technologies had not been invented, or at least they were not common. When I think about the past I don't just think about how things have changed, but I also think about how starting before that technological cut off has created some persistent differences between myself and people 5-10 years younger than myself.

    A few years ago I was sitting with a friend while she was going through old files on her laptop. She is a year older than I am, but she had been meticulous about preserving and migrating her data as she bought new computers. She had essays she had written for grade school in the late 80s or early 90s. While I might have some backup tapes (that I no longer have hardware to read) of that age, I certainly would never be able to find any of my middle school essays.

    Now, the value of my middle school essays is almost nothing, beyond a bit of humor. I certainly don't miss them. Sometimes I wonder about people I knew back then, or the exact details of some incident my memory has muddled over time. I didn't have a home email account until 1994, and I didn't have a digital camera, webcam, or any of the plethora of information recording devices that are common place today. I often wonder what sort of treasure trove of information that might be, if I had well preserved data from those years. A few emails from back then would probably include names of long forgotten friends, and enough information to find them on sites like FaceBook and LinkedIn.

    Lets have some stories about your childhood

    Why, my childhood was boring... you want proof?

    When I was in high school I had friend who was a year or two younger than me. We weren't especially close, but I would sometimes drive her home from school and we would hang out. Recently I was driving down a road close to the house she lived in, and it got me thinking about her and some other high school friends. I am certain I never emailed her (and any email that old is sitting on tapes or is gone), we didn't really have many mutual friends, and I couldn't remember her last name. If it had been really important I could have gone scrounging through some old year books, but those sorts of faint recollections happen a lot, so I just let it pass. If I had had some sort of easily searchable (electronic) records that I thought would yield a hit I would have done a quick search, but it wasn't worth coming through physical unstructured data to satisfy some mild curiosity. A few weeks later I had a pleasant surprise when she sent me a friend request via FaceBook.

    On the other hand, there were two children who lived a block away from me when I was in elementary school, Melissa and Bennett. I don't remember their last name, but I walked to school with them everyday, and probably spent 10-20 minutes talking with Bennet most days. I still recall his vivid descriptions of Zelda to me. I didn't have an NES, and one week he described various elements of the game to me. Somehow his descriptions of the game really brought it to life for me although I had never heard of it until he mentioned it. Arguably his descriptions of Zelda are the reason I bugged my parents to buy me an NES. Bennett and Melissa moved to California when I was in second or third grade. They gave me their new address on one of their last days at school, but I lost it almost immediately. Every year or two I wonder whatever happened to them, but I don't recall their last name (do we see a theme here...), and the information I have is certainly not enough to find them through casual means.

    The fact that they moved away in the eighties means any information they may have entered into FaceBook was long after I last saw them. I have no emails or data from back then, nor do any of my friends. Most people don't list their elementary school and years of attendance on FaceBook, especially for an elementary school their didn't complete because they moved 3,000 miles away! While I could certainly track down their last name if I really wanted (I suspect my elementary school probably has old class photos, or I could look at public records for the house they lived in and find their parents names) but it just isn't worth that kind of effort. Maybe they would have a blog that has interesting stories or they would be a great business contact, but if I put that sort of effort into every person I lost contact with I would never have time to get anything done.

    What is the point?

    The point is that I, like many people, save almost everything digital. Not explicitly mind you, I just have so much space on my computer and my email accounts that it is not worth the effort to look back through mail to determine what is worth keeping once it has passed out of my initial scrutiny. I eventually move it to archives, but I do that in bulk (everything for 2007, for instance), and it is just organizational, not actual offline archival media, searching it is trivial. Thanks to Time Machine the same is true of backups, and given the size of my data compared to the size of storage media I expect that to continue into the future. I think we are reaching an age were won’t tend to throw out data, and that data gets tagged or sorted as we use it. All those annotations will be preserved as long as long as we keep the data, a life time of semantic information.

    When I look at sites like FaceBook, LinkedIn, Flicker, or any other sharing and social networking sites the accuracy of people's data does not appear to be correlated to how recent it is, it appears to be correlated to whether they entered it close to when the events were happening, or long after the fact. All of my childhood predates this sort of semantic tagging, and even if I were to expend an enormous effort to try to fill in a ton of old information it would still be quite limited because many of the people in my past social circles would not. People in middle school, high school, and even elementary school are creating durable, semantically annotated records of their lives, both privately and through social sites. If such resources had existed both of the above stories would have been "I was curious about how this person I knew a long time ago was, so I typed in a few searches and found out."

    The fact that people are entering data as they go and can easily look at their history changes things. I am not sure if that is good or bad, but I think it is significant, and I am certain it is different.

    Monday
    Apr272009

    Maybe doomed was too strong a word

    When I wrote my first post about Time Capsule it was mostly a rant so I could point friends to it instead of going over it again. I am naive, I know once you put something out on the Internet it is there for everyone to see. I didn't mind people reading it, I tweeted that I had written it, but I didn't expect to get fireballed. The honest truth is I was still fiddling around with blog templates an doing changes that were temporarily making the blog unreadable when a friend texted me asking "Are you the one writing /dev/why!?!" My response was "Yes, and how did you hear about it?"

    In my first post I might have been overly harsh, and I probably focused on one particular issue too much. To be fair, I have had multiple corrupted backups, and I feel that does entitle me to be a bit harsh. On the other hand some very talented people have spent a lot of time trying to make Time Capsule into a product that finally makes it pleasant for users to backup their data. Given that most users are unwilling to expend any effort backing things up it is probably worth cutting Apple some slack, even if it has been a bit of bumpy ride getting there.

    So what about the issues?

    I spent a lot of time focused on the issue of data reliability. As Drew and Dominic pointed out that is a bit of redherring. I acknowledged in the comments that it was more an issue of stacked complexity, that the deeper things get the more complicated and harder to trace they are, and that Apple chose a relatively deep stack (HFS+ on Mac OS X -> a disk image on Mac OS X -> AFP client on Mac OS X -> AFP server on NetBSD -> HFS+ filesystem on NetBSD). In general doubling the number of components in a stack like that more than doubles the complexity, at least in terms of catching all the weird edge cases you need to make it reliable. After the number of corrupted systems I had over the course of a year I just assumed it was too complicated a setup to make work. This was reinforced by several Software Updates that had the note "Improves Time Machine reliability with Time Capsule," but didn't solve my problems.

    As it turns out there were some fairly significant issues, but they were not insurmountable, just difficult to track down. After talking with several contacts I have a fairly good understanding of what was going on, and there was a bug that was fixed in 10.5.6 and the Time Capsule 7.4.1 firmware. If you are using a Time Capsule you should make sure you update to those if you have not already.

    So problem solved?

    Well, problem mostly solved. First off, until I have been running it for a while I can't be 100% confident about the fix, but some very smart people have told me it is fixed, and I am confident enough in their judgement to use my Time Capsule as my primary backup device. I have also been doing some particularly evil things (purposefully cutting the networking, pulling power on the Time Capsule and/or the Mac, etc). So far the backup integrity has withstood all of this, so I am mostly statisfied?

    Mostly?

    While the integrity of the data (which is what I focused on in the last post) seems to be assured now, there were some other issues with Time Capsule that I had neglected to mention in my previous post, because I got sidetracked on that one issue and the post was really long. Compared to the integrity issues they are minor, but they are worth mentioning, if only to help out other people.

    The first one is the while interrupting backups does not have the potential to corrupt backups any more, it can greatly increase the time the next backup takes. A full explanation of when and why that happens would take an entire blog post, but the short version is that if the disk image is properly unmounted it can correctly track what directories have had changes done to them, and if it is not unmounted it needs to scan substantial portions of the backup to figure out where it was when the volume was unmounted. For the most part that is not a big deal, but it does suck a bit when a backup takes 30 minutes to scan things instead of 5. Strictly speaking this a limitation of HFS+, not Time Machine or Time Capsule.

    Scanning large directories is just painful. You will notice in the above paragraph I said that when HFS+ is correctly unmounted the system maintains a list of directories that were modified, not files. For various reasons it is way too expensive to keep a list of all files that were modified on HFS+, so it tracks the directories that are modified, and Time Machine takes that last of directories and scans the files in those directories for changes. It is a very reasonable compromise that works very well unless you have directories with thousands of files that frequently change. Scanning those folders can be very slow, and if you have to do it every backup it becomes an issue.

    My problem is that I use gmail. Gmail's IMAP bridge is kind of weird, but the big issue is that it doesn't really have mailboxes, but it emulates them by putting everything with a particular tag into an IMAP mailbox. That means your inbox contains every mail you have ever received, and the folder changes every time you receive a mail. It also means that most of your mails are duplicated at least once if you tag them. Since Mail stores every a ."emlx" file for every email in a directory that corresponds to the IMAP folder that means I have several folders with thousands to tens of thousands of emails, and INBOX that is immense, all of which frequently have new files added.

    As a result Time Machine and Time Capsule takes 3+ hours canning for every backup, more if the previous backup was interrupted. Once the backups get that long the odds of interrupting them are pretty high, so I got into a state of perpetually scanning and never completing a backup. To be fair, this is a combination of limitations HFS+ imposes on Time Machine, Gmail making some poor choices in their IMAP bridge, and Mail making some poor choices in their file storage, all three of which conspire to make for a bad experience. Fortunately it was simple enough to fix by excluding ~/Library/Mail/ from the backup. Since my mail is stored on a server backing it up locally is not strictly necessary.

    So, I have gone from doom and gloom to functional but with some problems. My initial backup took ~35 hours, and my Time Capsule's AFP server has wedged 3 times (but the Time Capsule itself kept running and it was possible to reset the just the AFP server by hitting "Disconnect All Users" in the admin tool) since I started using it again.

    I am told my blog posts get kind of long winded, so I am also posting a Networked Time Machine Best Practices post that just gives simple advice without all the wandering analysis and speculation. I plan to post another update after about a month of using it. Additionally, I imagine I will post another update sometimes after Snow Leopard ships just to look at what has changed. While no new features have been disclosed, Snow Leopard is supposed to be all about cleanup, reliability, and performance, so I am eager to see if there are any improvements over the current experience.