Twiiter

Twitter Updates

    follow me on Twitter
    Search
    Powered by Squarespace

    Entries in Programming (4)

    Thursday
    08Oct2009

    Flash on the iPhone

    Update: One of the developers of Trading Stuff posted a comment below, and has a great blog post about his experience getting it running on the iPhone.

    There has been a lot of discussion about running Flash apps on the iPhone over
    the last few days. It was precipitated by Adobe's announcement that Flash
    Professional CS5
    would have support for publishing apps as iPhone native
    executables. They went into a little more detail, saying that they were going to
    use an Ahead of Time (AOT) compiler backend based on LLVM, and that
    there are already several apps on the store using it. This generated a large
    number of responses from various people, some knee-jerk, some well
    reasoned out. Of course, the fact that there are samples we can dissect means
    that it is possible to make some informed analysis about them.

     

    Personal views

     

    Before we get into the technical details of this, let me go into my background a
    bit, lest I be accused of being biased or having an agenda. I have around 10
    years of Objective C development experience, and almost no experience writing
    anything substantial with Flash or ActionScript. I am also primarily a Macintosh
    user, where the Flash experience (even in the browser) is often less than ideal.
    I am told it is a better experience on Windows. I do use the Hulu Desktop app, which is written in Flash, and think it is pretty nice (though
    they should make cut and paste work in their text fields, grrr). Of course,
    something like Hulu is an immersive app that has no need to integrate with the
    native OS experience, but neither do most games.

    There are a lot of neat little web games and what not that are written in Flash,
    that I would like to run. If that was the tool the author felt it best to
    express their ideas and it worked for them then great. In particular, for
    software that I don't feel needs interface with OS and which can use a
    completely custom UI (in other words, games) I think there is no difference to
    the user. So long as the environment generates good code that can run at full
    frame rate without killing the battery I am in favor of getting Flash apps to
    run on the iPhone.

     

    Warm up

     

    Okay, so first off lets look at what we know. Adobe is using the LLVM arm
    backend to generate code. Right off the bat that gets me bit worried. Don't get
    me wrong, I love LLVM. Hell, I was one of the people who wrote
    the LLVM ppc backend. Having said that, I wouldn't use the LLVM arm backend at
    this time. The reason I wouldn't is because every time I have asked the
    people who are actively working on it they tell me it is not ready for prime
    time. That is the reason why LLVM-gcc and clang are not supported compiler
    targets for iPhone, despite them being supported (and encouraged) for development
    on Snow Leopard. Apple has a lot of compiler engineers and has basically stated
    that LLVM is their future compiler direction, so if those guys are telling us to
    hold back, then how is it good enough for AOT Flash?

    Now, Adobe could potentially have another arm backend they developed, or maybe
    they branched off a particular build and have fixed whatever bugs would impact
    them while ignoring stuff they didn't need. It is entirely possible that they
    could have reasonable code coming out of this thing, and I don't have access to
    the toolchain itself to inspect it, but it definitely gets me nervous. This may
    also be one of the reasons why they are not ready to widely release the tools
    yet.

     

    Lets get to it

     

    Okay, so I downloaded one of the games that was available, Trading
    Stuff
    , decompressed its IPA and had a look inside. At first
    glance it looked like a pretty normal iPhone app. Then I noticed there were no
    resources besides a basic MainWindow.nib. No images, no sounds, no
    localizations. The next thing I noticed was that the binary was ~13 megabytes,
    or approximately ~95% the size of the entire app. That is enormous for a binary.
    For reference, compare that to a normal iPhone game, like The Oregon
    Trail
    , which is ~106 megabyte game has ~1 megabyte executable, or
    about 1% the size of the app.

    What is going on is that the Flash build environment is not using any of the
    standard Mac OS X/iPhone OS bundling or localization mechanisms. Instead they
    are transforming all their assets into embeddable objects and shoving them
    directly into their application's TEXT section. At first glance that might not
    seem so bad, but it has a bunch of consequences. It defeats almost any sort of
    caching or prefetch logic the OS has for specific data types (like images), and
    instead places all of the pressure directly on the VM and paging subsystems.

    To be clear this is no way violation of the SDK agreement, and embedding objects
    into an app in this way is occasionally appropriate, but the degree to which it
    is happening with these cross compiled apps is different, and likely will have a
    number of significant (negative) performance implications.

    Now, moving on from the obvious, its time to actually start poking at the
    generated code. If we just take a cursory look at the linkage, we can see some
    bad stuff going on here. Why are they calling dlopen, dlclose, and dlsym? The
    only reason to use those is load in frameworks and resolve symbols after launch,
    something that is strictly verboten. In the best case that might be some dead
    code they use from debugging that should have been stripped that should have
    out. In the worst case, they depend on them to get access to symbols they are
    not supposed to use. I want to be clear about this: There is no legitimate
    reason why any app that follows the terms of the SDK agreement should use these
    functions
    , and I find it shocking that Apple lets apps that link against them
    on the store at all. I should also note this is not exhaustive, those are just the most obvious things, but in my cursory inspection I saw a dozen or so objective C selectors that I believe are private.

    Actually that brings up another point. Despite the app developer doing nothing
    wrong, one of their toolchain or middleware vendors is doing something that could
    be an issue. When I write apps for the store I might choose to play fast and
    loose with something if there is a compelling reason, but if I am providing a
    library for someone else I never do. Using private APIs is putting your
    customer's apps at risk. Not only do I find that unacceptable,
    but I think any vendor who does that is generally irresponsible and makes it me
    hesitate to use any of their other products because I feel it shows a certain
    casualness about how you treat their customers. If there is a legitimate reason
    you need to do something that risks your customer's products, then as a company
    you need to disclose it so your customers can make an informed decision.

    I should note this is not an intrinsic issue with Flash, I know for a fact
    certain major vendors ship iPhone libraries that call APIs that can get your app
    rejected from the store without informing developers. For instance, various analytics companies
    really shouldn't be poking at private APIs to try to find cached location
    framework data. It isn't just a privacy breach, it places your clients apps at risk.

     

    So what. How does it run?

     

    On my iPhone 3G it runs really choppy, on my 3GS it runs acceptably, but it
    still isn't smooth. Given the OpenGL performance people have seen on the 3GS
    that is still pretty bad. I have not done any invasive tests by instrumenting
    the binary, that is just what I can get via basic usage. The sad thing is that
    there is no reason it has to have performance like this. This is not an inherent
    issue with the ActionScript used in this app (though that may have issues), it
    is that what is coming out of the toolchain is a huge, monstrous binary that stresses
    the runtime and has performance characteristics completely different than
    anything the iPhone is currently setup for.

    Also, remember, the slower the frame rate the more work the phone is doing per frame, and the
    more battery the app is using. When you see an app that can do 120 FPS in its
    demo loop, that means that when it is running at 30 FPS it is using ~25% of the
    CPU/GPU assets. When you see one that can only get 20 FPS that means it cannot
    hit 30 FPS to clamp at despite maxing some or the costly (in terms of battery)
    system assets.

     

    Punchline

     

    Technically speaking, these do appear to basically be within letter of the SDK
    agreement, modulo the fact that Adobe appears to making private API calls. They
    should be able to do what they need to without making those calls, so ultimately
    that should be a non-issue.

    Now, the notion that what this thing emits is indistinguishable from
    something Xcode emits is laughable. They are very different, and not in a good
    way. While the apps may get acceptable frame rates on an iPhone 3GS, they don't
    on earlier hardware, and they almost certainly use substantially more
    battery power than native games.

    I want to be excited about this thing, both because it is a seriously cool
    piece of tech, and because there are Flash games I would like to run on my
    phone, but looking at what this thing is spitting out I think the apps it will
    generate will perpetuate the stereotypes about Flash (especially on cell
    phones), and give Objective C programmers a (somewhat misplaced) sense of
    vindication about their views on Flash.

    This is all still in beta, it could end up a lot better than it currently is. It
    could be something that can make some great games available on the iPhone.
    Unfortunately looking it right now I am very skeptical, and I think that is the
    right position to have given Flash's performance elsewhere. Yes, this is
    entirely new technology, but it comes from the same company with the same
    priorities. Given the product they have delivered to me on my desktop for the
    last 5 years they don't get benefit the doubt, they have to pull themselves out
    of the doghouse as far as I am concerned. Come on Adobe, prove me wrong!

    Thursday
    10Sep2009

    Apple has opensourced libdispatch

    Apple has opensourced the libdispatch Mac OS X implementation. This is excellent news, and hopefully we will quickly see ports to other platforms. This is a the last core piece of GCD that was proprietary, so it is a very exciting piece of news.

    Wednesday
    20May2009

    What do you get when you mix threads and red black trees?

    I was reading through Robert Sedgewick's 2008 paper on left leaning red black trees last week, and I had a sudden flash of inspiration. Of course, the flash of inspiration had nothing to do with my current project, so I was going to set it aside. Unfortunately I mentioned it to Rob, who convinced me I should actually implement it. Sometimes I wonder what happens to my weekends...

    This post is pretty tech heavy and I am going to assume people reading this know a binary search tree is, and have heard of a red black tree. If you haven't this may be pretty harsh read.

    Anyway, I was thinking to myself about how when would implement LLRB trees in Lisp. When I looked at must of the implementations out there they used commands that broke referential transparency, so I spent a bit of time thinking about how to implement the trees using immutable objects, and I came to a realization, a copy on write LLRB is a substantially more powerful data structure that would solve a lot of problems. I just had to implement one.

    Consider the below red black tree:

    before_insert.png

    In the above image the top number in each node is the nodes key, the bottom number is a reference count for how many other nodes point to that node. So long as the root node is held it will stabilize the entire subtree of nodes underneath it, and since the structure is immutable it is safe to walk through them from multiple threads without taking locks so long as we know we hold a reference to the root node.

    Now imagine we want to do an insert. Since we cannot mutate the objects since other threads may be walking around in them, we instead figure out what changes would need to happen to the tree during the insert, and clone the appropriate nodes, creating an alternate root for the tree:

    during_insert.png

    You will note that most of the nodes remain exactly the same as before, they are the same color, have the same value, and the same children. The new root node keeps the entire left subtree, but it does have to increase the reference count on the 2 node. Similiar relations happen through out the whole tree. Once we complete building this new root we can substitute it for the old root atomically (it actually requires a lock to bump the ref on the extant root node because of some arcane issues with atomic CPU operations), and all new accesses will go through the old tree. Once all extant users of the old tree release their references to the old root it is deallocated and cascades down the releases, leaving us with the final, updated tree:

    after_insert.png

    Okay, so we have this fancy new data structure, so lets do something fun with it:

    #include <pthread.h>
    #include <libkern/OSAtomic.h>
    
    #import <Foundation/Foundation.h>
    #import "LGTSMutableDictionary.h"
    
    LGTSMutableDictionary *dict = nil;
    
    void *dumper(__unused void *unused) {
      while(1) {
        NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
        uint32_t x = rand();
        [dict setObject:[NSNumber numberWithInteger:x] forKey:[NSNumber numberWithInteger:x]];
        [pool drain];
      }
    }
    
    void *sweeper(__unused void *unused) {
      while(1) {
        NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
    
        for (id key in dict) {
          [dict removeObjectForKey:key];
        }
        [pool drain];
      }
    }
    
    int main(int argc, const char * argv[]) {
      NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
      dict = [[LGTSMutableDictionary alloc] init];
      pthread_t dumper_thread;
      pthread_t sweeper_thread;
    
      (void)pthread_create(&dumper_thread, (const pthread_attr_t *)NULL, &dumper, (void *)NULL);
      (void)pthread_create(&sweeper_thread, (const pthread_attr_t *)NULL, &sweeper, (void *)NULL);
      [pool drain];
      sleep(120);
    
      return 0;
    }
    

    Conventional wisdom says the above code is really, really bad.

    1. We have a thread mutating collection while another is reading it
    2. That mutation going on while we are in the middle of a foreach style enumeration
    3. And we are mutating the underlying collection inside of the iterator, that isn't safe even single thread

    Guess what, LGTSMutableDictionary handles all of those things correctly, it just works, and it is based on the data structure above. So let me try to explain what is going on. The way the thread safety works should be pretty obvious, it does mutations as described above, then swaps the root node of the tree in that instance of LGTSMutableDictionary. If multiple conflicting mutations happen whichever one is committed first happens and the other are rolled back and then redone against the new tree.

    The mutation safe iterators are possible because they take advantage of the copy on write nature of the data structure. When an LGTSEnumerator is created it stashes a reference to the root node of the tree it was created against. That means so long as it uses that root node it has a stable view of the data set, no matter what mutations are happening. Once the collector is released it drops it reference, and which point that view of the world disappears, but in the mean time all the changes it was making have been live to all new accessors of the data structure. That is also why we can safely remove elements from inside the for loop itself.

    So, here are some of the other facts:

    1. It supports NSFastEnumertion, but because of some limitations calling break inside the loop can cause a leak. As a work around you can create a keyEnumerator and pass that into the for loop.
    2. Lookups are pretty fast, and on MP systems it handily beats locking and looking up things a hot NSMutableDictionary
    3. Inserts are about 10x slower than NSMutableDictionary on a single thread, that number gets better the more CPUs and threads there are contesting the dictionary
    4. -copy and -mutableCopy or O(1) operations, they just copy the root node of the tree
    5. The code doesn't have unit tests, though it does have internal functions for validating trees
    6. The code also can generate Graphviz commands to visually debug trees (the above pictures are generated from debug code)
    7. The code has never been used for anything, it is a weekend hack job, so it does have bugs
    8. There are some obvious cleanups and refactors that should be done
    9. I am not guaranteeing anything I put up at the moment is totally working, but I thought it was interesting enough that I should post it and see what people thing

    10. It is up on my github if you want to take a look

    Thursday
    23Apr2009

    Good programs are lazy

    Once upon a time my job title was "Performance Engineer." Back then my job was to make things go faster. Now for every piece of code the particulars of how you do that varies, but at the end of the day almost all code optimization is fundamentally about figuring out how to do less.

    It seems fairly basic, things run faster when you are doing less, when you need to send less data, when you need to walk through smaller structure. Except for hiding latency (making a long process appear look shorter actually without making it shorter, incrementally displaying results for instance) almost all optimization is about figuring out how to do less.

    That is why code being lazy tends to be a big win. By lazy I mean procrastinating, putting off anything it can do until the last moment. If you need an array for some operation why should you ever initialize it until the first time you try to do that operation. Of course that is just a rule of thumb, in some cases there may be responsiveness or latency concerns that require you to build some structures ahead of time, but people tend to default to precalculating things, and special casing lazy initialization when their profiling tells them something.

    The problem with that is that individually most of the time non-lazy initialization is not a huge a deal, but as Jamie points out, small performance issues everywhere do add up. Not only do you burn fewer cycles, but I have found that if I am aggressively making my code use lazy initialization techniques it tends to reduce ram footprint and heap fragmentation, because any time some precalculated object turns out not to be used it is never allocated in the first place. On small devices with no paging (like an iPhone) that can be be a significant improvement

    So why don't people write lazy code? Well, it is harder. That's right, being being lazy can be more work. The reason it is harder is more of a mindset and tool issue than a fundamental one. It is a systemic problem, and moving to being consistently lazy code requires a systemic solution. After I found myself doing the same pattern repeatedly I decided to codify it.

    Imagine you had an Objective C class with an init like this:

     


    - (id) initWithString:(NSString *)string_ {
    if ((self = [super init])) {
    cache = [[NSMutableDictionary alloc] init];
    derivedStringValue = [[self expensiveCalculation: string_] retain];
    }

     

    return self;
    }

     

    In that code we are allocating a cache array and doing some expensive calculation at object creation, but we don't know for sure the user is going to require us to use either of those. So, how would we rewrite that lazily? Well, here is what I used to do:

     


    - (id) initWithString:(NSString *)string_ {
    if ((self = [super init])) {
    string = [string_ retain];
    }

     

    return self;
    }

    @synthesize derivedStringValue;

    - (NSString *) derivedStringValue {
    if (!derivedStringValue) {
    derivedStringValue = [[self expensiveCalculation: string_] retain];
    }
    return derivedStringValue;
    }

    @synthesize cache;

    - (NSMutableDictionary *)cache {
    if (!cache) {
    cache = [[NSMutableDictionary alloc] init];
    }
    return cache;
    }

     

    Okay, that code doesn't allocate anything until we need it. In exchange it is no longer safe to directly access the ivar, you always need to go through the getters (i.e. [self derivedStringValue] or self.derivedStringValue). There are also added complications if you have some dependent values, you need to be careful their initializers don't form a loop. We don't have to worry about initializing the ivars to nil since the Objective C runtime zeros out newly allocated objects. Generally this works well, but it is more code and it is no longer clear what is initialization code. Since we use synthesized properties, if we call the generated setter it will bypass the initializer, which is equivalent to manually setting the ivar. So the question is, can we do better? Since I am writing a blog post about it, I must think the answer is yes.

    First, lets look at what is common between the two initializers, and try to isolate it:

     


    #define LG_SYNTHESIZE(name, type, initializer) \
    @synthesize name; \
    \
    - (type) name { \
    if (!name) { \
    name = (initializer); \
    } \
    \
    return name; \
    }

     

    Now, using that macro lets rewrite the lazy version:

     


    - (id) initWithString:(NSString *)string_ {
    if ((self = [super init])) {
    string = [string_ retain];
    }

     

    return self;
    }

    LG_SYNTHESIZE(derivedStringValue, NSString *, [self expensiveCalculation: string_]);
    LG_SYNTHESIZE(cache, NSMutableDictionary *, [[NSMutableDictionary alloc] init]);

     

    Now that is much better. Ignoring the macro, it is about the same length as the original, and it is clear what is initialization code. One interesting thing to point out is that in many cases doing this allows you to eliminate the -init method. This example requires it because we need to stash an a parameter, but in general any class that can be initialized with -init (as opposed to -initWithFoo:) can be written with lazy initialization so that you don't have an explicit -init (of course some super class will implement -init).

    So, by coming up with a few macros, working out a few rules, and always using accessors over ivars there is a lazy init pattern for Cocoa that works fairly well. I have found that there is a need for several different synthesis macros, basicly corresponding to Objective C property attributes (LG_SYNTHESIZE_ATOMIC, etc) as well as different macros if you to initializer a value that defined in a super via chained accessors. In some cases it is also necessary to implement the setter as part of macro. Personally I have only bothered to implement this and refactor my code for nonatomic retained synthesis, since that hit ~95% of the cases in my code.

    This could be made a lot nicer by implementing a source->source transform in clang and extending the existing synthesizer syntax, but I have found that adding custom extensions to your compiler to make your source code nicer tends to carry a very high cost in the long run.

    Update: changed source code highlighting to Syntax Highlighter 2.0