Friday, May 15, 2009

On Performance and Scaling

Finally, here's what you've been waiting for...

Performance is about getting to the bone with a butcher's knife to get that last bit of flesh (sure, it takes me great effort to say that as a vegetarian...so savor and ruminate this key point). It is about getting the last drop of milch from that udder. Think instruction counts. Think about the shortest path between two points. Think about latency and real-time systems.

Scalability, interestingly, lands you on a different planet. If you had 256 cores underneath, can you get 256,00% utilization? If this is of principal interest to you first then don't worry too much about performance as defined above. Use coarse grained locks, look at the wait time of locks (how long do you have to wait to get inside mutually exclusive code), and worry about pruning your critical sections (mutex protected code) later. Scalability is about throughput, mainly. Performance, is roughly, about latency. Eventually we will combine both and have some real fun.

To get the jargon right - we will use the term performance for a single thread of execution, and scalability for the "parallel performance" of multiple threads of execution.

Designing MT code from scratch can be a bitch if you haven't trained your mind to do so. In my company, PixBlitz Studios, we have some vision engineers (not hard-core programmers but good scientists and mathematicians) solve the algorithm correctly using simple single-threaded logic. Once the algorithm has been nailed, it is psuedo-coded and then parallelized. Sure, there are trade-offs in taking this route, the positive one being that the vision engineers can produce some kick-ass algorithms without worrying about the systems side of life. All algorithms are not naturally parallelizable. Knowledge of this fact can greatly reduce risk and agony. There are cases, however, where such a non-parallelizable problem may start looking interesting (say, to someone like my colleagues Arvind Nithrakayshap or Tudor Bosman from my past lives, or even me for that matter) especially when we know that solving it has much value. Exceptions apart, it is important to solve the problem for correctness before multi-threading it. Before you kill me...

Even in the general case of a solvable MT problem, just because you have nailed the single-threaded algorithm does not mean you can readily multi-thread it. A good approach is to start from a clean sheet of paper and write the parallel algorithm using the single-threaded algorithm in psuedo-code. A sound understanding of the algorithm and the underlying problem domain can quickly give a good sense for what parts can we written as concurrent code and what is intrinsically non-parallelizable. In the difficult case of non-parallelizable logic, it is helpful to understand the effect of sync points (points in the algorithms where some/all threads must synchronize) on the parallel logic. When writing parallel (or scalable) code the sync points play a key role in determining the scalability of the system.

You'll often hear me say - what is the effect of the fast path on the slow path? If the slow path has a systemic feedback loop into the fast path of the system you have a pathological problem to deal with and should be going out fishing instead.

Phew, was that much to digest? Let's focus on the easy case...the code is naturally parallelizable with no frickin gotchas. Should you be worrying about the execution time of a single thread of execution? Or go after the lock wait times? Put coarse locks in your code and use 256,00% CPU utilization first? Let's nail these questions first. Haha. We can then come back with the pruning shears and start shaving off instructions!!

Without being too cynical we must wait to milch the instruction count udder until we actually need to do that. In all likelihood your code is running on today's latest multi-core CPUs and GPUs; contrary to the urban legend - where the Gods of Performance will write you an eulogy for a lifetime of instruction pruning - your time is best spent on ensuring that your algorithm is parallel before going after instruction counts of a single thread of execution.

Let's leave it at this for now.

To be continued into many blogs from now on with real examples, code, etc.

The Philosopy of Performace and Scaling

For those of you who have worked with me (or currently do), it may be better to read through my non-technical blogs (business, CEO type stuff), mainly because there is nothing new here than what you may already know through our technical conversations.

We will get a lot into the meaning of the terms - Performance and Scaling. This blog is the start. When I have time on hand I'll try to compile some of this and hopefully much more into a book. I believe there are two ways to write a book: keep taking detailed notes as you do development of some hard-assed problem in computer science, or, teach a class in systems design and programming. The latter suits me better as I'm too selfish with my time when I'm writing code. Coding is very beautiful and to get distracted is to give an ounce less to the thing on hand (which is what all these blogs are all about :)).

I'm excited to see The Art of Multiprocessor Programming by Maurice Herlihy and Nir Shavit. I'll share my thoughts when I have read this book. But flipping quickly through it I see much out there where we can value add in terms of getting those great algorithms a run for the money on some of the coolest h/w coming out in industry.

What a shame seeing some good computer science talent coming out of school wanting to manage people or do business development. Embarrassing as that statement may seem given my life as a programmer today (Not!), I ask you - the brilliant CS student - what do you go to school for if you don't want to step out and give what you've learnt a road-test? If the CS degree was just a stepping stone for landing a well-paying job in a completely different domain then you are doing the right thing getting into business; rest assured, the world of building "beautiful" will not miss you. The economics of being a programmer v/s a manager or biz person are similar to that of a masonry brick layer v/s the contractor; sure, there is a scalar multiplier involved in terms of actuals $$s involved :). But some of my most successful engineer friends have done pretty damn well financially, compared to their management counterparts, and contributed immensely to academia and industry. I'm excited to say that the $$s business has never come in a significant way (haha) of my love for programming.

To conclude, performance and scaling is not a job IMHO. It is a way of life. Marrying s/w to the underlying h/w to get the most out of performance and scaling is a natural by-product of well written code. It does not require a "performance engineer," to run through your dirty laundry after you have written code, to "speed" things up. We will focus on becoming well-rounded engineers who do not have a surrogate relationship with processors, compilers, Instruction Set Architectures, machine architecture, etc. As we go higher through the s/w stack and away from silicon we will use the same building blocks that we used when close to bare metal. The problem is not very different when building a distributed system or a loosely coupled cluster than when designing for a symmetric shared-memory MP system with snoopy or directory based caches. Neither is it different when writing a compiler for a processor pipeline. The key is to distill the mathematical basis - once you have that down you are in a powerful position to build any vector. I may go years before I've discovered something fundamental that can become a part of my synchronization toolchest, thereby adding to my basis.

Concurrency and synchronization are an intrinsic part of any large real-life system. It is good to train ourselves on home turf, take baby steps, and set ourselves for bigger challenges over time.

Watch Over Your Angels!

Sure, as much as you'd like your heavenly angels to shower you with divinity, as an entrepreneur you may be blessed to be seed funded by angels - who are your fans, believe in you, and cheer you by motivating you to strive for the best. A good angel will get you off the ground, be there for you when you need them the most, and continue to be there for you till the very end.

If your company needs big money to grow and scale (remember that a company can be a healthy angel funded company never requiring VC money) then VC money is what you need next. The angel to VC transition follows the line of money; it is a business like banking and the stock market. Pampered as you may be by your angels, consider the honeymoon to be over when you seek institutional investment in your startup - you will need to get more serious in all aspects of business and execution.

Raising VC money will require more rigour. You will need to create a business model and slide decks that may need to show hundreds of millions of dollars in value that you can create. Look closely at your VC term sheets and seek help in trying to understand the terms. There can be a lot of subtle stuff in there. This is the time to watch over your angels. You angels will love you no matter what, but in the no-matter-what case if they are "wiped out," the angels (the ones with gossamer wings) shall weep.

Tuesday, May 12, 2009

Time To Market

The importance of getting the product to market on time cannot be understated. There are very few technologies that can stand the test of time in this fast moving market. Assume that there are others out there who are after the same business opportunity using similar technology (even a 1/2 star entrepreneurship book on amazon.com will say that). The difference between the first mover and the laggard (missing the first mover advantage when that was the original goal may politely be referred to as business validation if that helps ease the pain) can cost the business opportunity itself. Validation, however, has its advantages and may, in fact, be good for a new business model that needs to be proven. If the market is large, there may be room for other players after the first mover has demonstrated that the model is for real. A large market dominated by a few players (ever play Monopoly?) is not the same as a large market with hundreds of customers. This is a key factor that needs to be incorporated into the risk equation (OK, if you are looking for the answer, it is 1.0) stated below.

The challenge with a new business model is getting investors to believe that this is even possible. Those words are like music to the ears of a VC. Not. Haha. Add to this the difficulty of proving that the technology works...something that starts to become extremely critical when the technology itself is non-trivial. Waiting longer for the technical milestones can greatly reduce investment risk. But when the level of difficulty in building the technology is high (read as - money is needed to fund development and prove or disprove that the technology works), a slowdown in technical execution that is under-fueled financially can have serious business consequences.

So, what mathematically appears to be a zero sum game - the sum of technology and business risk is 1.0 using the Real Number notation, the savvy business mathematician knows where to draw the line (or place the fulcrum). For example, a technical winning team of a dozen years is likely to continue to be a winning team - a Warren Buffet Wall Street stock market investment principle (or so I'd like to believe), or analogous to Newton's First Law if you wish to be scientific...don't mess with it and it will stay its course!. More investment fuel for developing/validating technology may prove or disprove faster that the technology works or not, at the same time, giving the upside of being a first he last drop out of itmover that may be essential to the success of the company.

So, go ahead, milch that udder, and get the last drop of albino martini out of your business execution.

Wednesday, May 6, 2009

On the Business of Technology

Aside from my innate love for technology, the main reason for being a programmer has been to have the ability to say - Let's send man to Pluto, and then go write the code. Exciting as this technology trip has been, the last couple of years as a CEO have been an awakening of sorts.

I have now augmented the dream-and-build axiom with a corollary - What is the business of sending man to Pluto? Cool as the rocket science may seem there is no money in this space (never mind the pun). The algorithm as an entrepreneur is about creating value for the customer, building a team of highly motivated people, and giving a return on the money to the investors (without the investment we would not have the opportunity to build). The business of technology appears to be a lot harder than technology itself.

The truth of the matter is that both technology and the business of technology go hand in hand. Operating in a vacuum is probably the worst mistake an entrepreneur can make. After having built "beautiful," in a vacuum (that has a Silicon Valley shelf-life), the second application of our technology is being built to specifications from a major broadcast customer. How cool is that! Now add to it the deep science and challenging engineering. "Cool" is a natural magnet for the brightest minds in academia and industry. High quality work motivates the best. Channeling my team's energy and effort into something with significant business value is my challenge, and I'm beginning to love it.

Finding a real world application for cool is the single most important step in the direction of success for an entrepreneur. Let's call this my secret to creating a healthy company of happy people. It has been good learning.