Post-Mortem of the TiVo Crack Distributed Computing Project

Edwin Olson, December 2002

1. Introduction

In October 2002, TiVo released the version 3.2 software patch for their personal video recorder. Previous versions of the TiVo software include a backdoor password which enabled special features. For some reason, TiVo has not generally made these passwords available; rather, TiVo users have worked to discover the backdoor password themselves.

Early on, the backdoor password was in plaintext on the TiVo's harddrive. The only effort required was to know where to look. However, starting with version 3.0, TiVo began encrypting the password by using a one-way hash.

I was able to find the password used in version 3.0 with a brute force search. In general, such a brute force search would be doomed to failure, but TiVo had a tendency to use weak, short paswords. In fact, the original code I hacked together could find the correct backdoor password in less than a minute. (I've since been told that other TiVo hacking groups beat me to it.)

When version 3.2 came out, hopes were high that the password would be found quickly and easily. Unfortunately, this did not happen. For about a week, I tried using my personal machines to search the keyspace, but never found the solution. After discussion with other AVS forum participants, we decided a distributed approach was our best hope.

This article is about the distributed system I built, with a focus on what I did right, what I did wrong, and some observations about the participants and attempts to abuse the system. Please forgive any grammatical or organizational failures in this document. My goal was to record my thoughts before I forgot them, but I'm too busy with MASLab to put this document into more polished condition.

2. Server Architecture

2.1 Server Architecture version 1

The server design was motivated by a desire to have a working system as soon as possible, both because others were waiting for it and because this was only a side project. I expected that we would get a handful of participants from the AVS forum, perhaps totalling 100. Thus, I wasn't very worried about scalability.

I knew there would be "work units" which would be handed out to different users. I decided that I would just use a big mysql database, with a row for each workunit. The easiest way to get data in and out of mysql over the network was to use a bunch of php scripts.

There were two basic scripts: getwork and putwork. Getwork grabbed the workunit that was unfinished and assigned longest ago and gave it to the client. Putwork marked the workunit as finished. There were also a number of administrative scripts used to populate the database with workunits.

I made a decision at this point: there would be no 'user' table. Each row in the workunits table would have a string for the user that completed it. To tally the number of workunits done by a given user would require crawling over much of the table. The goal of this was to make getwork and putwork very simple; they would only need to operate on the workunits table, rather than having to update some user.workunitscompleted field. Consequently, there was also no user authentication; the worst one user could do would be to give credit for their workunits to someone else. I logged IP addresses for accountability.

I underestimated how important statistics would be for users. I viewed this as a double-edged sword-- users might get excited about their statistics and hence devote more computing power to the project, but for the same reason, they might be tempted to abuse the system to falsely inflate their ranking. In this first version, there were almost no safeguards against cheating. Ultimately, I provided some statistics capabilities, though users always wanted more!

The client software was written in C. I wanted to distribute the source code, rather than a binary, so that 1) users could inspect my code to assure themselves that it was not a trojan, 2) users could find errors possibly affecting correctness (none were ever found), and 3) I would not have to maintain a large collection of binaries. I expected only a few devoted TiVo enthusiasts to participate, so I was not terribly concerned about malicious modifications to the client code.

Still, I started to think about how to improve the system-- mostly to make it more secure. I also started thinking about finding a better machine for the server software; I didn't think my ISP would be terribly crazy about me hosting the server over my residential cable modem (which was unreliable anyway!) The server didn't use much bandwidth, but it did push the server load up high enough to be an inconvenience since the machine also handled my email.

Then, we got slashdotted.

Overnight, we reached 2000 machines and the server stayed up... mostly. It took a bit of a beating. But most of all, now that the project was visible to "the general public", and not just the TiVo enthusiasts on the AVS forum, I was much more concerned about abuse.

Yet, I didn't want to change the server software too quickly; the changes I had in mind for version 2 would not allow older clients to run on the newer server. This was a problem since, thanks to Slashdot, there were now 2000 installations of the version 1 code! I expected that the average Slashdot reader might be curious enough to install one version of the client, but would probably say "to hell with it" if there was a mandatory upgrade. I didn't want to lose the computing power! So, version 1 lived on for some time before I made version 2 public, and even then, I left the version 1 server running until the number of clients on the version 1 server had dwindled down.

A lesson for the future: I think I would preemptively ask people NOT to post on Slashdot when a "new and improved" version is just around the corner.

2.2 Server Architecture version 2

The first server was holding up pretty well, so I didn't want to do anything terribly dramatic. My main goal was to improve the abuse resistance of the system. (I also rewrote the client to make it easier to use and more portable.) There were a few basic changes:

Reissuing Workunits

I wanted the new server to support issuing workunits more than once so that results could be double checked.

Nonces

Each workunit was assigned a nonce (random number). The getwork script would hand out this nonce with a block and putwork would check that the client had the correct nonce. This prevented clients from easily submitting results for blocks they had not been issued.

I had originally wanted to hide from the client the row id of the workunit they were working on, since a client could conceivably guess what ids correspond to what workunits by examining the pattern. (Workunits are created in batches, thus similar workunits have similar row ids.) The risk is that a malicious user could "smell out" a regression test or a reissued workunit in order to know whether to do the work or not. Or worse, if TiVo could guess which row id corresponded to the correct solution, they could report the workunit as finished without reporting the key; we'd never discover this!

A malicious client could evade detection longer by properly computing the proof for blocks whose row id is quite old and therefore likely to be a "double check" of a previously completed workunit. Introducing a new random id would solve this problem, but it ultimately proved difficult to do a batch-import of workunits while ensuring the uniqueness of their randomized ids. Ultimately, the risk of exposing row ids did not seem worthwhile since it would only slightly increase the amount of time it would take to detect a modified client.

In summary, the nonces basically ensure that clients can't freely pick which units they work on/submit results for; they have to be issued a workunit in order to report a result for it.

Proofs

The client computed a "proof" that they actually completed the workunit, rather than just reporting back "nope, we didn't find the answer" immediately. Each workunit involved computing millions of SHA160 hashes. The Proof was defined to be the XOR of the first 32 bits of ALL the SHA1 hashes that were generated in the work unit. This means that each workunit had a different proof (with high probability), but that the proof for each workunit was also deterministic.

The getwork script handed out workunits more than once, meaning that different clients would compute the proof for the same workunit. If the proofs didn't match, one of the clients was up to something! The getwork script was configured to hand out a previously completed workunit about 5% of the time. In other words, about 1 in 20 blocks got double checked. Any user who was trying to inflate their statistics by cheating would have to "complete" a lot more than 20 blocks in order to be successful.

The proof mechanism doesn't protect against someone computing the correct proof but still lying about the answer. If the correct key was in a workunit completed by a client that actually did all the work but always reported 'false' anyway, then the distributed effort would be unlikely to notice. I believe it is unlikely that someone would be malicious in a way that requires as much CPU time as being honest. Of course, if I were to find out that a particular user was cheating (they confessed, for example), I could simply mark the workunits they completed as uncompleted and the server would reissue them.

Client Self Tests

I was worried about clients being modified by well-meaning users who unintentionally broke the client. Even something as innocuous as a compiler optimization flag can cause incorrect code to be generated on some platforms. Thus, the client included a basic self test. The self test eventually grew into a benchmarking option, and users could report their client's speed via a web form.

Cached Statistics

Generating statistics was a pretty slow process, since data for any particular user was spread throughout the table. Dynamically computing the statistics on-demand was infeasible, so the statistics were periodically computed and cached.

It's also interesting to consider what didn't change. The basic "one really big table" design remained, with one row per workunit. Despite its failings (fairly slow statistics gathering, redundancy of data in the table), it was a simple design that didn't take much effort to make work.

2.3 Version 2 Scalability

The version 2 server handled the load of about 1500 clients (peak) fairly well for quite a while. The server had a table in it with one row for every workunit. The search space consisting of 9 unknown characters contained a little over 50,000 workunits. With our tremendous computing power, we comleted the 9 character space quite quickly. The 10 character key space however, has 1.8 million workunits. All of a sudden, the table had a lot of rows.

The mysql server held up quite well even then, until those rows started filling up with data. About 300,000 workunits later, the server became sluggish. Clients frequently timed-out while trying to get or put a workunit. Manually optimizing the table and restarting apache seemed to help somewhat.

The need for manual intervention increased as time went on. Ultimately, when the table had grown to about a million full rows, I gave up; it just couldn't cope any more.

I attribute the slowdown to a couple different reasons:

Unnecessary database overhead. I didn't really need a relational database for this. I could have used fixed-length records accessed by C using structs, for example. But having the database allowed me to generate some very interesting statistics that would have been a lot of work any other way.
Overly complicated getwork/putwork algorithms. Getwork tried to give the client the oldest workunit that had never been completed. Even with table indices, this operation gets slower and slower. I'd assume it's O(log n). There's no need for the client to get the very oldest workunit-- an approximate lookup would have been sufficient. Putwork was simpler, but still performed a couple queries to verify the nonce and proof (if applicable) before updating the database to give credit to the appropriate user.
The statistics generation--even when only computed every few hours--became a significant source of load on the server.

An interesting statistic: only about 62% of workunits issued were ever completed. Thus, optimizing issuing workunits a bit (versus completions) might be a win

3. Abuse

I'll never know how much abuse occurred in version 1, but I suspect there was little of it, if any. It would have been fairly easy, considering that the client was open source and the lack of anti-abuse measures, to rapidly receive credit for 100,000 work units. This would be enough to become one of the top 10 users, but nobody mysteriously racked up workunits like this (to my knowledge.)

Version 2 was intended for a wider distribution, and I was a bit more paranoid. There was a small amount of abuse that I was able to detect. (Obviously, I can't report on abuse that I didn't detect, except to say that I don't know of a way for someone to cheat without it being detectable.)

I detected one case of a client reporting a solution on a bogus block. But this was a case of "Hey, the source code is available and I feel like playing a fun gag!", and the faked solution was obvious (and apparently obvious on purpose). No harm done.

There were perhaps a hundred proof errors overall. A bug in a particular version of the windows client could generate incorrect proofs under some circumstances, and since client versions were all logged, the affected workunits were reassigned to other users.

Most cases of incorrect proofs seem to have come from machines that were suffering from hardware problems rather than running maliciously modified client. Whether the problem was overclocking/overheating, or defective RAM I don't know. I spent some time with one or two of these users, sending them particular binaries--known to work on my machine--to determine whether their problem persisted. This ruled out a compilation problem. It's conceivable that they could have fooled me, but their willingness to spend time debugging the problem is inconsistent with them pulling a prank. The affected users were ultimately "blacklisted"-- the next time their client connected, the server would send a special message indicating that it should shutdown. Blacklisting required manual intervention which took little effort, but in a larger project might need to be automated.

Several companies used the client to burn-in new racks of machines and reported that the intense CPU load generated by the client did, in fact, cause some computers to crash. This was a win-win situation: the project benefited from a surge of processing power, and the companies were able to find troublesome components before they were deployed.

About half of the proof errors were never investigated due to time constraints. It is possible that they were caused by improperly modified clients, but I doubt it. I suspect the remaining proof errors were caused by faulty machines, as above. They did not occur in large enough quantity to meaningfully impact the user statistics. The best remaining explanation is that the machines were unintentionally faulty-- an evidently common condition.

Thus, the proof mechanism proved essential to ensuring the integrity of the computation, but for reasons entirely different than I would have guessed. I should have worried more about bad hardware than malicious users. Either the users that would have hacked the client were disuaded by the presence of the proof mechanism or malicious users are fewer in number than I would have guessed.

The users were given the opportunity to contribute the performance of their system while running the client. Anybody could enter data via a web form, again with no user authentication (IP addresses were, however, logged.) Abuse, while relatively harmless in this context, did occur. Of 242 reports, there were at least three different obviously bogus entries with highly exaggurated performance results (by an order of magnitude or more) or fictitous CPUs reported. Another 10-20 entries are suspicous, but are probably the result of typos or improperly run tests (perhaps reporting total performance on an SMP machine when per-processor scores were requested, or running other CPU-intensive processes in the background.) In general, these entries were easy to detect. I thought about using a linear regression model per CPU family to validate performance (rejecting entries which were too far off), but the problem was never bad enough to warrant the effort.

Since the web form did little to validate the input, I expected a great number of entries that were essentially equivilent. For example, it would be ideal if all users with a Pentium 4 processor would type "Pentium 4" and none would type "Pentium IV". This way, sorted results would list all Pentium 4's together and make it easier to navigate the results.

To encourage uniformity, the entry form lists all previously submitted values in a pop-up. Only by selecting "none of the above" can a user type in a different value. This seemed to keep the amount of meaningless variation down fairly well, though some users still felt the need to include unsolicited information, such as the voltage of the processor! Ultimately, I believe this user-interface design was successful.

My conclusion is that most prankers have a fairly low effort threshold. Submitting bogus data via a web page requiring no registration/authentication is apparently easy enough. Reading and understanding source code enough to make a simple modification, such as skipping the CPU intensive work, is evidently more work than most are willing to go through.

4. If I were to do it again... (Conclusion)

The basic structure of the server is fairly adequate-- scalability is the only real problem. In particular, I think that the abuse-prevention methods were effective and perhaps worth emulating in other projects (the proof mechanism in particular.) We also kept enough data around in order to generate meaningful and interesting statistics.

There are a couple of alternatives:

Ditch the relational database and use a flat file with fixed-length records in it. This would be fast, but you couldn't use fancy SQL queries to generate statistics. Instead, you would need to "accumulate" data in other data files. For example, when a user puts a workunit, increment that user's record in a "users" file. Knowing which workunits to handout next could be done by keeping a pool of suitable workunits, querying the big workunits file only when the pool is low.
Ditch the idea of keeping around data for every workunit ever completed. Use a relational database (it's going to be small in size, so why not?) to track only the workunits that are currently issued to a client. When a workunit is completed, remove it from the queue and credit the user. When a new workunit is needed, call some program that knows how to make the next workunit. This would be very fast, but a lot of useful information is lost. If you found out (after the fact) that a particular user's client was buggy, you could not reissue the blocks that the client had submitted since you would have no record of which blocks they had completed. You'd have to keep an append-only log, so that the system could be manually recovered from such disasters. These disasters are probably rare enough that manual intervention is acceptable (at least for a project the size of tivo crack).

Distributing an open-source client did not prove to be a problem; quite the contrary. I got many helpful bug reports and patches sent to me which I was able to integrate into subsequent revisions. Because of the proof mechanism, we could safely share the source. This is vastly preferable to relying on the "impenentrability" of stripped binaries, which only delays a motivated hacker anyway.

The server code was not open-source while it was in use. This is a matter of pragmatic security-- a case where obscurity does work. While I'm not aware of any vulnerabilities in the server software (and I took care to avoid creating any), it would have been an unnecessary risk to publish the server software. I wasn't willing to risk this, since a rooting of the server would have been a major inconvenience for me.

There was a cost, of course: if I had made the server's source code available, it is possible that others would have been able to help me improve the performance of the software without a major code overhaul.

Footnote

I wish to thank the TiVo AVS community for helping me with this project. It was a tremendous learning opportunity for me, and it was pleasure learning from so many of you. While we never found the key, I think we all had fun trying.

Biography

Edwin Olson is a second year Computer Science PhD candidate at MIT. His interests include robotics, artificial intelligence, and violin. He lives in Cambridge, MA and can be reached at eolson@mit.edu.