Showing posts with label postmortem. Show all posts
Showing posts with label postmortem. Show all posts

Thursday, June 21, 2007

Clues Postmortem

Sorry for the lag between my last postmortem post and now--I was taking time to get re-acquainted with *my life*.

Today I'll be talking about my favorite part of the game, the clues. We've talked about it a lot, and while good plotting and good sites are certainly appreciated, what coed astronomy tends to remember most about games and judge them upon is clues. So, when making a game of our own, our focus from the very beginning was on puzzles.

Answer Words

The use of palms, phones, magic wands, and word to location mappings a la BANG has really changed the nature of clues that can be used in games. In particular, there's a wider variety of encodings possible when the answer can be more arbitrary--things like encodings that require all answer letters to begin with a dash in morse or encodings where the answer has to have certain substrings.

But, even more importantly for NMS, it allows the clues to be completely divorced from the route. We *really* like writing clues. Scouting sites and planning a route... not so much. In fact, I would be willing to bet we had at least 15 clues written and playtested before we even began to talk about a route. On top of that, if clues came out poorly on the playtest because they were in the middle of the night, as long as they didn't have heavy site or plotting requirements, it wasn't too hard to swap them with other clues before the real game.

I realize that some teams don't like the weird disconnect between solving to an arbitrary word and having to do some sort of lookup to figure out where to go next. I also realize that our game made it even worse with first a phone entry followed by a laptop lookup. But for us, the clues are so much more important that we made an easy decision early on and never looked back.

Clue Types

We tried very hard to have a wide range of clue types represented in NMS. All teams have their likes and dislikes (we like word puzzles and hate data collection), and to create only a few types of clues would be to create a game that lacked broad appeal. We realized that this would be a problem pretty early on, when Jan, Yar, and myself compared notes and noticed we were all working on at least one word puzzle and had others in mind.

After that, we made a concerted effort to broaden the types of clues we made. Our big fear was that most of our clues would be paper clues, and teams would complain that they could have done all of our puzzles from the comfort of their own living rooms.

In several cases, we simply took clues that could have been done on paper and tried to translate them to a different format. For me, both Blinkenlights and Bugged (the No Morse Egrets) clue started out on paper and made the transition away from paper at various stages of their development.

As it turns out, we wound up being surprised with the number of non-paper clues we were able to put on our route. And doing the field offices means we had somewhere to put all of the extra paper clues we wrote. We hoped it wouldn't be too much of a letdown to get *another paper clue* at a field office, given that they were bonus clues meant for teams in the lead.

Playtesting

Playtesting is actually surprisingly hard to do, but it's something we got a lot of practice with while making our game. In particular, the difficulty lies in picking the optimal number of people to show the clue to. If you show it to a large group, you get great feedback, but you limit the number of playtest runs you can do because you quickly run out of people.

Given the number of core GC members and the number of people we could pull into helping us playtest, we found the optimal number of people to playtest a clue at once was often 2-3. Having a single person test is usually pretty rough, especially if the clue requires an "aha! moment". For any clue we were unsure on or thought might need a number of revisions, the creator would usually run it past their roommate/significant other with copious hints and then some subset of GC to test it on. This would hopefully save a couple of people for additional feedback or to test a revision on. Obviously some clues, particularly those where it's obvious what to do but takes some time to do it are better tested in larger groups.

We also had a sit-down playtest and a full route playtest, as well as roping in a few friends to test a final version of a couple clues. Thus, any clue that really needed it probably could have gotten 6 tests/revisions if need be. Of course, the clue that really needed it, the elements data collection clue, didn't get such treatment. Unfortunately, the clue went through multiple major revisions, and we simply weren't able to get enough testing. Eh, you win some, you lose some, I guess.

The one thing playtesting really didn't get us was accurate times for clues. When we were playtesting on ourselves, we didn't have a full 6. And as we found out, even the full route playtest didn't get us accurate times. For example, all of our playtest teams took a good hour longer on the Cellular Automata clue than the average team during the real game.

Route Clues vs. Bonus Clues

Going with the whole bonus site thing means we had a lot of clues to write, which was great. It also meant that we got to select the best clues to be on the route, which probably made the game a better experience for most teams, even if it meant that the top teams had to slog through some tedious puzzles.

Our decision process for whether to put a clue on the route or in the bonus queue was as follows:

  • If it's way cool, it goes on the route
  • If it's non paper or could be made into something non-paper, it goes on the route
  • If it requires the internet (the countries clue), knowledge the average gamer may not have (cryptics), or if it's a long grind (wordsearch), it goes into bonus land
  • For everything else, high spread clues go in the bonus queue, and low spread clues go on route
Wrap-up

So that's more or less what we did in the way of clues for our game. This post has seriously been at least 2 weeks in the making, and while I was really excited to finally be able to talk about all of the stuff we did for our game, the effect has kinda worn off by now. I loved running the game, but it was a lot of work, and it's nice to not have to come home with it hanging over my head. So, I think this may have inadvertently become the last post in my NMS postmortem, at least until I can look back on it with more nostalgia and less *oh my god I'm so glad it's finally over*.

Thursday, May 24, 2007

Server Postmortem (plus graphs)

Upon solving a clue, teams would call in the answer to an automated server, which would tell them where to go next, assuming they were correct. During the course of the game, the server received almost 2000 calls from 66 different phones. Here are some graphs of interesting data:


The phone load graph is particularly interesting, as you can see when the phone server blew up on the first clue, as well as when teams started to hit the Virus clue or when they left a bonus site.

Note: I am a complete gnuplot n00b, and so if you have suggestions as to how to make the graphs look better or suggestions on other data that might be interesting to graph, drop me an email at offpath@gmail.com.

The System

Well in advance of the game, we purchased VoIP service from VoicePulse Connect. They have a deal that's really nice for planning a game, in that the first 4 channels are only $11/month. This allowed us to work on and test the server for a period of 4-5 months for very little money at all. Then, a week in advance of the game, we upgraded to 8 channels, just in case we got a lot of teams on the Virus clue at the same time.

On our end, I ran a Linux box with Ubuntu, Asterisk, Apache and MySQL. When someone called in, a python script triggered by asterisk looked up their phone number in the mysql database, associating it with their team, as well as the clues they were currently on. All guesses and advancements were logged through the database.

Upon solving a clue, the server would lookup the next site on the route that was neither closed nor marked compromised and move the team there. At two points in our game, we had branches where teams followed different routes to reduce the load on certain sites. The server was able to dynamically route teams based upon how many teams it had sent down which route.

Since we were able to log everything in a central place, unlike a palm based system, GC could tell where all teams were headed at any given time. We were also able to change the route at a moment's notice if necessary. This allowed us to have backup sites in case of rain and to mark sites as compromised if something went wrong. Corey (of The Burninators) had told me how useful this would be, and I don't think it was until we were actually running the game that I realized it.

On the GC end of things, we had a set of mod python psp scripts running on my apache server. This let us lookup the location of teams and add notes every time they called. We also had a giant leaderboard, which for a giant table of very slowly changing numbers, was amazingly interesting to watch.

What Worked
  • As I said before, the server answered almost 2000 calls. That is 2000 calls that GC did not have to manually answer and gave GC a surprising amount of down time.
  • Both Twisters Gym and the Bank Heist were restricted in terms of the number of people we could have on the clue at a given time. In this case, we split teams across 3 and 2 sites respectively. Handling this sort of routing would have been very difficult manually, but it happened seamlessly through the server.
  • At several points, we had to have backup sites in case it rained or in case we couldn't use a building at Stanford. We actually had to use the backup site, and we had to change site closing times in a few other cases. Each of these actions was pretty easy to do over a web interface.
  • As with most recent games, having an automated system allowed us to use arbitrary words as answers, making it easier to use various encodings, and making clue writing mostly independent of route.
  • Because the leaderboard was a website, it was accessible over the internet and all GC members out in the field with an internet enabled cellphone could see where all of the teams were and how long they'd been there.
What Didn't
  • The server on the first clue. The last 5 teams to leave Plaza Del Sol had to be manually routed because a hoard of rabid squirrels attacked my server. I've poured over the logs generated by asterisk, my python scripts, and mysql, and for the life of me, I can't figure out what happened. Somehow, a runaway mysql process began eating 100% CPU, and for lack of a quicker fix, I had to restart the whole server. After that, it worked fine--go figure. Then I had the fun task of cleaning the bad data that got entered and manually fixing things to route the teams on the server where we had told them to go over the phone.
  • The server basically tied me to my apartment. We had generally planned to have me around GC for most of the game, but after that first snafu, it became clear that I really couldn't leave. Despite being a team of too many coders, I was the only one familiar enough with my code to fix it if it broke. If we had it to do over again, I'd have more actively tried to distribute the server knowledge.
  • There's nothing like 20 teams running through the game to test the code. Obviously, I should have written more tests, but I write code all day, and as much as I love coding, I don't always get home itching to do more of it. Other than the big crash, Here Be Dragons was incorrectly skipped over charades. Fortunately, the leaderboard acts as a manual double-check. I actually got calls from 2 GC members out in the field before I was able to fix this. All I have to say for myself is that 3-value logic is a scourge that should be cleansed from the land.
Wrapup

We really liked the server. It was a lot of work before hand to put it together, but it really paid off on the day of by allowing us to do creative re-routing on the spot and by taking some phone load off of GC. It had it's bugs, but none of them were fatal. I'd highly recommend a centralized server system to other new teams who have a coder or two on their team. It takes a good amount of uncertainty and guess-work out of the route.

No More Secrets Postmortem

Links to postmortem writeups:

  1. Server
UPDATE: Jan has also written a postmortem of our game on her blog. It can be found here.

As you may know, coed astronomy ran a Sneakers themed game this past weekend, titled No More Secrets. Some things went right, some things went wrong, and I'm going to slowly write up a postmortem for posterity and so that I can finally tell friends what I've been doing with the past year of my life instead of hanging out with them.

This is the first major game that coed astronomy has planned (as a team we have done 2 smaller games, and as individuals we have planned or run various other things in the past), and so before I begin to forget the details, I'd like to write out my thoughts on the process we went through. Hopefully what I write here will be useful to any other new teams who are considering running a game. I make no warranties that what worked for us will work for you or even that my opinions are shared by the rest of coed astronomy.

Over the next several posts, I'll take several different aspects of our game and try to go into detail about them and where they worked and where they failed. Since I have a somewhat personal interest in it, and since I can't presume to know how other people felt, I'll try not to comment on how I thought the game went for the players and instead stick to how it went for GC. So without further ado...