Now this is a story all about how my life got flip-turned upside down.
I’d like to take a minute — just sit right there,
I’ll tell you how chatrooms are the Prince of Bel-Air.
When I began researching, thinking always,
in chatrooms is where I spent most of my days.
I asked one little question and a sysadmin got scared…
Okay, well, he didn’t really get scared.
He got historical, and told me that we began using chatrooms at FreshBooks after a hackoff he did years ago. Receiving a direct answer like that encapsulates the magic of a simple chatroom: I asked a broad group of people about something and quickly received an answer from whomever had the time, knowledge, and interest to speak up. Chat systems are like water to fish: rarely examined, but integral to our experience — and if you’re not using named chatrooms, you’re missing out on something great.
Questions and Answers
That sysadmin was in #geek, an informal room we have for tech-related discussion which averages 20-25 users in our hundredish-person company. I asked, “hey everybody, who wants to regale me with tales of when and how we started using our chat system?” Around a minute later, our senior-most sysadmin said, “haha I was just talking about that — it was a hackoff of mine!” The discussion that followed gave me a lot of information, and many people chimed in when they had something relevant to add. No one was bothered by direct poking, we didn’t generate email clutter, and my problem was swiftly solved. Magic.
A different example comes from #dev, a more work-related discussion room which averages about 30 people. Someone asked why he got an error when piping a git command into grep. People came out of the woodwork and shortly figured out that he wanted “[colors] ui = true” in his .gitconfig, instead of “[colors] ui = always”.
We got robots
While that was going on, something else amazing was happening in the exact same place. A bot (named Hobbes) who watches the status of our builds reported a failure, and listed the commits and their authors since the last successful build. Everyone reading the discussion could see the failure, but no one needed to get an email to learn that the build broke. The specific people mentioned were also alerted by their clients that their name came up — which would likely grab their attention if they were looking elsewhere.
A bot named Bert sits in #release and acts as a queue-manager for hotfix patches. He’s also a cross-channel announcer: with a simple command, he can relay a message to every room he’s in. Another bot named Ernie does dozens of things. For example, he keeps track of streetcar times at the nearest stop, and reports on weather. He’s also a source of humour: there are commands to return custom image macros like ‘KHANNNNN!!!’, and any mention of ‘bees’ will link to this gem. Because Ernie is a little spammy, he only gets logged into chats like #geek and #fluffychat, where the topic of discussion is more free to wander. He is a modified version of github’s popular hubot, which we’ve found easy to make use of and extend.
We release a new version of our app every week. The day before a release, we hold a pre-release meeting where we make sure everyone involved is on the same page. The meeting involves a lot of people, follows a set routine, and many people only speak once in it, so it doesn’t feel worth gathering us all into a room. It’s also worth keeping a record of the output of the meeting, which is specific considerations and preparation instructions for the upcoming release. So we hold it in the #release chat room. That way, it’s instantly logged and bothers the minimum number of people for the shortest period of time.
During an actual release, the #release channel is watched closely by operations, QA, and every dev with code going into the app. Each step of the release is reported in the chat, and there’s no speaking in there aside from pertinent information. It’s an awesomely high signal to noise coordination tool.
Not-work things are fine too
There’s also a lot of value to chats beyond what’s directly work-related. For instance, jokes told in chat often elicit laughter throughout the office. Hearing a whole team start laughing when none of them said anything in real life is weird and so awesome. There are also off-topic rooms like #fluffychat which feature pictures of the internet’s cutest animals scrolling to the end of the universe. Our chatrooms keep conversation flowing throughout the office, and as a result the barriers to really valuable communication remain low.
So, you should use chatrooms
Our chatroom usage started as a day-long project by a single employee. People quickly saw its value, and it became a permanent, company-wide aspect of our lives. It’s simple, but it’s one of the things I like most about working here — it’s definitely worth a shot at at your workplace.
It’s pretty easy to get this kind of thing going. In the simplest case, just log into an existing IRC server, set up some rooms and bots, and you’re done. You could also run your own IRC or Jabber server, or sign up for a packaged alternative. We run ejabberd (an erlang jabber daemon) in-house, which gives us an automatic roster of each account and its status, and also keeps all of our internal chat under our own roof. If it’s important to you not to send your data out to a third party (like it is for us), you’ll probably want to run your own chat server.
I polled all my friends (about seven or eight),
and I learned from them (sadly) that they’re missing out here
you heard about our chatrooms, now we’re finally square,
they sit on the throne as the Prince of Bel-Air.
I’ve only been on the FreshBooks dev team for a few months, so the stories I hear about the bad old days when our releases were like picking a way through a minefield are only legend to me. But I gather we’ve come a long way, and I suspect that this was in large part due to our Release Retrospective meetings. It is in these meetings that our blunderings are turned into lessons so that we learn from our mistakes instead of making the same ones ad infinitum.
Curiosity. That is what I felt while I attended my first retro. It struck me as a strange ritual. In a small room, wherever there was room to stand, devs, QA, Product, Support and Ops crowded around a table strewn with Sharpies and Post-it-notes. These sticky-pads, some green, some red, were passed around. People began writing on them using squeaky markers, peeling the notes off and handing them to a person at the front. This person stuck them on the whiteboard—red notes on the left, green notes on the right. When the sound of scribbling and unsticking of stickies had tapered off, the person at the front went through each note on the board: peeling it off, reading it aloud, discussing it briefly with those present, and finally handing it to one or more people.
This is what was really happening during that meeting, and what happens during all our retros: the afternoon following our weekly release, those involved gather together. If someone thinks something went well during the release, they write it on a green sticky. If something could have gone better, it goes on a red sticky. Each note is read for all to hear. A green note is handed to the people responsible for the thing that went well. If it’s a red note, everyone is brought up to speed on what went wrong, and ideas on what could have been done differently are discussed. The sticky is then handed to whoever claims the responsibility of seeing that some or all of those ideas are put into practice.
That first retro probably didn’t last more than 20 minutes, but when I came out, I was sold. In exchange for 20 easy minutes of our week, the retro meeting gave us a host of benefits. Here are a few:
- Everyone learns from the actions of others. For example, one time a dev on my team discovered he would be unable to make it to the upcoming release. He backed out his changes that were scheduled to go out, just in case. This small act saved us a lot of headache that release when related changes caught fire. At the following retro, that dev’s name was on a green sticky, and we all heard why. That day I learned in a way that words can’t teach that sometimes it’s worth making the extra effort, just in case.
- Even the best devs are exposed as merely human. We do these retros weekly and mistakes are easy to make, so chances are each person in the room has been the cause of a red sticky at some point in the past. Being exposed to that knowledge routinely teaches us to admit to our failures more readily and with more grace—we know we’re in good company. This humility also makes us more willing to change when our current habits prove problematic, a quality that is so important when trying to get somewhere as a team.
- Achievements are honoured publicly. One of the devs here recently refactored a particularly nasty piece of code and put it under test. When his work was called out publicly at a retro, I felt so proud of him as I and those around me applauded him! I also suspect that knowing his work will not go unnoticed or unappreciated will mean he is more willing to take on similar beasts in the future.
- Cross-department communication happens! Let’s face it, fellow developers. Interactions with other departments can sometimes be a bit… well, sticky. After all, each department has its own goals and methods. As far as releases go, however, all departments are on the same page: we all want smooth, uneventful releases. That’s why 5 departments are represented at retro: it’s a time for different teams to talk to each other about how each department can adjust their processes in order to help make the next release go better.
Allow me to go meta on you and do a mini retro of the 25 or so retros I’ve been to. The process isn’t perfect: there are some red stickies that seem to keep coming back, week after week. But some red stickies that were common when I started have been replaced by green stickies: the routine “People late for release” has turned into “Everyone on time—again!”. “3 hotfixes this week” has turned into “5th week in a row no hotfixes”. As I become less of an onlooker and more of a participant in these retros, I feel more and more proud of the upward direction we’re taking. And that is something every team can benefit from!
Our first retro leader was Corey. Then it was Diller, whom we fondly referred to as Sexy Corey. Then Ashok came along, known as Sexy^2 Corey. Today he was sick, and Stinson, his substitute, had to bear the brunt of our disappointment with this red sticky.
People have fun during retros. This is The Mark Story drawing a goat on a green sticky.
I surround my monitor with the feel-good stickies I receive. The ones with an arrow in a circle are referring to the t-shirt I bought just for Retro from a shop in Montreal.
We deleted millions of customer database tables and nobody noticed. And that’s a good thing. It just took some thinking (a bit of reading), and 10 months of slow-but-steady work.
FreshBooks is a classic multi-tenant application. Customers sign up with us and live in their own little island of data (received invoices and contractors are treated like telephone calls to the next island). When the first version of FreshBooks shipped (and for many years after), every new account creation resulted in a new set of database tables being created. This worked well in the early days. Every time a web page was served by PHP, it would define the names of all the tables that that account needed to access, so you never had to use a WHERE clause to restrict data access to a single account. Early on in an HTTP request we converted an account name (e.g.
yourcompany.freshbooks.com) into a numeric ID (e.g.
123), and tables were named using that id (e.g.
But times change. FreshBooks grew, and with it our data. By 2010 we were living with MILLIONS of database tables (yes, literally on the order of 1E+6). Databases are designed to deal with a few thousand tables, each with millions of rows, not a few million tables with a handful of rows each! Daily database server backups started to take 18 hours. MySQL stores MyISAM tables as 3 or more files on the filesystem, and 3 times MILLIONS of files takes a very long time for the operating system to open and copy. Per-table metadata overhead was often larger than the data described. Data warehousing was also becoming horribly slow, and we were restricted to analyzing a fraction of our data to keep the impact to a minimum.
We also ran into MySQL design decisions that performed poorly on our schema. MySQL has a table cache that contains recently used file handles. The documentation suggests that the cache should be larger than the number of tables plus the number of concurrent accesses to those tables (each concurrent reader requires its own handle). However, with millions of tables that’s just not possible. The kernel wouldn’t let us hold that many files open, even if we had enough RAM. When we got bigger database servers a few years ago, increasing the size of the table cache from 100 000 to 200 000 file handles actually decreased performance because there’s an exclusive, global lock held during cache eviction, which is implemented as a naïve LRU. Doubling the size of the cache approximately doubled the time it took to evict the oldest cache entry! That in turn halved the rate at which we could open not-recently-used tables making your first log in of the day slower, and seriously hurt “background” jobs, affecting interactive performance drastically. At the time, we ended up turning the cache size down to 1000 to get reasonable interactive performance through all our workloads.
The “WTF”s from new hires didn’t help either. We had to take action.
The Schema Transformation
Take a simple association table that described which online payment gateways were allowed for a given invoice for FreshBooks account number 123:
CREATE TABLE `org123_allowed_gateway` (
`allowed_id` int(11) NOT NULL auto_increment COMMENT 'A synthetic PK',
`gateway_id` int(11) default NULL COMMENT 'FK to gateways',
`invoice_id` int(11) default NULL COMMENT 'FK to invoices',
PRIMARY KEY (`allowed_id`)
-- And maybe some indexes, but probably not; there usually isn't enough
-- data to make them worthwhile
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Because there’s one per account, we called these “shattered tables” (as compared to the more traditional “sharded tables”).
In order to use the database server the way it was intended, we wanted to convert it into a table that looked more like:
CREATE TABLE `invoice_allowed_gateway` (
`account_id` int(11) default NULL COMMENT 'FK to accounts',
`allowed_id` int(11) NOT NULL auto_increment COMMENT 'A synthetic PK',
`gateway_id` int(11) default NULL COMMENT 'FK to gateways',
`invoice_id` int(11) default NULL COMMENT 'FK to invoices',
PRIMARY KEY (`account_id`, `allowed_id`),
KEY `allowed_id_idx_for_autoincrement` (`allowed_id`),
KEY `account_invoice_idx` (`account_id`, `invoice_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Primary key ordering
account_id was the first part of the PK, because (almost) all queries ask for data with a single
account_id value. This index ordering improves the specificity of a query including an
account_id restriction (most accounts had, say, a few hundred invoices; but we had thousands of accounts with
invoice_id = 1!). Because we’re coalescing data from many accounts into one table, there were duplicate
allowed_id values. As an additional transformation, we could have renumbered
allowed_ids, but many of our tables’ PKs were FKs, so it got messy quickly. We decided that it was not a good idea to do 2 transformations at once because of this added complexity. Consistency is a prerequisite for pretty.
MySQL and auto_increment with composite primary keys
The index on
allowed_id told MyISAM that we didn’t want to re-use
auto_increment values if rows were deleted. Without
`allowed_id_idx_for_autoincrement` MyISAM would use
MAX(allowed_id)+1 WHERE account_id = X, which would re-issue values for
allowed_id if we ever deleted a row with
MAX(allowed_id) GROUP BY account_id. Again, if you have any FK references (particularly in something like an activity log) this can wreak havoc. Remember that MyISAM doesn’t actually have Foreign Keys, so cascading deletes aren’t provided for you. It might be better not to delete rows, but non-repeating
auto_increment values are nice in their own right, too.
SQLAlchemy and autoincrement with composite primary keys
When using SQLAlchemy, we had to set
autoincrement=False on the
account_id column definition. If it’s not there, SQLAlchemy will assume that it is
auto_increment, leading to strange, un-obvious errors.
The Hard Way
In early 2010 we bit the bullet and started fixing our schema the hard way. We migrated a few tables this way:
- Rewrote all the SQL queries to look at the new table, being very careful to add an
account_id = X restriction to the
WHERE clause of the query. This was tricky, particularly when there were joins and/or sub-queries involved.
- Primed the database server’s disk cache. Most table data wasn’t in memory at any given time, and IO was the slowest part of this transformation. The first time running a data migration took about 3h for one set of shattered tables. After clearing out the shard table, a second run took only 15 minutes!
- Then we shut down the website while the migration executed! It sucked for our users, at 15 minutes a pop. At this rate we’d have had bi-weekly downtime for a couple of years before finishing.
The Easy Way
If you’ve read our previous posts on using triggers to effect seamless data migrations, you know where this is headed. If you haven’t read them yet, now is a great time.
After fixing the code, database triggers allowed us to create and maintain an up-to-date copy of our data in the desired target form. For each shattered table we:
- Created the shard table.
- Created triggers on all the shattered tables, so that inserts, updates, and deletes propagated to the shard table. The triggers had to be written so that they couldn’t fail (i.e. used
REPLACE INTO), because the target data wasn’t guaranteed to exist there yet.
- Ran a backfill, basically:
REPLACE INTO invoice_allowed_gateway (account_id, blah, blah)
SELECT 123, col1, col2
Note that because the source table did not have an
account_id, we had to provide the literal
123 in the query.
- Every time a new account was created with the old shattered tables, an AMQP consumer would add the triggers and run the backfill for the new account. Adding code to do that in the same place as account creation would mean giving the MySQL SUPER privilege to code that doesn’t otherwise need it; it’s an unnecessary security risk. Our shard table is now eventually consistent!
Once the new code referencing the shard table had been deployed to production, we dropped the source “shattered” tables at a leisurely pace. (You don’t even need SUPER privilege to drop tables with triggers on them!)
It seems like old hat to us now, but this particular schema transformation was the first one we tried using triggers. Instead of taking years, it allowed us to completely transform our database in 10 months with no downtime, while at the same time delivering new features and bug fixes to customers.
But, the best part of our final approach is that customers didn’t notice a thing because we did our data transformation without requiring downtime. We were confident in doing it this way because we were able to verify the data transformation before committing to it. On one occasion we did notice an issue with the transformation and fixed it before the problem could affect customers.
Operationally, finally having a “normal” database schema led to many happy outcomes. Memory usage for our Python backends dropped about 25% (on the 12th; I don’t honestly remember what we did on the 16th to improve things further):
CPU usage and response time from our Python backends also dropped by about 30%. Given other changes that we’ve made over the years, I suspect that most of this is due to the smaller memory footprint and Python’s circular garbage collector. The more Python objects you have in memory, the harder the GC has to work to find cycles and orphaned weak-references. Python’s GC is an entire topic unto itself, which deserves its own future blog post.
Disk usage on the database servers dropped dramatically, particularly for tables that had very little in them per-account (which is most of them). Consider the on-disk size of a table containing a single integer:
mysql> CREATE TABLE foo (an INT PRIMARY KEY);
mysql> INSERT INTO foo VALUES (1);
$ ls -l foo* # in bytes
-rw-rw---- 1 mysql mysql 8556 Jan 31 12:42 foo.frm
-rw-rw---- 1 mysql mysql 7 Jan 31 12:43 foo.MYD
-rw-rw---- 1 mysql mysql 2048 Jan 31 12:43 foo.MYI
$ tune2fs -l /dev/mapper/VolGroup00-root | grep Block\ size
Block size: 4096
$ ls -ls --block-size=4096 foo* # In minimally-addressable filesystem blocks
3 -rw-rw---- 1 mysql mysql 3 Jan 31 12:42 foo.frm
1 -rw-rw---- 1 mysql mysql 1 Jan 31 12:43 foo.MYD
1 -rw-rw---- 1 mysql mysql 1 Jan 31 12:43 foo.MYI
That’s 20kB to store 32 bits! This overhead adds up quickly both in terms of mostly-empty disk blocks and used inodes. Getting more actual user data per gigabyte of RAM is always a win.
When I started at FreshBooks, we didn’t think we could actually fix this. It’s been two years to the day since we deployed the last “de-org” code, and a large part of our current dev team doesn’t even know what an “org table” is. And that’s a good thing.
In January 2010 we were pleased to announce the ability to bill in multiple currencies from FreshBooks. In 2012 we announced Cloud Accounting. In 2013, we’re changing the game once more. Today, we’re pleased to announce the arrival of FreshBucks! The true Cloud Currency for all!
If you’ve ever had to work with more than one currency in an application, you know how painful and annoying it can be. Providing reasonable rates of exchange, keeping track of payments in one currency against expenses in another — it’s ugly. It’s not even worth it. So, we’re throwing it out.
That’s right; we’re putting your money in the Cloud.
Introducing the FreshBuck
We looked at other alternative currencies, but what they’re all really missing is a solid sense of the value behind them. What can you really exchange these units FOR? It’s a problem nobody has really cracked yet.
Until now. We can tell you exactly what your FreshBuck is worth: six seconds of free FreshBooks time. That’s right — for only ƒ10 you get a whole minute of FreshBooks. ƒ144,400 equals a whole DAY of FreshBooks in your pocket.
Time really IS money with FreshBucks!
Of course running our own currency will require massive security. We need to be sure that no one can forge FreshBucks. Every single transaction record is signed using the finest, hand-crafted elliptic curve digital signatures, and stored in perpetuity right here on our servers. While other projects worry about avoiding a single, central trusted source, we’re perfectly happy to act as the trusted party in transactions all over the globe. Plainly, nothing could possibly go wrong.
Cloud Currency offers all the benefits of cloud computing, but for money. It’s just like the money you know and love, but instead of storing them in your own wallet, now you can store your money on someone else’s computer. No need to worry about burglary and fraud: we’ll deal with all of that for you, honest.
The Maple Secret
But that’s not all. In order to deliver a truly reliable currency for the 21st century we also need to be sure that this new currency is backed by enough hard specie. In our history, the most trusted currencies have all been backed by real resources: from shells and peppercorns through to the gold and silver-backed currencies for prior centuries.
We want to return to an era of truly trustworthy currency. We also wanted to build a gigantic underground vault. FreshBooks has recently, rather fortunately, taken possession of a very large quantity of maple syrup, from a private source that must remain nameless. So, we’re bootstrapping FreshBucks with just over six million gallons of maple syrup — 23 million litres, for our international colleagues.
Today we introduce the world’s first maple-backed currency. Who doesn’t love maple syrup? Accepted by all and the first step towards rejecting the complicated mix of unsubstantiated currencies run by central bankers, and utterly untrusted currencies run by people who can do very complicated sums very quickly. Time will tell who has the more solid economic outlook.
Oh yes. This is a thing.
The “Hack-Off” has a long and storied tradition at FreshBooks — a day when folks put aside their usual duties and work on whatever seems like a good idea to them, with the only stipulation being that at the end of the day they’re meant to show off their efforts to the company.
We’ve had new features built, long-standing bugs fixed, completely insane dance videos, and velociraptors running amok. There’s music, cocktails, celebrity judges (well, me) and all sorts of shenanigans.
It gets pretty crazy on Hack-Off Day at FreshBooks, but this month is shaping up to be the craziest yet. We’re combining the Hack-Off with the annual ever-popular FreshBooks Costume Contest to bring FreshBookers the
Epic. Ridiculous costumes plus mad hacking schemes equals MONUMENTAL AWESOMENESS. We hope.