Dev Blog

The True Cost of a Feature: PIN codes and the FreshBooks iOS App

by Craig Wilson on June 26, 2013

In the world of product development, learning how to say “no” is one of the most difficult yet important skills one can have. While the list of features that could be added to your product is always endless, being able to determine what your product does not do allows for greater focus and attention to detail. Having the confidence to say “no” also means that you can investigate and select the features that will provide the most value for your users, instead of feeling pressured to address every feature request for your application.

As an example, after the launch of our FreshBooks iOS application last year, some of our customers had concerns about the application’s lack of a PIN code. They feared that this omission raised security issues, and made their business data vulnerable to attack or accidental corruption. As a team we had discussed PIN codes during the design of the application, but ultimately decided against implementing them. In this post I’ll explain the rationale behind our decision, and give an overview of the security measures in iOS that made our decision possible.

ABOUT PIN CODES

In mobile applications, PIN codes are passphrases that the user must enter to unlock access to the app. They are typically entered after the application is launched, and function as an initial layer of authentication for applications that connect to an online service. This allows users to leave the application logged into their account with the service while still having a rudimentary method of blocking access for anyone but themselves or other trusted parties. In the case of FreshBooks, this would mean that our users could have a basic level of security without having to log out of the FreshBooks application after each time they used it. This would also result in a speedier launch experience, as they would not incur the cost of the application re-syncing their data each time they had to log in. While these examples make PIN codes seem like a good idea, in practice they do not provide any meaningful additional security for the amount of effort required to properly implement them. Additionally, supporting PIN codes may lead to less-than-ideal experiences for the end user. To see why, let’s step back and examine the true cost of implementing a feature in an application.

CAUSE AND EFFECT

No application feature exists in isolation; if you offer customer support to your users, then giving your customers additional powers necessitates supporting them in the event that those powers are not correctly used. If we implemented PIN codes in the FreshBooks iOS app, what happens if the user forgets their PIN? How would they regain access to the application? If no additional functionality were implemented, the user would have to delete, then re-download the FreshBooks application, log back into their account on the app, and finally create a new PIN to resume their work. This is obviously not a user-friendly workflow – would the user know that they had to delete the application to regain access, and that doing so would not delete any of their data? Would confused users reach out to FreshBooks customer support, or simply stop using the app altogether?

Looking at the above scenario, our default solution creates its own problem: being able to regain full access to the app after deleting it renders the entire PIN code feature moot. A thief that knew the customer’s FreshBooks login credentials could simply delete the application, then reinstall it to gain access to the customer’s information. As such, the right thing to do would be a user-invoked PIN-reset mechanism in both the application and a server-side component, analogous to mechanisms for resetting a password for a website. The PIN code would need to be stored on a server and associated with the customer’s account or device, so that when the app is reinstalled the PIN code would still need to be entered. This greatly increases the scope of the feature, however, and also increases the testing effort to detect any vulnerabilities that the mechanism may have.

This additional work extends to every area of the company. At FreshBooks we pride ourselves on having amazing in-house customer support. Our support staff would need to be trained on how to handle the situation where a customer has forgotten their PIN, and needs to regain access to the app. How would we verify the customer’s identity? Would we email them a new, random PIN and guide them through the process to set a new one, or would we take them through the aforementioned password-reset process? These processes, and any associated tools, would need to put in place to enable the support representative to get the customer’s app back in working order as soon as possible.

There are many other scenarios like the above that would require building additional functionality into the system and creating new processes across the organization, just to support the base feature of PIN codes. Should users be permitted to enter as many incorrect PIN codes as they want, or should the application take some kind of security precaution after a certain number of attempts? If the latter, should the type of precaution (lock access to the application, wipe all local data for the app, etc.) be configurable? All of the above functionality could be developed, but doing so comes at the cost of other features, most importantly ones we believe would provide a larger benefit to a greater portion of our users.

While the above considerations should be taken into account when considering any new feature for an application, we would not have been able to reject adding PIN codes to the app if our customers’ fears were legitimate. Thankfully, however, the design of iOS gave us confidence that adding PIN codes would not result in any more meaningful security in the application. In the remainder of this post I will elaborate on the security measures present in iOS, and what they mean for application developers.

DATA ENCRYPTION IN iOS

Starting with the iPhone 3GS, iOS has featured system-wide 256-bit AES encryption to protect all data on the device. At a high level, this encryption is enabled when the device is locked with a passcode, and disabled when the device is unlocked. This means that in order to read the contents of a device, one needs the passcode set by the user, or a brute-force method to try and guess it. iOS has mechanisms to prevent brute-force attacks, however, such as lengthening the wait time between passcode attempts with each successive incorrect attempt, and an optional setting to wipe the contents of the device (render them cryptographically inaccessable) after 10 failed attempts. Apple provides a much more detailed implementation of iOS’ security measures at this website.

For our purposes, the most important takeaway from this implementation is that once a device is unlocked, all data on the device is fair game. Having a PIN code on the FreshBooks application will not prevent a malicious third party from digging through the contents of the user’s business data if they’re able to unlock the device and connect it to a computer.

From a user’s perspective, securing a device’s data is simple. All they need to do is enable a Passcode Lock. This can be done within the “Settings” application, following Apple’s directions given here. While the passcode defaults to a 4-digit number, users can set the “Simple Passcode” option to “off”, allowing them to set a longer passcode that can also include letters and symbols.

ACCESS CONTROL, THE RIGHT WAY

In addition to data protection, our customers also requested PIN codes as a way to restrict access to their FreshBooks information on a shared device, with a common scenario we heard involving a device that is shared amongst family members. Obviously the parents wouldn’t want their children accidentally modifying their FreshBooks account, so having a PIN code seems like a logical way to solve the problem. It turns out, however, that iOS has built-in functionality to address this scenario in a more robust fashion.

The “Guided Access” mode, available in iOS 6.0 and above, restricts the device to a single app by disabling the “Home” button and gestures, and can even disable individual controls and functionality within the app.

Configuring a Guided Access session

The Guided Access configuration screen

A Guided Access session with a disabled address bar

Individual components, in this case the address bar, can be disabled

Allowed controls still function

Other controls still function

The user cannot leave this mode without entering a passcode, and this passcode is different than the one used to unlock the device itself. Thus, a parent could lock the device to a specific game, and prevent their child from navigating to other applications on the device. More information on Guided Access is available from Apple’s support article.

In addition to Guided Access, iOS also offers “Restrictions”, where specific functionality can be disabled, while still allowing the user to use multiple different applications on the device. The user can be prevented from installing or deleting apps, purchasing content from the iTunes or App Stores, or even using specific apps like Safari, Camera, or FaceTime. Similar to Guided Access, the user must enter a special passcode to exit this mode. An Apple support article about Restrictions can be found here.

FINAL THOUGHTS

Although PIN codes seem to have an obvious benefit in use in mobile applications, upon closer inspection they provide limited benefits, and can even imply a false level of security. By leveraging iOS’ native device encryption, along with the Guided Access and Restrictions features, we were able to make a strong case that PIN codes would not provide any meaningful value to the FreshBooks iOS application.

We hope this case study enables you to more intelligently evaluate feature requests for your own applications, and devote efforts to features that will add true value for your users.

40 comments

The Role of the Quality Assurance Professional in iOS Development

by Craig Wilson on May 28, 2013

Are you an independent iOS developer, looking to improve the quality of your applications? Even if you don’t have the budget to add a dedicated software testing professional, there are measures anyone can take to test more intelligently and efficiently. FreshBooks QA Analyst Craig Wilson recently sat down with iOS Developer and author Ash Furrow on the Springboard podcast to discuss the criteria FreshBooks uses to evaluate iOS testing tools and what conclusions they’ve reached so far, along with the “see how badly you can break it” mindset to have when testing any type of application.

Other highlights include:

  • FreshBooks’ experience with iOS automated testing tools, including Apple’s own “Instruments” developer tool, and Square’s “KIF” framework.
  • The philosophies and mindsets behind automated and manual testing.
  • What types of tests are suitable to be automated.

Have a listen here and let Craig and Ash know what you think!

add comment

FreshBooks secret sauce: chatrooms

by wyatt on May 17, 2013

Now this is a story all about how my life got flip-turned upside down.
I’d like to take a minute — just sit right there,
I’ll tell you how chatrooms are the Prince of Bel-Air.

When I began researching, thinking always,
in chatrooms is where I spent most of my days.
I asked one little question and a sysadmin got scared…
Okay, well, he didn’t really get scared.

He got historical, and told me that we began using chatrooms at FreshBooks after a hackoff he did years ago. Receiving a direct answer like that encapsulates the magic of a simple chatroom: I asked a broad group of people about something and quickly received an answer from whomever had the time, knowledge, and interest to speak up. Chat systems are like water to fish: rarely examined, but integral to our experience — and if you’re not using named chatrooms, you’re missing out on something great.

Questions and Answers

That sysadmin was in #geek, an informal room we have for tech-related discussion which averages 20-25 users in our hundredish-person company. I asked, “hey everybody, who wants to regale me with tales of when and how we started using our chat system?” Around a minute later, our senior-most sysadmin said, “haha I was just talking about that — it was a hackoff of mine!” The discussion that followed gave me a lot of information, and many people chimed in when they had something relevant to add. No one was bothered by direct poking, we didn’t generate email clutter, and my problem was swiftly solved. Magic.

A different example comes from #dev, a more work-related discussion room which averages about 30 people. Someone asked why he got an error when piping a git command into grep. People came out of the woodwork and shortly figured out that he wanted “[colors] ui = true” in his .gitconfig, instead of “[colors] ui = always”.

We got robots

While that was going on, something else amazing was happening in the exact same place. A bot (named Hobbes) who watches the status of our builds reported a failure, and listed the commits and their authors since the last successful build. Everyone reading the discussion could see the failure, but no one needed to get an email to learn that the build broke. The specific people mentioned were also alerted by their clients that their name came up — which would likely grab their attention if they were looking elsewhere.

A bot named Bert sits in #release and acts as a queue-manager for hotfix patches. He’s also a cross-channel announcer: with a simple command, he can relay a message to every room he’s in. Another bot named Ernie does dozens of things. For example, he keeps track of streetcar times at the nearest stop, and reports on weather. He’s also a source of humour: there are commands to return custom image macros like ‘KHANNNNN!!!’, and any mention of ‘bees’ will link to this gem. Because Ernie is a little spammy, he only gets logged into chats like #geek and #fluffychat, where the topic of discussion is more free to wander. He is a modified version of github’s popular hubot, which we’ve found easy to make use of and extend.

Releases

We release a new version of our app every week. The day before a release, we hold a pre-release meeting where we make sure everyone involved is on the same page. The meeting involves a lot of people, follows a set routine, and many people only speak once in it, so it doesn’t feel worth gathering us all into a room. It’s also worth keeping a record of the output of the meeting, which is specific considerations and preparation instructions for the upcoming release. So we hold it in the #release chat room. That way, it’s instantly logged and bothers the minimum number of people for the shortest period of time.

During an actual release, the #release channel is watched closely by operations, QA, and every dev with code going into the app. Each step of the release is reported in the chat, and there’s no speaking in there aside from pertinent information. It’s an awesomely high signal to noise coordination tool.

Not-work things are fine too

There’s also a lot of value to chats beyond what’s directly work-related. For instance, jokes told in chat often elicit laughter throughout the office. Hearing a whole team start laughing when none of them said anything in real life is weird and so awesome. There are also off-topic rooms like #fluffychat which feature pictures of the internet’s cutest animals scrolling to the end of the universe. Our chatrooms keep conversation flowing throughout the office, and as a result the barriers to really valuable communication remain low.

So, you should use chatrooms

Our chatroom usage started as a day-long project by a single employee. People quickly saw its value, and it became a permanent, company-wide aspect of our lives. It’s simple, but it’s one of the things I like most about working here — it’s definitely worth a shot at at your workplace.

It’s pretty easy to get this kind of thing going. In the simplest case, just log into an existing IRC server, set up some rooms and bots, and you’re done. You could also run your own IRC or Jabber server, or sign up for a packaged alternative. We run ejabberd (an erlang jabber daemon) in-house, which gives us an automatic roster of each account and its status, and also keeps all of our internal chat under our own roof. If it’s important to you not to send your data out to a third party (like it is for us), you’ll probably want to run your own chat server.

I polled all my friends (about seven or eight),
and I learned from them (sadly) that they’re missing out here
you heard about our chatrooms, now we’re finally square,
they sit on the throne as the Prince of Bel-Air.

1 comment

Hindsight is 20/20—so take advantage

by Marguerite on April 30, 2013

I’ve only been on the FreshBooks dev team for a few months, so the stories I hear about the bad old days when our releases were like picking a way through a minefield are only legend to me. But I gather we’ve come a long way, and I suspect that this was in large part due to our Release Retrospective meetings. It is in these meetings that our blunderings are turned into lessons so that we learn from our mistakes instead of making the same ones ad infinitum.

Curiosity. That is what I felt while I attended my first retro. It struck me as a strange ritual. In a small room, wherever there was room to stand, devs, QA, Product, Support and Ops crowded around a table strewn with Sharpies and Post-it-notes. These sticky-pads, some green, some red, were passed around. People began writing on them using squeaky markers, peeling the notes off and handing them to a person at the front. This person stuck them on the whiteboard—red notes on the left, green notes on the right. When the sound of scribbling and unsticking of stickies had tapered off, the person at the front went through each note on the board: peeling it off, reading it aloud, discussing it briefly with those present, and finally handing it to one or more people.

This is what was really happening during that meeting, and what happens during all our retros: the afternoon following our weekly release, those involved gather together. If someone thinks something went well during the release, they write it on a green sticky. If something could have gone better, it goes on a red sticky. Each note is read for all to hear. A green note is handed to the people responsible for the thing that went well. If it’s a red note, everyone is brought up to speed on what went wrong, and ideas on what could have been done differently are discussed. The sticky is then handed to whoever claims the responsibility of seeing that some or all of those ideas are put into practice.

That first retro probably didn’t last more than 20 minutes, but when I came out, I was sold. In exchange for 20 easy minutes of our week, the retro meeting gave us a host of benefits. Here are a few:

  • Everyone learns from the actions of others. For example, one time a dev on my team discovered he would be unable to make it to the upcoming release. He backed out his changes that were scheduled to go out, just in case. This small act saved us a lot of headache that release when related changes caught fire. At the following retro, that dev’s name was on a green sticky, and we all heard why. That day I learned in a way that words can’t teach that sometimes it’s worth making the extra effort, just in case.
  • Even the best devs are exposed as merely human. We do these retros weekly and mistakes are easy to make, so chances are each person in the room has been the cause of a red sticky at some point in the past. Being exposed to that knowledge routinely teaches us to admit to our failures more readily and with more grace—we know we’re in good company. This humility also makes us more willing to change when our current habits prove problematic, a quality that is so important when trying to get somewhere as a team.
  • Achievements are honoured publicly. One of the devs here recently refactored a particularly nasty piece of code and put it under test. When his work was called out publicly at a retro, I felt so proud of him as I and those around me applauded him! I also suspect that knowing his work will not go unnoticed or unappreciated will mean he is more willing to take on similar beasts in the future.
  • Cross-department communication happens! Let’s face it, fellow developers. Interactions with other departments can sometimes be a bit… well, sticky. After all, each department has its own goals and methods. As far as releases go, however, all departments are on the same page: we all want smooth, uneventful releases. That’s why 5 departments are represented at retro: it’s a time for different teams to talk to each other about how each department can adjust their processes in order to help make the next release go better.

Allow me to go meta on you and do a mini retro of the 25 or so retros I’ve been to. The process isn’t perfect: there are some red stickies that seem to keep coming back, week after week. But some red stickies that were common when I started have been replaced by green stickies: the routine “People late for release” has turned into “Everyone on time—again!”. “3 hotfixes this week” has turned into “5th week in a row no hotfixes”. As I become less of an onlooker and more of a participant in these retros, I feel more and more proud of the upward direction we’re taking. And that is something every team can benefit from!

Pictures

Our fearless leader was sick

Our first retro leader was Corey. Then it was Diller, whom we fondly referred to as Sexy Corey. Then Ashok came along, known as Sexy^2 Corey. Today he was sick, and Stinson, his substitute, had to bear the brunt of our disappointment with this red sticky.

Red and Green Stickies

Pre-discussion stickies.

drawing a goat on a green sticky

People have fun during retros. This is The Mark Story drawing a goat on a green sticky.

My monitor, surrounded by green and red stickies

I surround my monitor with the feel-good stickies I receive. The ones with an arrow in a circle are referring to the t-shirt I bought just for Retro from a shop in Montreal.

2 comments

Using triggers to un-shatter a schema

by Taavi on April 12, 2013

We deleted millions of customer database tables and nobody noticed. And that’s a good thing. It just took some thinking (a bit of reading), and 10 months of slow-but-steady work.

Background

FreshBooks is a classic multi-tenant application. Customers sign up with us and live in their own little island of data (received invoices and contractors are treated like telephone calls to the next island). When the first version of FreshBooks shipped (and for many years after), every new account creation resulted in a new set of database tables being created. This worked well in the early days. Every time a web page was served by PHP, it would define the names of all the tables that that account needed to access, so you never had to use a WHERE clause to restrict data access to a single account. Early on in an HTTP request we converted an account name (e.g. yourcompany from yourcompany.freshbooks.com) into a numeric ID (e.g. 123), and tables were named using that id (e.g. org123_invoice).

But times change. FreshBooks grew, and with it our data. By 2010 we were living with MILLIONS of database tables (yes, literally on the order of 1E+6). Databases are designed to deal with a few thousand tables, each with millions of rows, not a few million tables with a handful of rows each! Daily database server backups started to take 18 hours. MySQL stores MyISAM tables as 3 or more files on the filesystem, and 3 times MILLIONS of files takes a very long time for the operating system to open and copy. Per-table metadata overhead was often larger than the data described. Data warehousing was also becoming horribly slow, and we were restricted to analyzing a fraction of our data to keep the impact to a minimum.

We also ran into MySQL design decisions that performed poorly on our schema. MySQL has a table cache that contains recently used file handles. The documentation suggests that the cache should be larger than the number of tables plus the number of concurrent accesses to those tables (each concurrent reader requires its own handle). However, with millions of tables that’s just not possible. The kernel wouldn’t let us hold that many files open, even if we had enough RAM. When we got bigger database servers a few years ago, increasing the size of the table cache from 100 000 to 200 000 file handles actually decreased performance because there’s an exclusive, global lock held during cache eviction, which is implemented as a naïve LRU. Doubling the size of the cache approximately doubled the time it took to evict the oldest cache entry! That in turn halved the rate at which we could open not-recently-used tables making your first log in of the day slower, and seriously hurt “background” jobs, affecting interactive performance drastically. At the time, we ended up turning the cache size down to 1000 to get reasonable interactive performance through all our workloads.

The “WTF”s from new hires didn’t help either. We had to take action.

The Schema Transformation

Take a simple association table that described which online payment gateways were allowed for a given invoice for FreshBooks account number 123:

CREATE TABLE `org123_allowed_gateway` (
  `allowed_id` int(11) NOT NULL auto_increment COMMENT 'A synthetic PK',
  `gateway_id` int(11) default NULL COMMENT 'FK to gateways',
  `invoice_id` int(11) default NULL COMMENT 'FK to invoices',
  PRIMARY KEY  (`allowed_id`)
  -- And maybe some indexes, but probably not; there usually isn't enough
  -- data to make them worthwhile
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Because there’s one per account, we called these “shattered tables” (as compared to the more traditional “sharded tables”).

In order to use the database server the way it was intended, we wanted to convert it into a table that looked more like:

CREATE TABLE `invoice_allowed_gateway` (
  `account_id` int(11) default NULL COMMENT 'FK to accounts',
  `allowed_id` int(11) NOT NULL auto_increment COMMENT 'A synthetic PK',
  `gateway_id` int(11) default NULL COMMENT 'FK to gateways',
  `invoice_id` int(11) default NULL COMMENT 'FK to invoices',
  PRIMARY KEY  (`account_id`, `allowed_id`),
  KEY `allowed_id_idx_for_autoincrement` (`allowed_id`),
  KEY `account_invoice_idx` (`account_id`, `invoice_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Primary key ordering

account_id was the first part of the PK, because (almost) all queries ask for data with a single account_id value. This index ordering improves the specificity of a query including an account_id restriction (most accounts had, say, a few hundred invoices; but we had thousands of accounts with invoice_id = 1!). Because we’re coalescing data from many accounts into one table, there were duplicate allowed_id values. As an additional transformation, we could have renumbered allowed_ids, but many of our tables’ PKs were FKs, so it got messy quickly. We decided that it was not a good idea to do 2 transformations at once because of this added complexity. Consistency is a prerequisite for pretty.

MySQL and auto_increment with composite primary keys

The index on allowed_id told MyISAM that we didn’t want to re-use auto_increment values if rows were deleted. Without `allowed_id_idx_for_autoincrement` MyISAM would use MAX(allowed_id)+1 WHERE account_id = X, which would re-issue values for allowed_id if we ever deleted a row with MAX(allowed_id) GROUP BY account_id. Again, if you have any FK references (particularly in something like an activity log) this can wreak havoc. Remember that MyISAM doesn’t actually have Foreign Keys, so cascading deletes aren’t provided for you. It might be better not to delete rows, but non-repeating auto_increment values are nice in their own right, too.

SQLAlchemy and autoincrement with composite primary keys

When using SQLAlchemy, we had to set autoincrement=False on the account_id column definition. If it’s not there, SQLAlchemy will assume that it is auto_increment, leading to strange, un-obvious errors.

The Hard Way

In early 2010 we bit the bullet and started fixing our schema the hard way. We migrated a few tables this way:

  1. Rewrote all the SQL queries to look at the new table, being very careful to add an account_id = X restriction to the WHERE clause of the query. This was tricky, particularly when there were joins and/or sub-queries involved.
  2. Primed the database server’s disk cache. Most table data wasn’t in memory at any given time, and IO was the slowest part of this transformation. The first time running a data migration took about 3h for one set of shattered tables. After clearing out the shard table, a second run took only 15 minutes!
  3. Then we shut down the website while the migration executed! It sucked for our users, at 15 minutes a pop. At this rate we’d have had bi-weekly downtime for a couple of years before finishing.

The Easy Way

If you’ve read our previous posts on using triggers to effect seamless data migrations, you know where this is headed. If you haven’t read them yet, now is a great time. :) After fixing the code, database triggers allowed us to create and maintain an up-to-date copy of our data in the desired target form. For each shattered table we:

  1. Created the shard table.
  2. Created triggers on all the shattered tables, so that inserts, updates, and deletes propagated to the shard table. The triggers had to be written so that they couldn’t fail (i.e. used REPLACE INTO), because the target data wasn’t guaranteed to exist there yet.
  3. Ran a backfill, basically:
    REPLACE INTO invoice_allowed_gateway (account_id, blah, blah)
    SELECT 123, col1, col2
    FROM org123_invoice_allowed_gateway;
    

    Note that because the source table did not have an account_id, we had to provide the literal 123 in the query.

  4. Every time a new account was created with the old shattered tables, an AMQP consumer would add the triggers and run the backfill for the new account. Adding code to do that in the same place as account creation would mean giving the MySQL SUPER privilege to code that doesn’t otherwise need it; it’s an unnecessary security risk. Our shard table is now eventually consistent!

Once the new code referencing the shard table had been deployed to production, we dropped the source “shattered” tables at a leisurely pace. (You don’t even need SUPER privilege to drop tables with triggers on them!)

The Result

It seems like old hat to us now, but this particular schema transformation was the first one we tried using triggers. Instead of taking years, it allowed us to completely transform our database in 10 months with no downtime, while at the same time delivering new features and bug fixes to customers.

But, the best part of our final approach is that customers didn’t notice a thing because we did our data transformation without requiring downtime. We were confident in doing it this way because we were able to verify the data transformation before committing to it. On one occasion we did notice an issue with the transformation and fixed it before the problem could affect customers.

Operationally, finally having a “normal” database schema led to many happy outcomes. Memory usage for our Python backends dropped about 25% (on the 12th; I don’t honestly remember what we did on the 16th to improve things further):

Memory usage graph for our Python-based Evolve backend processes, across the release, showing a 25% reduction in steady-state memory usage on April 12, 2011.

CPU usage and response time from our Python backends also dropped by about 30%. Given other changes that we’ve made over the years, I suspect that most of this is due to the smaller memory footprint and Python’s circular garbage collector. The more Python objects you have in memory, the harder the GC has to work to find cycles and orphaned weak-references. Python’s GC is an entire topic unto itself, which deserves its own future blog post.

Disk usage on the database servers dropped dramatically, particularly for tables that had very little in them per-account (which is most of them). Consider the on-disk size of a table containing a single integer:

mysql> CREATE TABLE foo (an INT PRIMARY KEY);
mysql> INSERT INTO foo VALUES (1);

...

$ ls -l foo* # in bytes
-rw-rw---- 1 mysql mysql 8556 Jan 31 12:42 foo.frm
-rw-rw---- 1 mysql mysql    7 Jan 31 12:43 foo.MYD
-rw-rw---- 1 mysql mysql 2048 Jan 31 12:43 foo.MYI

$ tune2fs -l /dev/mapper/VolGroup00-root | grep Block\ size
Block size:               4096

$ ls -ls --block-size=4096 foo* # In minimally-addressable filesystem blocks
3 -rw-rw---- 1 mysql mysql 3 Jan 31 12:42 foo.frm
1 -rw-rw---- 1 mysql mysql 1 Jan 31 12:43 foo.MYD
1 -rw-rw---- 1 mysql mysql 1 Jan 31 12:43 foo.MYI

That’s 20kB to store 32 bits! This overhead adds up quickly both in terms of mostly-empty disk blocks and used inodes. Getting more actual user data per gigabyte of RAM is always a win.

Epilogue

When I started at FreshBooks, we didn’t think we could actually fix this. It’s been two years to the day since we deployed the last “de-org” code, and a large part of our current dev team doesn’t even know what an “org table” is. And that’s a good thing.

add comment

Search