Thursday, July 8, 2010

Facebook and transparency

As some of you may know, I use Facebook as an example of how not-to-do-transparency in my talk. Immediately following my talk at Velocity, I received the following comment from Bret Taylor (CTO of Facebook):



The
"Platform Live Status" page that is mentioned is such:


There's some really good stuff here (e.g. it exists, it looks up-to-date, and it has some great features). There is also a lot of room for improvement. Putting aside the fact that this wasn't meant to be a fully-featured dashboard, and is far better then nothing, lets run their status page through the
rules for a successful public health dashboard and see what we can advise for the next evolution of Facebook's transparency initiative:

Rule #1: Must show the current status for each "service" you offer
Today the status page only gives the status of various services through plain text. For example, at the time of this writing, the "hello, active and total user counts are currently missing from both public profile pages and the API." The two graphs to the right show API response time and error rate across all API functions, not per-API or per-function area. Showing a graph and/or status light for each API/function would add tremendous value for developers that use specific parts of the application and only need to know about those specific areas. It would also make it easier to automate functionality, and to decide which components can be relied on in your architecture.

Recommendation: A graph and status light for each specific API function and end-point that developers may use. See Google's health dashboard for ideas.

Rule #2: Data must be accurate and timely
From the outside this appears to be solid. My big worry is that updates are currently very manual, which isn't going to scale. I haven't watched the site long enough to gauge how timely the updates are, but let's give them the benefit of the doubt. The main reason for this rule, requiring that your data be accurate and timely, comes down to trust. If your users get a hint of inaccuracy or delays in updates, they lose faith in the tool and stop using it. Your users will resort back to emailing/tweeting/complaining, which defeats the entire purpose.

Recommendation: Automate status updates as much as possible. Set up regular monitoring that posts status changes automatically. Create a formal process that requires someone to post a detailed update within a Minimum-Time-To-Communicate.

Rule #3: Must be easy to find
This may be the biggest problem today with Facebook's status page. I've been collecting public heath blogs/dashboards for a couple years now, and I've never come across it. Google'ing for "facebook uptime" or "facebook status" does not help. There are over 100 links to the page, but most are from deep within developer forums. If Facebook is serious about using transparency to their advantage, this page needs to be linked to from the first place that developers would go when they experience issues with the API.

Recommendation: Link to the status page from here and here. Not being a Facebook developer, I'm not the best judge of this, but I'm sure Facebook has plenty of data to figure this out.

Rule #4: Must provide details for events in real time
We discussed this already, but it's very important, especially for API-based developers. The error rate graph is very useful for this, which appears to be real-time. I would do more with it.

Recommendations: Show error rate per API/function (including the types of issues seen), and show historical information to give an impression of what's "normal". Developer mostly need to know who is at fault. If you simply let them know that something is up on your end, they'll feel a lot better and be able to go on with their day. See trust.salesforce.com for an easy way so integrate basic updates into dashboard (click on an error icon).

Rule #5: Provide historical uptime and performance data
Mostly lacking in this area. The graphs only go back to the start of the current day, and the text status-updates go back about 2 weeks. A historical perspective gives new developers a baseline to go by, and gives existing developers a chance to correlate issues they saw on their end.

Recommendation: See OpenSRS's dashboard for a simple way to do historical uptime/performance by service/API. Clicking on the "archive" link shows you past updates for every service.

Rule #6: Provide a way to be notified of status changes
Facebook is actually doing a great job here. They have both an RSS feed and an email option, which is extremely rare and extremely awesome. This allows developers to be pushed updates, and to integrate the updates into your internal dashboards. Great job here.

Recommendations: None!

Rule #7: Provide details on how the data is gathered
Currently customers have no insight into how the API response/errors are measured, and what the policy around status updates is. Is it ad-hoc, is it comprehensive, is it automated? It's hard to rely on this data today without insight into those policies and processes.

Recommendation: Add an explanation to the bottom of the page, or as a link off of the page, going into some of these details. You don't have to reveal your special sauce, just give us confidence that we can rely on this data.

Bonus
  1. The list of top bugs along the left side is a GREAT idea. This takes transparency to another level, and I would highly encourage other sites to adopt this practice. Developers are the target audience for both health issue and outstanding bugs, so why not combine them (along with the "Developer Updates" feed) into a single dashboard ? Brilliant.
  2. I like how the "Current Status" is broken out into a big yellow box at the top, making it clear what the situation is right now. This is much better than the default approach of showing the latest status as simply the top news item in the chronological list. A nice touch.
Conclusion
The most important takeaway is that Facebook has taken the hardest step toward transparency: getting a status blog/dashboard online. If they were to implement some of the recommendations above, they would see more of the benefits that come with transparency, and set a great example for other development platforms.