Wednesday, February 4, 2009

Downtime all over the place (Denny's, Vizio, and QuickBooks)

The Super Bowl led to Dennys.com and Vizio.com falling down, while QuickBooks went offline for a number of hours. Denny's and Vizio we can live with, but QuickBooks is a different story. According to the CNet report:

Affected customers of Quickbooks Online were left without access to their financial records. While users of the software version of Quickbooks could access their records, those relying on Intuit for credit-card processing had to do authorizations manually over the phone--a slower, more expensive process.

All online services have outages from time to time, but this one appears to have been lengthy for a line-of-business service, and for many users, unsatisfactorily managed. We received complaints from users that communications from Intuit gave neither a reason for the outages nor an estimate on when the service would return.

Hopefully a lesson learned here. Watching the Twitter traffic at that time, I was impressed to see some signs of life.

Update: Jack in the Box had it's fair share of problems as well.

Saturday, January 31, 2009

The underreported half of Google's new Measurement Lab, and how it can help your online business

In describing the aims of the newly launched Measurement Lab, Vince Cerf describes it as such:
The tools will not only allow broadband customers to test their Internet connections, but also allow security and other researchers to work on ways to improve the Internet.
It's clear that the press and blogosphere are going nuts about the latter half. Specifically, the affect this will have on Net Neutrality, and shady practices of certain telcos. As powerful as this will be in the long run, I want to focus on the other half of the description. The tools to "allow broadband customers to test their Internet connections." I promise this isn't as geeky as it sounds, and it applies directly to helping helping your online business save time and money.

Imagine one of your customers sitting at home ready to use your service. She opens up her web browser, types in your URL, and presses Enter. The browser starts to load the page, the status bar shows "Connecting to yoursite.com...", then "Wating for yoursite.com...". It sits like this for about 15 seconds with a blank page the entire time. She starts to get annoyed. Just to see what happens, she presses refresh and starts the process over. Again, a blank screen, the browser sitting there waiting for your site to begin loading. She then checks that her Internet connection is working by visiting google.com, which loads fine. At this point, if you are lucky, she decides to call your support department up or shoot an email over asking whether there's something wrong. If you are unlucky, she asks around on Twitter, or blogs about it, or just gives up with the new thought in the back of her mind that this service is just plain unreliable. Now, imagine that this scenario took place while your site was perfectly healthy, with no actual downtime anywhere.

Your site is up, but your customer thinks your service is down. The problem lies somewhere along the way between the clients browser and your companies firewalls. The Tubes are clogged just for this specific customer, but how is she supposed to know?

There are a few levels to this problem (followed by the solution):

Level 1: The affect this has on your customer(s)
Online users are still more then likely to give you a few chances before they draw a conclusion, however every incident like this adds to the incorrect negative impression. Especially if this problem manifests itself as a performance issue, slowing or interrupting your customers connections, versus simply keeping them from connecting at all. Your user begins to dread using your service, and look for alternatives every chance they get.

Level 2: The dollar cost to your business
How many calls do you get to your support department from customers claiming they cannot connect to your site, or that your service is broken, or that it's really slow for them? How often does the problem end up being on their end, or completely unreproducible? It may be a relief for your support people, and it may be something your company is happy with, as it confirms that your site is working just fine. Unfortunately, each of these calls costs you money and time. Worse yet, these types of calls generally take the longest to diagnose, as they are vague and require long periods of debugging to get to the root cause. I haven't even mentioned the lost revenue from the missed traffic (if that affects your revenue).

Level 3: The "perception" cost to your business
As described in Level 1, any perceived downtime is just as real as actual downtime in the eyes of your customers. Word of mouth is powerful, especially with todays social media tools, in spreading negative news unfounded as it may be. The more you can do to keep the invalid negative perception from forming, the better.

Level 4: The unknown cost
How often does this happens to your customers? No one has any idea. I said earlier you're "lucky" if your customer decides to pick up the phone and call you about the perceived downtime. More often then not, your customer will simply give up. At worst, they give up with your service entirely. How can you capture this type of information, and help your customers at the same time?

The Solution
Provide a tool that your customers and your support department can use to quickly diagnose where the problem lies. The simplest of these would be to offer a public health dashboard. The more powerful route is to offer tools like these:
Network Diagnostic Tool - provides a sophisticated speed and diagnostic test. An NDT test reports more than just the upload and download speeds--it also attempts to determine what, if any, problems limited these speeds, differentiating between computer configuration and network infrastructure problems.

Network Path and Application Diagnosis - diagnoses some of the common problems affecting the last network mile and end-users' systems. These are the most common causes of all performance problems on wide area network paths.
And what do you know? These are two of the tool that have launched on The Measurement Lab!

Clearly these are still very raw, and not for the every day user. But I see tools like these becoming extremely important for online businesses, both in reducing costs, and in controlling perception. I see this becoming a part of the public health dashboard (which I hope you're hosting separate from your primary site!), allowing users to diagnose problems they are seeing that not reflected in the Internet at large.

I'm going to be watching the development of these tools very closely over the next few months. Most interesting will be noting which other companies support the reasearch, and end up using these tools. Will the focus stay on the Net Neutrality and BitTorrent, or will companies realize the potential of these other tools? We'll find out soon enough!

Google claims entire internet "harmful"

Between 6:30 a.m. PST and 7:25 a.m. PST this morning, every search on Google resulted in a message claiming each and every link the results "may harm your computer". As usual, Twitter was all over it. This likely cost Google a lot of money in lost ad revenue, and led to much undue stress for some poor sap, but what I'm most interested in is how transparently they communicated about this event. I'm happy to report that within 30 minutes of the problem being identified, a resolution was in place, and a couple hours later, Marissa Mayer, VP, Search Products & User Experience (who is only the fourth most powerful person at Google) clearly explained the situation on their company blog:
What happened? Very simply, human error. Google flags search results with the message "This site may harm your computer" if the site is known to install malicious software in the background or otherwise surreptitiously. We do this to protect our users against visiting sites that could harm their computers. We work with a non-profit called StopBadware.org to get our list of URLs. StopBadware carefully researches each consumer complaint to decide fairly whether that URL belongs on the list. Since each case needs to be individually researched, this list is maintained by humans, not algorithms.

We periodically receive updates to that list and received one such update to release on the site this morning. Unfortunately (and here's the human error), the URL of '/' was mistakenly checked in as a value to the file and '/' expands to all URLs. Fortunately, our on-call site reliability team found the problem quickly and reverted the file. Since we push these updates in a staggered and rolling fashion, the errors began appearing between 6:27 a.m. and 6:40 a.m. and began disappearing between 7:10 and 7:25 a.m., so the duration of the problem for any particular user was approximately 40 minutes.

Thanks to our team for their quick work in finding this. And again, our apologies to any of you who were inconvenienced this morning, and to site owners whose pages were incorrectly labelled. We will carefully investigate this incident and put more robust file checks in place to prevent it from happening again.

Thanks for your understanding.
Well handled, and hopefully this does not have negative repercussions for the company long term.

Update: StopBadware.org clarifies the situation a bit further, placing the blame back in Google's court:

[Update 12:31] Google has posted an update on their official blog that erroneously states that Google gets its list of URLs from us. This is not accurate. Google generates its own list of badware URLs, and no data that we generate is supposed to affect the warnings in Google’s search listings. We are attempting to work with Google to clarify their statement.

[Update 12:41] Google is working on an updated statement. Meanwhile, to clarify some false press reports, it does not appear to be the case that Google has taken down the warnings for legitimately bad sites. We have spot checked a couple known bad sites, and Google is still flagging those sites as bad. i.e., the problem appears to be corrected on their end.

Wednesday, January 28, 2009

More on how Google's support for M-Lab is a big deal for transparency

In support of Google's own words in launching Measurement Lab:
At Google, we care deeply about sustaining the Internet as an open platform for consumer choice and innovation. No matter your views on net neutrality and ISP network management practices, everyone can agree that Internet users deserve to be well-informed about what they're getting when they sign up for broadband, and good data is the bedrock of sound policy. Transparency has always been crucial to the success of the Internet, and, by advancing network research in this area, M-Lab aims to help sustain a healthy, innovative Internet.
...a few select quotes from around the web commenting on the power of transparency:
"For years, ISPs have been notoriously shady about what they're throttling or blocking. The industry needs a healthy dose of transparency. Right now, we're just a bunch of pissed-off users complaining about our Skype calls getting dropped and our YouTube videos sputtering to a halt. But when it comes to placing blame, most of us are in the dark."
-- http://blog.wired.com/business/2009/01/new-google-tool.html

"M-Lab aims to bring more transparency to network activity by allowing researchers to deploy Internet measurement tools and share data. The platform launched Wednesday with three Google servers dedicated to the project, and within six months, Google will provide researchers with 36 servers in 12 locations around the globe. All the data collected will be made publicly available."
-- http://news.cnet.com/8301-13578_3-10152117-38.html

"A number of ISPs have lately started to clamp down on peer-to-peer networks and are actively restricting heavy usage of 'unlimited' connections. For users, however, there is very little transparency in this process and it can be very hard to figure out if an ISP is actually actively throttling a connection or preventing certain applications from working properly. In reaction to this, Google, together with the New America Foundation's Open Technology Institute and the PlanetLab Consortium announced the Measurement Lab, an open platform for researchers and a set of tools for users that can be used to examine the state of your broadband connection."
-- http://www.readwriteweb.com/archives/google_announces_measurement_lab.php

This looks like an initiative at least partly created to deal with net neutrality issues, by providing more transparency to users. This seems to be in Google’s political interest; it would be interesting to see the same transparency be provided with issues like Google’s censorship in countries like China or Germany. Say, a Measurement Lab tool that registers which domains are censored in Google.cn, collecting those into a public database for research purposes.
-- http://blogoscoped.com/archive/2009-01-28-n84.html

Google working to make the Internet more transparent

A big (if you're a geek) announcement from Google today:
When an Internet application doesn't work as expected or your connection seems flaky, how can you tell whether there is a problem caused by your broadband ISP, the application, your PC, or something else? It can be difficult for experts, let alone average Internet users, to address this sort of question today.

Last year we asked a small group of academics about ways to advance network research and provide users with tools to test their broadband connections. Today Google, the New America Foundation's Open Technology Institute, the PlanetLab Consortium, and academic researchers are taking the wraps off of Measurement Lab (M-Lab), an open platform that researchers can use to deploy Internet measurement tools.
Basically, Google is working to help the general public diagnose the hidden problems that creep up with the Internet network. I plan to dive into his a lot further, but for now here are some of the tools that have already been made public as a result of this effort:
Exciting!

Update: I'm even more impressed with how much attention this is getting in the blogosphere. Most people are focusing on the BitTorrent aspects of this, but still, a lot of press for transparency movement:

Tuesday, January 27, 2009

Seth Godin on transparency

One of my favorite bloggers/writers/speakers/personalities/gurus Seth Godin recently had some thoughts on the power of transparency:

Can you succeed financially by acting in an ethical way?

I think the Net has opened both ends of the curve. On one hand, black hat tactics, scams, deceit and misdirection are far easier than ever to imagine and to scale. There are certainly people quietly banking millions of dollars as they lie and cheat their way to traffic and clicks.

On the other hand, there's far bigger growth associated with transparency. When your Facebook profile shows years of real connections and outreach and help for your friends, it's a lot more likely you'll get that great job.

When your customer service policies delight rather than enrage, word of mouth more than pays your costs. When past investors blog about how successful and ethical you were, it's a lot easier to attract new investors.

The Net enlarges the public sphere and shrinks the private one. And black hats require the private sphere to exist and thrive. More light = more success for the ethical players.

In a competitive world, then, one with increasing light, the way to win is not to shave more corners or hide more behavior, because you're going against the grain, fighting the tide of increasing light. In fact, the opposite is true. Individuals and organizations that can compete on generosity and fairness repeatedly defeat those that only do it grudgingly.