Auction Request Downtime: What We’ve Done to Make Sure it Doesn’t Happen Again

As a follow-up to our recent blog post regarding steps we’ve considered to avoid the downtimes & outages we’ve experienced, I’d like to update you on where we currently stand.

This past weekend, we switched the Auction Request service over to a new dedicated server. This server is faster than the old one, and has additional resources to manage increased load. While the load on the old server did not cause any of our outages, we felt it was a good investment to make now, with an eye towards future expansion.

This past month, several of our high-throughput clients were being mysteriously blocked from accessing our service. While this took us a number of weeks to diagnose, we did eventually discover that our old server’s hosting company was behind the blocks, as they saw the high-throughput API requests as a DoS (Denial of Service) attack. In order to mitigate this, we’re now incorporating Cloudflare’s proxy service. It’s important to note that Cloudflare does not enable us to survive a server outage at our host; since we provide an API service that interfaces with eBay, not a static website, viewing a cached copy of Auction Request doesn’t work. Cloudflare does enable us to engage their more sophisticated firewall to survive outside attacks, as well as a DNS failover service that can instantly redirect to another IP address should our current IP address not resolve.

What this means in layman’s terms: we’re going to utilize our old server as a backup server. Should Cloudflare detect that our current (new) server has gone down, they will automatically switch the service over to the backup. Theoretically, this will occur relatively quickly (within a few minutes or less), and again theoretically, should largely eliminate any further downtime. The odds of both servers being down at the same moment is mathematically pretty low.

The backup-redirection mechanics have not been enabled yet, but will be in the coming week, once we’re satisfied that the new server setup is performing optimally.

We thank you all for your continued patience as we overcome these technical challenges. Your business is very important to us, and we continue to make investments to provide the fast, reliable, professional product you deserve.


API Call Usage & Statistics

As we’ve now been “live” for about 50 days now, with a sizeable user base that’s provided much better real-world usage than we were able to simulate in testing, we’ve implemented a number of things in order to regulate how the service is used.

This article may get slightly technical, so a brief summary before we start: if your API call usage gets too high, you’ll receive an e-mail warning to that effect.  An increase much beyond that level of usage will result in another e-mail, and the system will temporarily block you.  We’ll also be notified, and will work with you to figure out the reason for the high usage.  To date, we’ve had a handful of users who’ve hit the “warning” threshold, but in each case, we were able to bring them down to “safe” levels without too much trouble.

For Premium users, you can view your recent API call usage and click-thru statistics via the “Your Stats” page, off the main menu.

So now the more technical parts:

The true “load” on the system is actually not how many calls we’re sending and receiving to the eBay API, but how much RAM and CPU resources are consumed on our server to process a single call from a user.  This was a bit of a surprise to us, and necessitated some extensive code optimization in order to minimize disk access and server database queries.  We’re comfortable with the limits we have in place now, the server load has calmed down, and thankfully we’ve been able to assist the (very few) users who had overly high throughput.  We’re also very happy with the performance of the WP eBay Product Feeds plugin, but users who have either modified the plugin, are using a different CMS solution, or most importantly, are utilizing a feature where an end-user can search eBay through our service (which ends up getting exploited by bots), have in some cases been causing issues.  As mentioned, should this occur, we’ll work diligently with you to examine your usage case and hopefully implement some code additions on your side to help mitigate and get you below that critical threshold.

When viewing your Stats screen, some things to keep in mind.  Because we have a cache system in place (a 5-minute cache for Premium users, and a 30-minute cache for free users), identical calls received by our system within that cache window will be served the cache result stored here, without needing to query the eBay API.  Any of these “cache hits” don’t count towards your API call usage.  Additionally, most error messages you get were caught by us before querying eBay, and they won’t count either.  Lastly, we have bot-filtering algorithms in place to cut down on the number of search-engine bots, or crawlers, that are viewing your feeds and causing API calls.  All of these strategies have dramatically reduced the actual API calls you’re making, and may be quite a bit less than what you would expect.

(Incidentally, this experience in monitoring usage real-time has revealed what I believe is the primary reason eBay shut down the RSS service.  Without any ability to filter bots, regulate usage, or serve a cache, their RSS would have been hammered by literally tens of millions of queries per day.  The resources to handle that would have been quite formidable, and costly.)

The usage metric we’re looking at is your calls to the API over the last thirty-minute period.  We check this every five minutes, in order to better catch a sudden massive spike.  The API call stats you see on your Stats screen will be five minutes or less “behind live”, so if you make an API call and then run over to check the stats screen, it may take up to five minutes to reflect.  The click stats are live.  The click stats also require you to have enabled Link Cloaking in your Profile, as that’s the only way we’re able to track your clicks.