How I survived going viral on a $5 Linode
Tuesday the 26th of September, a morning a bit different than the most for me, I was very excited because I had been up all weekend working on my newest project, Reddit Grid, and it was about to go live! I booted up my google analytics and started to share it in a few facebook groups and subreddits to gather feedback and add the features that was requested.
The initial feedback was that the app was great, but there were a few core features missing, which I quickly added. Around 13:00 CEST I had finished enough of the core features and went over to Hacker News to show off my baby in the “Show HN” category.
Right away I could see that people were heading over by looking the source of my visitors in Google Analytics, it went from roughly 15 users to 35-50 going up and down. I was very happy that people actually wanted to check out what I had made. I went to grab some food and came back around 14:30 and opened up my Analytics. I was shocked. I was looking at a number, I was sure was an error, and refreshed the page. Nope, it was real. I was sitting at 165 concurrent users, using my app right at this moment.
I went down to the sources to find out where all of this traffic was coming from, and sure enough it was from Hackernews, I quickly went over to check out how my post had been doing to generate such traffic, and it was right there, number 8 on the frontpage. And not only that, it had more than 30 comments at the time, of people letting me know if they wanted more features, that they liked it, and what they didn’t like. As of writing this article it’s currently at 93 comments, and still on the second page of Hackernews, you can check it out here!
How much traffic did it indeed generate then?
Over the past 24 hours, it had generated roughly 50 GB of traffic, 21GB of incoming requests, and serving roughly 29GB of assets and API responses.
In just one day it managed to use 2% of my Linode’s traffic quota.
I had roughly 41.000 pageviews served, and a total of 8.700 unique users visiting the site throughout the day, a lot of them returning which makes me all warm and bubbly inside. And gladly my server did not bend under to the pressure, in fact it was quite alright throughout the panic hours.
How did my $5 Linode handle the traffic?
When working with apps and web, you usually hear stories about people going viral on Reddit / Hackernews and 30 minutes later their app is down, their server can’t handle it.
Reddit Grid is hosted on a $5 Linode and never surpassed 35% CPU usage throughout the 7 hours of 200 concurrent users, and it all came down to being prepared.
After that any additional content is loaded by the API just returning small json arrays with information about the images that it needs to show.
When building the application I noticed a few things with the Reddit API:
- When requesting a subreddit with the API you get a ‘after’ value back along with the posts – This value is used to grab the next page of posts.
- If a new post has made it to the frontpage of the subreddit the ‘after’ value would change on the very first call, not on the second and third where you pass in the ‘after’ value.
That meant that only the initial call to any given subreddit would be able to change it’s value at any time, while any after that would be the same if i called it now, or tomorrow. Therefore i decided to implement some heavy caching.
The API is build with Laravel and only really have 2 API endpoints as of now:
- For fetching the autocomplete list for my search which is based upon my own database rather than Reddit’s API,
- For fetching the posts of a subreddit, and all of the images in each imgur album, high res sources from gfycats, redditmedia, and normal imgur images. Which is actually a lot of calls to different services to show one list of json for my frontend.
The caching setup
The API is built using Laravel, and I used Memcache as the driver for my caching.
On the initial call to any given subreddit there is absolutely zero caching, as this could have changed at any given time, therefore the first call can vary between 500ms and 2000ms depending on the speed of the third-party services.
The next call will know what ‘after’ key is set, which is the key that I use to cache combined with some values from the request the frontend has made to the API. That means that the first person to load a page might still be getting 500-2000ms response time when fetching new images, but any visitors after will get the response in roughly 40ms instead.
As for the autocomplete, everything was cached on the very first request to any given query for an entire day as i can always clear that cache for this table should I add more subreddits to the auto complete (Btw, if anyone has a complete list of subreddits with NSFW markers, hit me up, i want it!!!)
It was a ton fun to experience, and a little nerve wrecking, I was pretty much glued to my analytics screen, with Reddit and Hackernews opened up next to my code editor where i was releasing new features as fast as I could during the time.
I love all the feedback I got and I have many more features in the pipeline, that i look forward to present in the future!