What is SeaLion?
SeaLion is a Linux monitoring tool, built from the ground up for troubleshooting hundreds of servers.
Why did we create SeaLion?
We’ve another product line - CloudMagic which is a search centric email app. We manage petabytes of indexed user data spread across hundreds of servers. This data is hosted on AWS. Being a search centric email app, the search speed is critical. We take pride in being known for blazing fast search speeds. When a search query takes more than the threshold value, we get an alert and we debug.
Typically, it’s important at this point to determine if the issue was with the hosting provider or in our code. Inspite of having a suite of monitoring tools, we always fall back to the output of traditional commands. What took time? We check the usual suspects – CPU, Disk IO, Network, Memory etc. Where? Sift through the layers of database server, web server and application server.
We first zero in on the server we suspect, SSH into it. Look for the log files —you know how log rotation makes it tough to look for a file— and fish out the relevant log file. Uncompress it. grep, sort to jump to the specific time. If everything is fine, investigate the output of next set of commands. Rinse and repeat for all commands till we find the issue. Ah! This is indeed cumbersome and time consuming. Imagine doing it every time you get alerts for various other parameters.
Reminds one of your debugging terminal?
And that’s why we built SeaLion. Troubleshooting across a large number of servers is broken.
With SeaLion, we now just open the browser, and the output of standard commands is available in a nice tabbed interface. We can now jump to a specific period in time with a single click. One can add their own set of commands too. We can also compare the output of different servers side by side, to find out why one server is performing worse than the other of the same type. The task that used to take several minutes now gets done under a minute.
I’d like to emphasize that we consciously decided not to process the output, because as said before, we system admins anyway fall back to the raw output of commands. Our eyes are accustomed to look at the output of commands and use our judgement to fix things rather than trusting shiny charts. In fact, our experience says that processing of command output masks critical information. Raw data FTW.
A picture is worth a thousand words. A video? Check out this 2 minute video.
What’s next for SeaLion?
Well, there are several things that we debug on servers. We’ll be adding many more features to ease the debugging process and in turn save time.
And that we have built a centralized framework to record command outputs across several servers, the possibilities are infinite. We will also explore ways to build alerting, monitoring and reporting.
Do give it a spin, we’ve made it really easy to get started. We’re open to ideas. Tell us what you would like to see and we might just build it next. We are all ears!
(This post first appeared on SeaLion Blog.)