Best Practices in Problem Determination – Service Linux

Spread the love

Did you ever encounter your customers are screaming over the phone for network or systems down?

At one point whatever the situation the reality is that there’s a problem and you need to solve it.  Its looks so basic for advance linux professional but it’s a tool ready to use for all Sys/Net Administrator.

Here are the linux problem determination tools you can use.

-> strace: The strace tool traces the system calls, special functions that interact with the operating system. You can use this for many types of problems, especially those that relate to the operating system.

-> ltrace: The ltrace tool traces the functions that a process calls. This is similar to strace, but the called functions provide more detail.

-> lsof: The lsof tool lists all of the open files on the operating system (OS). When a file is open, the OS returns a numeric file descriptor to the process to use. This tool lists all of the open files on the OS with their respective process IDs and file descriptors.

-> top:This tool lists the “top” processes that are running on the system. By default it sorts by the amount of current CPU being consumed by a process.

-> readelf: This tool can read and display information about various sections of an Executable and Linking Format (ELF) file.

-> traceroute/tcptraceroute: These tools can be used to trace a network route (or at least one direction of it).

-> ping: Ping simply checks whether a remote system can respond. Sometimes firewalls block the network packets ping uses, but it is still very useful.

-> GDB: This is a powerful debugger that can be used to investigate some of the more difficult problems.

-> tcpdump and/or ethereal: Used for network problems, these tools can display the packets of network traffic.

You have a tools in your toolbox whats next?A toolbox without understanding how and when to use these tools is just sitting on bench doing nothing. Lets discuss the best practice on solving problem quickly and effectively.

1. Use your stack knowledge and acquired skills doing initial investigation.

2. Google for answer! Search the internet effectively,  join group/community – ask question politely and accept answer with “OPEN-MIND”.

3. So you’ve exhausted a lot then start to dig deeper investigation.

a. collect relevant information when the problem/incident happen.

b.Write down in details, be descriptive.

c. You have assumptions, now challenge it until proven – try to create a reproducible test case.

d.Use “process of elimination” to narrow down the scope for the problem/incident.

4. Seek advise from Subject Matter/Expert. – On this phase we intend to increase our ego or sometime we keep silent. The key is have an open-mind and be polite on delivering your rebuttal. The point here is we are proving our assumptions in order to solve the problem.

Lets apply Netiquette is RFC1855 (RFC) request for comments http://www.faqs.org/rfcs/rfc1855.html – Making people feel more comfortable about how we communicate over the internet you’ll get answer on your questions for help.  Best practice also includes how we communicate clearly & concise, use relevant facts and be mindful of your readers.

Leave a Reply