Top 20 High‑Frequency Ops Interview Questions with Expert Answers
This guide presents the most common operations interview questions—covering Linux mounting, filesystem issues, server performance, networking fundamentals, RAID, load balancing, and web server configuration—along with detailed, high‑scoring answers that showcase systematic thinking, troubleshooting logic, and production‑grade awareness.
In operations interviews, interviewers test systematic thinking, troubleshooting logic, production awareness, and communication, not just rote knowledge.
This article focuses on OS and networking fundamentals, offering high‑scoring approaches to the top 20 frequent questions to demonstrate engineering competence.
1. Mount point cannot be unmounted and cannot reboot, what to do?
High‑scoring answer: First identify the process using the mount point with lsof +D /mount_point or fuser -v /mount_point. If it is a non‑critical process, terminate it safely; if it is a business‑critical process, coordinate with stakeholders before proceeding. Only in extreme cases consider umount -f, after evaluating data‑consistency risks.
2. Filesystem write failure, how to handle?
High‑scoring answer: Check two common resource bottlenecks: disk space with df -h and inode usage with df -i. If space appears sufficient but writes still fail, look for “deleted but still held” files using lsof | grep deleted; clean or expand storage accordingly.
3. Server is sluggish, how to troubleshoot?
High‑scoring answer: Use a top‑down approach: run top or htop to view CPU, memory, and I/O wait. If load is high, determine whether it is CPU‑bound, memory‑bound, or I/O‑bound. Check database connections (e.g., MySQL SHOW FULL PROCESSLIST) for slow queries or locks, then correlate with historical monitoring data to identify spikes or anomalies.
4. Linux command execution is slow, how to solve?
High‑scoring answer: Examine overall system health: uptime for load, free -m for memory, iostat for disk I/O. Excessive page cache may affect other processes. Also inspect scheduled jobs ( crontab -l) or rogue background processes. Distinguish between systemic issues and isolated command problems.
5. Explain the OSI seven‑layer model, TCP vs UDP, and Layer‑2 vs Layer‑3 switches.
High‑scoring answer: The OSI layers from bottom to top are Physical, Data Link, Network, Transport, Session, Presentation, Application. TCP is connection‑oriented, reliable, and ordered; UDP is connection‑less, low‑latency. Layer‑2 switches forward based on MAC addresses (Data Link layer); Layer‑3 switches add routing capabilities based on IP addresses (Network layer).
6. Describe TCP three‑way handshake and four‑way termination.
High‑scoring answer: The three‑way handshake synchronizes initial sequence numbers: client sends SYN, server replies SYN+ACK, client sends ACK. The four‑way termination occurs because TCP is full‑duplex; one side closes its send channel with FIN, the other acknowledges (ACK) and later sends its own FIN, which is then acknowledged, ensuring orderly release.
7. Why does four‑way termination include a TIME_WAIT state?
High‑scoring answer: TIME_WAIT ensures the final ACK is received by the peer. If the ACK is lost, the peer will retransmit FIN, and the side in TIME_WAIT (2 MSL) can respond again, preventing premature connection closure and avoiding delivery of delayed packets to new connections.
8. Why is the connection established with three handshakes but terminated with four?
High‑scoring answer: During connection setup, the server can combine its SYN+ACK response, reducing an exchange. During teardown, the side receiving FIN must ACK before it can safely send its own FIN, so the actions cannot be merged, requiring four steps.
9. Difference between soft links and hard links? How to create them?
High‑scoring answer: Hard links share the same inode as the original file; deleting the original does not affect the hard link, but they cannot cross filesystems or link directories. Soft links are separate files storing the target path, can cross partitions and point to directories, but become dangling if the target is removed. Create a hard link with ln source target and a soft link with ln -s source target.
10. What is RAID? Difference between software and hardware RAID? Common RAID principles?
High‑scoring answer: RAID combines multiple disks for performance or redundancy. Software RAID (e.g., mdadm) is OS‑based, low cost, but depends on host resources. Hardware RAID uses dedicated RAID cards with battery‑backed cache, offering stable performance for enterprise. RAID 0 stripes data for speed without redundancy; RAID 1 mirrors for redundancy (halves capacity); RAID 5 distributes parity, tolerating one disk failure; RAID 10 combines striping and mirroring, recommended for production.
11. Which command monitors Linux system resources?
High‑scoring answer: Prefer dstat, which aggregates vmstat, iostat, netstat, etc., showing CPU, memory, disk, and network in real time. If unavailable, combine top, iostat, and free as a fallback.
12. CentOS boot process?
High‑scoring answer: CentOS 6 uses SysV init: BIOS → MBR → GRUB → init → /etc/inittab → rc.sysinit → rcN.d. CentOS 7 switches to systemd: BIOS → GRUB2 → systemd → default.target → service units. Systemd enables parallel startup, dramatically reducing boot time.
13. How to modify the Apache homepage? Differences between Nginx and Apache?
High‑scoring answer: Apache’s homepage is defined by the DirectoryIndex directive; edit httpd.conf or the virtual‑host config to change it. Nginx uses an asynchronous, non‑blocking model with low resource usage and strong concurrency; Apache follows a synchronous, blocking model with a richer module ecosystem but higher resource consumption. Common practice: place Nginx as a reverse proxy and let Apache handle dynamic content.
14. Difference between proxy and load balancer?
High‑scoring answer: A proxy (forward or reverse) represents a client or server to hide the underlying resources. A load balancer distributes incoming requests across multiple servers to increase throughput and availability. Reverse proxies often incorporate load‑balancing functionality, but their primary role is representation, whereas load balancers focus on distribution.
15. LVS vs Nginx for load balancing—what are the differences?
High‑scoring answer: LVS operates at Layer 4 (transport), forwarding IP packets with very high performance, suitable for massive traffic. Nginx works at Layer 7 (application), parsing HTTP, supporting URL routing, health checks, and static‑dynamic separation. LVS offers stability with complex configuration; Nginx is easier to configure but consumes more resources. Large architectures often stack LVS (first‑level) with Nginx (second‑level) for fine‑grained routing.
16. How does LVS work?
High‑scoring answer: LVS relies on the Linux kernel IPVS module, intercepting packets in the netfilter INPUT chain before routing, then dispatches them to backend servers using scheduling algorithms. Management is performed with the ipvsadm command. Kernel‑space processing yields near‑zero overhead.
17. Common LVS working modes?
High‑scoring answer: In Direct Routing (DR) mode, LVS rewrites the MAC address and forwards packets directly; backends reply to the client, offering the highest efficiency. NAT mode performs DNAT and SNAT, suitable for small clusters. TUN mode encapsulates traffic via IP tunnels for cross‑datacenter deployments. DR is the most widely used in production.
18. Typical LVS scheduling algorithms?
High‑scoring answer: rr (round‑robin) and wrr (weighted round‑robin) for even or capacity‑based distribution; lc (least‑connections) and wlc (weighted least‑connections) for long‑lived connections; dh/sh (destination/source hash) for session persistence; lblc for locality‑based scheduling. Choose based on service characteristics—e.g., web services often use wlc, caching clusters may prefer dh.
19. Three Apache operation modes?
High‑scoring answer: prefork – multi‑process, single‑thread, stable but resource‑heavy; worker – multi‑process, multi‑thread, saves memory but requires thread‑safety; event – builds on worker, allowing threads waiting on I/O to handle new requests, ideal for high concurrency. Modern Apache defaults to event.
20. Why does Nginx consume few resources and handle high concurrency?
High‑scoring answer: Nginx uses an event‑driven asynchronous non‑blocking architecture. When a request waits on I/O, Nginx suspends it and registers a callback, freeing the worker process to serve other connections. This design enables a small number of workers to manage tens of thousands of simultaneous connections, minimizing context switches and memory usage.
Mastering these underlying principles is the first step to handling complex failures. True ops experts also need deep knowledge of service architecture and high‑availability design.
Xiao Liu Lab
An operations lab passionate about server tinkering 🔬 Sharing automation scripts, high-availability architecture, alert optimization, and incident reviews. Using technology to reduce overtime and experience to avoid major pitfalls. Follow me for easier, more reliable operations!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
