Top 10 SRE Interview Questions with Expert Answers

This article presents ten essential SRE interview questions covering process priority, shell variable persistence, TTD/TTR metrics, system design for LinkedIn and Twitter, load‑balancing strategies, conflict handling, REST API usage, and log‑parsing code, each with detailed explanations and practical examples.

DevOps Coach
DevOps Coach
DevOps Coach
Top 10 SRE Interview Questions with Expert Answers

01. How to change a process’s priority?

Use renice or the top interface. Example: renice 5 -p 1234 sets the niceness to 5 (higher priority). Negative values require root privileges.

In top, press r, enter the PID, and set the new priority.

02. How to keep a variable from a shell script after it exits?

Variables defined in a child shell disappear when the script ends. Run the script with source script.sh (or . script.sh) so it executes in the current shell, preserving variables.

03. Explain TTD, TTR and why they matter.

TTD (Time To Detect) is the time from a failure occurring to its detection. TTR (Time To Resolve) is the time from detection to full remediation. Lower TTD leads to faster response; lower TTR reduces downtime and improves reliability. Both are key SRE metrics.

04. Design a system architecture for a LinkedIn profile page.

Key components: client apps, API Gateway, Profile Service (microservice), database (MySQL or DynamoDB), cache (Redis/Memcached), media service (object storage like S3), search index (Elasticsearch), connections service, feed service, load balancer, CDN. Non‑functional considerations include caching hot profiles, rate limiting, and security/privacy compliance.

05. Design a system architecture for Twitter.

Core components: client side, API Gateway, User Service, Tweet Service, Timeline Service, media service (S3), search service (Elasticsearch), caching layer (Redis/Memcached), load balancer, CDN. Data stores: MySQL for user data, Cassandra for tweets. Timeline generation can use push (pre‑compute) or pull (on‑demand) models; Twitter typically combines both. Include rate limiting, cross‑region replication, eventual consistency, and scalability for massive tweet volume.

06. Common load‑balancing strategies or techniques?

Round Robin – simple sequential distribution.

Least Connections – routes to server with fewest active connections.

IP Hash – uses client IP for session stickiness.

Weighted Round Robin – gives more traffic to higher‑capacity servers.

Random – selects a server at random.

Consistent Hashing – useful for distributed caches.

Health Checks – ensures traffic only goes to healthy instances.

07. Describe a situation where you strongly opposed a proposal and how you handled it.

Example: the team wanted to skip unit testing to meet a deadline. I explained the risks (technical debt, bugs, future delays), cited a past incident where lack of tests caused production issues, and suggested a minimal set of critical‑path tests. The team agreed to implement those tests, preserving quality without missing the deadline.

08. How to fetch JSON from a REST API?

Use

curl -H "Accept: application/json" https://api.example.com/data

or Python’s requests library:

import requests
response = requests.get('https://api.example.com/data')
data = response.json()
print(data)

09. Write code to parse a log file and count error types.

# parse_logs.py
log_counts = {}
with open('application.log', 'r') as file:
    for line in file:
        if 'ERROR' in line:
            error_type = line.split('ERROR')[1].strip().split()[0]
            log_counts[error_type] = log_counts.get(error_type, 0) + 1
# Display the counts
for error, count in log_counts.items():
    print(f"{error}: {count}")

The script opens the log, scans each line for “ERROR”, extracts the error type, tallies occurrences in a dictionary, and prints the results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

load balancingSystem DesignSREinterviewlog parsinglinux-commands
DevOps Coach
Written by

DevOps Coach

Master DevOps precisely and progressively.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.