Master Linux Server Administration: 250+ Real-World Interview Q&A
From Beginner to Most-Expert — AI-Oriented, Cloud & On-Prem, Business Problem-Solving Approach. Includes Hands-On Labs, Scenarios & Code Exercises.
250+
Questions
5
Experience Levels
30+
Lab Scenarios
50+
Code Exercises
Beginner Level — Linux Server Administration
0–2 Years Experience
Q1 What is Linux and why is it preferred for server environments?
Business Perspective: Linux is an open-source, Unix-like operating system kernel first released by Linus Torvalds in 1991. Organizations choose Linux for servers because it offers zero licensing costs, unparalleled stability, robust security, and massive community support. For a startup running 50 servers, choosing Linux over Windows Server can save $40,000–$100,000+ annually in licensing alone.
Key Advantages:
Cost Efficiency: No per-core or per-user licensing fees (Red Hat offers paid support but CentOS Stream/AlmaLinux/Rocky Linux are free)
Stability: Linux servers routinely achieve 99.999% uptime with proper configuration
Security: Open-source code means vulnerabilities are discovered and patched rapidly by the global community
Performance: Minimal overhead — a basic Linux server install uses ~512MB RAM vs 2–4GB for Windows Server
Automation-Friendly: Everything is scriptable via Bash, making DevOps and CI/CD seamless
FundamentalsBusiness CaseCost Analysis
Q2 How do you check the current Linux kernel version and distribution details?
Use multiple commands for comprehensive system identification:
# Kernel version
uname -r
# Output: 6.8.0-45-generic
# Full system info
uname -a
# Output: Linux hostname 6.8.0-45-generic #46-Ubuntu SMP x86_64 GNU/Linux
# Distribution details (works on most distros)
cat /etc/os-release
lsb_release -a
# For Red Hat based systems
cat /etc/redhat-release
# Detailed kernel parameters
cat /proc/version
Interview Tip: Knowing /etc/os-release is crucial as it's the modern standard across all major distributions. In a business context, you need this to verify compliance with vendor support matrices.
CommandsSystem Info
Q3 Explain the Linux file system hierarchy. Why is understanding it critical for server administration?
Business Impact: Misplacing application files in wrong directories can break backup scripts, cause security audits to fail, and create operational chaos. The FHS (Filesystem Hierarchy Standard) ensures consistency.
/ # Root — everything starts here
/bin # Essential user binaries (ls, cp, mv)
/sbin # System binaries (fdisk, mount, iptables) — often needs root
/etc # Configuration files — THE most critical directory for admins
/var # Variable data — logs (/var/log), databases, email queues
/home # User home directories
/root # Root user's home
/tmp # Temporary files — cleared on reboot (often)
/usr # User-installed software, libraries
/proc # Virtual filesystem — kernel & process info in real-time
/sys # Virtual filesystem — device & driver info
/dev # Device files
/boot # Boot loader files, kernel images
/opt # Optional/third-party software packages
/mnt & /media # Mount points
Real Scenario: A junior admin once stored application logs in /tmp. After a server reboot (routine patching), all logs were lost and the security team couldn't investigate an incident. Always use /var/log for persistent logs.
FHSFile SystemBest Practice
Q4 How do you create, modify, and delete users from the command line?
# Create a new user with home directory
sudo useradd -m -s /bin/bash john_doe
# Set password
sudo passwd john_doe
# Create user with specific UID, group, and expiry
sudo useradd -m -u 1500 -g developers -e 2026-12-31 jane_doe
# Modify user — add to supplementary group
sudo usermod -aG docker,sudo john_doe
# Lock / unlock account
sudo usermod -L john_doe # Lock
sudo usermod -U john_doe # Unlock
# Delete user (keep home dir)
sudo userdel john_doe
# Delete user AND home directory
sudo userdel -r john_doe
# List all users
cat /etc/passwd
getent passwd
Business Scenario: When offboarding an employee, you must lock the account immediately (usermod -L), backup their home directory, then delete after 30 days per HR policy. Automate this with a script that integrates with your HR system.
User ManagementSecurityOnboarding/Offboarding
Q5 What are file permissions in Linux? Explain numeric and symbolic modes.
Core Concept: Every file has three permission sets — Owner (u), Group (g), Others (o). Each set: Read (r=4), Write (w=2), Execute (x=1).
# Symbolic mode
chmod u+rwx,g+rx,o-rwx script.sh # Owner: rwx, Group: rx, Others: nothing
chmod g-w file.txt # Remove write from group
chmod a+x script.sh # Add execute for all (a = u+g+o)
# Numeric mode (most common in scripts)
chmod 755 script.sh # rwxr-xr-x (Owner:7, Group:5, Others:5)
chmod 644 file.txt # rw-r--r-- (Owner:6, Group:4, Others:4)
chmod 600 id_rsa # rw------- (SSH private key — CRITICAL)
chmod 777 dangerous # rwxrwxrwx — NEVER use on production servers!
# Common production patterns:
# Configuration files: 640 (owner read-write, group read)
# Executable scripts: 750 (owner full, group read-execute)
# Web content: 644 (world-readable, owner-writable)
# SSH keys: 600 (owner-only read-write)
Audit Impact: During a PCI-DSS audit, finding a file with 777 permissions containing sensitive data is an automatic finding. Use find / -perm /o=w -type f 2>/dev/null to locate world-writable files.
PermissionsSecurity Auditchmod
Q6 How do you manage services using systemctl? Give examples for a web server.
# Start / Stop / Restart
sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx
sudo systemctl reload nginx # Graceful reload (no downtime)
# Enable on boot / Disable
sudo systemctl enable nginx
sudo systemctl disable nginx
# Status and logs
sudo systemctl status nginx
journalctl -u nginx -f # Follow logs in real-time
journalctl -u nginx --since "1 hour ago"
# List all services
systemctl list-units --type=service --state=running
# Mask (prevent service from being started)
sudo systemctl mask unwanted-service
Business Scenario: During a production deployment at 3 AM, you need to reload Nginx config without dropping connections. Use systemctl reload nginx — it tests config first (nginx -t) and applies changes gracefully. Always test with nginx -t before reloading!
systemdService ManagementNginx
Q7 How do you monitor disk usage and find large files consuming space?
# Overall disk usage (human-readable)
df -h
# Output: Filesystem Size Used Avail Use% Mounted on
# /dev/sda1 50G 38G 9.3G 81% /
# Directory usage summary
du -sh /var/*
du -h --max-depth=1 /home | sort -rh | head -20
# Find largest files (top 20)
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -k5 -rh | head -20
# Find files older than 90 days and larger than 1GB
find /var/log -type f -mtime +90 -size +1G
# Check inode usage (critical! — running out of inodes = can't create files)
df -i
Real Incident: A production server went down because /var/log filled up. The application couldn't write logs and crashed. Solution: Implement logrotate and set up monitoring alerts at 80% disk usage. Business cost: 2 hours of downtime = ~$20,000 for an e-commerce site.
Disk ManagementMonitoringIncident Response
Q8 What is the difference between a process and a service (daemon)?
Process: Any running instance of a program. Has a PID, consumes CPU/memory, can be foreground or background. Created when you run ls, vim, or any command.
Daemon (Service): A background process that runs continuously, usually started at boot. Examples: sshd, nginx, mysqld. Managed by systemd (or init). Daemons detach from the terminal, often run as specific users, and have restart policies.
# View processes
ps aux
top
htop
# View daemons/services
systemctl list-units --type=service
# Key difference: A daemon survives terminal closure; a foreground process dies
ProcessesDaemonsFundamentals
Q9 How do you install, update, and remove packages on Debian-based vs Red Hat-based systems?
Business Note: Always test upgrades in a staging environment first. A production apt upgrade that pulls a broken kernel can cause extended downtime. Use canary deployments — upgrade 10% of servers, monitor for 24 hours, then proceed.
Package Managementaptdnf
Q10 How do you check network connectivity and troubleshoot basic network issues?
# Check IP configuration
ip addr show
ifconfig -a # Legacy, but still used
# Test connectivity
ping -c 4 google.com
ping -c 4 8.8.8.8 # Test without DNS dependency
# DNS resolution
nslookup example.com
dig example.com
host example.com
# Trace route
traceroute google.com
mtr google.com # My favorite — combines ping + traceroute
# Check open ports
ss -tlnp # Listening TCP ports
ss -tunap # All connections
netstat -tlnp # Legacy alternative
# Check firewall
sudo iptables -L -n
sudo ufw status
# Download test
curl -I https://example.com
wget --spider https://example.com
Troubleshooting Flow: 1) Check IP config → 2) Ping gateway → 3) Ping external IP → 4) DNS resolution → 5) Check firewall → 6) Check application logs. This systematic approach saves hours vs random guessing.
NetworkingTroubleshootingDiagnostics
Q11 Explain the difference between absolute and relative paths with business context.
Absolute Path: Starts from root /. Always resolves to the same location regardless of current directory. Example: /etc/nginx/nginx.conf
Relative Path: Relative to current working directory. Example: ../logs/app.log (go up one level, then into logs).
Business Risk: Using relative paths in cron jobs or scripts run by different users can lead to catastrophic errors. A script that uses rm -rf ./temp/* run from the wrong directory could delete critical data. Always use absolute paths in production scripts and cron jobs.
PathsScripting SafetyBest Practice
Q12 How do you use grep, awk, and sed for log analysis? Give a real-world example.
# grep: Find all 500 errors in nginx access log
grep " 500 " /var/log/nginx/access.log | wc -l
# awk: Extract IPs with most requests (top 10)
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# sed: Replace IP addresses for anonymization before sharing logs
sed 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/[REDACTED]/g' access.log
# Combined: Find top 5 URLs returning 500 errors
grep " 500 " access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -5
Business Value: During an incident, quickly identifying which endpoint is failing helps the dev team focus their fix. This skill saved my team 45 minutes during a critical outage.
Log Analysisgrepawksed
Q13 What are soft links (symlinks) and hard links? When would you use each in production?
# Soft link (symbolic link) — pointer to file path
ln -s /opt/app/current/releases/v2.5 /opt/app/latest
# If original is deleted, symlink breaks (dangling link)
# Hard link — additional name for same inode
ln /data/important.db /backup/important.db
# Both point to same data; delete one, data persists via the other
Production Use: Symlinks are used for zero-downtime deployments. Deploy new code to /opt/app/releases/v2.6, then update symlink /opt/app/current → v2.6. Nginx points to /opt/app/current. Rollback is instant — just point symlink back to v2.5.
SymlinksDeploymentZero-Downtime
Q14 How do you schedule tasks with cron? Share a business example.
# Crontab format: MIN HOUR DOM MON DOW COMMAND
# Edit crontab
crontab -e
# Examples:
# Daily database backup at 2 AM
0 2 * * * /usr/local/bin/backup-db.sh >> /var/log/backup.log 2>&1
# Every 5 minutes — health check
*/5 * * * * /opt/scripts/health-check.sh
# Every Monday at 3 AM — log rotation
0 3 * * 1 /usr/sbin/logrotate /etc/logrotate.conf
# List cron jobs
crontab -l
# System-wide cron
cat /etc/crontab
ls /etc/cron.d/
ls /etc/cron.daily/
Business Scenario: An e-commerce company runs nightly cron jobs to generate sales reports. If the cron fails silently (no error logging), the finance team misses data. Always redirect output to a log file and set up cron monitoring (e.g., Cronitor, Healthchecks.io) to alert on failures.
CronAutomationScheduling
Q15 How do you check memory usage and identify processes consuming the most RAM?
# Overall memory
free -h
# Output: total used free shared buff/cache available
# Top processes by memory
ps aux --sort=-%mem | head -15
top -o %MEM # Interactive, press 'M' to sort by memory
# Detailed per-process
cat /proc/meminfo
smem -rs memory # More accurate (includes shared memory)
# Check for memory leaks
watch -n 2 'ps aux --sort=-%mem | head -10'
Incident Example: A Java application had a memory leak — %MEM grew from 15% to 85% over 4 days. Using ps aux --sort=-%mem identified the PID, and pmap -x <PID> showed heap growth. The dev team fixed the leak, but the immediate fix was a nightly restart via cron until the patch deployed.
MemoryPerformanceTroubleshooting
Q16 What is the purpose of /etc/fstab and how do you configure auto-mount at boot?
Business Risk: An incorrect fstab entry can prevent the system from booting. Always test with mount -a before rebooting. Use nofail option for non-critical mounts so the system boots even if that mount fails.
fstabMountBoot
Q17 How do you redirect output and errors in shell scripts?
# Standard output to file (overwrite)
command > file.txt
# Append
command >> file.txt
# Standard error to file
command 2> error.log
# Both stdout and stderr to same file
command > all_output.log 2>&1
# Discard output
command > /dev/null 2>&1
# Separate files for stdout and stderr
command > output.log 2> error.log
ShellRedirectionScripting
Q18 Explain the Linux boot process step-by-step.
1. BIOS/UEFI: Firmware runs POST, selects boot device. 2. Boot Loader (GRUB2): Loads kernel image and initramfs into memory. 3. Kernel: Initializes hardware, mounts root filesystem (read-only initially). 4. initramfs: Temporary root filesystem with essential drivers. 5. systemd (PID 1): First userspace process, mounts filesystems, starts services. 6. Target/runlevel: Reaches multi-user.target or graphical.target.
Troubleshooting: If a server won't boot, use a live CD/USB, chroot into the system, check /var/log/boot.log and journalctl -b.
Boot ProcessTroubleshootingGRUB
Q19 How do you use tar to create and extract archives? Include compression options.
# Create tar.gz (gzip compressed)
tar -czvf archive.tar.gz /path/to/directory
# Create tar.bz2 (bzip2 — better compression, slower)
tar -cjvf archive.tar.bz2 /path/to/directory
# Create tar.xz (xz — best compression)
tar -cJvf archive.tar.xz /path/to/directory
# Extract
tar -xzvf archive.tar.gz
tar -xjvf archive.tar.bz2
tar -xJvf archive.tar.xz
# List contents without extracting
tar -tzvf archive.tar.gz
# Extract to specific directory
tar -xzvf archive.tar.gz -C /target/path/
Business Use: Pre-deployment backups. Before deploying, tar -czvf /backup/pre_deploy_$(date +%Y%m%d_%H%M%S).tar.gz /opt/app. This creates a timestamped backup for instant rollback.
tarCompressionBackup
Q20 What is SSH and how do you configure key-based authentication?
Security: Disable password authentication entirely. All it takes is one weak password for a breach. Key-based auth + fail2ban reduces brute-force risk by 99.9%.
SSHSecurityAuthentication
Q21 How do you find and kill a process? Explain signals.
# Find process
ps aux | grep nginx
pgrep nginx
pidof nginx
# Kill by PID
kill 1234 # SIGTERM (graceful, default)
kill -15 1234 # Same as above
kill -9 1234 # SIGKILL (force — last resort!)
kill -HUP 1234 # SIGHUP (reload config)
# Kill by name
pkill nginx
killall nginx
# Kill all processes by user
pkill -u username
Best Practice: Always try SIGTERM first — it lets the process clean up (close files, finish transactions). SIGKILL is like pulling the power cord; use only when SIGTERM fails. For databases, SIGKILL can corrupt data.
Process ManagementSignalskill
Q22 Explain the difference between apt, apt-get, and aptitude.
apt: Modern, user-friendly frontend (Ubuntu 16.04+). Combines most-used apt-get/apt-cache commands. Has progress bars, color output. Best for interactive use. apt-get: Lower-level, stable CLI. Best for scripts — output format is consistent across versions. aptitude: Full-featured package manager with ncurses GUI and advanced dependency resolution. Useful for complex dependency conflicts.
Scripting Rule: Always use apt-get in scripts. apt warns: "WARNING: apt does not have a stable CLI interface. Use with caution in scripts."
Package ManagementaptScripting
Q23 How do you configure and use sudo? What is the sudoers file?
# Edit sudoers safely (ALWAYS use visudo)
sudo visudo
# Grant user full sudo access
john_doe ALL=(ALL:ALL) ALL
# Grant group sudo access
%developers ALL=(ALL:ALL) ALL
# Passwordless sudo for specific command
%devops ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart nginx
# View your sudo privileges
sudo -l
Security: Never edit /etc/sudoers directly — a syntax error can lock all users out of sudo. Always use visudo which validates syntax before saving. Grant least privilege — only the commands needed.
sudoSecurityPrivilege Management
Q24 How do you manage environment variables? Differentiate between session, user, and system-wide.
# Session-level (current shell only)
export APP_ENV=production
echo $APP_ENV
# User-level (~/.bashrc, ~/.bash_profile, ~/.profile)
echo 'export JAVA_HOME=/usr/lib/jvm/java-17' >> ~/.bashrc
source ~/.bashrc
# System-wide (/etc/environment, /etc/profile.d/)
echo 'APP_ENV=production' | sudo tee -a /etc/environment
# For systemd services
# In service file: Environment="APP_ENV=production"
# Or use EnvironmentFile=/etc/app/config.env
Business Scenario: A production incident occurred when a developer hardcoded API keys in code. The fix: store secrets in environment variables loaded from a secure vault (HashiCorp Vault) at service startup. Never hardcode credentials.
Environment VariablesConfigurationSecurity
🧪 Beginner Hands-On Lab Scenario
Situation: You're a junior admin. The production web server's disk is 92% full. The senior admin is on vacation. You need to free up space immediately without breaking anything. Your Task: 1) Identify what's consuming space using du -sh /* 2>/dev/null | sort -rh | head -10. 2) Find that /var/log/nginx/access.log is 28GB. 3) Don't delete — truncate it: sudo truncate -s 0 /var/log/nginx/access.log. 4) Set up logrotate to prevent recurrence. 5) Document the incident for the team. Business Impact: You prevented a potential outage that could have cost $15,000/hour in lost sales.
💻 Code Exercise — Beginner
Write a script that checks disk usage of / and sends an email alert if usage exceeds 80%.
#!/bin/bash
THRESHOLD=80
USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$USAGE" -gt "$THRESHOLD" ]; then
echo "Disk usage is at ${USAGE}% on $(hostname)" | mail -s "Disk Alert" admin@company.com
fi
Q25 What are runlevels/targets in systemd? How do you switch between them?
Q27 Explain the difference between TCP and UDP. When is UDP preferred?
TCP: Connection-oriented, guaranteed delivery, ordered, flow control. Used for HTTP, SSH, databases. UDP: Connectionless, no delivery guarantee, no ordering, lower latency. Used for DNS, video streaming, VoIP, gaming. Business: A CDN uses UDP for streaming to reduce latency by 40ms vs TCP — that's the difference between buffering and smooth playback.
NetworkingTCP/UDPProtocols
Q28 How do you create and manage LVM (Logical Volume Manager) volumes?
Business Value: LVM allows extending volumes without unmounting — critical for databases that can't go offline. Saves hours of planned downtime.
LVMStorageVolume Management
Q29 What is the purpose of /etc/hosts file? How does it relate to DNS?
/etc/hosts provides local hostname-to-IP mapping, checked BEFORE DNS. Used for local development, overriding DNS, or blocking domains (point to 127.0.0.1). Format: 192.168.1.100 app.internal myapp. In production, use it sparingly — DNS is the single source of truth.
DNShostsNetworking
Q30 How do you use rsync for efficient file synchronization?
# Local sync
rsync -avz /source/ /destination/
# Remote sync (push)
rsync -avz /local/dir/ user@remote:/remote/dir/
# Remote sync (pull)
rsync -avz user@remote:/remote/dir/ /local/dir/
# Delete files at dest that don't exist at source
rsync -avz --delete /source/ /destination/
# Dry run (test first!)
rsync -avz --dry-run /source/ /destination/
Business: rsync uses delta-transfer — only transmits changed portions of files. For a 10GB database dump where only 50MB changed, rsync transfers ~50MB. scp would transfer all 10GB. That's 200x bandwidth savings.
rsyncSyncBackup
Q31 How do you check CPU information and load average?
Load average > number of CPU cores = system is overloaded. For a 4-core server, load of 4.0 means 100% utilization; load of 8.0 means processes are queuing.
CPULoadMonitoring
Q32 What is a shell? Compare bash, zsh, and sh.
sh: Original Bourne shell, minimal features, highly portable. bash: Bourne Again Shell — default on most Linux. Rich features, arrays, command history. zsh: Extended bash with better autocompletion, theming (oh-my-zsh), spell correction. Popular for dev workstations. For scripts: Use #!/bin/bash for features or #!/bin/sh for maximum portability across Unix systems.
ShellbashScripting
Q33 How do you configure NTP for time synchronization?
# Using timedatectl (modern)
sudo timedatectl set-ntp true
timedatectl status
# Using chrony
sudo apt install chrony
sudo systemctl enable --now chronyd
chronyc sources -v
Business Critical: Time sync is essential for distributed systems, database replication, and security (Kerberos requires <5 min skew). Log timestamps must be accurate for forensic analysis.
NTPTime Syncchrony
Q34 How do you use journalctl to query systemd logs?
Q35 Explain file descriptors (stdin, stdout, stderr) with examples.
FD 0 (stdin): Input stream. FD 1 (stdout): Normal output. FD 2 (stderr): Error output. Redirection: command 1>out.txt 2>err.txt or combined command &>all.txt. Understanding FDs is crucial for debugging cron jobs and pipeline scripts.
File DescriptorsI/OShell
Q36 How do you find the IP address of a server?
ip addr show
hostname -I
curl ifconfig.me # Public IP
ip route get 1.1.1.1 | awk '{print $7}' # Primary interface IP
IPNetworkingCommands
Q37 What is swap space? When should you use it?
Swap is disk space used as virtual memory when RAM is full. Modern recommendation: For servers with >16GB RAM, 2-4GB swap is sufficient. Swap is a safety net, not a performance solution. If a server is swapping heavily, add RAM or optimize the application. swapon --show to check.
SwapMemoryPerformance
Q38 How do you change the hostname of a Linux server?
sudo hostnamectl set-hostname new-name.company.com
# Also update /etc/hosts
echo "127.0.1.1 new-name.company.com new-name" | sudo tee -a /etc/hosts
HostnameConfiguration
Q39 How do you check which ports are listening on a server?
ss -tlnp # TCP listening
ss -ulnp # UDP listening
lsof -i :80 # What's using port 80?
netstat -tlnp # Legacy
PortsNetworkingss
Q40 Explain the difference between a shell variable and an environment variable.
Shell variable: Only available in the current shell session. MY_VAR=hello Environment variable: Passed to child processes. export MY_VAR=hello Use export to promote a shell variable to an environment variable. Child processes inherit environment variables but not shell variables.
VariablesShellEnvironment
Q41 How do you use scp to securely copy files between servers?
scp file.txt user@remote:/path/
scp -r /local/dir user@remote:/remote/dir/
scp user@remote:/remote/file.txt /local/path/
# Use port 2222
scp -P 2222 file.txt user@remote:/path/
scpFile TransferSSH
Q42 What is /dev/null and what is it used for?
/dev/null is a special device file that discards all data written to it. Used to suppress output: command > /dev/null 2>&1. Also used as an empty input: command < /dev/null. Essential for clean cron job output.
/dev/nullI/OShell
Q43 How do you set up a basic NFS share?
# Server
sudo apt install nfs-kernel-server
echo "/data 192.168.1.0/24(rw,sync,no_subtree_check)" | sudo tee -a /etc/exports
sudo exportfs -a
# Client
sudo mount -t nfs server:/data /mnt/nfs
NFSFile SharingNetwork
Q44 Explain the purpose of the /proc filesystem.
/proc is a virtual filesystem exposing kernel and process information. /proc/cpuinfo, /proc/meminfo, /proc/loadavg, /proc/PID/ for per-process details. Not a real disk — it's a window into the kernel's data structures. Invaluable for performance analysis and debugging.
/procKernelVirtual FS
Q45 How do you use the find command to locate files by name, size, and modification time?
find / -name "*.log" -type f
find /var -size +100M
find /tmp -mtime +7 -delete # Delete files older than 7 days
find / -user john_doe -type f
find . -name "*.conf" -exec grep -l "error" {} \;
findSearchFile Management
Q46 What is the difference between systemd and SysV init?
SysV init: Sequential boot, shell scripts in /etc/init.d/, slow, limited dependency management. systemd: Parallel boot, socket activation, cgroups integration, unified logging (journald), faster boot times. systemd is the modern standard on all major distributions. Critics cite complexity, but it solves real enterprise problems.
systemdinitBoot
Q47 How do you compress and decompress files using gzip, bzip2, and xz?
The shebang tells the system which interpreter to use. #!/bin/bash, #!/usr/bin/env python3, #!/bin/sh. Without it, the script runs in the caller's current shell, which may have different behavior. Always include it for portability and clarity.
ShebangScriptingBest Practice
Q50 How do you create a systemd service file for a custom application?
Q52 How do you monitor real-time system performance?
htop # Interactive process viewer
iotop # Disk I/O by process
iftop # Network bandwidth
nmon # All-in-one performance monitor
glances # Modern, web-based monitoring
dstat # Versatile resource statistics
MonitoringPerformanceTools
Q53 What are inodes and how do you check inode usage?
df -i
# Find directories with many small files
for dir in /*; do echo "$(find "$dir" -type f 2>/dev/null | wc -l) $dir"; done | sort -rn | head -10
Running out of inodes means you can't create new files even if disk space is available. Common culprit: session files in /tmp or cache directories with millions of tiny files.
InodesFile SystemTroubleshooting
Q54 How do you use the man command effectively?
man ls # Manual page
man -k keyword # Search (apropos)
man 5 crontab # Section 5 (file formats)
# Sections: 1=User commands, 5=File formats, 8=Admin commands
whatis ls # One-line description
manDocumentationHelp
Q55 What steps do you take when a user reports "the server is slow"?
Systematic approach: 1) Check load average (uptime). 2) Check memory (free -h). 3) Check disk I/O (iostat -x 1). 4) Check for swap usage. 5) Identify top CPU/memory consumers (top). 6) Check network (ping, iperf). 7) Review recent changes. 8) Check application logs for errors. This methodical approach impresses interviewers — it shows you don't jump to conclusions.
TroubleshootingPerformanceMethodology
Intermediate Level — Linux Server Administration
2–5 Years Experience
Q56 How do you troubleshoot a server that runs out of disk space overnight?
Business Context: This is the #1 overnight alert for production servers. A 2 AM disk-full alert means the on-call engineer must act fast.
# 1. Quick assessment
df -h
# 2. Find what grew recently
find / -type f -mtime -1 -size +100M -exec ls -lh {} \; 2>/dev/null
# 3. Check largest directories
du -sh /* 2>/dev/null | sort -rh | head -10
# 4. Common culprits:
# - /var/log (unrotated logs)
# - /tmp (session files, uploads)
# - /var/lib/docker (container images)
# - /home (user uploads)
# - Core dump files
find / -name "core.*" -type f -size +1G 2>/dev/null
Immediate Fix: Truncate logs (truncate -s 0), clean package cache (apt clean), remove old Docker images (docker system prune -a). Long-term: Set up logrotate, implement disk monitoring alerts at 75% and 85%, and create a runbook for the on-call team.
Disk SpaceIncident ResponseTroubleshooting
Q57 Explain how to configure and use iptables for a web server. Provide a production-ready ruleset.
#!/bin/bash
# Production iptables for web server
iptables -F
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow SSH (rate limited)
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --set
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 4 -j DROP
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Save rules
iptables-save > /etc/iptables/rules.v4
Business Impact: This ruleset blocks 99% of automated attacks. Rate-limiting SSH prevents brute-force attempts. For PCI-DSS compliance, you must document every open port with a business justification.
iptablesFirewallSecurityProduction
Q58 How do you set up and manage a MySQL/MariaDB database on Linux?
# Install
sudo apt install mariadb-server
sudo mysql_secure_installation
# Create database and user
sudo mysql -e "CREATE DATABASE appdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
sudo mysql -e "CREATE USER 'appuser'@'localhost' IDENTIFIED BY 'strong_password';"
sudo mysql -e "GRANT ALL PRIVILEGES ON appdb.* TO 'appuser'@'localhost';"
sudo mysql -e "FLUSH PRIVILEGES;"
# Backup
mysqldump -u appuser -p appdb | gzip > /backup/appdb_$(date +%Y%m%d).sql.gz
# Restore
gunzip < backup.sql.gz | mysql -u appuser -p appdb
Performance Tuning: Adjust innodb_buffer_pool_size to 70-80% of available RAM for dedicated DB servers. Use mysqltuner for recommendations.
MySQLDatabaseMariaDBBackup
Q59 How do you configure SSL/TLS certificates with Let's Encrypt and automate renewal?
Business: Let's Encrypt saves $200-500/year per domain vs paid SSL. For a company with 50 domains, that's $10,000-$25,000 annual savings. The 90-day expiry with auto-renewal is actually a security feature — compromised certs expire quickly.
SSL/TLSLet's EncryptSecurityAutomation
Q60 Explain load balancing concepts and how to configure HAProxy.
# /etc/haproxy/haproxy.cfg
frontend web_front
bind *:80
default_backend web_back
backend web_back
balance roundrobin
server web1 192.168.1.10:80 check
server web2 192.168.1.11:80 check
server web3 192.168.1.12:80 check backup
option httpchk GET /health
Algorithms: Round-robin (equal distribution), leastconn (sends to server with fewest connections), source (session persistence by IP hash). Business: HAProxy enables horizontal scaling — add more servers as traffic grows without changing application code.
Load BalancingHAProxyHigh Availability
Q61 How do you use Ansible for configuration management? Share a playbook example.
Business: Ansible eliminates configuration drift across 100s of servers. A security patch that takes 5 minutes per server manually takes 5 minutes total with Ansible, regardless of server count. ROI is immediate for teams managing 10+ servers.
AnsibleAutomationConfiguration Management
Q62 How do you perform a MySQL database migration with zero downtime?
Strategy: 1) Set up replication from old master to new master. 2) Let replication catch up. 3) Stop writes to old master (brief read-only mode). 4) Verify replication is fully synced. 5) Promote new master. 6) Update application connection strings. 7) Decommission old master after 48 hours of monitoring. Tools:pt-online-schema-change from Percona Toolkit for schema changes without locking tables. For large tables (100M+ rows), this is the only safe approach.
MySQLMigrationZero DowntimeReplication
Q63 Explain the concept of Linux namespaces and cgroups. How do they enable containerization?
Namespaces: Isolate what a process can SEE. PID namespace (isolated process tree), NET namespace (isolated network stack), MNT namespace (isolated filesystem mounts), UTS namespace (isolated hostname), IPC namespace, USER namespace. cgroups: Limit what a process can USE. CPU shares, memory limits, block I/O throttling, network priority. Together: Docker/LXC use namespaces for isolation and cgroups for resource control. Without these kernel features, containers as we know them wouldn't exist.
NamespacescgroupsContainersDocker
🧪 Intermediate Hands-On Lab Scenario
Situation: Your company's e-commerce site is experiencing intermittent 502 errors. Users complain orders are failing. The stack: Nginx → PHP-FPM → MySQL. You have 15 minutes to diagnose before the VP of Engineering escalates. Your Task: 1) Check Nginx error log: tail -f /var/log/nginx/error.log — see "upstream timed out". 2) Check PHP-FPM status: systemctl status php-fpm — service is running but slow. 3) Check PHP-FPM pool: ss -s shows many connections. 4) Check slow MySQL queries: SHOW FULL PROCESSLIST; — find a query taking 30+ seconds. 5) Kill the blocking query, increase PHP-FPM pm.max_children, and add index to the slow query's table. 6) Document the root cause for the post-mortem. Result: You resolved the issue in 12 minutes, saved ~$8,000 in potential lost orders, and earned the team's trust.
💻 Code Exercise — Intermediate
Write a Bash script that monitors a process and restarts it if it exceeds 80% CPU for 10 consecutive seconds.
#!/bin/bash
PROCESS_NAME="myapp"
THRESHOLD=80
COUNT=0
while true; do
CPU=$(ps -C "$PROCESS_NAME" -o %cpu --no-headers 2>/dev/null | awk '{print int($1)}')
if [ "$CPU" -gt "$THRESHOLD" ]; then
COUNT=$((COUNT+1))
else
COUNT=0
fi
if [ "$COUNT" -ge 10 ]; then
echo "$(date): Restarting $PROCESS_NAME (CPU: ${CPU}%)" >> /var/log/watchdog.log
systemctl restart "$PROCESS_NAME"
COUNT=0
fi
sleep 1
done
Q64 How do you configure logrotate for application logs?
Q69 How do you perform a kernel upgrade without rebooting?
Use Ksplice (Oracle), KernelCare (CloudLinux), or Livepatch (Canonical/Ubuntu Pro). These apply security patches to the running kernel without rebooting. Business: For servers requiring 99.999% uptime, live patching eliminates 4-12 planned reboots per year, each requiring maintenance windows and potential service disruption.
Bridge: Default, isolated network with NAT. Containers communicate via docker0 bridge. Host: Container shares host's network stack directly. Best performance, no isolation. Overlay: Multi-host networking for Swarm/Kubernetes. Uses VXLAN tunneling. Macvlan: Container gets its own MAC address, appears as physical device on network. Used when containers need direct LAN access.
DockerNetworkingContainers
Q71 How do you monitor server health with Prometheus and Grafana?
# Install node_exporter
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-linux-amd64.tar.gz
tar xzf node_exporter-*.tar.gz
sudo mv node_exporter /usr/local/bin/
# Create systemd service, enable, and add to Prometheus targets
# In Grafana, import dashboard ID 1860 (Node Exporter Full)
PrometheusGrafanaMonitoring
Q72 What is SELinux and how do you troubleshoot it?
# Check status
getenforce
# Temporarily set to permissive (logs but doesn't block)
setenforce 0
# Check audit log for denials
ausearch -m avc -ts recent
sealert -a /var/log/audit/audit.log
# Create policy to allow
audit2allow -a -M mypol
semodule -i mypol.pp
Business: SELinux provides mandatory access control — even if an attacker compromises the web server, SELinux can prevent them from accessing /etc/shadow or spawning a reverse shell. Never disable SELinux in production; troubleshoot and create proper policies.
WireGuard is faster and simpler than OpenVPN — 4,000 lines of code vs 100,000+. Kernel-integrated, lower latency, better battery life for mobile clients.
WireGuardVPNSecurity
Q74 Explain the difference between GitOps and traditional CI/CD deployment.
Traditional CI/CD: CI server pushes changes to servers. The CI server has credentials to production. GitOps: Git is the single source of truth. An agent (ArgoCD/Flux) running in the cluster pulls changes from Git and reconciles. No external push access needed — more secure. Rollback = git revert. Business: GitOps reduces deployment-related security incidents by 60% according to industry surveys.
GitOpsCI/CDDevOps
Q75 How do you analyze and optimize slow MySQL queries?
# Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
# Analyze with pt-query-digest
pt-query-digest /var/log/mysql/slow.log
# Use EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
# Add index
CREATE INDEX idx_customer_id ON orders(customer_id);
MySQLPerformanceQuery Optimization
Q76 How do you set up Redis for caching and session storage?
Q77 Explain the CAP theorem and its implications for distributed systems.
Consistency: All nodes see the same data at the same time. Availability: Every request receives a response. Partition Tolerance: System continues despite network partitions. You can only have 2 of 3. Most distributed databases choose AP or CP. Business: Choosing CP (e.g., PostgreSQL with synchronous replication) means the system may be unavailable during a network split. Choosing AP (e.g., Cassandra) means you might serve stale data. The choice depends on whether your business tolerates downtime or stale data.
CAP TheoremDistributed SystemsArchitecture
Q78 How do you use strace to debug a running process?
strace -p <PID> -f -e trace=file,network -o /tmp/strace.log
# Find why a process is hanging
strace -p <PID> -e trace=read,write
# Count system calls
strace -c command
straceDebuggingSystem Calls
Q79 How do you configure centralized logging with the ELK stack?
Elasticsearch: Stores and indexes logs. Logstash: Processes and transforms logs. Kibana: Visualizes and searches logs. Filebeat: Lightweight agent on each server that ships logs. Business: Centralized logging is essential for security compliance (SOC2, PCI-DSS) and enables cross-server correlation during incident investigations.
ELKLoggingObservability
Q80 How do you manage swapiness and kernel parameters via sysctl?
# /etc/sysctl.conf
vm.swappiness=10 # Use swap only when RAM <10% free
vm.dirty_ratio=15 # Max % of RAM for dirty pages
net.core.somaxconn=1024 # Max connections backlog
fs.file-max=65535 # Max open files
# Apply
sudo sysctl -p
sysctlKernel TuningPerformance
Q81 How do you set up a PostgreSQL streaming replication?
Q82 What is the OOM killer and how do you protect critical processes?
# Check OOM score
cat /proc/<PID>/oom_score
# Protect critical process (lower = less likely to be killed)
echo -1000 | sudo tee /proc/<PID>/oom_score_adj
# Or in systemd service:
[Service]
OOMScoreAdjust=-500
OOMMemorysystemd
Q83 How do you perform a security audit of a Linux server?
# Lynis — comprehensive security audit
sudo apt install lynis
sudo lynis audit system
# Check for world-writable files
find / -perm /o=w -type f 2>/dev/null
# Check for files with SUID bit
find / -perm /4000 -type f 2>/dev/null
# Check listening ports
ss -tlnp
# Check for rootkits
sudo apt install rkhunter chkrootkit
sudo rkhunter --check
Security AuditLynisCompliance
Q84 How do you configure SAMBA for file sharing with Windows?
Q85 Explain the use of ulimit and how to set resource limits.
ulimit -n 65535 # Max open files
ulimit -u 4096 # Max user processes
# Permanent: /etc/security/limits.conf
myapp soft nofile 65535
myapp hard nofile 65535
ulimitResource LimitsPerformance
Q86 How do you use Git hooks for automated deployment?
ip link add link eth0 name eth0.100 type vlan id 100
ip addr add 192.168.100.1/24 dev eth0.100
ip link set eth0.100 up
VLANNetworkingSegmentation
Q88 What is the difference between SNAT and DNAT in iptables?
SNAT (Source NAT): Changes source IP of outgoing packets. Used for internet access from private networks. DNAT (Destination NAT): Changes destination IP of incoming packets. Used for port forwarding to internal servers. MASQUERADE: Dynamic SNAT for interfaces with changing IPs (DHCP).
NATiptablesNetworking
Q89 How do you use nc (netcat) for network troubleshooting?
# Test port connectivity
nc -zv server.example.com 3306
# Simple chat server
nc -l 1234
# File transfer
# Receiver: nc -l 1234 > file.txt
# Sender: nc server 1234 < file.txt
netcatNetworkTroubleshooting
Q90 How do you set up email alerts with Postfix for system monitoring?
sudo apt install postfix mailutils
# Configure as "Internet Site"
echo "Test alert from $(hostname)" | mail -s "Alert" admin@company.com
# Use in cron: command || echo "Failed" | mail -s "Cron Error" admin@company.com
PostfixEmailMonitoring
Q91 What is Btrfs and how does it compare to ext4 and ZFS?
ext4: Stable, mature, no checksumming, no snapshots. Best for general use. Btrfs: Copy-on-write, snapshots, compression, checksumming. Good for workstations and backup servers. Some stability concerns with RAID5/6. ZFS: Enterprise-grade, best data integrity, built-in RAID, deduplication, snapshots. Higher memory requirements. Best for critical data storage.
FilesystemBtrfsZFSext4
Q92 How do you deploy a web application using Docker Compose?
perf record -p <PID> -g -- sleep 30
perf report
perf top # Real-time
perfProfilingPerformance
Q95 Explain Blue-Green deployment strategy.
Blue: Current production environment. Green: New version, fully tested but not live. Switch traffic from Blue to Green via load balancer. Instant rollback by switching back. Requires double the infrastructure but enables zero-downtime deployments. Business: Reduces deployment risk — if the new version has issues, rollback takes seconds, not hours.
Blue-GreenDeploymentDevOps
Q96 How do you configure a mail server with Postfix + Dovecot?
# Postfix for SMTP, Dovecot for IMAP/POP3
sudo apt install postfix dovecot-imapd dovecot-pop3d
# Configure virtual domains, SSL certificates, authentication
# Business: Full email server setup for small business — saves $5-15/user/month vs Google Workspace
Mail ServerPostfixDovecot
Q97 How do you use iotop and iostat to diagnose disk performance?
iostat -x 1 # Detailed disk stats
iotop -o # Processes doing I/O
# Look for high await (queue time) and %util near 100%
I/OPerformanceDiagnostics
Q98 What is the difference between active and passive FTP?
Active FTP: Server connects back to client for data transfer. Problematic with firewalls/NAT. Passive FTP: Client initiates both control and data connections. Firewall-friendly. Modern recommendation: Avoid FTP entirely — use SFTP (SSH-based) or HTTPS for file transfers. FTP sends credentials in plaintext.
FTPSecurityNetworking
Q99 How do you integrate LDAP for centralized authentication?
Q101 How do you use ssh tunneling for secure access?
# Local port forwarding
ssh -L 3306:db.internal:3306 jump-server
# Remote port forwarding
ssh -R 8080:localhost:3000 remote-server
# Dynamic SOCKS proxy
ssh -D 1080 jump-server
SSHTunnelingSecurity
Q102 Explain the concept of immutable infrastructure.
Servers are never modified after deployment. Instead of patching, you build a new image and replace the server. Business Benefits: No configuration drift, predictable deployments, easier rollbacks, improved security posture. Tools: Packer + Terraform + Docker.
ImmutableInfrastructureDevOps
Q103 How do you use auditd for system call auditing?
# Monitor changes to /etc/passwd
auditctl -w /etc/passwd -p wa -k passwd_changes
# Search logs
ausearch -k passwd_changes
auditdSecurityCompliance
Q104 How do you configure a high-availability cluster with Corosync and Pacemaker?
Q111 How do you design a disaster recovery plan for a Linux-based infrastructure?
Business-First Approach: DR planning starts with RPO (Recovery Point Objective — how much data can you lose?) and RTO (Recovery Time Objective — how long can you be down?).
Implementation: 1) Multi-region database replication (async for cost, sync for zero data loss). 2) Automated failover with health checks. 3) Infrastructure as Code (Terraform) to recreate environment. 4) Regular DR testing (quarterly minimum). 5) Documented runbooks.
Cost-Benefit: A DR solution costing $50K/year is cheap if a single day of downtime costs $500K. Present this math to management to get budget approval.
Disaster RecoveryRPO/RTOBusiness Continuity
Q112 How do you optimize Linux kernel parameters for a high-traffic web server handling 100K+ concurrent connections?
# /etc/sysctl.conf for high-concurrency web server
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0 # Set to 0 behind NAT
net.ipv4.ip_local_port_range = 1024 65535
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
fs.file-max = 2097152
fs.nr_open = 2097152
# Also increase ulimit
# /etc/security/limits.conf
* soft nofile 1048576
* hard nofile 1048576
Verification: Use ss -s to monitor socket statistics. Use ab or wrk for load testing. These settings enabled a client's server to handle 150K concurrent WebSocket connections on a single 32-core machine.
Kernel TuningHigh ConcurrencyPerformance
Q113 Explain how to implement a multi-tier caching strategy for a web application.
Tier 1 — CDN (Cloudflare/Fastly): Cache static assets at edge. 90%+ cache hit rate for images, CSS, JS. Tier 2 — Varnish/NGINX FastCGI Cache: Full-page cache for anonymous users. Reduces application server load by 70%. Tier 3 — Redis/Memcached: Object cache for database queries, sessions, API responses. Sub-millisecond response times. Tier 4 — Application-level: In-memory caching within the app for frequently accessed computed data. Business Result: A properly implemented 4-tier cache can reduce database load by 95% and improve page load times from 2 seconds to 200ms.
CachingPerformanceArchitecture
Q114 How do you implement a zero-downtime database schema migration for a table with 500M+ rows?
Strategy using pt-online-schema-change (Percona Toolkit):
How it works: 1) Creates a shadow copy of the table. 2) Adds triggers to sync changes. 3) Copies data in chunks. 4) Atomically swaps tables. Business: No downtime, no locked tables, users never notice. For a fintech company, this meant deploying schema changes during business hours instead of Sunday 3 AM maintenance windows.
DatabaseZero DowntimeMigrationPercona
Q115 How do you architect a multi-region, active-active database setup?
Challenge: Active-active multi-region means writes can happen in any region simultaneously. Conflict resolution is the hard part.
Solutions: 1) CRDTs (Conflict-Free Replicated Data Types): Mathematical approach — operations commute, so order doesn't matter. Used by Redis CRDB, Riak. 2) Last-Write-Wins (LWW): Simplest, but can lose data. OK for caches, not for financial data. 3) Application-level conflict resolution: Custom merge logic. Complex but most flexible. 4) Partition by region: Users in Asia write to Asia DB, users in Europe write to Europe DB. No conflicts. Used by many SaaS companies.
Business: Active-active reduces latency (users connect to nearest region) and provides true disaster resilience. Cost: 2-3x infrastructure spend plus engineering complexity.
Multi-RegionActive-ActiveArchitecture
Q116 How do you use eBPF for advanced observability?
eBPF runs sandboxed programs in the kernel without changing kernel source. Tools: bcc (BCC tools), bpftrace, Cilium for networking, Falco for security. Example: trace all open() syscalls across the system with zero performance impact. eBPF is revolutionizing observability — it's like having a programmable microscope into the kernel.
eBPFObservabilityKernel
Q117 How do you implement secrets management with HashiCorp Vault?
# Dynamic database credentials
vault write database/roles/myapp \
db_name=postgres \
creation_statements="CREATE USER '{{name}}'@'%' IDENTIFIED BY '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON appdb.* TO '{{name}}'@'%';" \
default_ttl="1h" max_ttl="24h"
Each application instance gets unique, time-limited credentials. If compromised, the credentials auto-expire. This is the gold standard for database security.
VaultSecretsSecurity
Q118 Explain Kubernetes architecture and how to troubleshoot pod networking.
Control Plane: API Server, etcd (state), Scheduler, Controller Manager. Worker Nodes: Kubelet, kube-proxy, Container Runtime. Networking: CNI plugins (Calico, Flannel, Cilium). Troubleshooting: kubectl exec -it pod -- netstat -tlnp, kubectl describe pod, check CNI logs, use tcpdump inside pods.
KubernetesArchitectureNetworking
Q119 How do you perform a live migration of a running VM with KVM?
Q121 How do you implement a service mesh with Istio?
Istio adds mTLS, traffic management, and observability to Kubernetes without changing application code. Sidecar proxy (Envoy) injected into each pod handles all network traffic. Business: Zero-trust security between microservices, canary deployments with traffic splitting, and distributed tracing — all without developer effort.
IstioService MeshKubernetes
Q122 How do you optimize disk I/O for a PostgreSQL database?
# Use separate disks for WAL and data
# /var/lib/postgresql/data on SSD (data)
# /var/lib/postgresql/wal on NVMe (WAL)
# Mount options in /etc/fstab:
UUID=xxx /data ext4 defaults,noatime,nodiratime,data=writeback 0 2
# PostgreSQL conf:
effective_io_concurrency = 200
random_page_cost = 1.1 # For SSD
PostgreSQLI/OPerformance
Q123 Explain the Linux memory management subsystem: slab, buddy allocator, page cache.
Buddy Allocator: Allocates contiguous physical pages (4KB each). Merges adjacent free blocks into larger ones. Slab Allocator: Caches frequently allocated kernel objects (inodes, dentries). Reduces fragmentation. Page Cache: Caches file data in RAM. free -h shows it as "buff/cache" — this is available memory, not used memory. Linux will free it if applications need RAM.
Memory ManagementKernelDeep Dive
Q124 How do you implement a CI/CD pipeline with GitLab CI for a microservices application?
Q125 How do you troubleshoot a memory leak in a Java application on Linux?
# Monitor heap usage
jstat -gc <PID> 1000
# Heap dump
jmap -dump:live,format=b,file=heap.hprof <PID>
# Analyze with Eclipse MAT or jhat
# Check native memory (off-heap leak)
pmap -x <PID> | sort -k3 -rn | head -20
JavaMemory LeakTroubleshooting
Q126 How do you configure OpenLDAP with SSL/TLS for enterprise authentication?
# Generate certificates, configure slapd
# Enable LDAPS on port 636
# Integrate with PAM/NSS for system auth
# Business: Single source of truth for 10,000+ employees across all Linux servers
OpenLDAPEnterpriseAuthentication
Q127 How do you use Terraform to provision Linux servers on AWS/Azure/GCP?
Q130 Explain the Linux I/O scheduler algorithms: CFQ, Deadline, NOOP, mq-deadline, kyber.
mq-deadline: Default for SSDs in modern kernels. Fair, low latency, good for mixed workloads. kyber: Designed for fast SSDs/NVMe. Uses token bucket to control latency. none/noop: Minimal overhead, lets the device handle queuing. Best for NVMe and virtualized storage. Check current: cat /sys/block/sda/queue/scheduler
I/O SchedulerKernelPerformance
Q131 How do you use Ceph for distributed storage?
Ceph provides object (S3-compatible), block (RBD), and file (CephFS) storage in a single cluster. Self-healing, no single point of failure. Used by CERN for petabyte-scale storage. Business: Replace expensive SAN storage with commodity servers — 60-80% cost reduction for large-scale storage.
CephStorageDistributed
Q132 How do you implement canary deployments in Kubernetes?
# Using Istio/Flagger
# Deploy new version, route 5% traffic to it
# Monitor error rate and latency
# Gradually increase to 100% or auto-rollback
# Business: Reduce deployment risk — if canary fails, only 5% of users are affected
CanaryKubernetesDeployment
Q133 How do you use systemd-nspawn for lightweight containers?
# Create container
sudo debootstrap stable /var/lib/machines/mycontainer
# Start
sudo systemd-nspawn -D /var/lib/machines/mycontainer -b
# Lighter than Docker, integrates with systemd, good for system containers
systemd-nspawnContainerssystemd
Q134 How do you audit and harden a Linux server for PCI-DSS compliance?
Key Requirements: 1) Firewall with documented rules. 2) No default passwords. 3) File integrity monitoring (AIDE). 4) Centralized logging with tamper protection. 5) Quarterly vulnerability scans. 6) Access control with least privilege. 7) Encryption at rest and in transit. 8) Regular patching with documented SLAs. Tool:lynis audit system, OpenSCAP for automated compliance scanning.
PCI-DSSComplianceSecurity
Q135 How do you configure VXLAN tunnels for overlay networking?
ip link add vxlan0 type vxlan id 100 dstport 4789 group 239.1.1.1 dev eth0
ip addr add 10.100.0.1/24 dev vxlan0
ip link set vxlan0 up
VXLAN encapsulates Layer 2 frames in UDP packets, enabling virtual networks across physical infrastructure. Used by Docker overlay, Kubernetes flannel, OpenStack Neutron.
VXLANOverlayNetworking
Q136 How do you implement automated patching with Ansible and a canary strategy?
# Patch canary servers first (10% of fleet)
ansible-playbook -l canary_group patch.yml
# Wait 24 hours, monitor for issues
# If healthy, patch remaining servers
ansible-playbook -l production_group patch.yml
AnsiblePatchingAutomation
Q137 How do you use Kafka for event streaming on Linux?
Q141 How do you use kernel live patching with Kpatch?
# Install kpatch
sudo apt install kpatch
# Apply a live patch
sudo kpatch apply patch.kpatch
# List active patches
sudo kpatch list
Live PatchingKernelSecurity
Q142 How do you configure SR-IOV for network performance?
SR-IOV allows a single physical NIC to present multiple virtual NICs (VFs) directly to VMs/containers, bypassing the hypervisor for near-native network performance. Used in telco/NFV and high-frequency trading.
SR-IOVNetworkingPerformance
Q143 How do you use OSSEC for host-based intrusion detection?
sudo apt install ossec-hids
# Monitors file integrity, log analysis, rootkit detection
# Alerts on suspicious activity via email/SIEM integration
OSSECIDSSecurity
Q144 Explain the design of a message queue system with RabbitMQ on Linux.
Q149 How do you perform a security penetration test on a Linux server?
# Using nmap for port scanning
nmap -sV -sC -p- target_server
# Using nikto for web vulnerability scanning
nikto -h https://target_server
# Using metasploit for exploitation testing
msfconsole
Penetration TestingSecuritynmap
Q150 How do you configure a load-balanced MySQL cluster with ProxySQL?
# ProxySQL sits between app and MySQL servers
# Provides connection pooling, query routing, read/write splitting
# Reduces DB connections by 90%+ through connection multiplexing
ProxySQLMySQLLoad Balancing
Q151 Explain the concept of chaos engineering and how to implement it on Linux.
Chaos Engineering: Deliberately introduce failures to test system resilience. Tools: Chaos Monkey (Netflix), LitmusChaos (Kubernetes), stress-ng (Linux). Example: Randomly kill 50% of web servers during business hours and verify the load balancer handles it. Business: Proactively discovering weaknesses prevents production outages.
Chaos EngineeringResilienceTesting
Q152 How do you configure automatic failover with keepalived?
Q164 How do you configure a production-ready Elasticsearch cluster?
# elasticsearch.yml
cluster.name: production
node.name: node-1
network.host: 0.0.0.0
discovery.seed_hosts: ["node-1","node-2","node-3"]
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
# Set heap to 50% of RAM, max 31GB
# Use SSD for data path
ElasticsearchClusterProduction
Q165 How do you use SSH certificates for authentication at scale?
# Instead of managing authorized_keys on 1000 servers
# Issue short-lived SSH certificates signed by a CA
ssh-keygen -s ca_key -I user_id -n username -V +1d user_key.pub
# Servers trust the CA — no per-server key management
SSHCertificatesScale
Most-Expert Level — Linux Server Administration
10+ Years Experience
Q166 How would you architect a globally distributed, multi-cloud Kubernetes platform serving 500M+ users?
Architecture Vision:
Control Plane: Multi-cluster management with Karmada or Google Anthos — single pane of glass across AWS, GCP, Azure.
Networking: Service mesh (Istio) with multi-cluster federation. Cross-cluster mTLS, global traffic routing based on latency and health.
Data Layer: CockroachDB or YugabyteDB for globally consistent SQL. Cassandra/ScyllaDB for high-throughput NoSQL.
CDN: CloudFront + Cloudflare for static content. Dynamic content routed to nearest PoP.
Observability: Centralized Prometheus/Thanos for metrics, Tempo for tracing, Loki for logs — all in Grafana Cloud.
CI/CD: ArgoCD with ApplicationSets for automated multi-cluster deployments.
Cost Optimization: Spot instances for stateless workloads, reserved instances for databases, auto-scaling across clouds based on pricing.
Business Impact: This architecture provides 99.99% availability, < 100ms global latency, and avoids single-cloud vendor lock-in. Estimated infrastructure cost: $2-5M/month for 500M users, but the business can survive any single cloud provider outage.
Multi-CloudKubernetesArchitectureGlobal Scale
Q167 How do you diagnose and resolve a kernel panic in a production server without physical access?
Immediate Actions:
# 1. Check IPMI/iDRAC/iLO for console access
ipmitool -H <ipmi_ip> -U admin -P password sol activate
# 2. Configure kdump for crash dumps (pre-incident setup)
sudo apt install kdump-tools
# Edit /etc/default/kdump-tools — set crashkernel=256M
# After crash, find dump in /var/crash/
# 3. Analyze crash dump
crash /usr/lib/debug/boot/vmlinux-$(uname -r) /var/crash/dump.xxx
# 4. Common causes:
# - Faulty kernel module
# - Hardware failure (bad RAM — check with memtest)
# - Filesystem corruption
# - Out-of-memory with critical process killed
Prevention: Use netconsole to stream kernel logs to another server. Configure watchdog timers. Set up automatic reboot after panic: echo 10 > /proc/sys/kernel/panic. Business: A kernel panic on a critical server without kdump configured = 4+ hours of debugging vs 30 minutes with proper crash dump analysis.
Kernel PanicCrash DumpkdumpIPMI
Q168 Design a real-time data processing pipeline handling 10M events/second on Linux.
Pipeline Architecture:
Ingestion: Kafka cluster (20+ brokers) with partitioned topics. Use io_uring for disk I/O — 2x throughput vs traditional AIO.
Processing: Apache Flink on Kubernetes for stateful stream processing with exactly-once semantics.
Storage: S3 data lake (via Kafka Connect S3 sink) + ClickHouse for real-time analytics.
Kernel Tuning: XDP for packet filtering at NIC level (bypasses kernel networking stack for 10x performance). HugePages for JVM.
Monitoring: Prometheus + custom eBPF probes for pipeline latency tracking.
Business: This pipeline can process financial transactions in real-time for fraud detection — a 100ms delay can mean a $10M fraudulent transaction slipping through.
Real-TimeKafkaStreaming10M EPS
Q169 How do you implement a custom Linux kernel module for a specific business requirement?
Use Case: A fintech company needed a kernel module to intercept all network I/O for real-time compliance monitoring at wire speed — impossible to achieve in userspace.
Kernel ModuleC ProgrammingLow-Level
Q170 How do you use XDP (eXpress Data Path) for high-performance packet processing?
XDP runs eBPF programs directly on the NIC driver level, before the kernel allocates sk_buff structures. This enables line-rate packet processing (40Gbps+) with minimal CPU. Use cases: DDoS mitigation, load balancing, telemetry. Companies like Cloudflare use XDP to drop attack traffic at the edge with zero performance impact.
XDPeBPFHigh Performance
Q171 How do you design a storage architecture using NVMe-oF for a database cluster?
NVMe over Fabrics allows accessing NVMe storage over the network (RDMA, Fibre Channel, or TCP) with latency approaching local NVMe (< 10µs overhead vs DAS). Architecture: NVMe-oF target servers expose NVMe namespaces. Database servers connect via RoCE (RDMA over Converged Ethernet). Result: Shared NVMe storage with < 100µs latency — ideal for Oracle RAC or PostgreSQL with shared storage.
NVMe-oFStorageRDMA
Q172 How do you implement a consensus algorithm (Raft) for a distributed system?
Raft is used by etcd, Consul, and TiKV. Leader election, log replication, and safety. In production, you need odd number of nodes (3, 5, or 7), proper timeout tuning, and disk persistence for the write-ahead log. Business: Raft ensures your distributed lock service or configuration store remains consistent even during network partitions.
RaftConsensusDistributed Systems
Q173 How do you use DPDK for userspace networking?
DPDK (Data Plane Development Kit) bypasses the kernel network stack entirely. Applications poll NIC directly from userspace using huge pages and CPU pinning. Achieves 100M+ packets per second per core. Used in telco (5G infrastructure), financial trading systems, and high-performance load balancers.
DPDKNetworkingUserspace
Q174 How do you architect a multi-tenant SaaS platform with strict data isolation on Linux?
Options: 1) Database-per-tenant: Strongest isolation, most overhead. 2) Schema-per-tenant: PostgreSQL schemas — good balance. 3) Row-level security: PostgreSQL RLS policies — efficient but complex. 4) OS-level isolation: Each tenant gets a Linux namespace or lightweight container with dedicated resources. Business: For healthcare SaaS (HIPAA), database-per-tenant is non-negotiable. For a project management tool, row-level security with encryption is sufficient.
SaaSMulti-TenantArchitecture
Q175 How do you perform a live migration of a running container between hosts?
CRIU (Checkpoint/Restore In Userspace) can checkpoint a running process/container and restore it on another host. Docker supports this experimentally: docker checkpoint create and docker start --checkpoint. Limitations: Requires same kernel version, doesn't work with GPU passthrough, open TCP connections may break. Alternative: Kubernetes graceful eviction with preStop hooks.
CRIULive MigrationContainers
Q176 How do you configure Linux for low-latency trading (sub-microsecond)?
# CPU isolation — dedicate cores to trading app
isolcpus=2-7 nohz_full=2-7 rcu_nocbs=2-7
# Disable power management
cpupower frequency-set -g performance
# Use real-time scheduling
chrt -f 99 ./trading_app
# Pin to isolated CPU
taskset -c 2-7 ./trading_app
# Use huge pages, disable swap, pin NIC interrupts to dedicated cores
Low LatencyTradingPerformance
Q177 How do you implement a network traffic generator for testing at 100Gbps?
# Using pktgen (kernel module)
modprobe pktgen
echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
echo "count 10000000" > /proc/net/pktgen/eth0
echo "start" > /proc/net/pktgen/pgctrl
# For more advanced: T-Rex (Cisco) or Warp17
Traffic Generator100GbpsTesting
Q178 How do you design a hot-hot disaster recovery solution with real-time data sync?
Hot-Hot: Both data centers serve traffic simultaneously. Data Sync: Use database-native multi-master (MySQL Group Replication, PostgreSQL BDR) or application-level dual-write with conflict resolution. Global Load Balancing: DNS-based (Route 53, NS1) with health checks. Business: Banks use hot-hot for zero RPO/RTO. Cost is 2-3x infrastructure but zero revenue loss during a DC failure.
Hot-HotDRReal-Time
Q179 How do you use io_uring for high-performance asynchronous I/O?
io_uring (kernel 5.1+) is the next-gen async I/O interface. Uses shared memory ring buffers between kernel and userspace — zero syscall overhead for I/O operations. Achieves 2-3x throughput vs libaio. Used by RocksDB, ScyllaDB, and modern storage systems. Code:liburing library provides easy API for C/C++ applications.
io_uringAsync I/OPerformance
Q180 How do you implement a zero-trust network with SPIFFE/SPIRE?
SPIFFE: Standard for workload identity (SPIFFE ID like spiffe://company.com/app/frontend). SPIRE: Implementation — issues short-lived X.509 certificates and JWTs to workloads. Every service-to-service call is mutually authenticated via mTLS. No more hardcoded API keys or static credentials.
SPIFFEZero TrustSecurity
Q181 How do you use perf and flame graphs to identify CPU bottlenecks?
perf record -F 99 -p <PID> -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
# Flame graph shows where CPU time is spent — width = time
Flame GraphsperfPerformance
Q182 How do you configure Linux as a high-performance router with BGP and OSPF?
# FRRouting for routing protocols
# VPP (Vector Packet Processing) for forwarding plane
# Achieves 100Gbps routing on commodity hardware
# Used by cloud providers for virtual networking
RouterBGPVPP
Q183 How do you implement a blockchain node on Linux?
# Ethereum node (Geth)
geth --syncmode "snap" --http --http.api "eth,web3,personal"
# Requires fast SSD (NVMe recommended), 32GB+ RAM
# Storage: 1TB+ for full archive node
BlockchainEthereumNode
Q184 How do you use LTTng for low-overhead kernel and userspace tracing?
LTTng (Linux Trace Toolkit Next Generation) provides sub-microsecond overhead tracing. Used for debugging latency issues in production without impacting performance. Integrates with Trace Compass for visualization.
LTTngTracingPerformance
Q185 How do you architect a solution for GDPR-compliant data processing on Linux?
Key Technical Requirements: 1) Data encryption at rest (LUKS/dm-crypt) and in transit (TLS 1.3). 2) Data anonymization/pseudonymization. 3) Right to erasure — ability to delete specific user data across all systems. 4) Audit logging of all data access. 5) Data residency — keep EU user data in EU data centers. 6) Breach notification within 72 hours — requires comprehensive monitoring.
GDPRComplianceData Privacy
Q186 How do you perform capacity planning for a Linux-based infrastructure?
Methodology: 1) Baseline current usage (CPU, memory, disk, network). 2) Analyze growth trends (linear, exponential, seasonal). 3) Model future demand with headroom (typically 30-50% buffer). 4) Plan for peak (Black Friday, product launches). 5) Right-size instances — most companies over-provision by 40% on average. Tools: Prometheus + Grafana for trend analysis, sar for historical data.
Capacity PlanningInfrastructureStrategy
Q187 How do you implement a message broker cluster with Apache Pulsar?
# Pulsar separates serving (brokers) from storage (BookKeeper)
# Enables independent scaling, multi-tenancy, geo-replication
# Used by Yahoo, Verizon, Splunk for trillion+ messages/day
PulsarMessagingScale
Q188 How do you use systemd sandboxing features for service security?
[Service]
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
NoNewPrivileges=yes
ReadOnlyPaths=/etc/myapp
ReadWritePaths=/var/lib/myapp
# systemd can isolate services without Docker — built-in security
systemdSandboxingSecurity
Q189 How do you use BPF CO-RE for portable eBPF programs?
CO-RE (Compile Once, Run Everywhere) allows eBPF programs to run across different kernel versions without recompilation. Uses BTF (BPF Type Format) for type information. Essential for distributing eBPF tools as binaries.
BPF CO-REeBPFPortability
Q190 How do you design a hybrid cloud architecture with consistent security policies?
Key Components: 1) Unified identity (LDAP/AD + cloud IAM federation). 2) Consistent firewall policies via IaC (Terraform for cloud, Ansible for on-prem). 3) Centralized logging (ELK stack spanning both). 4) VPN/Direct Connect for secure interconnect. 5) Container orchestration (OpenShift/Rancher) that spans on-prem and cloud. Business: Hybrid cloud provides flexibility — keep sensitive data on-prem while bursting to cloud for peak loads.
Hybrid CloudArchitectureSecurity
Q191 How do you implement a service level objective (SLO) monitoring system?
SLO: Target for service reliability (e.g., 99.9% availability). SLI: Metric measured (e.g., successful requests / total requests). Error Budget: 1 - SLO = allowable failures (0.1% for 99.9% SLO). Implementation: Prometheus recording rules for SLI, Grafana dashboards, alert when error budget burn rate exceeds threshold. Business: SLOs align engineering with business expectations — prevents over-engineering (99.999% when 99.9% is sufficient).
SLOSREMonitoring
Q192 How do you use user namespaces for rootless containers?
# Podman supports rootless containers out of the box
podman run --user 1000:1000 -d nginx
# Root inside container maps to unprivileged user outside
# Eliminates risk of container escape to host root
RootlessContainersSecurity
Q193 How do you configure Linux for real-time audio/video processing?
# Install real-time kernel
sudo apt install linux-image-rt-amd64
# Set thread priorities
chrt -f 80 ./audio_process
# Use ALSA with mmap for low-latency audio
Real-TimeAudioKernel
Q194 How do you implement a distributed key-value store with etcd?
Q195 How do you use the kernel's ftrace for function-level tracing?
echo function > /sys/kernel/debug/tracing/current_tracer
echo "kfree_skb" > /sys/kernel/debug/tracing/set_ftrace_filter
cat /sys/kernel/debug/tracing/trace
ftraceKernelTracing
Q196 How do you design a CDN using Linux and open-source tools?
# Nginx + Varnish for caching
# GeoDNS (PowerDNS) for routing users to nearest PoP
# BGP Anycast for IP-level routing
# Rsync/Lsyncd for content replication
CDNOpen SourceArchitecture
Q197 How do you implement disk encryption with LUKS and manage keys?
cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted_volume
mkfs.ext4 /dev/mapper/encrypted_volume
# Key management: Store master key in HSM or remote key server (Tang/Clevis)
LUKSEncryptionSecurity
Q198 How do you use the perf subsystem for hardware performance counters?
perf stat -e cache-misses,cache-references,branch-misses ./app
perf record -e intel_pt// ./app # Intel PT for cycle-accurate tracing
perfHardware CountersProfiling
Q199 How do you configure Linux for HPC (High-Performance Computing) workloads?
# Use Mellanox OFED for InfiniBand
# Configure SLURM for job scheduling
# Use Lustre/GPFS for parallel filesystem
# Enable huge pages, CPU frequency scaling governor=performance
HPCInfiniBandSLURM
Q200 How do you implement a multi-factor authentication system with PAM?
sudo apt install libpam-google-authenticator
# Configure /etc/pam.d/sshd
auth required pam_google_authenticator.so
# Users get TOTP codes via authenticator app + password
MFAPAMSecurity
Q201 How do you use namespace manipulation for advanced container isolation?
unshare --pid --net --mount --fork --ipc --uts /bin/bash
# Creates isolated namespaces manually
# The building blocks of Docker/LXC
NamespacesContainersIsolation
Q202 How do you implement a Git server with Gitea?
# Self-hosted Git (lightweight, Go-based)
docker run -d -p 3000:3000 gitea/gitea
# Alternative to GitHub/GitLab for internal use
# Saves $21/user/month vs GitHub Enterprise
GiteaGitSelf-Hosted
Q203 How do you configure Linux for deep learning model training with multiple GPUs?
# Install NVIDIA drivers + CUDA + cuDNN
nvidia-smi # Verify GPU availability
# Use NCCL for multi-GPU communication
# PyTorch: model = nn.DataParallel(model) or DistributedDataParallel
GPUDeep LearningCUDA
Q204 How do you use systemd timers as a cron replacement?
# /etc/systemd/system/backup.timer
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
# More features than cron: random delays, monotonic timers, dependencies
systemdTimersScheduling
Q205 How do you use netfilter hooks for custom firewall logic?
# Write kernel module that registers netfilter hook
nf_register_net_hook(&init_net, &my_hook);
# Hook at NF_INET_PRE_ROUTING, NF_INET_POST_ROUTING, etc.
# Can inspect/modify/drop packets at kernel level
NetfilterKernelFirewall
Q206 How do you use the Linux kernel's KVM for nested virtualization?
# Enable nested KVM
echo "options kvm-intel nested=1" > /etc/modprobe.d/kvm-intel.conf
# Verify
cat /sys/module/kvm_intel/parameters/nested # Should show Y
KVMNested VirtualizationHypervisor
Q207 How do you implement a sidecar pattern in Kubernetes for logging?
# Pod with app container + Fluentd sidecar
# App writes to shared emptyDir volume
# Fluentd reads and ships to Elasticsearch
# Pattern: Separation of concerns — app doesn't know about logging infrastructure
SidecarKubernetesLogging
Q208 How do you perform a rolling kernel upgrade across a server fleet?
# Strategy:
# 1. Upgrade kernel on 5% of fleet (canary)
# 2. Monitor for 48 hours (crash rate, performance)
# 3. If healthy, upgrade 25% batches every 2 hours
# 4. Use ksplice/livepatch for critical security patches (no reboot)
# 5. Have rollback plan: grub set-default previous kernel
KernelRolling UpgradeFleet Management
Q209 How do you use the kernel's cgroup v2 for I/O throttling?
echo "8:0 wbps=104857600" > /sys/fs/cgroup/myapp/io.max
# Limits writes to 100MB/s on device 8:0
# Prevents noisy neighbor problem in multi-tenant systems
cgroups v2I/OThrottling
Q210 How do you design a service that handles 1 million WebSocket connections on a single server?
# Kernel tuning (see Q112) + application architecture
# Use epoll/kqueue for event-driven I/O
# Minimize per-connection memory (goal: <10KB per connection)
# 1M connections * 10KB = 10GB RAM — feasible on a single 32GB server
# Test with: https://github.com/ericmoritz/wsdemo
WebSocket1M ConnectionsScaling
Q211 How do you use TPM (Trusted Platform Module) for measured boot?
# TPM stores hashes of boot components
# On boot, compare PCR values against known good values
# Detect tampering — if BIOS/bootloader/kernel is modified, alert
# Used with LUKS for automatic disk decryption only if boot chain is trusted
TPMMeasured BootSecurity
Q212 How do you implement a data pipeline with Apache NiFi on Linux?
# NiFi provides visual dataflow programming
# Drag-and-drop processors for ingest, transform, route, store
# Built-in backpressure, prioritization, provenance tracking
NiFiData PipelineETL
Q213 How do you use the kernel's DAMON for memory access monitoring?
DAMON (Data Access MONitor) (kernel 5.15+) monitors memory access patterns with minimal overhead. Can identify cold memory regions for proactive reclaim or tiering. Used with DAMOS for automated memory management — migrate cold pages to slower storage automatically.
DAMONMemoryKernel
Q214 How do you configure Linux for confidential computing with AMD SEV/Intel TDX?
Confidential Computing encrypts VM memory so even the hypervisor can't read it. AMD SEV, Intel TDX. Enables running sensitive workloads in untrusted cloud environments. Business: Financial institutions can run trading algorithms in the public cloud without exposing data to the cloud provider.
Confidential ComputingSEVSecurity
Q215 How do you build a custom Linux distribution for embedded/IoT?
# Yocto Project — build custom distro
bitbake core-image-minimal
# Configure kernel, packages, init system
# Create bootable image for target device
# Used by automotive, industrial IoT, smart devices
YoctoEmbeddedIoT
AI-Oriented & Modern Trends in Linux Administration (2026)
AI/ML · Cloud-Native · Future
Q216 How is AI changing Linux server administration in 2026?
AI-Driven Operations (AIOps):
Predictive Scaling: AI models analyze traffic patterns and pre-scale infrastructure 30 minutes before demand spikes.
Anomaly Detection: ML models on metrics/logs detect subtle anomalies humans miss — a 0.5% increase in disk latency that precedes a drive failure.
Automated Root Cause Analysis: When an incident occurs, AI correlates logs across 1000+ services and suggests the most likely root cause.
Self-Healing: AI agents diagnose and fix common issues (restart service, clear cache, scale up) without human intervention.
ChatOps with LLMs: Natural language interface to infrastructure — "Show me all servers with CPU > 80% in the last hour" queries Prometheus via LLM.
Business Impact: Companies using AIOps report 50% reduction in MTTR (Mean Time to Resolution) and 30% reduction in operational costs. The role of Linux admin is evolving from "operator" to "AI supervisor."
AIAIOpsFutureAutomation
Q217 How do you set up GPU-accelerated workloads on Linux for AI/ML training?
# Install NVIDIA drivers
sudo apt install nvidia-driver-550
# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run
# Install cuDNN for deep learning
# Verify
nvidia-smi
python3 -c "import torch; print(torch.cuda.is_available())"
# Set up GPU passthrough for containers
docker run --gpus all -it pytorch/pytorch:latest
Infrastructure: For multi-GPU training, use NCCL (NVIDIA Collective Communications Library). For distributed training across nodes, use Horovod or PyTorch Distributed. Business: Proper GPU setup can reduce model training time from weeks to hours — directly impacting time-to-market for AI products.
GPUCUDAAI TrainingNVIDIA
Q218 How do you deploy and scale LLM (Large Language Model) inference on Linux servers?
Stack Options:
vLLM: High-throughput LLM serving with PagedAttention — 24x throughput vs naive implementations.
Ollama: Easy local LLM deployment — ollama run llama3 for instant inference.
llama.cpp: CPU-optimized inference with quantization (4-bit models run on 32GB RAM).
Text Generation Inference (TGI): HuggingFace's production-grade server.
# Deploy with vLLM
docker run --gpus all -p 8000:8000 \
vllm/vllm-openai:latest \
--model mistralai/Mixtral-8x7B-Instruct-v0.1
Infrastructure Requirements: For a 70B parameter model: 4x A100 (80GB) GPUs for FP16, or 1x A100 for 4-bit quantized. Business: Self-hosting LLMs saves $0.002-0.01 per token vs API providers — for 1B tokens/month, that's $2,000-$10,000 monthly savings.
LLMInferencevLLMAI Deployment
Q219 How do you use MLOps tools (Kubeflow, MLflow) on Linux for ML lifecycle management?
Kubeflow: ML workflow orchestration on Kubernetes. Pipelines for training, hyperparameter tuning, and serving. MLflow: Experiment tracking, model registry, and deployment. Linux Admin Role: Set up Kubernetes cluster with GPU support, configure storage (PVCs for datasets), set up monitoring for training jobs, manage resource quotas to prevent ML jobs from starving production services.
MLOpsKubeflowMLflowAI
Q220 How do you secure AI/ML infrastructure on Linux? What are the unique threats?
Unique AI Security Threats:
Model Poisoning: Attacker injects malicious data into training set — model behaves incorrectly on specific inputs.
Model Theft: Unauthorized access to trained model weights (the company's IP).
Adversarial Inputs: Carefully crafted inputs that cause the model to fail.
GPU Memory Attacks: Malicious code in shared GPU environments reading other processes' GPU memory.
Supply Chain: Compromised pre-trained models from public repositories (HuggingFace, PyTorch Hub).
Mitigations: Model encryption at rest, access control for model servers, input validation, GPU isolation (MIG — Multi-Instance GPU), and scanning model files for embedded malware.
AI SecurityModel ProtectionThreats
Q221 How do you use Vector Databases (Pinecone, Weaviate, Milvus) on Linux for RAG applications?
# Deploy Milvus (open-source vector DB)
docker run -d --name milvus \
-p 19530:19530 \
milvusdb/milvus:latest
# Used for semantic search, RAG (Retrieval-Augmented Generation)
# Stores embeddings from LLMs for fast similarity search
Vector DBRAGAI
Q222 How do you implement a CI/CD pipeline for ML models?
ML CI/CD Pipeline: 1) Data validation (Great Expectations). 2) Model training with versioned datasets (DVC). 3) Model evaluation against baseline. 4) Model registry (MLflow). 5) A/B testing deployment (Canary with Istio). 6) Continuous monitoring for data drift and model decay. Business: Automating ML deployment reduces time-to-production from months to days.
ML CI/CDAutomationAI
Q223 How do you monitor GPU metrics on Linux for AI workloads?
nvidia-smi dmon -s pucvmet -d 2
# Prometheus: DCGM (Data Center GPU Manager) exporter
# Grafana dashboard for GPU utilization, memory, temperature
# Alert when GPU memory > 90% or temperature > 80°C
GPU MonitoringDCGMAI
Q224 How do you implement federated learning infrastructure on Linux?
Federated Learning: Train models across decentralized devices without centralizing data. Each node trains locally, shares only model updates (not raw data). Linux Stack: Flower framework, TensorFlow Federated. Use Case: Healthcare — train diagnostic models across hospitals without sharing patient data (HIPAA-compliant).
Federated LearningPrivacyAI
Q225 How do you use Linux containers for reproducible ML environments?
# Dockerfile for reproducible ML
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN pip install torch==2.3.0 transformers==4.40.0
# Pin exact versions for reproducibility
# Use singularity/apptainer for HPC environments
ReproducibilityDockerML
Q226 How do you optimize Linux kernel for AI training workloads?
# Huge pages for large memory allocations
echo 4096 > /proc/sys/vm/nr_hugepages
# Transparent huge pages
echo always > /sys/kernel/mm/transparent_hugepage/enabled
# GPU direct RDMA for multi-node training
# IOMMU passthrough for GPU
# Disable CPU frequency scaling during training
Kernel TuningAI TrainingPerformance
Q227 How do you set up a data lakehouse with Iceberg/Delta Lake on Linux?
# Apache Iceberg on S3-compatible storage (MinIO on Linux)
# Provides ACID transactions on data lake
# Query with Spark, Trino, or Flink
# Business: Combine data warehouse reliability with data lake flexibility
LakehouseIcebergData
Q228 How do you use WebAssembly (Wasm) on Linux servers for edge computing?
# WasmEdge — lightweight WebAssembly runtime
wasmedge run --env PORT=8080 app.wasm
# Faster cold start than containers (microseconds vs seconds)
# 10x smaller than Docker images
# Used for edge AI inference, CDN edge compute
WebAssemblyEdgeWasm
Q229 How do you implement a GitOps workflow with ArgoCD for AI model deployment?
# ArgoCD Application pointing to Git repo with model serving config
# Git commit triggers automatic deployment
# Rollback = git revert
# Business: Audit trail for every model deployment — critical for regulated industries
ArgoCDGitOpsAI
Q230 How do you use eBPF for AI workload observability?
eBPF can trace GPU memory allocations, CUDA kernel launch latency, and data transfer between CPU and GPU — all with zero code changes to ML frameworks. Tools like gpud and custom bpftrace scripts provide unprecedented visibility into AI workloads.
eBPFAI ObservabilityGPU
Q231 How do you implement a feature store (Feast) on Linux for ML?
# Feast — open-source feature store
feast apply
# Manages feature definitions, online/offline serving
# Integrates with Redis for online, BigQuery/Snowflake for offline
Feature StoreMLFeast
Q232 How do you configure Linux for edge AI inference (Jetson, Raspberry Pi)?
# NVIDIA Jetson with JetPack SDK
# TensorRT for optimized inference
# ONNX Runtime for cross-platform
# Use CPU governors, disable unnecessary services for power efficiency
Edge AIJetsonInference
Q233 How do you use Ray for distributed AI workloads on a Linux cluster?
# Ray — distributed computing framework
ray start --head
# Python: @ray.remote decorator for distributed functions
# Ray Train for distributed training, Ray Serve for model serving
RayDistributed AICluster
Q234 How do you implement a model registry with MLflow on Linux?
Q236 How do you implement drift detection for ML models in production?
# Use Evidently AI, Alibi Detect, or NannyML
# Monitor data drift, concept drift, prediction drift
# Trigger retraining when drift exceeds threshold
# Business: Prevents model degradation that could cost millions in wrong predictions
Drift DetectionML MonitoringProduction
Q237 How do you use Apache Airflow for ML pipeline orchestration?
# Airflow DAG for ML pipeline
# Extract data → Validate → Train → Evaluate → Deploy
# Schedule, monitor, retry, alert
# Production-grade ML pipelines need robust orchestration
AirflowOrchestrationML Pipeline
Q238 How do you use the Hugging Face ecosystem on Linux for NLP?
pip install transformers datasets accelerate
# Load any of 200K+ models
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("I love Linux!")
HuggingFaceNLPTransformers
Q239 How do you deploy Stable Diffusion on a Linux server for image generation?
# AUTOMATIC1111 WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
# Requires 8GB+ VRAM GPU
# API mode for production: --api --listen
# Can generate 60+ images/minute on A100
Stable DiffusionImage GenerationAI
Q240 How do you implement a chatbot with RAG on Linux using open-source tools?
# LangChain + ChromaDB + Llama 3
# 1. Ingest documents → chunk → embed → store in vector DB
# 2. Query: embed query → retrieve similar chunks → prompt LLM
# 3. LLM generates answer grounded in retrieved documents
# Business: Internal knowledge base chatbot — answers HR/IT questions instantly
RAGChatbotOpen Source AI
Q241 How do you monitor carbon footprint of AI workloads on Linux?
# CodeCarbon — track CO2 emissions
pip install codecarbon
from codecarbon import EmissionsTracker
tracker = EmissionsTracker()
tracker.start()
# ... training code ...
emissions = tracker.stop()
# Business: ESG compliance, green AI initiatives
Green AICarbonSustainability
Q242 How do you use LakeFS for data versioning in ML pipelines?
# lakeFS — Git-like versioning for data lakes
lakectl branch create experiment-1
# Experiment on branch without affecting production data
# Merge if successful — rollback if not
# Business: Reproducible ML experiments with data lineage
LakeFSData VersioningML
Q243 How do you implement real-time model serving with Triton Inference Server?
# NVIDIA Triton — enterprise model serving
docker run --gpus all -p 8000:8000 nvcr.io/nvidia/tritonserver:24.04-py3
# Supports TensorRT, ONNX, PyTorch, TensorFlow, Python models
# Dynamic batching, model ensembles, GPU metrics
TritonModel ServingNVIDIA
Q244 How do you use JupyterHub on Linux for collaborative data science?
# JupyterHub — multi-user Jupyter notebook server
sudo apt install jupyterhub
# Docker spawner for isolated environments per user
# Integrate with LDAP for enterprise auth
# Business: 50 data scientists sharing GPU resources efficiently
JupyterHubData ScienceCollaboration
Q245 How do you use Prefect for modern workflow orchestration?
# Prefect — Python-native workflow engine
from prefect import flow, task
@task
def extract(): return data
@flow
def ml_pipeline():
data = extract()
# Modern alternative to Airflow with better Python support
PrefectOrchestrationPython
Q246 How do you implement model quantization for efficient inference on CPU?
# llama.cpp — 4-bit quantization
./quantize model-f16.gguf model-q4.gguf q4_K_M
# 70B model: 140GB → 40GB — runs on consumer hardware
# GGUF format for CPU inference
# Business: Run LLMs without $30K GPU servers
QuantizationCPU InferenceOptimization
Q247 How do you use DVC (Data Version Control) for ML data management?
dvc init
dvc add dataset/
git add dataset.dvc .gitignore
git commit -m "Add dataset v1"
# DVC tracks data versions in Git while storing data in S3/GCS
# Reproducible ML pipelines with dvc.yaml
DVCData VersioningML
Q248 How do you implement a private container registry for AI images?
# Harbor — enterprise container registry
docker run -d -p 443:443 harbor/harbor
# Vulnerability scanning, image signing, RBAC
# Store custom ML images with pre-installed CUDA, frameworks
HarborRegistryAI
Q249 How do you use K3s for lightweight Kubernetes on edge for AI?
curl -sfL https://get.k3s.io | sh -
# 40MB binary, runs on Raspberry Pi
# Perfect for edge AI deployments
# Manage edge devices like cloud servers
K3sEdge AIKubernetes
Q250 What is the future of Linux server administration with AI agents?
2026-2030 Vision: AI agents will handle 80% of routine Linux administration tasks — patching, scaling, troubleshooting common issues. Humans will focus on architecture, security strategy, and AI supervision. The Linux admin becomes an "AI Operations Engineer" — training AI models on infrastructure data, writing prompts for infrastructure automation, and handling complex edge cases. Skills to develop: AI/ML fundamentals, prompt engineering for infrastructure, eBPF, and distributed systems architecture. The role isn't disappearing — it's evolving to a higher level of abstraction.
FutureAI AgentsCareer Evolution
🚀 Ready to Land Your Dream Linux Admin Job?
Access 5,000+ curated interview questions, AI-powered mock interviews, real-world scenarios, and hands-on labs — completely free on FreeLearning365.
Md Mominul Islam
S/W Development Lead | Project Mgmt | DBA | Data Engineering
Microsoft-certified professional with 16+ years experience delivering 40+ enterprise solutions across diverse industries.
No comments:
Post a Comment
Thanks for your valuable comment...........
Md. Mominul Islam