250+ Linux Server Admin Interview Q&A | Beginner to Most-Expert (2026) | FreeLearning365

📋 2026 Ultimate Guide

Master Linux Server Administration:
250+ Real-World Interview Q&A

From Beginner to Most-Expert — AI-Oriented, Cloud & On-Prem, Business Problem-Solving Approach. Includes Hands-On Labs, Scenarios & Code Exercises.

250+

Questions

Experience Levels

30+

Lab Scenarios

50+

Code Exercises

Beginner Level — Linux Server Administration

0–2 Years Experience

Q1 What is Linux and why is it preferred for server environments?

Business Perspective: Linux is an open-source, Unix-like operating system kernel first released by Linus Torvalds in 1991. Organizations choose Linux for servers because it offers zero licensing costs, unparalleled stability, robust security, and massive community support. For a startup running 50 servers, choosing Linux over Windows Server can save $40,000–$100,000+ annually in licensing alone.

Key Advantages:

Cost Efficiency: No per-core or per-user licensing fees (Red Hat offers paid support but CentOS Stream/AlmaLinux/Rocky Linux are free)
Stability: Linux servers routinely achieve 99.999% uptime with proper configuration
Security: Open-source code means vulnerabilities are discovered and patched rapidly by the global community
Performance: Minimal overhead — a basic Linux server install uses ~512MB RAM vs 2–4GB for Windows Server
Automation-Friendly: Everything is scriptable via Bash, making DevOps and CI/CD seamless

FundamentalsBusiness CaseCost Analysis

Q2 How do you check the current Linux kernel version and distribution details?

Use multiple commands for comprehensive system identification:

# Kernel version
uname -r
# Output: 6.8.0-45-generic

# Full system info
uname -a
# Output: Linux hostname 6.8.0-45-generic #46-Ubuntu SMP x86_64 GNU/Linux

# Distribution details (works on most distros)
cat /etc/os-release
lsb_release -a

# For Red Hat based systems
cat /etc/redhat-release

# Detailed kernel parameters
cat /proc/version

Interview Tip: Knowing /etc/os-release is crucial as it's the modern standard across all major distributions. In a business context, you need this to verify compliance with vendor support matrices.

CommandsSystem Info

Q3 Explain the Linux file system hierarchy. Why is understanding it critical for server administration?

Business Impact: Misplacing application files in wrong directories can break backup scripts, cause security audits to fail, and create operational chaos. The FHS (Filesystem Hierarchy Standard) ensures consistency.

/          # Root — everything starts here
/bin       # Essential user binaries (ls, cp, mv)
/sbin      # System binaries (fdisk, mount, iptables) — often needs root
/etc       # Configuration files — THE most critical directory for admins
/var       # Variable data — logs (/var/log), databases, email queues
/home      # User home directories
/root      # Root user's home
/tmp       # Temporary files — cleared on reboot (often)
/usr       # User-installed software, libraries
/proc      # Virtual filesystem — kernel & process info in real-time
/sys       # Virtual filesystem — device & driver info
/dev       # Device files
/boot      # Boot loader files, kernel images
/opt       # Optional/third-party software packages
/mnt & /media  # Mount points

Real Scenario: A junior admin once stored application logs in /tmp. After a server reboot (routine patching), all logs were lost and the security team couldn't investigate an incident. Always use /var/log for persistent logs.

FHSFile SystemBest Practice

Q4 How do you create, modify, and delete users from the command line?

# Create a new user with home directory
sudo useradd -m -s /bin/bash john_doe

# Set password
sudo passwd john_doe

# Create user with specific UID, group, and expiry
sudo useradd -m -u 1500 -g developers -e 2026-12-31 jane_doe

# Modify user — add to supplementary group
sudo usermod -aG docker,sudo john_doe

# Lock / unlock account
sudo usermod -L john_doe   # Lock
sudo usermod -U john_doe   # Unlock

# Delete user (keep home dir)
sudo userdel john_doe

# Delete user AND home directory
sudo userdel -r john_doe

# List all users
cat /etc/passwd
getent passwd

Business Scenario: When offboarding an employee, you must lock the account immediately (usermod -L), backup their home directory, then delete after 30 days per HR policy. Automate this with a script that integrates with your HR system.

User ManagementSecurityOnboarding/Offboarding

Q5 What are file permissions in Linux? Explain numeric and symbolic modes.

Core Concept: Every file has three permission sets — Owner (u), Group (g), Others (o). Each set: Read (r=4), Write (w=2), Execute (x=1).

# Symbolic mode
chmod u+rwx,g+rx,o-rwx script.sh   # Owner: rwx, Group: rx, Others: nothing
chmod g-w file.txt                   # Remove write from group
chmod a+x script.sh                  # Add execute for all (a = u+g+o)

# Numeric mode (most common in scripts)
chmod 755 script.sh   # rwxr-xr-x (Owner:7, Group:5, Others:5)
chmod 644 file.txt    # rw-r--r-- (Owner:6, Group:4, Others:4)
chmod 600 id_rsa      # rw------- (SSH private key — CRITICAL)
chmod 777 dangerous   # rwxrwxrwx — NEVER use on production servers!

# Common production patterns:
# Configuration files: 640 (owner read-write, group read)
# Executable scripts: 750 (owner full, group read-execute)
# Web content: 644 (world-readable, owner-writable)
# SSH keys: 600 (owner-only read-write)

Audit Impact: During a PCI-DSS audit, finding a file with 777 permissions containing sensitive data is an automatic finding. Use find / -perm /o=w -type f 2>/dev/null to locate world-writable files.

PermissionsSecurity Auditchmod

Q6 How do you manage services using systemctl? Give examples for a web server.

# Start / Stop / Restart
sudo systemctl start nginx
sudo systemctl stop nginx
sudo systemctl restart nginx
sudo systemctl reload nginx    # Graceful reload (no downtime)

# Enable on boot / Disable
sudo systemctl enable nginx
sudo systemctl disable nginx

# Status and logs
sudo systemctl status nginx
journalctl -u nginx -f          # Follow logs in real-time
journalctl -u nginx --since "1 hour ago"

# List all services
systemctl list-units --type=service --state=running

# Mask (prevent service from being started)
sudo systemctl mask unwanted-service

Business Scenario: During a production deployment at 3 AM, you need to reload Nginx config without dropping connections. Use systemctl reload nginx — it tests config first (nginx -t) and applies changes gracefully. Always test with nginx -t before reloading!

systemdService ManagementNginx

Q7 How do you monitor disk usage and find large files consuming space?

# Overall disk usage (human-readable)
df -h
# Output: Filesystem  Size  Used Avail Use% Mounted on
#         /dev/sda1    50G   38G   9.3G  81% /

# Directory usage summary
du -sh /var/*
du -h --max-depth=1 /home | sort -rh | head -20

# Find largest files (top 20)
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -k5 -rh | head -20

# Find files older than 90 days and larger than 1GB
find /var/log -type f -mtime +90 -size +1G

# Check inode usage (critical! — running out of inodes = can't create files)
df -i

Real Incident: A production server went down because /var/log filled up. The application couldn't write logs and crashed. Solution: Implement logrotate and set up monitoring alerts at 80% disk usage. Business cost: 2 hours of downtime = ~$20,000 for an e-commerce site.

Disk ManagementMonitoringIncident Response

Q8 What is the difference between a process and a service (daemon)?

Process: Any running instance of a program. Has a PID, consumes CPU/memory, can be foreground or background. Created when you run ls, vim, or any command.

Daemon (Service): A background process that runs continuously, usually started at boot. Examples: sshd, nginx, mysqld. Managed by systemd (or init). Daemons detach from the terminal, often run as specific users, and have restart policies.

# View processes
ps aux
top
htop

# View daemons/services
systemctl list-units --type=service

# Key difference: A daemon survives terminal closure; a foreground process dies

ProcessesDaemonsFundamentals

Q9 How do you install, update, and remove packages on Debian-based vs Red Hat-based systems?

# Debian/Ubuntu (apt)
sudo apt update                    # Refresh package index
sudo apt upgrade                   # Upgrade all packages
sudo apt install nginx             # Install
sudo apt remove nginx              # Remove (keep configs)
sudo apt purge nginx               # Remove completely
sudo apt autoremove                # Clean orphaned dependencies
apt list --installed               # List installed packages

# Red Hat/CentOS/Rocky/Alma (dnf/yum)
sudo dnf check-update
sudo dnf upgrade
sudo dnf install nginx
sudo dnf remove nginx
sudo dnf autoremove
dnf list installed

# Universal: Snap & Flatpak
sudo snap install certbot
flatpak install flathub org.app.Name

Business Note: Always test upgrades in a staging environment first. A production apt upgrade that pulls a broken kernel can cause extended downtime. Use canary deployments — upgrade 10% of servers, monitor for 24 hours, then proceed.

Package Managementaptdnf

Q10 How do you check network connectivity and troubleshoot basic network issues?

# Check IP configuration
ip addr show
ifconfig -a          # Legacy, but still used

# Test connectivity
ping -c 4 google.com
ping -c 4 8.8.8.8    # Test without DNS dependency

# DNS resolution
nslookup example.com
dig example.com
host example.com

# Trace route
traceroute google.com
mtr google.com       # My favorite — combines ping + traceroute

# Check open ports
ss -tlnp             # Listening TCP ports
ss -tunap            # All connections
netstat -tlnp        # Legacy alternative

# Check firewall
sudo iptables -L -n
sudo ufw status

# Download test
curl -I https://example.com
wget --spider https://example.com

Troubleshooting Flow: 1) Check IP config → 2) Ping gateway → 3) Ping external IP → 4) DNS resolution → 5) Check firewall → 6) Check application logs. This systematic approach saves hours vs random guessing.

NetworkingTroubleshootingDiagnostics

Q11 Explain the difference between absolute and relative paths with business context.

Absolute Path: Starts from root /. Always resolves to the same location regardless of current directory. Example: /etc/nginx/nginx.conf

Relative Path: Relative to current working directory. Example: ../logs/app.log (go up one level, then into logs).

Business Risk: Using relative paths in cron jobs or scripts run by different users can lead to catastrophic errors. A script that uses rm -rf ./temp/* run from the wrong directory could delete critical data. Always use absolute paths in production scripts and cron jobs.

PathsScripting SafetyBest Practice

Q12 How do you use grep, awk, and sed for log analysis? Give a real-world example.

# grep: Find all 500 errors in nginx access log
grep " 500 " /var/log/nginx/access.log | wc -l

# awk: Extract IPs with most requests (top 10)
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# sed: Replace IP addresses for anonymization before sharing logs
sed 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/[REDACTED]/g' access.log

# Combined: Find top 5 URLs returning 500 errors
grep " 500 " access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -5

Business Value: During an incident, quickly identifying which endpoint is failing helps the dev team focus their fix. This skill saved my team 45 minutes during a critical outage.

Log Analysisgrepawksed

Q13 What are soft links (symlinks) and hard links? When would you use each in production?

# Soft link (symbolic link) — pointer to file path
ln -s /opt/app/current/releases/v2.5 /opt/app/latest
# If original is deleted, symlink breaks (dangling link)

# Hard link — additional name for same inode
ln /data/important.db /backup/important.db
# Both point to same data; delete one, data persists via the other

Production Use: Symlinks are used for zero-downtime deployments. Deploy new code to /opt/app/releases/v2.6, then update symlink /opt/app/current → v2.6. Nginx points to /opt/app/current. Rollback is instant — just point symlink back to v2.5.

SymlinksDeploymentZero-Downtime

Q14 How do you schedule tasks with cron? Share a business example.

# Crontab format: MIN HOUR DOM MON DOW COMMAND
# Edit crontab
crontab -e

# Examples:
# Daily database backup at 2 AM
0 2 * * * /usr/local/bin/backup-db.sh >> /var/log/backup.log 2>&1

# Every 5 minutes — health check
*/5 * * * * /opt/scripts/health-check.sh

# Every Monday at 3 AM — log rotation
0 3 * * 1 /usr/sbin/logrotate /etc/logrotate.conf

# List cron jobs
crontab -l

# System-wide cron
cat /etc/crontab
ls /etc/cron.d/
ls /etc/cron.daily/

Business Scenario: An e-commerce company runs nightly cron jobs to generate sales reports. If the cron fails silently (no error logging), the finance team misses data. Always redirect output to a log file and set up cron monitoring (e.g., Cronitor, Healthchecks.io) to alert on failures.

CronAutomationScheduling

Q15 How do you check memory usage and identify processes consuming the most RAM?

# Overall memory
free -h
# Output: total  used  free  shared  buff/cache  available

# Top processes by memory
ps aux --sort=-%mem | head -15
top -o %MEM    # Interactive, press 'M' to sort by memory

# Detailed per-process
cat /proc/meminfo
smem -rs memory   # More accurate (includes shared memory)

# Check for memory leaks
watch -n 2 'ps aux --sort=-%mem | head -10'

Incident Example: A Java application had a memory leak — %MEM grew from 15% to 85% over 4 days. Using ps aux --sort=-%mem identified the PID, and pmap -x <PID> showed heap growth. The dev team fixed the leak, but the immediate fix was a nightly restart via cron until the patch deployed.

MemoryPerformanceTroubleshooting

Q16 What is the purpose of /etc/fstab and how do you configure auto-mount at boot?

# /etc/fstab format:
# DEVICE    MOUNT_POINT  FS_TYPE  OPTIONS        DUMP  PASS
UUID=abc123 /data        ext4     defaults,noatime 0   2
//nas/backup /mnt/backup cifs    credentials=/etc/samba/creds,uid=1000,gid=1000 0 0

Business Risk: An incorrect fstab entry can prevent the system from booting. Always test with mount -a before rebooting. Use nofail option for non-critical mounts so the system boots even if that mount fails.

fstabMountBoot

Q17 How do you redirect output and errors in shell scripts?

# Standard output to file (overwrite)
command > file.txt
# Append
command >> file.txt
# Standard error to file
command 2> error.log
# Both stdout and stderr to same file
command > all_output.log 2>&1
# Discard output
command > /dev/null 2>&1
# Separate files for stdout and stderr
command > output.log 2> error.log

ShellRedirectionScripting

Q18 Explain the Linux boot process step-by-step.

1. BIOS/UEFI: Firmware runs POST, selects boot device.
2. Boot Loader (GRUB2): Loads kernel image and initramfs into memory.
3. Kernel: Initializes hardware, mounts root filesystem (read-only initially).
4. initramfs: Temporary root filesystem with essential drivers.
5. systemd (PID 1): First userspace process, mounts filesystems, starts services.
6. Target/runlevel: Reaches multi-user.target or graphical.target.

Troubleshooting: If a server won't boot, use a live CD/USB, chroot into the system, check /var/log/boot.log and journalctl -b.

Boot ProcessTroubleshootingGRUB

Q19 How do you use tar to create and extract archives? Include compression options.

# Create tar.gz (gzip compressed)
tar -czvf archive.tar.gz /path/to/directory

# Create tar.bz2 (bzip2 — better compression, slower)
tar -cjvf archive.tar.bz2 /path/to/directory

# Create tar.xz (xz — best compression)
tar -cJvf archive.tar.xz /path/to/directory

# Extract
tar -xzvf archive.tar.gz
tar -xjvf archive.tar.bz2
tar -xJvf archive.tar.xz

# List contents without extracting
tar -tzvf archive.tar.gz

# Extract to specific directory
tar -xzvf archive.tar.gz -C /target/path/

Business Use: Pre-deployment backups. Before deploying, tar -czvf /backup/pre_deploy_$(date +%Y%m%d_%H%M%S).tar.gz /opt/app. This creates a timestamped backup for instant rollback.

tarCompressionBackup

Q20 What is SSH and how do you configure key-based authentication?

# Generate SSH key pair (Ed25519 — modern, secure)
ssh-keygen -t ed25519 -C "admin@company.com"

# Copy public key to server
ssh-copy-id user@server.example.com

# Manual method
cat ~/.ssh/id_ed25519.pub | ssh user@server "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

# Secure SSH config (/etc/ssh/sshd_config)
PermitRootLogin prohibit-password
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3

Security: Disable password authentication entirely. All it takes is one weak password for a breach. Key-based auth + fail2ban reduces brute-force risk by 99.9%.

SSHSecurityAuthentication

Q21 How do you find and kill a process? Explain signals.

# Find process
ps aux | grep nginx
pgrep nginx
pidof nginx

# Kill by PID
kill 1234        # SIGTERM (graceful, default)
kill -15 1234    # Same as above
kill -9 1234     # SIGKILL (force — last resort!)
kill -HUP 1234   # SIGHUP (reload config)

# Kill by name
pkill nginx
killall nginx

# Kill all processes by user
pkill -u username

Best Practice: Always try SIGTERM first — it lets the process clean up (close files, finish transactions). SIGKILL is like pulling the power cord; use only when SIGTERM fails. For databases, SIGKILL can corrupt data.

Process ManagementSignalskill

Q22 Explain the difference between apt, apt-get, and aptitude.

apt: Modern, user-friendly frontend (Ubuntu 16.04+). Combines most-used apt-get/apt-cache commands. Has progress bars, color output. Best for interactive use.
apt-get: Lower-level, stable CLI. Best for scripts — output format is consistent across versions.
aptitude: Full-featured package manager with ncurses GUI and advanced dependency resolution. Useful for complex dependency conflicts.

Scripting Rule: Always use apt-get in scripts. apt warns: "WARNING: apt does not have a stable CLI interface. Use with caution in scripts."

Package ManagementaptScripting

Q23 How do you configure and use sudo? What is the sudoers file?

# Edit sudoers safely (ALWAYS use visudo)
sudo visudo

# Grant user full sudo access
john_doe ALL=(ALL:ALL) ALL

# Grant group sudo access
%developers ALL=(ALL:ALL) ALL

# Passwordless sudo for specific command
%devops ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart nginx

# View your sudo privileges
sudo -l

Security: Never edit /etc/sudoers directly — a syntax error can lock all users out of sudo. Always use visudo which validates syntax before saving. Grant least privilege — only the commands needed.

sudoSecurityPrivilege Management

Q24 How do you manage environment variables? Differentiate between session, user, and system-wide.

# Session-level (current shell only)
export APP_ENV=production
echo $APP_ENV

# User-level (~/.bashrc, ~/.bash_profile, ~/.profile)
echo 'export JAVA_HOME=/usr/lib/jvm/java-17' >> ~/.bashrc
source ~/.bashrc

# System-wide (/etc/environment, /etc/profile.d/)
echo 'APP_ENV=production' | sudo tee -a /etc/environment

# For systemd services
# In service file: Environment="APP_ENV=production"
# Or use EnvironmentFile=/etc/app/config.env

Business Scenario: A production incident occurred when a developer hardcoded API keys in code. The fix: store secrets in environment variables loaded from a secure vault (HashiCorp Vault) at service startup. Never hardcode credentials.

Environment VariablesConfigurationSecurity

🧪 Beginner Hands-On Lab Scenario

Situation: You're a junior admin. The production web server's disk is 92% full. The senior admin is on vacation. You need to free up space immediately without breaking anything.
Your Task: 1) Identify what's consuming space using du -sh /* 2>/dev/null | sort -rh | head -10. 2) Find that /var/log/nginx/access.log is 28GB. 3) Don't delete — truncate it: sudo truncate -s 0 /var/log/nginx/access.log. 4) Set up logrotate to prevent recurrence. 5) Document the incident for the team.
Business Impact: You prevented a potential outage that could have cost $15,000/hour in lost sales.

💻 Code Exercise — Beginner

Write a script that checks disk usage of / and sends an email alert if usage exceeds 80%.

#!/bin/bash
THRESHOLD=80
USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$USAGE" -gt "$THRESHOLD" ]; then
echo "Disk usage is at ${USAGE}% on $(hostname)" | mail -s "Disk Alert" admin@company.com
fi

Q25 What are runlevels/targets in systemd? How do you switch between them?

# View current target
systemctl get-default
# Switch targets
systemctl isolate multi-user.target    # CLI mode
systemctl isolate graphical.target      # GUI mode
# Set default
systemctl set-default multi-user.target
# Emergency mode (single user, minimal)
systemctl rescue

systemdTargetsRunlevels

Q26 How do you set up a basic firewall with ufw (Uncomplicated Firewall)?

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 80/tcp    # HTTP
sudo ufw allow 443/tcp   # HTTPS
sudo ufw enable
sudo ufw status verbose

FirewallUFWSecurity

Q27 Explain the difference between TCP and UDP. When is UDP preferred?

TCP: Connection-oriented, guaranteed delivery, ordered, flow control. Used for HTTP, SSH, databases.
UDP: Connectionless, no delivery guarantee, no ordering, lower latency. Used for DNS, video streaming, VoIP, gaming.
Business: A CDN uses UDP for streaming to reduce latency by 40ms vs TCP — that's the difference between buffering and smooth playback.

NetworkingTCP/UDPProtocols

Q28 How do you create and manage LVM (Logical Volume Manager) volumes?

# Create PV, VG, LV
pvcreate /dev/sdb
vgcreate vg_data /dev/sdb
lvcreate -L 50G -n lv_appdata vg_data
mkfs.ext4 /dev/vg_data/lv_appdata
mount /dev/vg_data/lv_appdata /data
# Extend LV (online!)
lvextend -L +20G /dev/vg_data/lv_appdata
resize2fs /dev/vg_data/lv_appdata

Business Value: LVM allows extending volumes without unmounting — critical for databases that can't go offline. Saves hours of planned downtime.

LVMStorageVolume Management

Q29 What is the purpose of /etc/hosts file? How does it relate to DNS?

/etc/hosts provides local hostname-to-IP mapping, checked BEFORE DNS. Used for local development, overriding DNS, or blocking domains (point to 127.0.0.1). Format: 192.168.1.100 app.internal myapp. In production, use it sparingly — DNS is the single source of truth.

DNShostsNetworking

Q30 How do you use rsync for efficient file synchronization?

# Local sync
rsync -avz /source/ /destination/
# Remote sync (push)
rsync -avz /local/dir/ user@remote:/remote/dir/
# Remote sync (pull)
rsync -avz user@remote:/remote/dir/ /local/dir/
# Delete files at dest that don't exist at source
rsync -avz --delete /source/ /destination/
# Dry run (test first!)
rsync -avz --dry-run /source/ /destination/

Business: rsync uses delta-transfer — only transmits changed portions of files. For a 10GB database dump where only 50MB changed, rsync transfers ~50MB. scp would transfer all 10GB. That's 200x bandwidth savings.

rsyncSyncBackup

Q31 How do you check CPU information and load average?

lscpu
cat /proc/cpuinfo | grep "model name" | uniq
uptime   # load average: 1 min, 5 min, 15 min
top
htop

Load average > number of CPU cores = system is overloaded. For a 4-core server, load of 4.0 means 100% utilization; load of 8.0 means processes are queuing.

CPULoadMonitoring

Q32 What is a shell? Compare bash, zsh, and sh.

sh: Original Bourne shell, minimal features, highly portable.
bash: Bourne Again Shell — default on most Linux. Rich features, arrays, command history.
zsh: Extended bash with better autocompletion, theming (oh-my-zsh), spell correction. Popular for dev workstations.
For scripts: Use #!/bin/bash for features or #!/bin/sh for maximum portability across Unix systems.

ShellbashScripting

Q33 How do you configure NTP for time synchronization?

# Using timedatectl (modern)
sudo timedatectl set-ntp true
timedatectl status
# Using chrony
sudo apt install chrony
sudo systemctl enable --now chronyd
chronyc sources -v

Business Critical: Time sync is essential for distributed systems, database replication, and security (Kerberos requires <5 min skew). Log timestamps must be accurate for forensic analysis.

NTPTime Syncchrony

Q34 How do you use journalctl to query systemd logs?

journalctl -u nginx --since "2026-07-01" --until "2026-07-02"
journalctl -p err -b   # Errors from current boot
journalctl -f          # Follow (like tail -f)
journalctl --disk-usage

Loggingjournaldsystemd

Q35 Explain file descriptors (stdin, stdout, stderr) with examples.

FD 0 (stdin): Input stream.
FD 1 (stdout): Normal output.
FD 2 (stderr): Error output.
Redirection: command 1>out.txt 2>err.txt or combined command &>all.txt. Understanding FDs is crucial for debugging cron jobs and pipeline scripts.

File DescriptorsI/OShell

Q36 How do you find the IP address of a server?

ip addr show
hostname -I
curl ifconfig.me   # Public IP
ip route get 1.1.1.1 | awk '{print $7}'   # Primary interface IP

IPNetworkingCommands

Q37 What is swap space? When should you use it?

Swap is disk space used as virtual memory when RAM is full. Modern recommendation: For servers with >16GB RAM, 2-4GB swap is sufficient. Swap is a safety net, not a performance solution. If a server is swapping heavily, add RAM or optimize the application. swapon --show to check.

SwapMemoryPerformance

Q38 How do you change the hostname of a Linux server?

sudo hostnamectl set-hostname new-name.company.com
# Also update /etc/hosts
echo "127.0.1.1 new-name.company.com new-name" | sudo tee -a /etc/hosts

HostnameConfiguration

Q39 How do you check which ports are listening on a server?

ss -tlnp   # TCP listening
ss -ulnp   # UDP listening
lsof -i :80   # What's using port 80?
netstat -tlnp  # Legacy

PortsNetworkingss

Q40 Explain the difference between a shell variable and an environment variable.

Shell variable: Only available in the current shell session. MY_VAR=hello
Environment variable: Passed to child processes. export MY_VAR=hello
Use export to promote a shell variable to an environment variable. Child processes inherit environment variables but not shell variables.

VariablesShellEnvironment

Q41 How do you use scp to securely copy files between servers?

scp file.txt user@remote:/path/
scp -r /local/dir user@remote:/remote/dir/
scp user@remote:/remote/file.txt /local/path/
# Use port 2222
scp -P 2222 file.txt user@remote:/path/

scpFile TransferSSH

Q42 What is /dev/null and what is it used for?

/dev/null is a special device file that discards all data written to it. Used to suppress output: command > /dev/null 2>&1. Also used as an empty input: command < /dev/null. Essential for clean cron job output.

/dev/nullI/OShell

Q43 How do you set up a basic NFS share?

# Server
sudo apt install nfs-kernel-server
echo "/data 192.168.1.0/24(rw,sync,no_subtree_check)" | sudo tee -a /etc/exports
sudo exportfs -a
# Client
sudo mount -t nfs server:/data /mnt/nfs

NFSFile SharingNetwork

Q44 Explain the purpose of the /proc filesystem.

/proc is a virtual filesystem exposing kernel and process information. /proc/cpuinfo, /proc/meminfo, /proc/loadavg, /proc/PID/ for per-process details. Not a real disk — it's a window into the kernel's data structures. Invaluable for performance analysis and debugging.

/procKernelVirtual FS

Q45 How do you use the find command to locate files by name, size, and modification time?

find / -name "*.log" -type f
find /var -size +100M
find /tmp -mtime +7 -delete   # Delete files older than 7 days
find / -user john_doe -type f
find . -name "*.conf" -exec grep -l "error" {} \;

findSearchFile Management

Q46 What is the difference between systemd and SysV init?

SysV init: Sequential boot, shell scripts in /etc/init.d/, slow, limited dependency management.
systemd: Parallel boot, socket activation, cgroups integration, unified logging (journald), faster boot times. systemd is the modern standard on all major distributions. Critics cite complexity, but it solves real enterprise problems.

systemdinitBoot

Q47 How do you compress and decompress files using gzip, bzip2, and xz?

gzip file.txt        # Produces file.txt.gz
gunzip file.txt.gz
bzip2 file.txt       # Better compression
bunzip2 file.txt.bz2
xz file.txt          # Best compression
unxz file.txt.xz

Compressiongzipbzip2xz

Q48 How do you check the status of a service and its logs?

systemctl status nginx
journalctl -u nginx -n 50 --no-pager
tail -f /var/log/nginx/error.log

ServicesLogsMonitoring

Q49 What is the shebang (#!) line in scripts?

The shebang tells the system which interpreter to use. #!/bin/bash, #!/usr/bin/env python3, #!/bin/sh. Without it, the script runs in the caller's current shell, which may have different behavior. Always include it for portability and clarity.

ShebangScriptingBest Practice

Q50 How do you create a systemd service file for a custom application?

# /etc/systemd/system/myapp.service
[Unit]
Description=My Custom Application
After=network.target
[Service]
Type=simple
User=myapp
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/start.sh
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now myapp

systemdServiceCustom App

Q51 Explain the PATH variable and how to modify it.

echo $PATH
# Add to PATH (temporary)
export PATH=$PATH:/opt/custom/bin
# Permanent (user)
echo 'export PATH=$PATH:/opt/custom/bin' >> ~/.bashrc
# System-wide
echo 'export PATH=$PATH:/opt/custom/bin' | sudo tee /etc/profile.d/custom.sh

PATHEnvironmentConfiguration

Q52 How do you monitor real-time system performance?

htop        # Interactive process viewer
iotop       # Disk I/O by process
iftop       # Network bandwidth
nmon        # All-in-one performance monitor
glances     # Modern, web-based monitoring
dstat       # Versatile resource statistics

MonitoringPerformanceTools

Q53 What are inodes and how do you check inode usage?

df -i
# Find directories with many small files
for dir in /*; do echo "$(find "$dir" -type f 2>/dev/null | wc -l) $dir"; done | sort -rn | head -10

Running out of inodes means you can't create new files even if disk space is available. Common culprit: session files in /tmp or cache directories with millions of tiny files.

InodesFile SystemTroubleshooting

Q54 How do you use the man command effectively?

man ls        # Manual page
man -k keyword  # Search (apropos)
man 5 crontab   # Section 5 (file formats)
# Sections: 1=User commands, 5=File formats, 8=Admin commands
whatis ls      # One-line description

manDocumentationHelp

Q55 What steps do you take when a user reports "the server is slow"?

Systematic approach: 1) Check load average (uptime). 2) Check memory (free -h). 3) Check disk I/O (iostat -x 1). 4) Check for swap usage. 5) Identify top CPU/memory consumers (top). 6) Check network (ping, iperf). 7) Review recent changes. 8) Check application logs for errors. This methodical approach impresses interviewers — it shows you don't jump to conclusions.

TroubleshootingPerformanceMethodology

Intermediate Level — Linux Server Administration

2–5 Years Experience

Q56 How do you troubleshoot a server that runs out of disk space overnight?

Business Context: This is the #1 overnight alert for production servers. A 2 AM disk-full alert means the on-call engineer must act fast.

# 1. Quick assessment
df -h
# 2. Find what grew recently
find / -type f -mtime -1 -size +100M -exec ls -lh {} \; 2>/dev/null
# 3. Check largest directories
du -sh /* 2>/dev/null | sort -rh | head -10
# 4. Common culprits:
# - /var/log (unrotated logs)
# - /tmp (session files, uploads)
# - /var/lib/docker (container images)
# - /home (user uploads)
# - Core dump files
find / -name "core.*" -type f -size +1G 2>/dev/null

Immediate Fix: Truncate logs (truncate -s 0), clean package cache (apt clean), remove old Docker images (docker system prune -a). Long-term: Set up logrotate, implement disk monitoring alerts at 75% and 85%, and create a runbook for the on-call team.

Disk SpaceIncident ResponseTroubleshooting

Q57 Explain how to configure and use iptables for a web server. Provide a production-ready ruleset.

#!/bin/bash
# Production iptables for web server
iptables -F
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
# Allow loopback
iptables -A INPUT -i lo -j ACCEPT
# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow SSH (rate limited)
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --set
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 4 -j DROP
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Save rules
iptables-save > /etc/iptables/rules.v4

Business Impact: This ruleset blocks 99% of automated attacks. Rate-limiting SSH prevents brute-force attempts. For PCI-DSS compliance, you must document every open port with a business justification.

iptablesFirewallSecurityProduction

Q58 How do you set up and manage a MySQL/MariaDB database on Linux?

# Install
sudo apt install mariadb-server
sudo mysql_secure_installation
# Create database and user
sudo mysql -e "CREATE DATABASE appdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
sudo mysql -e "CREATE USER 'appuser'@'localhost' IDENTIFIED BY 'strong_password';"
sudo mysql -e "GRANT ALL PRIVILEGES ON appdb.* TO 'appuser'@'localhost';"
sudo mysql -e "FLUSH PRIVILEGES;"
# Backup
mysqldump -u appuser -p appdb | gzip > /backup/appdb_$(date +%Y%m%d).sql.gz
# Restore
gunzip < backup.sql.gz | mysql -u appuser -p appdb

Performance Tuning: Adjust innodb_buffer_pool_size to 70-80% of available RAM for dedicated DB servers. Use mysqltuner for recommendations.

MySQLDatabaseMariaDBBackup

Q59 How do you configure SSL/TLS certificates with Let's Encrypt and automate renewal?

# Install certbot
sudo apt install certbot python3-certbot-nginx
# Obtain certificate
sudo certbot --nginx -d example.com -d www.example.com
# Auto-renewal (certbot adds systemd timer automatically)
sudo certbot renew --dry-run
# Check renewal timer
systemctl status certbot.timer
# Manual renewal test
sudo certbot renew --force-renewal

Business: Let's Encrypt saves $200-500/year per domain vs paid SSL. For a company with 50 domains, that's $10,000-$25,000 annual savings. The 90-day expiry with auto-renewal is actually a security feature — compromised certs expire quickly.

SSL/TLSLet's EncryptSecurityAutomation

Q60 Explain load balancing concepts and how to configure HAProxy.

# /etc/haproxy/haproxy.cfg
frontend web_front
bind *:80
default_backend web_back

backend web_back
balance roundrobin
server web1 192.168.1.10:80 check
server web2 192.168.1.11:80 check
server web3 192.168.1.12:80 check backup
option httpchk GET /health

Algorithms: Round-robin (equal distribution), leastconn (sends to server with fewest connections), source (session persistence by IP hash). Business: HAProxy enables horizontal scaling — add more servers as traffic grows without changing application code.

Load BalancingHAProxyHigh Availability

Q61 How do you use Ansible for configuration management? Share a playbook example.

# playbook.yml
- hosts: webservers
  become: yes
  vars:
  nginx_version: "1.24.0"
  tasks:
  - name: Install nginx
  apt:
  name: nginx={{ nginx_version }}
  state: present
  - name: Deploy config
  template:
  src: nginx.conf.j2
  dest: /etc/nginx/nginx.conf
  notify: restart nginx
  handlers:
  - name: restart nginx
  systemd:
  name: nginx
  state: restarted

Business: Ansible eliminates configuration drift across 100s of servers. A security patch that takes 5 minutes per server manually takes 5 minutes total with Ansible, regardless of server count. ROI is immediate for teams managing 10+ servers.

AnsibleAutomationConfiguration Management

Q62 How do you perform a MySQL database migration with zero downtime?

Strategy: 1) Set up replication from old master to new master. 2) Let replication catch up. 3) Stop writes to old master (brief read-only mode). 4) Verify replication is fully synced. 5) Promote new master. 6) Update application connection strings. 7) Decommission old master after 48 hours of monitoring.
Tools: pt-online-schema-change from Percona Toolkit for schema changes without locking tables. For large tables (100M+ rows), this is the only safe approach.

MySQLMigrationZero DowntimeReplication

Q63 Explain the concept of Linux namespaces and cgroups. How do they enable containerization?

Namespaces: Isolate what a process can SEE. PID namespace (isolated process tree), NET namespace (isolated network stack), MNT namespace (isolated filesystem mounts), UTS namespace (isolated hostname), IPC namespace, USER namespace.
cgroups: Limit what a process can USE. CPU shares, memory limits, block I/O throttling, network priority.
Together: Docker/LXC use namespaces for isolation and cgroups for resource control. Without these kernel features, containers as we know them wouldn't exist.

NamespacescgroupsContainersDocker

🧪 Intermediate Hands-On Lab Scenario

Situation: Your company's e-commerce site is experiencing intermittent 502 errors. Users complain orders are failing. The stack: Nginx → PHP-FPM → MySQL. You have 15 minutes to diagnose before the VP of Engineering escalates.
Your Task: 1) Check Nginx error log: tail -f /var/log/nginx/error.log — see "upstream timed out". 2) Check PHP-FPM status: systemctl status php-fpm — service is running but slow. 3) Check PHP-FPM pool: ss -s shows many connections. 4) Check slow MySQL queries: SHOW FULL PROCESSLIST; — find a query taking 30+ seconds. 5) Kill the blocking query, increase PHP-FPM pm.max_children, and add index to the slow query's table. 6) Document the root cause for the post-mortem.
Result: You resolved the issue in 12 minutes, saved ~$8,000 in potential lost orders, and earned the team's trust.

💻 Code Exercise — Intermediate

Write a Bash script that monitors a process and restarts it if it exceeds 80% CPU for 10 consecutive seconds.

#!/bin/bash
PROCESS_NAME="myapp"
THRESHOLD=80
COUNT=0
while true; do
CPU=$(ps -C "$PROCESS_NAME" -o %cpu --no-headers 2>/dev/null | awk '{print int($1)}')
if [ "$CPU" -gt "$THRESHOLD" ]; then
COUNT=$((COUNT+1))
else
COUNT=0
fi
if [ "$COUNT" -ge 10 ]; then
echo "$(date): Restarting $PROCESS_NAME (CPU: ${CPU}%)" >> /var/log/watchdog.log
systemctl restart "$PROCESS_NAME"
COUNT=0
fi
sleep 1
done

Q64 How do you configure logrotate for application logs?

# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
postrotate
systemctl reload myapp > /dev/null 2>&1
endscript
}

logrotateLog ManagementAutomation

Q65 How do you secure SSH with fail2ban?

sudo apt install fail2ban
# /etc/fail2ban/jail.local
[sshd]
enabled = true
maxretry = 3
bantime = 3600
sudo systemctl enable --now fail2ban
fail2ban-client status sshd

fail2banSSHSecurity

Q66 Explain the difference between RAID levels and when to use each.

RAID 0: Striping, no redundancy, max performance. Use: Temporary data, cache.
RAID 1: Mirroring, 50% usable capacity. Use: OS drives, small databases.
RAID 5: Striping with distributed parity, needs 3+ disks, tolerates 1 failure. Use: General purpose storage.
RAID 6: Like RAID 5 but tolerates 2 failures. Use: Large arrays (8+ disks).
RAID 10: Mirroring + striping, best performance + redundancy. Use: High-performance databases.

RAIDStoragePerformance

Q67 How do you use tcpdump to analyze network traffic?

tcpdump -i eth0 port 80 -nn -A
tcpdump -i any host 192.168.1.100 -w capture.pcap
tcpdump -r capture.pcap -nn | grep "500"

tcpdumpNetwork AnalysisTroubleshooting

Q68 How do you configure a reverse proxy with Nginx?

server {
    listen 80;
    server_name app.example.com;
    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

NginxReverse ProxyWeb Server

Q69 How do you perform a kernel upgrade without rebooting?

Use Ksplice (Oracle), KernelCare (CloudLinux), or Livepatch (Canonical/Ubuntu Pro). These apply security patches to the running kernel without rebooting. Business: For servers requiring 99.999% uptime, live patching eliminates 4-12 planned reboots per year, each requiring maintenance windows and potential service disruption.

KernelLive PatchingUptime

Q70 Explain Docker networking modes: bridge, host, overlay, macvlan.

Bridge: Default, isolated network with NAT. Containers communicate via docker0 bridge.
Host: Container shares host's network stack directly. Best performance, no isolation.
Overlay: Multi-host networking for Swarm/Kubernetes. Uses VXLAN tunneling.
Macvlan: Container gets its own MAC address, appears as physical device on network. Used when containers need direct LAN access.

DockerNetworkingContainers

Q71 How do you monitor server health with Prometheus and Grafana?

# Install node_exporter
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-linux-amd64.tar.gz
tar xzf node_exporter-*.tar.gz
sudo mv node_exporter /usr/local/bin/
# Create systemd service, enable, and add to Prometheus targets
# In Grafana, import dashboard ID 1860 (Node Exporter Full)

PrometheusGrafanaMonitoring

Q72 What is SELinux and how do you troubleshoot it?

# Check status
getenforce
# Temporarily set to permissive (logs but doesn't block)
setenforce 0
# Check audit log for denials
ausearch -m avc -ts recent
sealert -a /var/log/audit/audit.log
# Create policy to allow
audit2allow -a -M mypol
semodule -i mypol.pp

Business: SELinux provides mandatory access control — even if an attacker compromises the web server, SELinux can prevent them from accessing /etc/shadow or spawning a reverse shell. Never disable SELinux in production; troubleshoot and create proper policies.

SELinuxSecurityMAC

Q73 How do you configure a VPN with WireGuard?

# Install
sudo apt install wireguard
# Generate keys
wg genkey | tee privatekey | wg pubkey > publickey
# Config /etc/wireguard/wg0.conf
[Interface]
Address = 10.0.0.1/24
PrivateKey = <server-private-key>
ListenPort = 51820
[Peer]
PublicKey = <client-public-key>
AllowedIPs = 10.0.0.2/32

WireGuard is faster and simpler than OpenVPN — 4,000 lines of code vs 100,000+. Kernel-integrated, lower latency, better battery life for mobile clients.

WireGuardVPNSecurity

Q74 Explain the difference between GitOps and traditional CI/CD deployment.

Traditional CI/CD: CI server pushes changes to servers. The CI server has credentials to production.
GitOps: Git is the single source of truth. An agent (ArgoCD/Flux) running in the cluster pulls changes from Git and reconciles. No external push access needed — more secure. Rollback = git revert. Business: GitOps reduces deployment-related security incidents by 60% according to industry surveys.

GitOpsCI/CDDevOps

Q75 How do you analyze and optimize slow MySQL queries?

# Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
# Analyze with pt-query-digest
pt-query-digest /var/log/mysql/slow.log
# Use EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
# Add index
CREATE INDEX idx_customer_id ON orders(customer_id);

MySQLPerformanceQuery Optimization

Q76 How do you set up Redis for caching and session storage?

sudo apt install redis-server
# /etc/redis/redis.conf
maxmemory 256mb
maxmemory-policy allkeys-lru
requirepass strong_password
bind 127.0.0.1

RedisCachingPerformance

Q77 Explain the CAP theorem and its implications for distributed systems.

Consistency: All nodes see the same data at the same time.
Availability: Every request receives a response.
Partition Tolerance: System continues despite network partitions.
You can only have 2 of 3. Most distributed databases choose AP or CP. Business: Choosing CP (e.g., PostgreSQL with synchronous replication) means the system may be unavailable during a network split. Choosing AP (e.g., Cassandra) means you might serve stale data. The choice depends on whether your business tolerates downtime or stale data.

CAP TheoremDistributed SystemsArchitecture

Q78 How do you use strace to debug a running process?

strace -p <PID> -f -e trace=file,network -o /tmp/strace.log
# Find why a process is hanging
strace -p <PID> -e trace=read,write
# Count system calls
strace -c command

straceDebuggingSystem Calls

Q79 How do you configure centralized logging with the ELK stack?

Elasticsearch: Stores and indexes logs.
Logstash: Processes and transforms logs.
Kibana: Visualizes and searches logs.
Filebeat: Lightweight agent on each server that ships logs. Business: Centralized logging is essential for security compliance (SOC2, PCI-DSS) and enables cross-server correlation during incident investigations.

ELKLoggingObservability

Q80 How do you manage swapiness and kernel parameters via sysctl?

# /etc/sysctl.conf
vm.swappiness=10          # Use swap only when RAM <10% free
vm.dirty_ratio=15         # Max % of RAM for dirty pages
net.core.somaxconn=1024   # Max connections backlog
fs.file-max=65535         # Max open files
# Apply
sudo sysctl -p

sysctlKernel TuningPerformance

Q81 How do you set up a PostgreSQL streaming replication?

# Primary postgresql.conf
wal_level = replica
max_wal_senders = 3
# Standby
pg_basebackup -h primary_host -D /var/lib/postgresql/data -U replicator -P -R
# Start standby — it will continuously replay WAL from primary

PostgreSQLReplicationHigh Availability

Q82 What is the OOM killer and how do you protect critical processes?

# Check OOM score
cat /proc/<PID>/oom_score
# Protect critical process (lower = less likely to be killed)
echo -1000 | sudo tee /proc/<PID>/oom_score_adj
# Or in systemd service:
[Service]
OOMScoreAdjust=-500

OOMMemorysystemd

Q83 How do you perform a security audit of a Linux server?

# Lynis — comprehensive security audit
sudo apt install lynis
sudo lynis audit system
# Check for world-writable files
find / -perm /o=w -type f 2>/dev/null
# Check for files with SUID bit
find / -perm /4000 -type f 2>/dev/null
# Check listening ports
ss -tlnp
# Check for rootkits
sudo apt install rkhunter chkrootkit
sudo rkhunter --check

Security AuditLynisCompliance

Q84 How do you configure SAMBA for file sharing with Windows?

sudo apt install samba
# /etc/samba/smb.conf
[shared]
path = /data/shared
browsable = yes
writable = yes
valid users = @developers
sudo systemctl enable --now smbd

SAMBAFile SharingWindows Integration

Q85 Explain the use of ulimit and how to set resource limits.

ulimit -n 65535     # Max open files
ulimit -u 4096      # Max user processes
# Permanent: /etc/security/limits.conf
myapp soft nofile 65535
myapp hard nofile 65535

ulimitResource LimitsPerformance

Q86 How do you use Git hooks for automated deployment?

# .git/hooks/post-receive
#!/bin/bash
GIT_WORK_TREE=/var/www/app git checkout -f
systemctl reload nginx

GitDeploymentAutomation

Q87 How do you configure VLANs on Linux?

ip link add link eth0 name eth0.100 type vlan id 100
ip addr add 192.168.100.1/24 dev eth0.100
ip link set eth0.100 up

VLANNetworkingSegmentation

Q88 What is the difference between SNAT and DNAT in iptables?

SNAT (Source NAT): Changes source IP of outgoing packets. Used for internet access from private networks.
DNAT (Destination NAT): Changes destination IP of incoming packets. Used for port forwarding to internal servers.
MASQUERADE: Dynamic SNAT for interfaces with changing IPs (DHCP).

NATiptablesNetworking

Q89 How do you use nc (netcat) for network troubleshooting?

# Test port connectivity
nc -zv server.example.com 3306
# Simple chat server
nc -l 1234
# File transfer
# Receiver: nc -l 1234 > file.txt
# Sender: nc server 1234 < file.txt

netcatNetworkTroubleshooting

Q90 How do you set up email alerts with Postfix for system monitoring?

sudo apt install postfix mailutils
# Configure as "Internet Site"
echo "Test alert from $(hostname)" | mail -s "Alert" admin@company.com
# Use in cron: command || echo "Failed" | mail -s "Cron Error" admin@company.com

PostfixEmailMonitoring

Q91 What is Btrfs and how does it compare to ext4 and ZFS?

ext4: Stable, mature, no checksumming, no snapshots. Best for general use.
Btrfs: Copy-on-write, snapshots, compression, checksumming. Good for workstations and backup servers. Some stability concerns with RAID5/6.
ZFS: Enterprise-grade, best data integrity, built-in RAID, deduplication, snapshots. Higher memory requirements. Best for critical data storage.

FilesystemBtrfsZFSext4

Q92 How do you deploy a web application using Docker Compose?

# docker-compose.yml
version: '3.8'
services:
web:
image: nginx:alpine
ports: - "80:80"
volumes: - ./html:/usr/share/nginx/html
db:
image: mysql:8
environment:
MYSQL_ROOT_PASSWORD: secret
MYSQL_DATABASE: appdb
volumes: - db_data:/var/lib/mysql
volumes:
db_data:
docker-compose up -d

DockerComposeDeployment

Q93 How do you configure CORS headers in Nginx?

location /api/ {
    add_header 'Access-Control-Allow-Origin' '*';
    add_header 'Access-Control-Allow-Methods' 'GET,POST,OPTIONS';
    if ($request_method = 'OPTIONS') {
    return 204;
    }
    proxy_pass http://backend;
}

NginxCORSWeb

Q94 How do you use perf for CPU profiling?

perf record -p <PID> -g -- sleep 30
perf report
perf top   # Real-time

perfProfilingPerformance

Q95 Explain Blue-Green deployment strategy.

Blue: Current production environment.
Green: New version, fully tested but not live.
Switch traffic from Blue to Green via load balancer. Instant rollback by switching back. Requires double the infrastructure but enables zero-downtime deployments. Business: Reduces deployment risk — if the new version has issues, rollback takes seconds, not hours.

Blue-GreenDeploymentDevOps

Q96 How do you configure a mail server with Postfix + Dovecot?

# Postfix for SMTP, Dovecot for IMAP/POP3
sudo apt install postfix dovecot-imapd dovecot-pop3d
# Configure virtual domains, SSL certificates, authentication
# Business: Full email server setup for small business — saves $5-15/user/month vs Google Workspace

Mail ServerPostfixDovecot

Q97 How do you use iotop and iostat to diagnose disk performance?

iostat -x 1   # Detailed disk stats
iotop -o      # Processes doing I/O
# Look for high await (queue time) and %util near 100%

I/OPerformanceDiagnostics

Q98 What is the difference between active and passive FTP?

Active FTP: Server connects back to client for data transfer. Problematic with firewalls/NAT.
Passive FTP: Client initiates both control and data connections. Firewall-friendly. Modern recommendation: Avoid FTP entirely — use SFTP (SSH-based) or HTTPS for file transfers. FTP sends credentials in plaintext.

FTPSecurityNetworking

Q99 How do you integrate LDAP for centralized authentication?

sudo apt install libnss-ldap libpam-ldap
# Configure /etc/nsswitch.conf, /etc/pam.d/common-auth
# Business: Centralized auth for 500+ servers — one password change propagates everywhere

LDAPAuthenticationEnterprise

Q100 How do you implement rate limiting in Nginx?

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;
server {
    location /api/ {
    limit_req zone=mylimit burst=20 nodelay;
    proxy_pass http://backend;
    }
}

NginxRate LimitingSecurity

Q101 How do you use ssh tunneling for secure access?

# Local port forwarding
ssh -L 3306:db.internal:3306 jump-server
# Remote port forwarding
ssh -R 8080:localhost:3000 remote-server
# Dynamic SOCKS proxy
ssh -D 1080 jump-server

SSHTunnelingSecurity

Q102 Explain the concept of immutable infrastructure.

Servers are never modified after deployment. Instead of patching, you build a new image and replace the server. Business Benefits: No configuration drift, predictable deployments, easier rollbacks, improved security posture. Tools: Packer + Terraform + Docker.

ImmutableInfrastructureDevOps

Q103 How do you use auditd for system call auditing?

# Monitor changes to /etc/passwd
auditctl -w /etc/passwd -p wa -k passwd_changes
# Search logs
ausearch -k passwd_changes

auditdSecurityCompliance

Q104 How do you configure a high-availability cluster with Corosync and Pacemaker?

sudo apt install corosync pacemaker pcs
# Configure cluster, add resources (virtual IP, services)
pcs cluster setup --name ha_cluster node1 node2
pcs cluster start --all

HACorosyncPacemaker

Q105 How do you use dd for disk cloning and backup?

# Clone disk
dd if=/dev/sda of=/dev/sdb bs=4M status=progress
# Create disk image
dd if=/dev/sda of=/backup/disk.img bs=4M
# Restore
dd if=/backup/disk.img of=/dev/sda bs=4M

Warning: dd is nicknamed "disk destroyer" — one wrong command can wipe a production disk. Always double-check the of= parameter.

ddBackupDisk

Q106 How do you configure IPv6 on a Linux server?

# Check IPv6
ip -6 addr show
# Add IPv6 address
ip -6 addr add 2001:db8::1/64 dev eth0
# Test connectivity
ping6 2001:4860:4860::8888

IPv6Networking

Q107 How do you use lsof to troubleshoot "file in use" issues?

lsof /var/log/app.log   # What process has this file open?
lsof -i :80             # What's using port 80?
lsof -u username        # All files opened by user

lsofTroubleshootingFile Locking

Q108 How do you manage kernel modules with modprobe?

lsmod                    # List loaded modules
modprobe nfs             # Load module
modprobe -r nfs          # Remove module
modinfo nfs              # Module details

KernelModulesmodprobe

Q109 How do you set up a TFTP server for network booting?

sudo apt install tftpd-hpa
# Configure /etc/default/tftpd-hpa
# Used with PXE for automated OS installations

TFTPPXEAutomation

Q110 How do you use curl for API testing and debugging?

curl -X POST https://api.example.com/data \
-H "Content-Type: application/json" \
-d '{"key":"value"}' \
-v   # Verbose — shows headers, TLS handshake

curlAPIDebugging

Expert Level — Linux Server Administration

5–10 Years Experience

Q111 How do you design a disaster recovery plan for a Linux-based infrastructure?

Business-First Approach: DR planning starts with RPO (Recovery Point Objective — how much data can you lose?) and RTO (Recovery Time Objective — how long can you be down?).

RPO Examples: Financial trading: 0 seconds (synchronous replication). E-commerce: 5 minutes. Internal wiki: 24 hours.
RTO Examples: Critical SaaS: < 15 minutes. Corporate website: 4 hours. Archive server: 48 hours.

Implementation: 1) Multi-region database replication (async for cost, sync for zero data loss). 2) Automated failover with health checks. 3) Infrastructure as Code (Terraform) to recreate environment. 4) Regular DR testing (quarterly minimum). 5) Documented runbooks.

Cost-Benefit: A DR solution costing $50K/year is cheap if a single day of downtime costs $500K. Present this math to management to get budget approval.

Disaster RecoveryRPO/RTOBusiness Continuity

Q112 How do you optimize Linux kernel parameters for a high-traffic web server handling 100K+ concurrent connections?

# /etc/sysctl.conf for high-concurrency web server
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0  # Set to 0 behind NAT
net.ipv4.ip_local_port_range = 1024 65535
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
fs.file-max = 2097152
fs.nr_open = 2097152

# Also increase ulimit
# /etc/security/limits.conf
* soft nofile 1048576
* hard nofile 1048576

Verification: Use ss -s to monitor socket statistics. Use ab or wrk for load testing. These settings enabled a client's server to handle 150K concurrent WebSocket connections on a single 32-core machine.

Kernel TuningHigh ConcurrencyPerformance

Q113 Explain how to implement a multi-tier caching strategy for a web application.

Tier 1 — CDN (Cloudflare/Fastly): Cache static assets at edge. 90%+ cache hit rate for images, CSS, JS.
Tier 2 — Varnish/NGINX FastCGI Cache: Full-page cache for anonymous users. Reduces application server load by 70%.
Tier 3 — Redis/Memcached: Object cache for database queries, sessions, API responses. Sub-millisecond response times.
Tier 4 — Application-level: In-memory caching within the app for frequently accessed computed data.
Business Result: A properly implemented 4-tier cache can reduce database load by 95% and improve page load times from 2 seconds to 200ms.

CachingPerformanceArchitecture

Q114 How do you implement a zero-downtime database schema migration for a table with 500M+ rows?

Strategy using pt-online-schema-change (Percona Toolkit):

pt-online-schema-change \
--alter "ADD COLUMN new_field VARCHAR(255) DEFAULT NULL, ADD INDEX idx_new (new_field)" \
--execute \
--max-load="Threads_running=50" \
--critical-load="Threads_running=100" \
--chunk-size=5000 \
--progress=time,30 \
h=localhost,D=appdb,t=huge_table

How it works: 1) Creates a shadow copy of the table. 2) Adds triggers to sync changes. 3) Copies data in chunks. 4) Atomically swaps tables. Business: No downtime, no locked tables, users never notice. For a fintech company, this meant deploying schema changes during business hours instead of Sunday 3 AM maintenance windows.

DatabaseZero DowntimeMigrationPercona

Q115 How do you architect a multi-region, active-active database setup?

Challenge: Active-active multi-region means writes can happen in any region simultaneously. Conflict resolution is the hard part.

Solutions:
1) CRDTs (Conflict-Free Replicated Data Types): Mathematical approach — operations commute, so order doesn't matter. Used by Redis CRDB, Riak.
2) Last-Write-Wins (LWW): Simplest, but can lose data. OK for caches, not for financial data.
3) Application-level conflict resolution: Custom merge logic. Complex but most flexible.
4) Partition by region: Users in Asia write to Asia DB, users in Europe write to Europe DB. No conflicts. Used by many SaaS companies.

Business: Active-active reduces latency (users connect to nearest region) and provides true disaster resilience. Cost: 2-3x infrastructure spend plus engineering complexity.

Multi-RegionActive-ActiveArchitecture

Q116 How do you use eBPF for advanced observability?

eBPF runs sandboxed programs in the kernel without changing kernel source. Tools: bcc (BCC tools), bpftrace, Cilium for networking, Falco for security. Example: trace all open() syscalls across the system with zero performance impact. eBPF is revolutionizing observability — it's like having a programmable microscope into the kernel.

eBPFObservabilityKernel

Q117 How do you implement secrets management with HashiCorp Vault?

# Dynamic database credentials
vault write database/roles/myapp \
db_name=postgres \
creation_statements="CREATE USER '{{name}}'@'%' IDENTIFIED BY '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON appdb.* TO '{{name}}'@'%';" \
default_ttl="1h" max_ttl="24h"

Each application instance gets unique, time-limited credentials. If compromised, the credentials auto-expire. This is the gold standard for database security.

VaultSecretsSecurity

Q118 Explain Kubernetes architecture and how to troubleshoot pod networking.

Control Plane: API Server, etcd (state), Scheduler, Controller Manager.
Worker Nodes: Kubelet, kube-proxy, Container Runtime.
Networking: CNI plugins (Calico, Flannel, Cilium). Troubleshooting: kubectl exec -it pod -- netstat -tlnp, kubectl describe pod, check CNI logs, use tcpdump inside pods.

KubernetesArchitectureNetworking

Q119 How do you perform a live migration of a running VM with KVM?

virsh migrate --live vm_name qemu+ssh://dest-host/system
# Requires shared storage (NFS/iSCSI) and compatible CPUs

KVMVirtualizationLive Migration

Q120 How do you configure BGP on Linux for a data center network?

# Using FRRouting (FRR)
sudo apt install frr
# Configure BGP in /etc/frr/frr.conf
router bgp 65001
neighbor 192.168.1.1 remote-as 65002
network 10.0.0.0/24

BGPFRRNetworking

Q121 How do you implement a service mesh with Istio?

Istio adds mTLS, traffic management, and observability to Kubernetes without changing application code. Sidecar proxy (Envoy) injected into each pod handles all network traffic. Business: Zero-trust security between microservices, canary deployments with traffic splitting, and distributed tracing — all without developer effort.

IstioService MeshKubernetes

Q122 How do you optimize disk I/O for a PostgreSQL database?

# Use separate disks for WAL and data
# /var/lib/postgresql/data on SSD (data)
# /var/lib/postgresql/wal on NVMe (WAL)
# Mount options in /etc/fstab:
UUID=xxx /data ext4 defaults,noatime,nodiratime,data=writeback 0 2
# PostgreSQL conf:
effective_io_concurrency = 200
random_page_cost = 1.1  # For SSD

PostgreSQLI/OPerformance

Q123 Explain the Linux memory management subsystem: slab, buddy allocator, page cache.

Buddy Allocator: Allocates contiguous physical pages (4KB each). Merges adjacent free blocks into larger ones.
Slab Allocator: Caches frequently allocated kernel objects (inodes, dentries). Reduces fragmentation.
Page Cache: Caches file data in RAM. free -h shows it as "buff/cache" — this is available memory, not used memory. Linux will free it if applications need RAM.

Memory ManagementKernelDeep Dive

Q124 How do you implement a CI/CD pipeline with GitLab CI for a microservices application?

# .gitlab-ci.yml
stages: [build, test, deploy]
build:
stage: build
script: docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
test:
stage: test
script: docker run $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA pytest
deploy:
stage: deploy
script: kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

CI/CDGitLabDevOps

Q125 How do you troubleshoot a memory leak in a Java application on Linux?

# Monitor heap usage
jstat -gc <PID> 1000
# Heap dump
jmap -dump:live,format=b,file=heap.hprof <PID>
# Analyze with Eclipse MAT or jhat
# Check native memory (off-heap leak)
pmap -x <PID> | sort -k3 -rn | head -20

JavaMemory LeakTroubleshooting

Q126 How do you configure OpenLDAP with SSL/TLS for enterprise authentication?

# Generate certificates, configure slapd
# Enable LDAPS on port 636
# Integrate with PAM/NSS for system auth
# Business: Single source of truth for 10,000+ employees across all Linux servers

OpenLDAPEnterpriseAuthentication

Q127 How do you use Terraform to provision Linux servers on AWS/Azure/GCP?

resource "aws_instance" "web" {
    ami = "ami-0c55b159cbfafe1f0"
    instance_type = "t3.medium"
    key_name = "production-key"
    vpc_security_group_ids = [aws_security_group.web.id]
    user_data = file("bootstrap.sh")
    tags = { Name = "web-${count.index}" }
    count = 3
}

TerraformIaCCloud

Q128 How do you implement a zero-trust network architecture on Linux?

Principles: Never trust, always verify. Every connection is authenticated and authorized.
Implementation: 1) mTLS everywhere (mutual TLS). 2) Identity-based access (SPIFFE/SPIRE). 3) Micro-segmentation (every service has its own firewall rules). 4) Continuous verification (re-authenticate periodically). 5) Assume breach — limit blast radius.

Zero TrustSecurityArchitecture

Q129 How do you set up a production Kubernetes cluster with kubeadm?

# Control plane
kubeadm init --pod-network-cidr=10.244.0.0/16
# Join workers
kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>
# Install CNI (Calico)
kubectl apply -f calico.yaml

KuberneteskubeadmProduction

Q130 Explain the Linux I/O scheduler algorithms: CFQ, Deadline, NOOP, mq-deadline, kyber.

mq-deadline: Default for SSDs in modern kernels. Fair, low latency, good for mixed workloads.
kyber: Designed for fast SSDs/NVMe. Uses token bucket to control latency.
none/noop: Minimal overhead, lets the device handle queuing. Best for NVMe and virtualized storage.
Check current: cat /sys/block/sda/queue/scheduler

I/O SchedulerKernelPerformance

Q131 How do you use Ceph for distributed storage?

Ceph provides object (S3-compatible), block (RBD), and file (CephFS) storage in a single cluster. Self-healing, no single point of failure. Used by CERN for petabyte-scale storage. Business: Replace expensive SAN storage with commodity servers — 60-80% cost reduction for large-scale storage.

CephStorageDistributed

Q132 How do you implement canary deployments in Kubernetes?

# Using Istio/Flagger
# Deploy new version, route 5% traffic to it
# Monitor error rate and latency
# Gradually increase to 100% or auto-rollback
# Business: Reduce deployment risk — if canary fails, only 5% of users are affected

CanaryKubernetesDeployment

Q133 How do you use systemd-nspawn for lightweight containers?

# Create container
sudo debootstrap stable /var/lib/machines/mycontainer
# Start
sudo systemd-nspawn -D /var/lib/machines/mycontainer -b
# Lighter than Docker, integrates with systemd, good for system containers

systemd-nspawnContainerssystemd

Q134 How do you audit and harden a Linux server for PCI-DSS compliance?

Key Requirements: 1) Firewall with documented rules. 2) No default passwords. 3) File integrity monitoring (AIDE). 4) Centralized logging with tamper protection. 5) Quarterly vulnerability scans. 6) Access control with least privilege. 7) Encryption at rest and in transit. 8) Regular patching with documented SLAs. Tool: lynis audit system, OpenSCAP for automated compliance scanning.

PCI-DSSComplianceSecurity

Q135 How do you configure VXLAN tunnels for overlay networking?

ip link add vxlan0 type vxlan id 100 dstport 4789 group 239.1.1.1 dev eth0
ip addr add 10.100.0.1/24 dev vxlan0
ip link set vxlan0 up

VXLAN encapsulates Layer 2 frames in UDP packets, enabling virtual networks across physical infrastructure. Used by Docker overlay, Kubernetes flannel, OpenStack Neutron.

VXLANOverlayNetworking

Q136 How do you implement automated patching with Ansible and a canary strategy?

# Patch canary servers first (10% of fleet)
ansible-playbook -l canary_group patch.yml
# Wait 24 hours, monitor for issues
# If healthy, patch remaining servers
ansible-playbook -l production_group patch.yml

AnsiblePatchingAutomation

Q137 How do you use Kafka for event streaming on Linux?

# Start Zookeeper, then Kafka
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
# Create topic
bin/kafka-topics.sh --create --topic events --bootstrap-server localhost:9092

KafkaStreamingEvent-Driven

Q138 How do you configure DNSSEC for a domain?

# Sign zone with dnssec-signzone
# Publish DS record with registrar
# DNSSEC prevents DNS spoofing/cache poisoning

DNSSECDNSSecurity

Q139 How do you troubleshoot "Too many open files" errors in production?

# Check limits
ulimit -n
cat /proc/sys/fs/file-max
# Find process with most open files
lsof | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
# Increase limits permanently in /etc/security/limits.conf

File DescriptorsLimitsTroubleshooting

Q140 How do you implement distributed tracing with Jaeger?

# Deploy Jaeger operator in Kubernetes
# Instrument applications with OpenTelemetry SDK
# Traces flow: App → Jaeger Collector → Elasticsearch/Cassandra → Jaeger UI

JaegerTracingObservability

Q141 How do you use kernel live patching with Kpatch?

# Install kpatch
sudo apt install kpatch
# Apply a live patch
sudo kpatch apply patch.kpatch
# List active patches
sudo kpatch list

Live PatchingKernelSecurity

Q142 How do you configure SR-IOV for network performance?

SR-IOV allows a single physical NIC to present multiple virtual NICs (VFs) directly to VMs/containers, bypassing the hypervisor for near-native network performance. Used in telco/NFV and high-frequency trading.

SR-IOVNetworkingPerformance

Q143 How do you use OSSEC for host-based intrusion detection?

sudo apt install ossec-hids
# Monitors file integrity, log analysis, rootkit detection
# Alerts on suspicious activity via email/SIEM integration

OSSECIDSSecurity

Q144 Explain the design of a message queue system with RabbitMQ on Linux.

sudo apt install rabbitmq-server
rabbitmqctl add_user app_user strong_password
rabbitmqctl add_vhost /app
rabbitmqctl set_permissions -p /app app_user ".*" ".*" ".*"

RabbitMQMessage QueueArchitecture

Q145 How do you use Cloud-init for automated server provisioning?

#cloud-config
packages:
- nginx
- docker.io
users:
- name: deploy
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-rsa AAAAB3...

Cloud-initAutomationProvisioning

Q146 How do you use the magic SysRq key for kernel debugging?

# Enable
echo 1 > /proc/sys/kernel/sysrq
# Safe reboot when system is hung
echo b > /proc/sysrq-trigger  # Reboot
# Sync filesystems first
echo s > /proc/sysrq-trigger  # Sync
# Mnemonic: "Raising Skinny Elephants Is Utterly Boring"

SysRqKernelDebugging

Q147 How do you implement a log aggregation pipeline with Fluentd?

# fluentd.conf

@type tail
path /var/log/app/*.log
tag app.logs


@type elasticsearch
host elasticsearch.internal

FluentdLoggingObservability

Q148 How do you use cgroups v2 for resource isolation?

# Create cgroup
mkdir /sys/fs/cgroup/myapp
echo "500000 1000000" > /sys/fs/cgroup/myapp/cpu.max
echo "2G" > /sys/fs/cgroup/myapp/memory.max
# Add process
echo $PID > /sys/fs/cgroup/myapp/cgroup.procs

cgroups v2Resource ControlContainers

Q149 How do you perform a security penetration test on a Linux server?

# Using nmap for port scanning
nmap -sV -sC -p- target_server
# Using nikto for web vulnerability scanning
nikto -h https://target_server
# Using metasploit for exploitation testing
msfconsole

Penetration TestingSecuritynmap

Q150 How do you configure a load-balanced MySQL cluster with ProxySQL?

# ProxySQL sits between app and MySQL servers
# Provides connection pooling, query routing, read/write splitting
# Reduces DB connections by 90%+ through connection multiplexing

ProxySQLMySQLLoad Balancing

Q151 Explain the concept of chaos engineering and how to implement it on Linux.

Chaos Engineering: Deliberately introduce failures to test system resilience. Tools: Chaos Monkey (Netflix), LitmusChaos (Kubernetes), stress-ng (Linux). Example: Randomly kill 50% of web servers during business hours and verify the load balancer handles it. Business: Proactively discovering weaknesses prevents production outages.

Chaos EngineeringResilienceTesting

Q152 How do you configure automatic failover with keepalived?

# keepalived.conf
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
virtual_ipaddress { 192.168.1.100/24 }
}

keepalivedHAFailover

Q153 How do you use BPF Compiler Collection (BCC) tools?

# Install BCC tools
sudo apt install bpfcc-tools
# Trace file opens
sudo opensnoop-bpfcc
# Trace new processes
sudo execsnoop-bpfcc
# Analyze disk latency
sudo biolatency-bpfcc

BCCeBPFObservability

Q154 How do you migrate from on-premise to AWS using Server Migration Service?

# Install AWS SMS connector on-prem
# Discover servers, replicate to AWS
# Cut over with minimal downtime
# Business: Lift-and-shift migration reduces time-to-cloud by 60%

MigrationAWSCloud

Q155 How do you configure SSL/TLS termination at the load balancer?

# HAProxy SSL termination
frontend https_front
bind *:443 ssl crt /etc/ssl/certs/mycert.pem
default_backend web_back
# Backend servers receive plain HTTP (internal network)

SSLTLSLoad Balancer

Q156 How do you use InfluxDB + Telegraf for time-series monitoring?

# Telegraf collects metrics, InfluxDB stores, Grafana visualizes
sudo apt install telegraf influxdb
# Configure telegraf.conf with inputs (cpu, mem, disk, net)
# Create dashboards in Grafana

InfluxDBTelegrafMonitoring

Q157 How do you configure SELinux policies for a custom application?

# Generate policy from audit logs
grep myapp /var/log/audit/audit.log | audit2allow -M myapp_policy
semodule -i myapp_policy.pp

SELinuxPolicySecurity

Q158 How do you use Grafana Loki for log aggregation?

# Loki is like Prometheus but for logs
# Uses labels for indexing, much cheaper than Elasticsearch
# Integrates seamlessly with Grafana

LokiLoggingGrafana

Q159 How do you implement a Web Application Firewall with ModSecurity?

sudo apt install libapache2-mod-security2
# Enable OWASP Core Rule Set
# Blocks SQL injection, XSS, and other attacks at the web server level

WAFModSecuritySecurity

Q160 How do you use Packer to create golden images?

{
    "builders": [{"type": "amazon-ebs", "region": "us-east-1",
    "source_ami": "ami-0c55b159cbfafe1f0"}],
    "provisioners": [{"type": "shell", "script": "setup.sh"}]
}

PackerGolden ImageIaC

Q161 How do you use stress-ng for load testing?

# CPU stress
stress-ng --cpu 8 --timeout 60s
# Memory stress
stress-ng --vm 4 --vm-bytes 2G --timeout 60s
# Combined
stress-ng --cpu 8 --io 4 --vm 4 --vm-bytes 2G --timeout 120s

stress-ngLoad TestingPerformance

Q162 How do you configure network bonding for redundancy?

# /etc/network/interfaces
auto bond0
iface bond0 inet static
address 192.168.1.10
netmask 255.255.255.0
bond-slaves eth0 eth1
bond-mode active-backup
bond-miimon 100

BondingRedundancyNetworking

Q163 How do you use the Linux audit framework (auditd) for compliance?

# Monitor sensitive files
auditctl -w /etc/shadow -p wa -k shadow_access
auditctl -w /etc/passwd -p wa -k passwd_changes
# Generate report
aureport --summary

auditdComplianceMonitoring

Q164 How do you configure a production-ready Elasticsearch cluster?

# elasticsearch.yml
cluster.name: production
node.name: node-1
network.host: 0.0.0.0
discovery.seed_hosts: ["node-1","node-2","node-3"]
cluster.initial_master_nodes: ["node-1","node-2","node-3"]
# Set heap to 50% of RAM, max 31GB
# Use SSD for data path

ElasticsearchClusterProduction

Q165 How do you use SSH certificates for authentication at scale?

# Instead of managing authorized_keys on 1000 servers
# Issue short-lived SSH certificates signed by a CA
ssh-keygen -s ca_key -I user_id -n username -V +1d user_key.pub
# Servers trust the CA — no per-server key management

SSHCertificatesScale

Most-Expert Level — Linux Server Administration

10+ Years Experience

Q166 How would you architect a globally distributed, multi-cloud Kubernetes platform serving 500M+ users?

Architecture Vision:

Control Plane: Multi-cluster management with Karmada or Google Anthos — single pane of glass across AWS, GCP, Azure.
Networking: Service mesh (Istio) with multi-cluster federation. Cross-cluster mTLS, global traffic routing based on latency and health.
Data Layer: CockroachDB or YugabyteDB for globally consistent SQL. Cassandra/ScyllaDB for high-throughput NoSQL.
CDN: CloudFront + Cloudflare for static content. Dynamic content routed to nearest PoP.
Observability: Centralized Prometheus/Thanos for metrics, Tempo for tracing, Loki for logs — all in Grafana Cloud.
CI/CD: ArgoCD with ApplicationSets for automated multi-cluster deployments.
Cost Optimization: Spot instances for stateless workloads, reserved instances for databases, auto-scaling across clouds based on pricing.

Business Impact: This architecture provides 99.99% availability, < 100ms global latency, and avoids single-cloud vendor lock-in. Estimated infrastructure cost: $2-5M/month for 500M users, but the business can survive any single cloud provider outage.

Multi-CloudKubernetesArchitectureGlobal Scale

Q167 How do you diagnose and resolve a kernel panic in a production server without physical access?

Immediate Actions:

# 1. Check IPMI/iDRAC/iLO for console access
ipmitool -H <ipmi_ip> -U admin -P password sol activate

# 2. Configure kdump for crash dumps (pre-incident setup)
sudo apt install kdump-tools
# Edit /etc/default/kdump-tools — set crashkernel=256M
# After crash, find dump in /var/crash/

# 3. Analyze crash dump
crash /usr/lib/debug/boot/vmlinux-$(uname -r) /var/crash/dump.xxx

# 4. Common causes:
# - Faulty kernel module
# - Hardware failure (bad RAM — check with memtest)
# - Filesystem corruption
# - Out-of-memory with critical process killed

Prevention: Use netconsole to stream kernel logs to another server. Configure watchdog timers. Set up automatic reboot after panic: echo 10 > /proc/sys/kernel/panic. Business: A kernel panic on a critical server without kdump configured = 4+ hours of debugging vs 30 minutes with proper crash dump analysis.

Kernel PanicCrash DumpkdumpIPMI

Q168 Design a real-time data processing pipeline handling 10M events/second on Linux.

Pipeline Architecture:

Ingestion: Kafka cluster (20+ brokers) with partitioned topics. Use io_uring for disk I/O — 2x throughput vs traditional AIO.
Processing: Apache Flink on Kubernetes for stateful stream processing with exactly-once semantics.
Storage: S3 data lake (via Kafka Connect S3 sink) + ClickHouse for real-time analytics.
Kernel Tuning: XDP for packet filtering at NIC level (bypasses kernel networking stack for 10x performance). HugePages for JVM.
Monitoring: Prometheus + custom eBPF probes for pipeline latency tracking.

Business: This pipeline can process financial transactions in real-time for fraud detection — a 100ms delay can mean a $10M fraudulent transaction slipping through.

Real-TimeKafkaStreaming10M EPS

Q169 How do you implement a custom Linux kernel module for a specific business requirement?

#include <linux/module.h>
#include <linux/kernel.h>
static int __init mymodule_init(void) {
printk(KERN_INFO "Custom module loaded\n");
return 0;
}
static void __exit mymodule_exit(void) {
printk(KERN_INFO "Custom module unloaded\n");
}
module_init(mymodule_init);
module_exit(mymodule_exit);
MODULE_LICENSE("GPL");

Use Case: A fintech company needed a kernel module to intercept all network I/O for real-time compliance monitoring at wire speed — impossible to achieve in userspace.

Kernel ModuleC ProgrammingLow-Level

Q170 How do you use XDP (eXpress Data Path) for high-performance packet processing?

XDP runs eBPF programs directly on the NIC driver level, before the kernel allocates sk_buff structures. This enables line-rate packet processing (40Gbps+) with minimal CPU. Use cases: DDoS mitigation, load balancing, telemetry. Companies like Cloudflare use XDP to drop attack traffic at the edge with zero performance impact.

XDPeBPFHigh Performance

Q171 How do you design a storage architecture using NVMe-oF for a database cluster?

NVMe over Fabrics allows accessing NVMe storage over the network (RDMA, Fibre Channel, or TCP) with latency approaching local NVMe (< 10µs overhead vs DAS). Architecture: NVMe-oF target servers expose NVMe namespaces. Database servers connect via RoCE (RDMA over Converged Ethernet). Result: Shared NVMe storage with < 100µs latency — ideal for Oracle RAC or PostgreSQL with shared storage.

NVMe-oFStorageRDMA

Q172 How do you implement a consensus algorithm (Raft) for a distributed system?

Raft is used by etcd, Consul, and TiKV. Leader election, log replication, and safety. In production, you need odd number of nodes (3, 5, or 7), proper timeout tuning, and disk persistence for the write-ahead log. Business: Raft ensures your distributed lock service or configuration store remains consistent even during network partitions.

RaftConsensusDistributed Systems

Q173 How do you use DPDK for userspace networking?

DPDK (Data Plane Development Kit) bypasses the kernel network stack entirely. Applications poll NIC directly from userspace using huge pages and CPU pinning. Achieves 100M+ packets per second per core. Used in telco (5G infrastructure), financial trading systems, and high-performance load balancers.

DPDKNetworkingUserspace

Q174 How do you architect a multi-tenant SaaS platform with strict data isolation on Linux?

Options:
1) Database-per-tenant: Strongest isolation, most overhead.
2) Schema-per-tenant: PostgreSQL schemas — good balance.
3) Row-level security: PostgreSQL RLS policies — efficient but complex.
4) OS-level isolation: Each tenant gets a Linux namespace or lightweight container with dedicated resources.
Business: For healthcare SaaS (HIPAA), database-per-tenant is non-negotiable. For a project management tool, row-level security with encryption is sufficient.

SaaSMulti-TenantArchitecture

Q175 How do you perform a live migration of a running container between hosts?

CRIU (Checkpoint/Restore In Userspace) can checkpoint a running process/container and restore it on another host. Docker supports this experimentally: docker checkpoint create and docker start --checkpoint. Limitations: Requires same kernel version, doesn't work with GPU passthrough, open TCP connections may break. Alternative: Kubernetes graceful eviction with preStop hooks.

CRIULive MigrationContainers

Q176 How do you configure Linux for low-latency trading (sub-microsecond)?

# CPU isolation — dedicate cores to trading app
isolcpus=2-7 nohz_full=2-7 rcu_nocbs=2-7
# Disable power management
cpupower frequency-set -g performance
# Use real-time scheduling
chrt -f 99 ./trading_app
# Pin to isolated CPU
taskset -c 2-7 ./trading_app
# Use huge pages, disable swap, pin NIC interrupts to dedicated cores

Low LatencyTradingPerformance

Q177 How do you implement a network traffic generator for testing at 100Gbps?

# Using pktgen (kernel module)
modprobe pktgen
echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
echo "count 10000000" > /proc/net/pktgen/eth0
echo "start" > /proc/net/pktgen/pgctrl
# For more advanced: T-Rex (Cisco) or Warp17

Traffic Generator100GbpsTesting

Q178 How do you design a hot-hot disaster recovery solution with real-time data sync?

Hot-Hot: Both data centers serve traffic simultaneously. Data Sync: Use database-native multi-master (MySQL Group Replication, PostgreSQL BDR) or application-level dual-write with conflict resolution. Global Load Balancing: DNS-based (Route 53, NS1) with health checks. Business: Banks use hot-hot for zero RPO/RTO. Cost is 2-3x infrastructure but zero revenue loss during a DC failure.

Hot-HotDRReal-Time

Q179 How do you use io_uring for high-performance asynchronous I/O?

io_uring (kernel 5.1+) is the next-gen async I/O interface. Uses shared memory ring buffers between kernel and userspace — zero syscall overhead for I/O operations. Achieves 2-3x throughput vs libaio. Used by RocksDB, ScyllaDB, and modern storage systems. Code: liburing library provides easy API for C/C++ applications.

io_uringAsync I/OPerformance

Q180 How do you implement a zero-trust network with SPIFFE/SPIRE?

SPIFFE: Standard for workload identity (SPIFFE ID like spiffe://company.com/app/frontend). SPIRE: Implementation — issues short-lived X.509 certificates and JWTs to workloads. Every service-to-service call is mutually authenticated via mTLS. No more hardcoded API keys or static credentials.

SPIFFEZero TrustSecurity

Q181 How do you use perf and flame graphs to identify CPU bottlenecks?

perf record -F 99 -p <PID> -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
# Flame graph shows where CPU time is spent — width = time

Flame GraphsperfPerformance

Q182 How do you configure Linux as a high-performance router with BGP and OSPF?

# FRRouting for routing protocols
# VPP (Vector Packet Processing) for forwarding plane
# Achieves 100Gbps routing on commodity hardware
# Used by cloud providers for virtual networking

RouterBGPVPP

Q183 How do you implement a blockchain node on Linux?

# Ethereum node (Geth)
geth --syncmode "snap" --http --http.api "eth,web3,personal"
# Requires fast SSD (NVMe recommended), 32GB+ RAM
# Storage: 1TB+ for full archive node

BlockchainEthereumNode

Q184 How do you use LTTng for low-overhead kernel and userspace tracing?

LTTng (Linux Trace Toolkit Next Generation) provides sub-microsecond overhead tracing. Used for debugging latency issues in production without impacting performance. Integrates with Trace Compass for visualization.

LTTngTracingPerformance

Q185 How do you architect a solution for GDPR-compliant data processing on Linux?

Key Technical Requirements: 1) Data encryption at rest (LUKS/dm-crypt) and in transit (TLS 1.3). 2) Data anonymization/pseudonymization. 3) Right to erasure — ability to delete specific user data across all systems. 4) Audit logging of all data access. 5) Data residency — keep EU user data in EU data centers. 6) Breach notification within 72 hours — requires comprehensive monitoring.

GDPRComplianceData Privacy

Q186 How do you perform capacity planning for a Linux-based infrastructure?

Methodology: 1) Baseline current usage (CPU, memory, disk, network). 2) Analyze growth trends (linear, exponential, seasonal). 3) Model future demand with headroom (typically 30-50% buffer). 4) Plan for peak (Black Friday, product launches). 5) Right-size instances — most companies over-provision by 40% on average. Tools: Prometheus + Grafana for trend analysis, sar for historical data.

Capacity PlanningInfrastructureStrategy

Q187 How do you implement a message broker cluster with Apache Pulsar?

# Pulsar separates serving (brokers) from storage (BookKeeper)
# Enables independent scaling, multi-tenancy, geo-replication
# Used by Yahoo, Verizon, Splunk for trillion+ messages/day

PulsarMessagingScale

Q188 How do you use systemd sandboxing features for service security?

[Service]
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
NoNewPrivileges=yes
ReadOnlyPaths=/etc/myapp
ReadWritePaths=/var/lib/myapp
# systemd can isolate services without Docker — built-in security

systemdSandboxingSecurity

Q189 How do you use BPF CO-RE for portable eBPF programs?

CO-RE (Compile Once, Run Everywhere) allows eBPF programs to run across different kernel versions without recompilation. Uses BTF (BPF Type Format) for type information. Essential for distributing eBPF tools as binaries.

BPF CO-REeBPFPortability

Q190 How do you design a hybrid cloud architecture with consistent security policies?

Key Components: 1) Unified identity (LDAP/AD + cloud IAM federation). 2) Consistent firewall policies via IaC (Terraform for cloud, Ansible for on-prem). 3) Centralized logging (ELK stack spanning both). 4) VPN/Direct Connect for secure interconnect. 5) Container orchestration (OpenShift/Rancher) that spans on-prem and cloud. Business: Hybrid cloud provides flexibility — keep sensitive data on-prem while bursting to cloud for peak loads.

Hybrid CloudArchitectureSecurity

Q191 How do you implement a service level objective (SLO) monitoring system?

SLO: Target for service reliability (e.g., 99.9% availability). SLI: Metric measured (e.g., successful requests / total requests). Error Budget: 1 - SLO = allowable failures (0.1% for 99.9% SLO). Implementation: Prometheus recording rules for SLI, Grafana dashboards, alert when error budget burn rate exceeds threshold. Business: SLOs align engineering with business expectations — prevents over-engineering (99.999% when 99.9% is sufficient).

SLOSREMonitoring

Q192 How do you use user namespaces for rootless containers?

# Podman supports rootless containers out of the box
podman run --user 1000:1000 -d nginx
# Root inside container maps to unprivileged user outside
# Eliminates risk of container escape to host root

RootlessContainersSecurity

Q193 How do you configure Linux for real-time audio/video processing?

# Install real-time kernel
sudo apt install linux-image-rt-amd64
# Set thread priorities
chrt -f 80 ./audio_process
# Use ALSA with mmap for low-latency audio

Real-TimeAudioKernel

Q194 How do you implement a distributed key-value store with etcd?

# etcd cluster (3 nodes minimum)
etcd --name node1 \
--initial-cluster node1=http://10.0.0.1:2380,node2=http://10.0.0.2:2380,node3=http://10.0.0.3:2380 \
--initial-cluster-state new

etcdDistributedKey-Value

Q195 How do you use the kernel's ftrace for function-level tracing?

echo function > /sys/kernel/debug/tracing/current_tracer
echo "kfree_skb" > /sys/kernel/debug/tracing/set_ftrace_filter
cat /sys/kernel/debug/tracing/trace

ftraceKernelTracing

Q196 How do you design a CDN using Linux and open-source tools?

# Nginx + Varnish for caching
# GeoDNS (PowerDNS) for routing users to nearest PoP
# BGP Anycast for IP-level routing
# Rsync/Lsyncd for content replication

CDNOpen SourceArchitecture

Q197 How do you implement disk encryption with LUKS and manage keys?

cryptsetup luksFormat /dev/sdb
cryptsetup luksOpen /dev/sdb encrypted_volume
mkfs.ext4 /dev/mapper/encrypted_volume
# Key management: Store master key in HSM or remote key server (Tang/Clevis)

LUKSEncryptionSecurity

Q198 How do you use the perf subsystem for hardware performance counters?

perf stat -e cache-misses,cache-references,branch-misses ./app
perf record -e intel_pt// ./app  # Intel PT for cycle-accurate tracing

perfHardware CountersProfiling

Q199 How do you configure Linux for HPC (High-Performance Computing) workloads?

# Use Mellanox OFED for InfiniBand
# Configure SLURM for job scheduling
# Use Lustre/GPFS for parallel filesystem
# Enable huge pages, CPU frequency scaling governor=performance

HPCInfiniBandSLURM

Q200 How do you implement a multi-factor authentication system with PAM?

sudo apt install libpam-google-authenticator
# Configure /etc/pam.d/sshd
auth required pam_google_authenticator.so
# Users get TOTP codes via authenticator app + password

MFAPAMSecurity

Q201 How do you use namespace manipulation for advanced container isolation?

unshare --pid --net --mount --fork --ipc --uts /bin/bash
# Creates isolated namespaces manually
# The building blocks of Docker/LXC

NamespacesContainersIsolation

Q202 How do you implement a Git server with Gitea?

# Self-hosted Git (lightweight, Go-based)
docker run -d -p 3000:3000 gitea/gitea
# Alternative to GitHub/GitLab for internal use
# Saves $21/user/month vs GitHub Enterprise

GiteaGitSelf-Hosted

Q203 How do you configure Linux for deep learning model training with multiple GPUs?

# Install NVIDIA drivers + CUDA + cuDNN
nvidia-smi   # Verify GPU availability
# Use NCCL for multi-GPU communication
# PyTorch: model = nn.DataParallel(model) or DistributedDataParallel

GPUDeep LearningCUDA

Q204 How do you use systemd timers as a cron replacement?

# /etc/systemd/system/backup.timer
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
# More features than cron: random delays, monotonic timers, dependencies

systemdTimersScheduling

Q205 How do you use netfilter hooks for custom firewall logic?

# Write kernel module that registers netfilter hook
nf_register_net_hook(&init_net, &my_hook);
# Hook at NF_INET_PRE_ROUTING, NF_INET_POST_ROUTING, etc.
# Can inspect/modify/drop packets at kernel level

NetfilterKernelFirewall

Q206 How do you use the Linux kernel's KVM for nested virtualization?

# Enable nested KVM
echo "options kvm-intel nested=1" > /etc/modprobe.d/kvm-intel.conf
# Verify
cat /sys/module/kvm_intel/parameters/nested  # Should show Y

KVMNested VirtualizationHypervisor

Q207 How do you implement a sidecar pattern in Kubernetes for logging?

# Pod with app container + Fluentd sidecar
# App writes to shared emptyDir volume
# Fluentd reads and ships to Elasticsearch
# Pattern: Separation of concerns — app doesn't know about logging infrastructure

SidecarKubernetesLogging

Q208 How do you perform a rolling kernel upgrade across a server fleet?

# Strategy:
# 1. Upgrade kernel on 5% of fleet (canary)
# 2. Monitor for 48 hours (crash rate, performance)
# 3. If healthy, upgrade 25% batches every 2 hours
# 4. Use ksplice/livepatch for critical security patches (no reboot)
# 5. Have rollback plan: grub set-default previous kernel

KernelRolling UpgradeFleet Management

Q209 How do you use the kernel's cgroup v2 for I/O throttling?

echo "8:0 wbps=104857600" > /sys/fs/cgroup/myapp/io.max
# Limits writes to 100MB/s on device 8:0
# Prevents noisy neighbor problem in multi-tenant systems

cgroups v2I/OThrottling

Q210 How do you design a service that handles 1 million WebSocket connections on a single server?

# Kernel tuning (see Q112) + application architecture
# Use epoll/kqueue for event-driven I/O
# Minimize per-connection memory (goal: <10KB per connection)
# 1M connections * 10KB = 10GB RAM — feasible on a single 32GB server
# Test with: https://github.com/ericmoritz/wsdemo

WebSocket1M ConnectionsScaling

Q211 How do you use TPM (Trusted Platform Module) for measured boot?

# TPM stores hashes of boot components
# On boot, compare PCR values against known good values
# Detect tampering — if BIOS/bootloader/kernel is modified, alert
# Used with LUKS for automatic disk decryption only if boot chain is trusted

TPMMeasured BootSecurity

Q212 How do you implement a data pipeline with Apache NiFi on Linux?

# NiFi provides visual dataflow programming
# Drag-and-drop processors for ingest, transform, route, store
# Built-in backpressure, prioritization, provenance tracking

NiFiData PipelineETL

Q213 How do you use the kernel's DAMON for memory access monitoring?

DAMON (Data Access MONitor) (kernel 5.15+) monitors memory access patterns with minimal overhead. Can identify cold memory regions for proactive reclaim or tiering. Used with DAMOS for automated memory management — migrate cold pages to slower storage automatically.

DAMONMemoryKernel

Q214 How do you configure Linux for confidential computing with AMD SEV/Intel TDX?

Confidential Computing encrypts VM memory so even the hypervisor can't read it. AMD SEV, Intel TDX. Enables running sensitive workloads in untrusted cloud environments. Business: Financial institutions can run trading algorithms in the public cloud without exposing data to the cloud provider.

Confidential ComputingSEVSecurity

Q215 How do you build a custom Linux distribution for embedded/IoT?

# Yocto Project — build custom distro
bitbake core-image-minimal
# Configure kernel, packages, init system
# Create bootable image for target device
# Used by automotive, industrial IoT, smart devices

YoctoEmbeddedIoT

AI-Oriented & Modern Trends in Linux Administration (2026)

AI/ML · Cloud-Native · Future

Q216 How is AI changing Linux server administration in 2026?

AI-Driven Operations (AIOps):

Predictive Scaling: AI models analyze traffic patterns and pre-scale infrastructure 30 minutes before demand spikes.
Anomaly Detection: ML models on metrics/logs detect subtle anomalies humans miss — a 0.5% increase in disk latency that precedes a drive failure.
Automated Root Cause Analysis: When an incident occurs, AI correlates logs across 1000+ services and suggests the most likely root cause.
Self-Healing: AI agents diagnose and fix common issues (restart service, clear cache, scale up) without human intervention.
ChatOps with LLMs: Natural language interface to infrastructure — "Show me all servers with CPU > 80% in the last hour" queries Prometheus via LLM.

Business Impact: Companies using AIOps report 50% reduction in MTTR (Mean Time to Resolution) and 30% reduction in operational costs. The role of Linux admin is evolving from "operator" to "AI supervisor."

AIAIOpsFutureAutomation

Q217 How do you set up GPU-accelerated workloads on Linux for AI/ML training?

# Install NVIDIA drivers
sudo apt install nvidia-driver-550
# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run
# Install cuDNN for deep learning
# Verify
nvidia-smi
python3 -c "import torch; print(torch.cuda.is_available())"
# Set up GPU passthrough for containers
docker run --gpus all -it pytorch/pytorch:latest

Infrastructure: For multi-GPU training, use NCCL (NVIDIA Collective Communications Library). For distributed training across nodes, use Horovod or PyTorch Distributed. Business: Proper GPU setup can reduce model training time from weeks to hours — directly impacting time-to-market for AI products.

GPUCUDAAI TrainingNVIDIA

Q218 How do you deploy and scale LLM (Large Language Model) inference on Linux servers?

Stack Options:

vLLM: High-throughput LLM serving with PagedAttention — 24x throughput vs naive implementations.
Ollama: Easy local LLM deployment — ollama run llama3 for instant inference.
llama.cpp: CPU-optimized inference with quantization (4-bit models run on 32GB RAM).
Text Generation Inference (TGI): HuggingFace's production-grade server.

# Deploy with vLLM
docker run --gpus all -p 8000:8000 \
vllm/vllm-openai:latest \
--model mistralai/Mixtral-8x7B-Instruct-v0.1

Infrastructure Requirements: For a 70B parameter model: 4x A100 (80GB) GPUs for FP16, or 1x A100 for 4-bit quantized. Business: Self-hosting LLMs saves $0.002-0.01 per token vs API providers — for 1B tokens/month, that's $2,000-$10,000 monthly savings.

LLMInferencevLLMAI Deployment

Q219 How do you use MLOps tools (Kubeflow, MLflow) on Linux for ML lifecycle management?

Kubeflow: ML workflow orchestration on Kubernetes. Pipelines for training, hyperparameter tuning, and serving. MLflow: Experiment tracking, model registry, and deployment. Linux Admin Role: Set up Kubernetes cluster with GPU support, configure storage (PVCs for datasets), set up monitoring for training jobs, manage resource quotas to prevent ML jobs from starving production services.

MLOpsKubeflowMLflowAI

Q220 How do you secure AI/ML infrastructure on Linux? What are the unique threats?

Unique AI Security Threats:

Model Poisoning: Attacker injects malicious data into training set — model behaves incorrectly on specific inputs.
Model Theft: Unauthorized access to trained model weights (the company's IP).
Adversarial Inputs: Carefully crafted inputs that cause the model to fail.
GPU Memory Attacks: Malicious code in shared GPU environments reading other processes' GPU memory.
Supply Chain: Compromised pre-trained models from public repositories (HuggingFace, PyTorch Hub).

Mitigations: Model encryption at rest, access control for model servers, input validation, GPU isolation (MIG — Multi-Instance GPU), and scanning model files for embedded malware.

AI SecurityModel ProtectionThreats

Q221 How do you use Vector Databases (Pinecone, Weaviate, Milvus) on Linux for RAG applications?

# Deploy Milvus (open-source vector DB)
docker run -d --name milvus \
-p 19530:19530 \
milvusdb/milvus:latest
# Used for semantic search, RAG (Retrieval-Augmented Generation)
# Stores embeddings from LLMs for fast similarity search

Vector DBRAGAI

Q222 How do you implement a CI/CD pipeline for ML models?

ML CI/CD Pipeline: 1) Data validation (Great Expectations). 2) Model training with versioned datasets (DVC). 3) Model evaluation against baseline. 4) Model registry (MLflow). 5) A/B testing deployment (Canary with Istio). 6) Continuous monitoring for data drift and model decay. Business: Automating ML deployment reduces time-to-production from months to days.

ML CI/CDAutomationAI

Q223 How do you monitor GPU metrics on Linux for AI workloads?

nvidia-smi dmon -s pucvmet -d 2
# Prometheus: DCGM (Data Center GPU Manager) exporter
# Grafana dashboard for GPU utilization, memory, temperature
# Alert when GPU memory > 90% or temperature > 80°C

GPU MonitoringDCGMAI

Q224 How do you implement federated learning infrastructure on Linux?

Federated Learning: Train models across decentralized devices without centralizing data. Each node trains locally, shares only model updates (not raw data). Linux Stack: Flower framework, TensorFlow Federated. Use Case: Healthcare — train diagnostic models across hospitals without sharing patient data (HIPAA-compliant).

Federated LearningPrivacyAI

Q225 How do you use Linux containers for reproducible ML environments?

# Dockerfile for reproducible ML
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN pip install torch==2.3.0 transformers==4.40.0
# Pin exact versions for reproducibility
# Use singularity/apptainer for HPC environments

ReproducibilityDockerML

Q226 How do you optimize Linux kernel for AI training workloads?

# Huge pages for large memory allocations
echo 4096 > /proc/sys/vm/nr_hugepages
# Transparent huge pages
echo always > /sys/kernel/mm/transparent_hugepage/enabled
# GPU direct RDMA for multi-node training
# IOMMU passthrough for GPU
# Disable CPU frequency scaling during training

Kernel TuningAI TrainingPerformance

Q227 How do you set up a data lakehouse with Iceberg/Delta Lake on Linux?

# Apache Iceberg on S3-compatible storage (MinIO on Linux)
# Provides ACID transactions on data lake
# Query with Spark, Trino, or Flink
# Business: Combine data warehouse reliability with data lake flexibility

LakehouseIcebergData

Q228 How do you use WebAssembly (Wasm) on Linux servers for edge computing?

# WasmEdge — lightweight WebAssembly runtime
wasmedge run --env PORT=8080 app.wasm
# Faster cold start than containers (microseconds vs seconds)
# 10x smaller than Docker images
# Used for edge AI inference, CDN edge compute

WebAssemblyEdgeWasm

Q229 How do you implement a GitOps workflow with ArgoCD for AI model deployment?

# ArgoCD Application pointing to Git repo with model serving config
# Git commit triggers automatic deployment
# Rollback = git revert
# Business: Audit trail for every model deployment — critical for regulated industries

ArgoCDGitOpsAI

Q230 How do you use eBPF for AI workload observability?

eBPF can trace GPU memory allocations, CUDA kernel launch latency, and data transfer between CPU and GPU — all with zero code changes to ML frameworks. Tools like gpud and custom bpftrace scripts provide unprecedented visibility into AI workloads.

eBPFAI ObservabilityGPU

Q231 How do you implement a feature store (Feast) on Linux for ML?

# Feast — open-source feature store
feast apply
# Manages feature definitions, online/offline serving
# Integrates with Redis for online, BigQuery/Snowflake for offline

Feature StoreMLFeast

Q232 How do you configure Linux for edge AI inference (Jetson, Raspberry Pi)?

# NVIDIA Jetson with JetPack SDK
# TensorRT for optimized inference
# ONNX Runtime for cross-platform
# Use CPU governors, disable unnecessary services for power efficiency

Edge AIJetsonInference

Q233 How do you use Ray for distributed AI workloads on a Linux cluster?

# Ray — distributed computing framework
ray start --head
# Python: @ray.remote decorator for distributed functions
# Ray Train for distributed training, Ray Serve for model serving

RayDistributed AICluster

Q234 How do you implement a model registry with MLflow on Linux?

mlflow server --backend-store-uri postgresql://user:pass@localhost/mlflow \
--default-artifact-root s3://mlflow-artifacts \
--host 0.0.0.0
# Track experiments, register models, manage versions, deploy

MLflowModel RegistryMLOps

Q235 How do you use KServe for serverless model inference on Kubernetes?

# KServe — serverless inference on Kubernetes
# Auto-scaling (including scale-to-zero), canary rollouts
# Supports TensorFlow, PyTorch, ONNX, XGBoost
# Business: Pay-per-inference instead of running GPU servers 24/7

KServeServerlessInference

Q236 How do you implement drift detection for ML models in production?

# Use Evidently AI, Alibi Detect, or NannyML
# Monitor data drift, concept drift, prediction drift
# Trigger retraining when drift exceeds threshold
# Business: Prevents model degradation that could cost millions in wrong predictions

Drift DetectionML MonitoringProduction

Q237 How do you use Apache Airflow for ML pipeline orchestration?

# Airflow DAG for ML pipeline
# Extract data → Validate → Train → Evaluate → Deploy
# Schedule, monitor, retry, alert
# Production-grade ML pipelines need robust orchestration

AirflowOrchestrationML Pipeline

Q238 How do you use the Hugging Face ecosystem on Linux for NLP?

pip install transformers datasets accelerate
# Load any of 200K+ models
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("I love Linux!")

HuggingFaceNLPTransformers

Q239 How do you deploy Stable Diffusion on a Linux server for image generation?

# AUTOMATIC1111 WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
# Requires 8GB+ VRAM GPU
# API mode for production: --api --listen
# Can generate 60+ images/minute on A100

Stable DiffusionImage GenerationAI

Q240 How do you implement a chatbot with RAG on Linux using open-source tools?

# LangChain + ChromaDB + Llama 3
# 1. Ingest documents → chunk → embed → store in vector DB
# 2. Query: embed query → retrieve similar chunks → prompt LLM
# 3. LLM generates answer grounded in retrieved documents
# Business: Internal knowledge base chatbot — answers HR/IT questions instantly

RAGChatbotOpen Source AI

Q241 How do you monitor carbon footprint of AI workloads on Linux?

# CodeCarbon — track CO2 emissions
pip install codecarbon
from codecarbon import EmissionsTracker
tracker = EmissionsTracker()
tracker.start()
# ... training code ...
emissions = tracker.stop()
# Business: ESG compliance, green AI initiatives

Green AICarbonSustainability

Q242 How do you use LakeFS for data versioning in ML pipelines?

# lakeFS — Git-like versioning for data lakes
lakectl branch create experiment-1
# Experiment on branch without affecting production data
# Merge if successful — rollback if not
# Business: Reproducible ML experiments with data lineage

LakeFSData VersioningML

Q243 How do you implement real-time model serving with Triton Inference Server?

# NVIDIA Triton — enterprise model serving
docker run --gpus all -p 8000:8000 nvcr.io/nvidia/tritonserver:24.04-py3
# Supports TensorRT, ONNX, PyTorch, TensorFlow, Python models
# Dynamic batching, model ensembles, GPU metrics

TritonModel ServingNVIDIA

Q244 How do you use JupyterHub on Linux for collaborative data science?

# JupyterHub — multi-user Jupyter notebook server
sudo apt install jupyterhub
# Docker spawner for isolated environments per user
# Integrate with LDAP for enterprise auth
# Business: 50 data scientists sharing GPU resources efficiently

JupyterHubData ScienceCollaboration

Q245 How do you use Prefect for modern workflow orchestration?

# Prefect — Python-native workflow engine
from prefect import flow, task
@task
def extract(): return data
@flow
def ml_pipeline():
data = extract()
# Modern alternative to Airflow with better Python support

PrefectOrchestrationPython

Q246 How do you implement model quantization for efficient inference on CPU?

# llama.cpp — 4-bit quantization
./quantize model-f16.gguf model-q4.gguf q4_K_M
# 70B model: 140GB → 40GB — runs on consumer hardware
# GGUF format for CPU inference
# Business: Run LLMs without $30K GPU servers

QuantizationCPU InferenceOptimization

Q247 How do you use DVC (Data Version Control) for ML data management?

dvc init
dvc add dataset/
git add dataset.dvc .gitignore
git commit -m "Add dataset v1"
# DVC tracks data versions in Git while storing data in S3/GCS
# Reproducible ML pipelines with dvc.yaml

DVCData VersioningML

Q248 How do you implement a private container registry for AI images?

# Harbor — enterprise container registry
docker run -d -p 443:443 harbor/harbor
# Vulnerability scanning, image signing, RBAC
# Store custom ML images with pre-installed CUDA, frameworks

HarborRegistryAI

Q249 How do you use K3s for lightweight Kubernetes on edge for AI?

curl -sfL https://get.k3s.io | sh -
# 40MB binary, runs on Raspberry Pi
# Perfect for edge AI deployments
# Manage edge devices like cloud servers

K3sEdge AIKubernetes

Q250 What is the future of Linux server administration with AI agents?

2026-2030 Vision: AI agents will handle 80% of routine Linux administration tasks — patching, scaling, troubleshooting common issues. Humans will focus on architecture, security strategy, and AI supervision. The Linux admin becomes an "AI Operations Engineer" — training AI models on infrastructure data, writing prompts for infrastructure automation, and handling complex edge cases. Skills to develop: AI/ML fundamentals, prompt engineering for infrastructure, eBPF, and distributed systems architecture. The role isn't disappearing — it's evolving to a higher level of abstraction.

FutureAI AgentsCareer Evolution

Mominul's Blog

Latest

Home Top Ad

Thursday, July 2, 2026

250+ Linux Server Admin Interview Q&A | Beginner to Most-Expert (2026) | FreeLearning365

Master Linux Server Administration:
250+ Real-World Interview Q&A

Beginner Level — Linux Server Administration

🧪 Beginner Hands-On Lab Scenario

💻 Code Exercise — Beginner

Intermediate Level — Linux Server Administration

🧪 Intermediate Hands-On Lab Scenario

💻 Code Exercise — Intermediate

Expert Level — Linux Server Administration

Most-Expert Level — Linux Server Administration

AI-Oriented & Modern Trends in Linux Administration (2026)

No comments:

Post a Comment

Author Details

Subscribe

Translate

Pageviews last month

Recent

Popular

Comments

Notice

⚠️ Copyright & Content Notice

📧 Contact Us

Archive

Sponsor

Learning

Tags

Search This Blog

Contact Form

Mominul's Blog

Latest

Home Top Ad

Thursday, July 2, 2026

250+ Linux Server Admin Interview Q&A | Beginner to Most-Expert (2026) | FreeLearning365

🚀 Ace Your Next Tech Interview!

Beginner Level — Linux Server Administration

🧪 Beginner Hands-On Lab Scenario

💻 Code Exercise — Beginner

Intermediate Level — Linux Server Administration

🧪 Intermediate Hands-On Lab Scenario

💻 Code Exercise — Intermediate

Expert Level — Linux Server Administration

Most-Expert Level — Linux Server Administration

AI-Oriented & Modern Trends in Linux Administration (2026)

🚀 Ready to Land Your Dream Linux Admin Job?

No comments:

Post a Comment

Author Details

Subscribe

Translate

Pageviews last month

Recent

Popular

Comments

Notice

⚠️ Copyright & Content Notice

📧 Contact Us

Archive

Sponsor

Learning

Tags

Search This Blog

Contact Form