The Linux Debugging Superpower Nobody Teaches You (Strace/SysCalls) | Kekuatan Debug Linux yang Tidak Diajarkan
Deep Dive Linux & Networking: The Real Engineering Path
Part 3 of 10
What if you could see exactly what your programs do behind the scenes? Learn strace - the debugging tool that reveals every file access, network call, and system interaction. From zero to practical mastery. | Bagaimana kalau kamu bisa lihat persis apa yang program lakukan di balik layar? Pelajari strace - tool debugging yang mengungkap setiap akses file, network call, dan interaksi sistem. Dari nol sampai mahir.
Introduction: The Hidden World Your Programs Live In
Have you ever wondered what really happens when you run a simple command like echo "hello world"? Or struggled to debug why your application suddenly canβt open a file or connect to a network?
Meet strace - the X-ray vision tool for Linux that shows you exactly what your programs are doing behind the scenes.
In this comprehensive guide, weβll journey from absolute basics to practical mastery. Whether youβre a Cloud Engineer, DevOps professional, or system administrator, strace will become your secret weapon for troubleshooting mysterious issues.
Why Strace Matters: The Real Problem It Solves
The Debugging Nightmare
Imagine this scenario: Your application suddenly throws βConnection failedβ but gives no details. Where is it trying to connect? What files is it accessing? What permissions does it need?
Without strace, youβre debugging in the dark. With strace, you can see:
- Every file your program tries to open (and whether it succeeds)
- Every network connection attempt
- Every permission check
- Every interaction with the operating system
Real-World Impact
In my career, I encountered a classic example. A critical service randomly failed to start.
The application logs showed nothing useful. Traditional debugging tools couldnβt help because the service would randomly fail. But with strace, I saw exactly which .so file it was looking for and where it was searching. Problem solved in 15 minutes.
Understanding the Foundation: What Are System Calls?
Before diving into strace, we need to understand system calls (syscalls) - the bridge between your program and the Linux kernel.
The Hotel Analogy
Think of your program as a hotel guest and the kernel as the front desk. Every time the guest needs something - room service, opening a door, using facilities - they must call the front desk. They canβt directly access hotel resources.
Similarly, when a program wants to:
- Read or write a file β Must ask kernel via syscall
- Open a network connection β Must ask kernel via syscall
- Allocate memory β Must ask kernel via syscall
Strace is the call log that records every conversation between your program and the kernel.
Why This Architecture?
This separation exists for security and stability:
- Security: Programs canβt directly access hardware or other programsβ memory
- Stability: The kernel controls resource allocation, preventing conflicts
- Abstraction: Programs donβt need to know hardware specifics
Your First Strace: Hello World Under the Microscope
Letβs start with the simplest possible command:
strace echo "hello world"What Youβll See
execve("/usr/bin/echo", ["echo", "hello world"], 0x7fff9c8b7e90 /* 24 vars */) = 0
brk(NULL) = 0x55b8f4a3d000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd8e9c4a50) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9e3c8a5000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
write(1, "hello world\n", 12) = 12
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++Overwhelming, right? Donβt worry! Letβs understand whatβs happening.
Counting System Calls
strace echo "hello world" 2>&1 | wc -lOutput:
113113 system calls just to print βhello worldβ! But why redirect with 2>&1?
Understanding stderr vs stdout: The Three Communication Channels
In Linux, every program has 3 standard communication channels (file descriptors):
| File Descriptor | Name | Purpose | Analogy |
|---|---|---|---|
| 0 | stdin | Input channel | Your inbox (receive) |
| 1 | stdout | Normal output | Your outbox (send normal mail) |
| 2 | stderr | Error/diagnostic output | Your urgent mailbox (send alerts) |
Why Does This Matter?
Strace deliberately outputs to stderr (fd 2) so it doesnβt interfere with the actual program output on stdout (fd 1).
Experiment: See the Difference
# Redirect stdout only
strace echo "hello" > output.txt
# Result: You see strace output on screen, "hello" goes to file
# Redirect stderr only
strace echo "hello" 2> output.txt
# Result: "hello" on screen, strace output goes to file
# Redirect both to the same place
strace echo "hello" 2>&1 | less
# Result: Everything piped to less for easy viewingPro tip: Always use 2>&1 when piping strace output for analysis.
Decoding Syscall Format: Reading the Matrix
Every strace line follows this pattern:
syscall_name(argument1, argument2, ...) = return_valueAnalyzing a Real Example
write(1, "hello world\n", 12) = 12Letβs break it down:
| Component | Value | Meaning |
|---|---|---|
| syscall name | write | Write data to a file descriptor |
| argument 1 | 1 | File descriptor 1 (stdout) |
| argument 2 | "hello world\n" | The actual data to write |
| argument 3 | 12 | Number of bytes to write |
| return value | = 12 | Successfully wrote 12 bytes |
Counting Characters
h e l l o w o r l d \n
1 2 3 4 5 6 7 8 9 10 11 12Perfect match! The \n (newline) counts as one character.
Key insight: The return value tells you how many bytes were actually processed. If itβs less than requested, something went wrong.
File Descriptors: Your Programβs Reference System
This is crucial to understand. File descriptors are NOT the data - they are reference numbers (handles) to access resources.
The Corrected Office Analogy
Program = You working at your desk
Kernel = Resource manager
fd 0 (stdin) = Your inbox (receive input from others)
fd 1 (stdout) = Your outbox (send normal documents)
fd 2 (stderr) = Urgent outbox (send error reports)
fd 3, 4, 5... = Filing cabinet drawers (opened files/resources)Proof of Concept
strace -e trace=openat,read,write,close cat /etc/hostnameYouβll see this sequence:
openat(AT_FDCWD, "/etc/hostname", O_RDONLY) = 3 β Open file, get handle #3
read(3, "thinkx13\n", 131072) = 9 β Use handle #3 to READ
write(1, "thinkx13\n", 9) = 9 β Use handle #1 to WRITE to screen
close(3) = 0 β Return handle #3Notice:
- We read from fd 3 (the file we opened)
- We write to fd 1 (stdout - the screen)
- File descriptor is just a handle, not the data itself
- After
close(3), fd 3 can be reused for another file
Why Start at 3?
Because 0, 1, and 2 are always pre-opened when your program starts:
- 0 = stdin (usually your keyboard)
- 1 = stdout (usually your screen)
- 2 = stderr (usually your screen)
So the first file you open gets fd 3, the next gets fd 4, and so on.
Practical Strace Techniques: From Noise to Signal
1. Filtering Specific Syscalls
Instead of seeing all 113 syscalls, focus on what matters:
strace -e trace=openat,read,write,close echo "hello world"Output is much cleaner:
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0"..., 832) = 832
close(3) = 0
write(1, "hello world\n", 12) = 12
close(1) = 0
close(2) = 02. Statistics Mode: The High-Level View
strace -c echo "hello world"Output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 1 read
0.00 0.000000 0 1 write
0.00 0.000000 0 3 close
0.00 0.000000 0 7 mmap
0.00 0.000000 0 3 mprotect
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 47 3 totalThis shows:
- Total number of each syscall type
- How many failed (errors column)
- Time spent in each syscall
- Percentage of total time
Use case: Quickly identify which syscalls are taking the most time.
3. Filter by Category
# Only network syscalls
strace -e trace=network curl google.com
# Only file operations
strace -e trace=file ls /etc
# Only process operations
strace -e trace=process bash -c "echo hello"
# Everything EXCEPT memory operations
strace -e trace=\!memory cat /etc/hostnameReal-World Debugging Scenario: Permission Denied Mystery
The Problem
Letβs simulate a common debugging scenario:
# Create a file with no permissions
echo "secret data" > /tmp/secret.txt
chmod 000 /tmp/secret.txt
# Try to read it
cat /tmp/secret.txtOutput:
cat: /tmp/secret.txt: Permission deniedGeneric error message! Not very helpful.
Using Strace to Investigate
strace cat /tmp/secret.txt 2>&1 | grep secretOutput:
openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = -1 EACCES (Permission denied)Now we see:
- syscall:
openat- itβs failing at the open stage - path:
/tmp/secret.txt- confirmed the file path - flags:
O_RDONLY- trying to open for reading - return:
-1 EACCES- specific error code for permission denied
The Fix
# Check actual permissions
ls -l /tmp/secret.txt
# Output: ---------- 1 user user 12 Nov 6 10:30 /tmp/secret.txt
# Fix permissions
chmod 644 /tmp/secret.txt
# Verify with strace
strace -e openat cat /tmp/secret.txt 2>&1 | grep secretNew output:
openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = 3 β Success! Got fd 3Understanding Return Values: Success or Failure?
Return values tell you whether the syscall succeeded or failed:
| Return Value | Meaning | Example |
|---|---|---|
| Positive number | Success, usually bytes processed | write(...) = 12 |
| 0 | Success (EOF or nothing to process) | read(...) = 0 |
| -1 ERRORCODE | Failed with specific error | openat(...) = -1 ENOENT |
Common Error Codes
# ENOENT - File doesn't exist
strace cat /file_does_not_exist 2>&1 | grep ENOENT
# EACCES - Permission denied
strace cat /etc/shadow 2>&1 | grep EACCES
# EISDIR - Is a directory (can't cat a directory)
strace cat /etc 2>&1 | grep EISDIR
# ECONNREFUSED - Connection refused (network)
strace curl http://localhost:9999 2>&1 | grep ECONNREFUSEDPro tip: Understanding error codes helps you debug faster. Google βerrno ENOENTβ for detailed explanations.
The Fallback Pattern: How Programs Handle Failure Gracefully
Programs often try multiple locations before giving up. Letβs see this in action:
strace -e trace=openat cat /etc/hostname 2>&1 | grep localeYouβll see patterns like:
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", ...) = -1 ENOENT
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", ...) = 3The program has a priority list:
- Try location A (C.UTF-8) β Failed (file doesnβt exist)
- Try location B (C.utf8) β Success!
This is called a fallback mechanism - graceful degradation instead of immediate crash.
Why This Matters
When you see multiple failed openat calls, donβt panic! The program is just trying different options. Only worry if all attempts fail.
Understanding AT_FDCWD: Relative Path Magic
Youβve seen AT_FDCWD in every openat() call. What is it?
AT_FDCWD = βAT File Descriptor Current Working Directoryβ
It means: βOpen this file relative to the current working directoryβ
Proof with Experiment
# From /tmp
cd /tmp
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = -1 ENOENT
# It looks for /tmp/hostname - doesn't exist!
# From /etc
cd /etc
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = 3
# It looks for /etc/hostname - Success!Key insight: Relative paths are resolved from your current directory. Thatβs why cd matters!
Advanced: Tracing Running Processes
You can attach strace to already-running processes:
# Find process ID
ps aux | grep nginx
# Attach to it (requires root/sudo)
sudo strace -p <PID>
# Follow child processes too
sudo strace -f -p <PID>
# Save to file for later analysis
sudo strace -o trace.log -p <PID>Real-World Use Case
Scenario: Your production web server is occasionally slow, but you canβt reproduce it in testing.
Solution: Attach strace during the slow period:
sudo strace -c -p $(pgrep nginx | head -1)
# Let it run for 60 seconds
# Press Ctrl+C
# Now analyze which syscalls are taking the most timeWarning: Strace adds significant overhead (10-100x slower). Use it sparingly in production!
Performance Considerations: When NOT to Use Strace
The Overhead Problem
Strace adds massive overhead because:
- It must intercept every syscall
- It must stop the process to read arguments
- It must format and output the data
Impact: Your program can run 10-100x slower!
Safer Alternatives for Production
# Only trace specific syscalls (less overhead)
strace -e trace=openat,connect -p <PID>
# Set time limits
timeout 10s strace -p <PID>
# Count calls only (minimal overhead)
strace -c -p <PID>When to Use Strace in Production
β Good use cases:
- Diagnosing a specific issue for a short time
- Understanding why a service wonβt start
- Finding configuration files being accessed
β Bad use cases:
- Continuous monitoring (use proper monitoring tools instead)
- Performance profiling (use
perfinstead) - Production load testing (will skew results)
Practical Debugging Exercises
Exercise 1: Find Configuration Files
Question: What configuration files does sshd read on startup?
strace -e trace=openat /usr/sbin/sshd -t 2>&1 | grep "\.conf"Youβll discover files like:
/etc/ssh/sshd_config/etc/gai.conf/etc/nsswitch.conf
Exercise 2: Debug Network Connection
Question: Where is curl trying to connect when you hit a timeout?
strace -e trace=connect curl https://example.com 2>&1Look for the connect() syscall with IP addresses and port numbers.
Exercise 3: Find Missing Library
Question: Why does this custom binary fail with βerror while loading shared librariesβ?
strace ./my_program 2>&1 | grep "\.so"Youβll see which .so files itβs looking for and where.
Common Pitfalls and How to Avoid Them
1. Forgetting stderr Redirect
# β Wrong - strace output goes to screen, gets lost in pipe
strace cat /etc/hostname | grep hostname
# β
Correct - redirect stderr to stdout first
strace cat /etc/hostname 2>&1 | grep hostname2. Not Filtering When Needed
# β Too much noise - thousands of lines
strace curl google.com
# β
Focus on what matters
strace -e trace=network,openat curl google.com3. Missing Permissions
# β Fails when tracing other users' processes
strace -p 1234
# β
Use sudo
sudo strace -p 12344. Forgetting About Child Processes
# β Only traces parent process
strace bash -c "ls /tmp"
# β
Follow children with -f
strace -f bash -c "ls /tmp"Quick Reference Cheat Sheet
# ============ BASIC USAGE ============
strace <command> # Basic tracing
strace -o output.log <command> # Save to file
strace -p <PID> # Attach to running process
# ============ FILTERING ============
strace -e trace=openat,read <command> # Specific syscalls
strace -e trace=file <command> # File operations only
strace -e trace=network <command> # Network operations only
strace -e trace=process <command> # Process operations only
strace -e trace=\!memory <command> # Everything EXCEPT memory
# ============ OUTPUT CONTROL ============
strace -c <command> # Statistics summary
strace -t <command> # Show timestamps
strace -T <command> # Show time spent per call
strace -v <command> # Verbose (full structures)
strace -s 1000 <command> # Show first 1000 bytes of strings
# ============ PROCESS CONTROL ============
strace -f <command> # Follow child processes
strace -ff -o trace <command> # Separate file per process
sudo strace -p <PID> # Attach to running process
# ============ ADVANCED ============
strace -e trace=file -e signal=none <cmd> # No signal output
strace -y <command> # Show file descriptor paths
strace -k <command> # Show stack tracesNext Steps: Beyond Strace
Youβve mastered the fundamentals! Hereβs what to explore next:
1. ltrace - Library Call Tracer
Similar to strace but traces library function calls instead of syscalls.
ltrace ls2. perf - Performance Analysis
Better for production performance profiling with minimal overhead.
perf record -g -p <PID>
perf report3. bpftrace - Modern System Tracing
Uses eBPF for efficient tracing with custom scripts.
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args->filename)); }'4. SystemTap - Advanced System Instrumentation
Enterprise-grade tracing and monitoring.
Conclusion: Your New Debugging Superpower
Strace is like having X-ray vision for your Linux system. Youβve learned:
β
What syscalls are and why they matter
β
How to read and interpret strace output
β
File descriptors and their purpose
β
Practical debugging techniques for real problems
β
Common patterns, errors, and how to fix them
β
When to use strace and when to use alternatives
The Practice Challenge
The best way to master strace is through practice. Hereβs your challenge:
- Pick any command you use daily
- Run it with
strace -cto see statistics - Find the top 3 most-called syscalls
- Research what those syscalls do
- Run it again with detailed tracing of just those syscalls
Every time you encounter a mysterious error, reach for strace first. Youβll be amazed at what you discover!
Share Your Experience
Have you used strace to solve a difficult problem? Found an interesting pattern? Iβd love to hear about it!
Follow me for more deep-dive technical content on Linux, Cloud Engineering, and Infrastructure.
Pendahuluan: Dunia Tersembunyi Tempat Program Kamu Hidup
Pernahkah kamu penasaran apa yang sebenarnya terjadi ketika menjalankan command sederhana seperti echo "hello world"? Atau kesulitan men-debug kenapa aplikasi tiba-tiba tidak bisa buka file atau koneksi ke network?
Kenalkan strace - tool X-ray untuk Linux yang menunjukkan persis apa yang dilakukan program di balik layar.
Dalam panduan komprehensif ini, kita akan belajar dari dasar sampai mahir secara praktis. Baik kamu seorang Cloud Engineer, DevOps professional, atau system administrator, strace akan menjadi senjata rahasia untuk troubleshooting masalah misterius.
Mengapa Strace Itu Penting: Masalah Nyata yang Diselesaikan
Mimpi Buruk Debugging
Bayangkan skenario ini: Aplikasi tiba-tiba error βConnection failedβ tapi tidak memberikan detail. Ke mana sebenarnya dia coba koneksi? File apa yang dia akses? Permission apa yang dia butuhkan?
Tanpa strace, kamu debugging dalam kegelapan. Dengan strace, kamu bisa lihat:
- Setiap file yang program coba buka (dan apakah berhasil)
- Setiap percobaan koneksi network
- Setiap pengecekan permission
- Setiap interaksi dengan operating system
Impact Real-World
Di pekerjaan saya mengelola infrastruktur OpenStack, strace membantu mengidentifikasi library path yang salah konfigurasi yang menyebabkan service failure intermittent - masalah yang akan memakan waktu berhari-hari untuk debug tanpa tool ini.
Application logs tidak menunjukkan apa-apa yang berguna. Traditional debugging tools tidak bisa membantu karena service gagal secara random. Tapi dengan strace, saya lihat persis file .so mana yang dicari dan di mana dia mencarinya. Masalah solved dalam 15 menit.
Memahami Fondasi: Apa Itu System Calls?
Sebelum masuk ke strace, kita perlu paham system calls (syscalls) - jembatan antara program dan kernel Linux.
Analogi Hotel
Bayangkan program adalah tamu hotel dan kernel adalah resepsionis. Setiap kali tamu butuh sesuatu - room service, buka pintu, pakai fasilitas - mereka harus telepon resepsionis. Mereka tidak bisa langsung akses resource hotel.
Sama halnya, ketika program ingin:
- Baca atau tulis file β Harus minta ke kernel via syscall
- Buka koneksi network β Harus minta ke kernel via syscall
- Alokasi memory β Harus minta ke kernel via syscall
Strace adalah call log yang merekam setiap percakapan antara program dan kernel.
Mengapa Arsitektur Ini Ada?
Pemisahan ini ada untuk keamanan dan stabilitas:
- Keamanan: Program tidak bisa langsung akses hardware atau memory program lain
- Stabilitas: Kernel mengontrol alokasi resource, mencegah konflik
- Abstraksi: Program tidak perlu tahu spesifik hardware
Strace Pertama Kamu: Hello World di Bawah Mikroskop
Mari mulai dengan command paling sederhana:
strace echo "hello world"Apa yang Akan Kamu Lihat
execve("/usr/bin/echo", ["echo", "hello world"], 0x7fff9c8b7e90 /* 24 vars */) = 0
brk(NULL) = 0x55b8f4a3d000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd8e9c4a50) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9e3c8a5000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
write(1, "hello world\n", 12) = 12
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++Overwhelming, kan? Jangan khawatir! Mari kita pahami apa yang terjadi.
Menghitung System Calls
strace echo "hello world" 2>&1 | wc -lOutput:
113113 system calls hanya untuk print βhello worldβ! Tapi kenapa redirect dengan 2>&1?
Memahami stderr vs stdout: Tiga Channel Komunikasi
Di Linux, setiap program punya 3 channel komunikasi standard (file descriptors):
| File Descriptor | Nama | Tujuan | Analogi |
|---|---|---|---|
| 0 | stdin | Channel input | Inbox kamu (terima) |
| 1 | stdout | Output normal | Outbox kamu (kirim mail normal) |
| 2 | stderr | Output error/diagnostic | Mailbox urgent (kirim alert) |
Mengapa Ini Penting?
Strace sengaja output ke stderr (fd 2) supaya tidak mengotori output program yang sebenarnya di stdout (fd 1).
Eksperimen: Lihat Perbedaannya
# Redirect stdout saja
strace echo "hello" > output.txt
# Hasil: Kamu lihat output strace di layar, "hello" masuk ke file
# Redirect stderr saja
strace echo "hello" 2> output.txt
# Hasil: "hello" di layar, output strace masuk ke file
# Redirect keduanya ke tempat yang sama
strace echo "hello" 2>&1 | less
# Hasil: Semuanya di-pipe ke less untuk viewing mudahPro tip: Selalu pakai 2>&1 ketika piping output strace untuk analisa.
Memahami Format Syscall: Membaca Matrix
Setiap baris strace mengikuti pola ini:
nama_syscall(argument1, argument2, ...) = return_valueMenganalisa Contoh Nyata
write(1, "hello world\n", 12) = 12Mari kita bedah:
| Komponen | Value | Arti |
|---|---|---|
| nama syscall | write | Tulis data ke file descriptor |
| argument 1 | 1 | File descriptor 1 (stdout) |
| argument 2 | "hello world\n" | Data yang akan ditulis |
| argument 3 | 12 | Jumlah byte yang ditulis |
| return value | = 12 | Berhasil tulis 12 byte |
Menghitung Karakter
h e l l o w o r l d \n
1 2 3 4 5 6 7 8 9 10 11 12Perfect match! \n (newline) dihitung sebagai satu karakter.
Key insight: Return value memberitahu berapa byte yang sebenarnya diproses. Jika kurang dari yang diminta, ada yang salah.
File Descriptors: Sistem Referensi Program Kamu
Ini sangat penting untuk dipahami. File descriptors itu BUKAN data-nya - mereka adalah nomor referensi (handle) untuk akses resource.
Analogi Kantor yang Diperbaiki
Program = Kamu kerja di meja
Kernel = Manager resource
fd 0 (stdin) = Inbox kamu (terima input dari orang lain)
fd 1 (stdout) = Outbox kamu (kirim dokumen normal)
fd 2 (stderr) = Outbox urgent (kirim laporan error)
fd 3, 4, 5... = Laci filing cabinet (file/resource yang dibuka)Bukti Konsep
strace -e trace=openat,read,write,close cat /etc/hostnameKamu akan lihat urutan ini:
openat(AT_FDCWD, "/etc/hostname", O_RDONLY) = 3 β Buka file, dapat handle #3
read(3, "thinkx13\n", 131072) = 9 β Pakai handle #3 untuk BACA
write(1, "thinkx13\n", 9) = 9 β Pakai handle #1 untuk TULIS ke layar
close(3) = 0 β Kembalikan handle #3Perhatikan:
- Kita read dari fd 3 (file yang kita buka)
- Kita write ke fd 1 (stdout - layar)
- File descriptor hanya handle, bukan data itu sendiri
- Setelah
close(3), fd 3 bisa dipakai lagi untuk file lain
Mengapa Mulai dari 3?
Karena 0, 1, dan 2 selalu sudah dibuka ketika program start:
- 0 = stdin (biasanya keyboard kamu)
- 1 = stdout (biasanya layar kamu)
- 2 = stderr (biasanya layar kamu)
Jadi file pertama yang kamu buka dapat fd 3, berikutnya fd 4, dan seterusnya.
Teknik Praktis Strace: Dari Noise ke Signal
1. Filter Syscall Tertentu
Daripada lihat semua 113 syscalls, fokus ke yang penting:
strace -e trace=openat,read,write,close echo "hello world"Output jauh lebih bersih:
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0"..., 832) = 832
close(3) = 0
write(1, "hello world\n", 12) = 12
close(1) = 0
close(2) = 02. Mode Statistik: Pandangan High-Level
strace -c echo "hello world"Output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 1 read
0.00 0.000000 0 1 write
0.00 0.000000 0 3 close
0.00 0.000000 0 7 mmap
0.00 0.000000 0 3 mprotect
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 47 3 totalIni menunjukkan:
- Total jumlah setiap tipe syscall
- Berapa yang gagal (kolom errors)
- Waktu yang dihabiskan di setiap syscall
- Persentase dari total waktu
Use case: Cepat identifikasi syscall mana yang paling banyak menghabiskan waktu.
3. Filter berdasarkan Kategori
# Hanya network syscalls
strace -e trace=network curl google.com
# Hanya file operations
strace -e trace=file ls /etc
# Hanya process operations
strace -e trace=process bash -c "echo hello"
# Semua KECUALI memory operations
strace -e trace=\!memory cat /etc/hostnameSkenario Debugging Real-World: Misteri Permission Denied
Masalahnya
Mari simulasi skenario debugging umum:
# Buat file tanpa permission
echo "secret data" > /tmp/secret.txt
chmod 000 /tmp/secret.txt
# Coba baca
cat /tmp/secret.txtOutput:
cat: /tmp/secret.txt: Permission deniedError message generic! Tidak terlalu membantu.
Pakai Strace untuk Investigasi
strace cat /tmp/secret.txt 2>&1 | grep secretOutput:
openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = -1 EACCES (Permission denied)Sekarang kita lihat:
- syscall:
openat- gagal di tahap buka - path:
/tmp/secret.txt- konfirmasi path file - flags:
O_RDONLY- coba buka untuk baca - return:
-1 EACCES- kode error spesifik untuk permission denied
Perbaikannya
# Cek permission sebenarnya
ls -l /tmp/secret.txt
# Output: ---------- 1 user user 12 Nov 6 10:30 /tmp/secret.txt
# Perbaiki permission
chmod 644 /tmp/secret.txt
# Verifikasi dengan strace
strace -e openat cat /tmp/secret.txt 2>&1 | grep secretOutput baru:
openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = 3 β Berhasil! Dapat fd 3Memahami Return Values: Berhasil atau Gagal?
Return values memberitahu apakah syscall berhasil atau gagal:
| Return Value | Arti | Contoh |
|---|---|---|
| Angka positif | Berhasil, biasanya byte yang diproses | write(...) = 12 |
| 0 | Berhasil (EOF atau tidak ada yang diproses) | read(...) = 0 |
| -1 KODEERROR | Gagal dengan error spesifik | openat(...) = -1 ENOENT |
Kode Error Umum
# ENOENT - File tidak ada
strace cat /file_tidak_ada 2>&1 | grep ENOENT
# EACCES - Permission denied
strace cat /etc/shadow 2>&1 | grep EACCES
# EISDIR - Adalah directory (tidak bisa cat directory)
strace cat /etc 2>&1 | grep EISDIR
# ECONNREFUSED - Connection refused (network)
strace curl http://localhost:9999 2>&1 | grep ECONNREFUSEDPro tip: Memahami error codes membantu debug lebih cepat. Google βerrno ENOENTβ untuk penjelasan detail.
Pola Fallback: Bagaimana Program Handle Failure dengan Graceful
Program sering coba beberapa lokasi sebelum menyerah. Mari lihat aksi ini:
strace -e trace=openat cat /etc/hostname 2>&1 | grep localeKamu akan lihat pola seperti:
openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", ...) = -1 ENOENT
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", ...) = 3Program punya priority list:
- Coba lokasi A (C.UTF-8) β Gagal (file tidak ada)
- Coba lokasi B (C.utf8) β Berhasil!
Ini disebut mekanisme fallback - degradasi graceful daripada crash langsung.
Mengapa Ini Penting
Ketika kamu lihat beberapa openat calls yang gagal, jangan panik! Program hanya coba berbagai opsi. Baru worry kalau semua percobaan gagal.
Memahami AT_FDCWD: Magic Relative Path
Kamu sudah lihat AT_FDCWD di setiap panggilan openat(). Apa itu?
AT_FDCWD = βAT File Descriptor Current Working Directoryβ
Artinya: βBuka file ini relatif terhadap current working directoryβ
Bukti dengan Eksperimen
# Dari /tmp
cd /tmp
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = -1 ENOENT
# Dia cari /tmp/hostname - tidak ada!
# Dari /etc
cd /etc
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = 3
# Dia cari /etc/hostname - Berhasil!Key insight: Relative paths di-resolve dari directory kamu saat ini. Itu sebabnya cd penting!
Advanced: Trace Process yang Sedang Berjalan
Kamu bisa attach strace ke process yang sudah running:
# Cari process ID
ps aux | grep nginx
# Attach ke process (butuh root/sudo)
sudo strace -p <PID>
# Follow child processes juga
sudo strace -f -p <PID>
# Save ke file untuk analisa nanti
sudo strace -o trace.log -p <PID>Use Case Real-World
Skenario: Web server production kamu kadang lambat, tapi tidak bisa reproduce di testing.
Solusi: Attach strace selama periode lambat:
sudo strace -c -p $(pgrep nginx | head -1)
# Biarkan jalan 60 detik
# Tekan Ctrl+C
# Sekarang analisa syscall mana yang paling banyak menghabiskan waktuPeringatan: Strace menambah overhead signifikan (10-100x lebih lambat). Gunakan dengan hemat di production!
Pertimbangan Performance: Kapan TIDAK Pakai Strace
Masalah Overhead
Strace menambah overhead massive karena:
- Harus intercept setiap syscall
- Harus stop process untuk baca arguments
- Harus format dan output data
Impact: Program kamu bisa jalan 10-100x lebih lambat!
Alternatif Lebih Aman untuk Production
# Hanya trace syscall spesifik (overhead lebih rendah)
strace -e trace=openat,connect -p <PID>
# Set time limits
timeout 10s strace -p <PID>
# Count calls saja (overhead minimal)
strace -c -p <PID>Kapan Pakai Strace di Production
β Use case bagus:
- Diagnosa masalah spesifik untuk waktu singkat
- Pahami kenapa service tidak mau start
- Cari configuration files yang sedang diakses
β Use case jelek:
- Monitoring terus-menerus (pakai proper monitoring tools)
- Performance profiling (pakai
perf) - Production load testing (akan mengacaukan hasil)
Latihan Debugging Praktis
Latihan 1: Cari Configuration Files
Pertanyaan: Configuration files apa yang dibaca sshd saat startup?
strace -e trace=openat /usr/sbin/sshd -t 2>&1 | grep "\.conf"Kamu akan discover files seperti:
/etc/ssh/sshd_config/etc/gai.conf/etc/nsswitch.conf
Latihan 2: Debug Network Connection
Pertanyaan: Ke mana curl coba connect ketika dapat timeout?
strace -e trace=connect curl https://example.com 2>&1Cari connect() syscall dengan IP address dan port number.
Latihan 3: Cari Missing Library
Pertanyaan: Kenapa binary custom ini gagal dengan βerror while loading shared librariesβ?
strace ./my_program 2>&1 | grep "\.so"Kamu akan lihat file .so mana yang dicari dan di mana.
Kesalahan Umum dan Cara Menghindarinya
1. Lupa Redirect stderr
# β Salah - output strace ke layar, hilang di pipe
strace cat /etc/hostname | grep hostname
# β
Benar - redirect stderr ke stdout dulu
strace cat /etc/hostname 2>&1 | grep hostname2. Tidak Filter Ketika Perlu
# β Terlalu banyak noise - ribuan baris
strace curl google.com
# β
Fokus ke yang penting
strace -e trace=network,openat curl google.com3. Kurang Permission
# β Gagal ketika trace process user lain
strace -p 1234
# β
Pakai sudo
sudo strace -p 12344. Lupa Tentang Child Processes
# β Hanya trace parent process
strace bash -c "ls /tmp"
# β
Follow children dengan -f
strace -f bash -c "ls /tmp"Cheat Sheet Referensi Cepat
# ============ PENGGUNAAN DASAR ============
strace <command> # Tracing dasar
strace -o output.log <command> # Save ke file
strace -p <PID> # Attach ke running process
# ============ FILTERING ============
strace -e trace=openat,read <command> # Syscall spesifik
strace -e trace=file <command> # File operations saja
strace -e trace=network <command> # Network operations saja
strace -e trace=process <command> # Process operations saja
strace -e trace=\!memory <command> # Semua KECUALI memory
# ============ KONTROL OUTPUT ============
strace -c <command> # Summary statistik
strace -t <command> # Show timestamps
strace -T <command> # Show waktu per call
strace -v <command> # Verbose (full structures)
strace -s 1000 <command> # Show 1000 byte pertama dari strings
# ============ KONTROL PROCESS ============
strace -f <command> # Follow child processes
strace -ff -o trace <command> # File terpisah per process
sudo strace -p <PID> # Attach ke running process
# ============ ADVANCED ============
strace -e trace=file -e signal=none <cmd> # Tidak ada signal output
strace -y <command> # Show file descriptor paths
strace -k <command> # Show stack tracesLangkah Selanjutnya: Beyond Strace
Kamu sudah menguasai fundamental! Berikut yang bisa dieksplorasi selanjutnya:
1. ltrace - Library Call Tracer
Mirip strace tapi trace library function calls daripada syscalls.
ltrace ls2. perf - Performance Analysis
Lebih baik untuk production performance profiling dengan overhead minimal.
perf record -g -p <PID>
perf report3. bpftrace - Modern System Tracing
Pakai eBPF untuk efficient tracing dengan custom scripts.
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args->filename)); }'4. SystemTap - Advanced System Instrumentation
Enterprise-grade tracing dan monitoring.
Kesimpulan: Kekuatan Debug Baru Kamu
Strace seperti punya X-ray vision untuk sistem Linux kamu. Kamu sudah belajar:
β
Apa itu syscalls dan kenapa penting
β
Cara baca dan interpretasi output strace
β
File descriptors dan fungsinya
β
Teknik debugging praktis untuk masalah real
β
Pola umum, error, dan cara fix-nya
β
Kapan pakai strace dan kapan pakai alternatif
Challenge Praktik
Cara terbaik menguasai strace adalah lewat praktik. Ini challenge kamu:
- Pilih command apa saja yang kamu pakai sehari-hari
- Jalankan dengan
strace -cuntuk lihat statistik - Cari top 3 syscalls yang paling sering dipanggil
- Research apa yang dilakukan syscalls tersebut
- Jalankan lagi dengan detailed tracing hanya untuk syscalls itu
Setiap kali menemui error misterius, gunakan strace terlebih dahulu. Kamu akan kagum dengan apa yang ditemukan!
Bagikan Pengalaman Kamu
Pernah pakai strace untuk solve masalah sulit? Menemukan pola menarik? Saya ingin dengar ceritanya!
Follow saya untuk lebih banyak technical content deep-dive tentang Linux, Cloud Engineering, dan Infrastructure.
Tags: #Linux #Strace #Debugging #Syscalls #SystemAdministration #DevOps #CloudEngineering #Troubleshooting #LinuxInternals #OpenStack #InfrastructureMonitoring #SystemCalls #FileDescriptors #RealWorldDebugging
Happy debugging! | Selamat debugging! ππ