The Linux Debugging Superpower Nobody Teaches You (Strace/SysCalls) | Kekuatan Debug Linux yang Tidak Diajarkan

The Linux Debugging Superpower Nobody Teaches You (Strace/SysCalls) | Kekuatan Debug Linux yang Tidak Diajarkan

Blog Series

Deep Dive Linux & Networking: The Real Engineering Path

Part 3 of 10

Updated:
96 min read
12020 wordsSystem Administration
linuxstracesyscalls

What if you could see exactly what your programs do behind the scenes? Learn strace - the debugging tool that reveals every file access, network call, and system interaction. From zero to practical mastery. | Bagaimana kalau kamu bisa lihat persis apa yang program lakukan di balik layar? Pelajari strace - tool debugging yang mengungkap setiap akses file, network call, dan interaksi sistem. Dari nol sampai mahir.

Introduction: The Hidden World Your Programs Live In

Have you ever wondered what really happens when you run a simple command like echo "hello world"? Or struggled to debug why your application suddenly can’t open a file or connect to a network?

Meet strace - the X-ray vision tool for Linux that shows you exactly what your programs are doing behind the scenes.

In this comprehensive guide, we’ll journey from absolute basics to practical mastery. Whether you’re a Cloud Engineer, DevOps professional, or system administrator, strace will become your secret weapon for troubleshooting mysterious issues.


Why Strace Matters: The Real Problem It Solves

The Debugging Nightmare

Imagine this scenario: Your application suddenly throws β€œConnection failed” but gives no details. Where is it trying to connect? What files is it accessing? What permissions does it need?

Without strace, you’re debugging in the dark. With strace, you can see:

  • Every file your program tries to open (and whether it succeeds)
  • Every network connection attempt
  • Every permission check
  • Every interaction with the operating system

Real-World Impact

In my career, I encountered a classic example. A critical service randomly failed to start.

The application logs showed nothing useful. Traditional debugging tools couldn’t help because the service would randomly fail. But with strace, I saw exactly which .so file it was looking for and where it was searching. Problem solved in 15 minutes.


Understanding the Foundation: What Are System Calls?

Before diving into strace, we need to understand system calls (syscalls) - the bridge between your program and the Linux kernel.

The Hotel Analogy

Think of your program as a hotel guest and the kernel as the front desk. Every time the guest needs something - room service, opening a door, using facilities - they must call the front desk. They can’t directly access hotel resources.

Similarly, when a program wants to:

  • Read or write a file β†’ Must ask kernel via syscall
  • Open a network connection β†’ Must ask kernel via syscall
  • Allocate memory β†’ Must ask kernel via syscall

Strace is the call log that records every conversation between your program and the kernel.

Why This Architecture?

This separation exists for security and stability:

  • Security: Programs can’t directly access hardware or other programs’ memory
  • Stability: The kernel controls resource allocation, preventing conflicts
  • Abstraction: Programs don’t need to know hardware specifics

Your First Strace: Hello World Under the Microscope

Let’s start with the simplest possible command:

strace echo "hello world"

What You’ll See

execve("/usr/bin/echo", ["echo", "hello world"], 0x7fff9c8b7e90 /* 24 vars */) = 0
brk(NULL)                               = 0x55b8f4a3d000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd8e9c4a50) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9e3c8a5000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
write(1, "hello world\n", 12)           = 12
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Overwhelming, right? Don’t worry! Let’s understand what’s happening.

Counting System Calls

strace echo "hello world" 2>&1 | wc -l

Output:

113

113 system calls just to print β€œhello world”! But why redirect with 2>&1?


Understanding stderr vs stdout: The Three Communication Channels

In Linux, every program has 3 standard communication channels (file descriptors):

File DescriptorNamePurposeAnalogy
0stdinInput channelYour inbox (receive)
1stdoutNormal outputYour outbox (send normal mail)
2stderrError/diagnostic outputYour urgent mailbox (send alerts)

Why Does This Matter?

Strace deliberately outputs to stderr (fd 2) so it doesn’t interfere with the actual program output on stdout (fd 1).

Experiment: See the Difference

# Redirect stdout only
strace echo "hello" > output.txt
# Result: You see strace output on screen, "hello" goes to file
 
# Redirect stderr only  
strace echo "hello" 2> output.txt
# Result: "hello" on screen, strace output goes to file
 
# Redirect both to the same place
strace echo "hello" 2>&1 | less
# Result: Everything piped to less for easy viewing

Pro tip: Always use 2>&1 when piping strace output for analysis.


Decoding Syscall Format: Reading the Matrix

Every strace line follows this pattern:

syscall_name(argument1, argument2, ...) = return_value

Analyzing a Real Example

write(1, "hello world\n", 12) = 12

Let’s break it down:

ComponentValueMeaning
syscall namewriteWrite data to a file descriptor
argument 11File descriptor 1 (stdout)
argument 2"hello world\n"The actual data to write
argument 312Number of bytes to write
return value= 12Successfully wrote 12 bytes

Counting Characters

h e l l o   w o r l d \n
1 2 3 4 5 6 7 8 9 10 11 12

Perfect match! The \n (newline) counts as one character.

Key insight: The return value tells you how many bytes were actually processed. If it’s less than requested, something went wrong.


File Descriptors: Your Program’s Reference System

This is crucial to understand. File descriptors are NOT the data - they are reference numbers (handles) to access resources.

The Corrected Office Analogy

Program = You working at your desk
Kernel = Resource manager
 
fd 0 (stdin)  = Your inbox    (receive input from others)
fd 1 (stdout) = Your outbox   (send normal documents)  
fd 2 (stderr) = Urgent outbox (send error reports)
fd 3, 4, 5... = Filing cabinet drawers (opened files/resources)

Proof of Concept

strace -e trace=openat,read,write,close cat /etc/hostname

You’ll see this sequence:

openat(AT_FDCWD, "/etc/hostname", O_RDONLY) = 3  ← Open file, get handle #3
read(3, "thinkx13\n", 131072)            = 9     ← Use handle #3 to READ
write(1, "thinkx13\n", 9)                = 9     ← Use handle #1 to WRITE to screen
close(3)                                 = 0     ← Return handle #3

Notice:

  1. We read from fd 3 (the file we opened)
  2. We write to fd 1 (stdout - the screen)
  3. File descriptor is just a handle, not the data itself
  4. After close(3), fd 3 can be reused for another file

Why Start at 3?

Because 0, 1, and 2 are always pre-opened when your program starts:

  • 0 = stdin (usually your keyboard)
  • 1 = stdout (usually your screen)
  • 2 = stderr (usually your screen)

So the first file you open gets fd 3, the next gets fd 4, and so on.


Practical Strace Techniques: From Noise to Signal

1. Filtering Specific Syscalls

Instead of seeing all 113 syscalls, focus on what matters:

strace -e trace=openat,read,write,close echo "hello world"

Output is much cleaner:

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0"..., 832) = 832
close(3)                                = 0
write(1, "hello world\n", 12)           = 12
close(1)                                = 0
close(2)                                = 0

2. Statistics Mode: The High-Level View

strace -c echo "hello world"

Output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         1           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         3           close
  0.00    0.000000           0         7           mmap
  0.00    0.000000           0         3           mprotect
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000           0        47         3 total

This shows:

  • Total number of each syscall type
  • How many failed (errors column)
  • Time spent in each syscall
  • Percentage of total time

Use case: Quickly identify which syscalls are taking the most time.

3. Filter by Category

# Only network syscalls
strace -e trace=network curl google.com
 
# Only file operations
strace -e trace=file ls /etc
 
# Only process operations
strace -e trace=process bash -c "echo hello"
 
# Everything EXCEPT memory operations
strace -e trace=\!memory cat /etc/hostname

Real-World Debugging Scenario: Permission Denied Mystery

The Problem

Let’s simulate a common debugging scenario:

# Create a file with no permissions
echo "secret data" > /tmp/secret.txt
chmod 000 /tmp/secret.txt
 
# Try to read it
cat /tmp/secret.txt

Output:

cat: /tmp/secret.txt: Permission denied

Generic error message! Not very helpful.

Using Strace to Investigate

strace cat /tmp/secret.txt 2>&1 | grep secret

Output:

openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = -1 EACCES (Permission denied)

Now we see:

  • syscall: openat - it’s failing at the open stage
  • path: /tmp/secret.txt - confirmed the file path
  • flags: O_RDONLY - trying to open for reading
  • return: -1 EACCES - specific error code for permission denied

The Fix

# Check actual permissions
ls -l /tmp/secret.txt
# Output: ---------- 1 user user 12 Nov  6 10:30 /tmp/secret.txt
 
# Fix permissions
chmod 644 /tmp/secret.txt
 
# Verify with strace
strace -e openat cat /tmp/secret.txt 2>&1 | grep secret

New output:

openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = 3  ← Success! Got fd 3

Understanding Return Values: Success or Failure?

Return values tell you whether the syscall succeeded or failed:

Return ValueMeaningExample
Positive numberSuccess, usually bytes processedwrite(...) = 12
0Success (EOF or nothing to process)read(...) = 0
-1 ERRORCODEFailed with specific erroropenat(...) = -1 ENOENT

Common Error Codes

# ENOENT - File doesn't exist
strace cat /file_does_not_exist 2>&1 | grep ENOENT
 
# EACCES - Permission denied
strace cat /etc/shadow 2>&1 | grep EACCES
 
# EISDIR - Is a directory (can't cat a directory)
strace cat /etc 2>&1 | grep EISDIR
 
# ECONNREFUSED - Connection refused (network)
strace curl http://localhost:9999 2>&1 | grep ECONNREFUSED

Pro tip: Understanding error codes helps you debug faster. Google β€œerrno ENOENT” for detailed explanations.


The Fallback Pattern: How Programs Handle Failure Gracefully

Programs often try multiple locations before giving up. Let’s see this in action:

strace -e trace=openat cat /etc/hostname 2>&1 | grep locale

You’ll see patterns like:

openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", ...) = -1 ENOENT
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", ...) = 3

The program has a priority list:

  1. Try location A (C.UTF-8) β†’ Failed (file doesn’t exist)
  2. Try location B (C.utf8) β†’ Success!

This is called a fallback mechanism - graceful degradation instead of immediate crash.

Why This Matters

When you see multiple failed openat calls, don’t panic! The program is just trying different options. Only worry if all attempts fail.


Understanding AT_FDCWD: Relative Path Magic

You’ve seen AT_FDCWD in every openat() call. What is it?

AT_FDCWD = β€œAT File Descriptor Current Working Directory”

It means: β€œOpen this file relative to the current working directory”

Proof with Experiment

# From /tmp
cd /tmp
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = -1 ENOENT
# It looks for /tmp/hostname - doesn't exist!
 
# From /etc
cd /etc
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = 3
# It looks for /etc/hostname - Success!

Key insight: Relative paths are resolved from your current directory. That’s why cd matters!


Advanced: Tracing Running Processes

You can attach strace to already-running processes:

# Find process ID
ps aux | grep nginx
 
# Attach to it (requires root/sudo)
sudo strace -p <PID>
 
# Follow child processes too
sudo strace -f -p <PID>
 
# Save to file for later analysis
sudo strace -o trace.log -p <PID>

Real-World Use Case

Scenario: Your production web server is occasionally slow, but you can’t reproduce it in testing.

Solution: Attach strace during the slow period:

sudo strace -c -p $(pgrep nginx | head -1)
# Let it run for 60 seconds
# Press Ctrl+C
 
# Now analyze which syscalls are taking the most time

Warning: Strace adds significant overhead (10-100x slower). Use it sparingly in production!


Performance Considerations: When NOT to Use Strace

The Overhead Problem

Strace adds massive overhead because:

  • It must intercept every syscall
  • It must stop the process to read arguments
  • It must format and output the data

Impact: Your program can run 10-100x slower!

Safer Alternatives for Production

# Only trace specific syscalls (less overhead)
strace -e trace=openat,connect -p <PID>
 
# Set time limits
timeout 10s strace -p <PID>
 
# Count calls only (minimal overhead)
strace -c -p <PID>

When to Use Strace in Production

βœ… Good use cases:

  • Diagnosing a specific issue for a short time
  • Understanding why a service won’t start
  • Finding configuration files being accessed

❌ Bad use cases:

  • Continuous monitoring (use proper monitoring tools instead)
  • Performance profiling (use perf instead)
  • Production load testing (will skew results)

Practical Debugging Exercises

Exercise 1: Find Configuration Files

Question: What configuration files does sshd read on startup?

strace -e trace=openat /usr/sbin/sshd -t 2>&1 | grep "\.conf"

You’ll discover files like:

  • /etc/ssh/sshd_config
  • /etc/gai.conf
  • /etc/nsswitch.conf

Exercise 2: Debug Network Connection

Question: Where is curl trying to connect when you hit a timeout?

strace -e trace=connect curl https://example.com 2>&1

Look for the connect() syscall with IP addresses and port numbers.

Exercise 3: Find Missing Library

Question: Why does this custom binary fail with β€œerror while loading shared libraries”?

strace ./my_program 2>&1 | grep "\.so"

You’ll see which .so files it’s looking for and where.


Common Pitfalls and How to Avoid Them

1. Forgetting stderr Redirect

# ❌ Wrong - strace output goes to screen, gets lost in pipe
strace cat /etc/hostname | grep hostname
 
# βœ… Correct - redirect stderr to stdout first
strace cat /etc/hostname 2>&1 | grep hostname

2. Not Filtering When Needed

# ❌ Too much noise - thousands of lines
strace curl google.com
 
# βœ… Focus on what matters
strace -e trace=network,openat curl google.com

3. Missing Permissions

# ❌ Fails when tracing other users' processes
strace -p 1234
 
# βœ… Use sudo
sudo strace -p 1234

4. Forgetting About Child Processes

# ❌ Only traces parent process
strace bash -c "ls /tmp"
 
# βœ… Follow children with -f
strace -f bash -c "ls /tmp"

Quick Reference Cheat Sheet

# ============ BASIC USAGE ============
strace <command>                           # Basic tracing
strace -o output.log <command>             # Save to file
strace -p <PID>                            # Attach to running process
 
# ============ FILTERING ============
strace -e trace=openat,read <command>      # Specific syscalls
strace -e trace=file <command>             # File operations only
strace -e trace=network <command>          # Network operations only
strace -e trace=process <command>          # Process operations only
strace -e trace=\!memory <command>         # Everything EXCEPT memory
 
# ============ OUTPUT CONTROL ============
strace -c <command>                        # Statistics summary
strace -t <command>                        # Show timestamps
strace -T <command>                        # Show time spent per call
strace -v <command>                        # Verbose (full structures)
strace -s 1000 <command>                   # Show first 1000 bytes of strings
 
# ============ PROCESS CONTROL ============
strace -f <command>                        # Follow child processes
strace -ff -o trace <command>              # Separate file per process
sudo strace -p <PID>                       # Attach to running process
 
# ============ ADVANCED ============
strace -e trace=file -e signal=none <cmd>  # No signal output
strace -y <command>                        # Show file descriptor paths
strace -k <command>                        # Show stack traces

Next Steps: Beyond Strace

You’ve mastered the fundamentals! Here’s what to explore next:

1. ltrace - Library Call Tracer

Similar to strace but traces library function calls instead of syscalls.

ltrace ls

2. perf - Performance Analysis

Better for production performance profiling with minimal overhead.

perf record -g -p <PID>
perf report

3. bpftrace - Modern System Tracing

Uses eBPF for efficient tracing with custom scripts.

bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args->filename)); }'

4. SystemTap - Advanced System Instrumentation

Enterprise-grade tracing and monitoring.


Conclusion: Your New Debugging Superpower

Strace is like having X-ray vision for your Linux system. You’ve learned:

βœ… What syscalls are and why they matter
βœ… How to read and interpret strace output
βœ… File descriptors and their purpose
βœ… Practical debugging techniques for real problems
βœ… Common patterns, errors, and how to fix them
βœ… When to use strace and when to use alternatives

The Practice Challenge

The best way to master strace is through practice. Here’s your challenge:

  1. Pick any command you use daily
  2. Run it with strace -c to see statistics
  3. Find the top 3 most-called syscalls
  4. Research what those syscalls do
  5. Run it again with detailed tracing of just those syscalls

Every time you encounter a mysterious error, reach for strace first. You’ll be amazed at what you discover!

Share Your Experience

Have you used strace to solve a difficult problem? Found an interesting pattern? I’d love to hear about it!

Follow me for more deep-dive technical content on Linux, Cloud Engineering, and Infrastructure.


Pendahuluan: Dunia Tersembunyi Tempat Program Kamu Hidup

Pernahkah kamu penasaran apa yang sebenarnya terjadi ketika menjalankan command sederhana seperti echo "hello world"? Atau kesulitan men-debug kenapa aplikasi tiba-tiba tidak bisa buka file atau koneksi ke network?

Kenalkan strace - tool X-ray untuk Linux yang menunjukkan persis apa yang dilakukan program di balik layar.

Dalam panduan komprehensif ini, kita akan belajar dari dasar sampai mahir secara praktis. Baik kamu seorang Cloud Engineer, DevOps professional, atau system administrator, strace akan menjadi senjata rahasia untuk troubleshooting masalah misterius.


Mengapa Strace Itu Penting: Masalah Nyata yang Diselesaikan

Mimpi Buruk Debugging

Bayangkan skenario ini: Aplikasi tiba-tiba error β€œConnection failed” tapi tidak memberikan detail. Ke mana sebenarnya dia coba koneksi? File apa yang dia akses? Permission apa yang dia butuhkan?

Tanpa strace, kamu debugging dalam kegelapan. Dengan strace, kamu bisa lihat:

  • Setiap file yang program coba buka (dan apakah berhasil)
  • Setiap percobaan koneksi network
  • Setiap pengecekan permission
  • Setiap interaksi dengan operating system

Impact Real-World

Di pekerjaan saya mengelola infrastruktur OpenStack, strace membantu mengidentifikasi library path yang salah konfigurasi yang menyebabkan service failure intermittent - masalah yang akan memakan waktu berhari-hari untuk debug tanpa tool ini.

Application logs tidak menunjukkan apa-apa yang berguna. Traditional debugging tools tidak bisa membantu karena service gagal secara random. Tapi dengan strace, saya lihat persis file .so mana yang dicari dan di mana dia mencarinya. Masalah solved dalam 15 menit.


Memahami Fondasi: Apa Itu System Calls?

Sebelum masuk ke strace, kita perlu paham system calls (syscalls) - jembatan antara program dan kernel Linux.

Analogi Hotel

Bayangkan program adalah tamu hotel dan kernel adalah resepsionis. Setiap kali tamu butuh sesuatu - room service, buka pintu, pakai fasilitas - mereka harus telepon resepsionis. Mereka tidak bisa langsung akses resource hotel.

Sama halnya, ketika program ingin:

  • Baca atau tulis file β†’ Harus minta ke kernel via syscall
  • Buka koneksi network β†’ Harus minta ke kernel via syscall
  • Alokasi memory β†’ Harus minta ke kernel via syscall

Strace adalah call log yang merekam setiap percakapan antara program dan kernel.

Mengapa Arsitektur Ini Ada?

Pemisahan ini ada untuk keamanan dan stabilitas:

  • Keamanan: Program tidak bisa langsung akses hardware atau memory program lain
  • Stabilitas: Kernel mengontrol alokasi resource, mencegah konflik
  • Abstraksi: Program tidak perlu tahu spesifik hardware

Strace Pertama Kamu: Hello World di Bawah Mikroskop

Mari mulai dengan command paling sederhana:

strace echo "hello world"

Apa yang Akan Kamu Lihat

execve("/usr/bin/echo", ["echo", "hello world"], 0x7fff9c8b7e90 /* 24 vars */) = 0
brk(NULL)                               = 0x55b8f4a3d000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd8e9c4a50) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9e3c8a5000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
write(1, "hello world\n", 12)           = 12
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Overwhelming, kan? Jangan khawatir! Mari kita pahami apa yang terjadi.

Menghitung System Calls

strace echo "hello world" 2>&1 | wc -l

Output:

113

113 system calls hanya untuk print β€œhello world”! Tapi kenapa redirect dengan 2>&1?


Memahami stderr vs stdout: Tiga Channel Komunikasi

Di Linux, setiap program punya 3 channel komunikasi standard (file descriptors):

File DescriptorNamaTujuanAnalogi
0stdinChannel inputInbox kamu (terima)
1stdoutOutput normalOutbox kamu (kirim mail normal)
2stderrOutput error/diagnosticMailbox urgent (kirim alert)

Mengapa Ini Penting?

Strace sengaja output ke stderr (fd 2) supaya tidak mengotori output program yang sebenarnya di stdout (fd 1).

Eksperimen: Lihat Perbedaannya

# Redirect stdout saja
strace echo "hello" > output.txt
# Hasil: Kamu lihat output strace di layar, "hello" masuk ke file
 
# Redirect stderr saja
strace echo "hello" 2> output.txt
# Hasil: "hello" di layar, output strace masuk ke file
 
# Redirect keduanya ke tempat yang sama
strace echo "hello" 2>&1 | less
# Hasil: Semuanya di-pipe ke less untuk viewing mudah

Pro tip: Selalu pakai 2>&1 ketika piping output strace untuk analisa.


Memahami Format Syscall: Membaca Matrix

Setiap baris strace mengikuti pola ini:

nama_syscall(argument1, argument2, ...) = return_value

Menganalisa Contoh Nyata

write(1, "hello world\n", 12) = 12

Mari kita bedah:

KomponenValueArti
nama syscallwriteTulis data ke file descriptor
argument 11File descriptor 1 (stdout)
argument 2"hello world\n"Data yang akan ditulis
argument 312Jumlah byte yang ditulis
return value= 12Berhasil tulis 12 byte

Menghitung Karakter

h e l l o   w o r l d \n
1 2 3 4 5 6 7 8 9 10 11 12

Perfect match! \n (newline) dihitung sebagai satu karakter.

Key insight: Return value memberitahu berapa byte yang sebenarnya diproses. Jika kurang dari yang diminta, ada yang salah.


File Descriptors: Sistem Referensi Program Kamu

Ini sangat penting untuk dipahami. File descriptors itu BUKAN data-nya - mereka adalah nomor referensi (handle) untuk akses resource.

Analogi Kantor yang Diperbaiki

Program = Kamu kerja di meja
Kernel = Manager resource
 
fd 0 (stdin)  = Inbox kamu    (terima input dari orang lain)
fd 1 (stdout) = Outbox kamu   (kirim dokumen normal)  
fd 2 (stderr) = Outbox urgent (kirim laporan error)
fd 3, 4, 5... = Laci filing cabinet (file/resource yang dibuka)

Bukti Konsep

strace -e trace=openat,read,write,close cat /etc/hostname

Kamu akan lihat urutan ini:

openat(AT_FDCWD, "/etc/hostname", O_RDONLY) = 3  ← Buka file, dapat handle #3
read(3, "thinkx13\n", 131072)            = 9     ← Pakai handle #3 untuk BACA
write(1, "thinkx13\n", 9)                = 9     ← Pakai handle #1 untuk TULIS ke layar
close(3)                                 = 0     ← Kembalikan handle #3

Perhatikan:

  1. Kita read dari fd 3 (file yang kita buka)
  2. Kita write ke fd 1 (stdout - layar)
  3. File descriptor hanya handle, bukan data itu sendiri
  4. Setelah close(3), fd 3 bisa dipakai lagi untuk file lain

Mengapa Mulai dari 3?

Karena 0, 1, dan 2 selalu sudah dibuka ketika program start:

  • 0 = stdin (biasanya keyboard kamu)
  • 1 = stdout (biasanya layar kamu)
  • 2 = stderr (biasanya layar kamu)

Jadi file pertama yang kamu buka dapat fd 3, berikutnya fd 4, dan seterusnya.


Teknik Praktis Strace: Dari Noise ke Signal

1. Filter Syscall Tertentu

Daripada lihat semua 113 syscalls, fokus ke yang penting:

strace -e trace=openat,read,write,close echo "hello world"

Output jauh lebih bersih:

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0"..., 832) = 832
close(3)                                = 0
write(1, "hello world\n", 12)           = 12
close(1)                                = 0
close(2)                                = 0

2. Mode Statistik: Pandangan High-Level

strace -c echo "hello world"

Output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         1           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         3           close
  0.00    0.000000           0         7           mmap
  0.00    0.000000           0         3           mprotect
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000           0        47         3 total

Ini menunjukkan:

  • Total jumlah setiap tipe syscall
  • Berapa yang gagal (kolom errors)
  • Waktu yang dihabiskan di setiap syscall
  • Persentase dari total waktu

Use case: Cepat identifikasi syscall mana yang paling banyak menghabiskan waktu.

3. Filter berdasarkan Kategori

# Hanya network syscalls
strace -e trace=network curl google.com
 
# Hanya file operations
strace -e trace=file ls /etc
 
# Hanya process operations
strace -e trace=process bash -c "echo hello"
 
# Semua KECUALI memory operations
strace -e trace=\!memory cat /etc/hostname

Skenario Debugging Real-World: Misteri Permission Denied

Masalahnya

Mari simulasi skenario debugging umum:

# Buat file tanpa permission
echo "secret data" > /tmp/secret.txt
chmod 000 /tmp/secret.txt
 
# Coba baca
cat /tmp/secret.txt

Output:

cat: /tmp/secret.txt: Permission denied

Error message generic! Tidak terlalu membantu.

Pakai Strace untuk Investigasi

strace cat /tmp/secret.txt 2>&1 | grep secret

Output:

openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = -1 EACCES (Permission denied)

Sekarang kita lihat:

  • syscall: openat - gagal di tahap buka
  • path: /tmp/secret.txt - konfirmasi path file
  • flags: O_RDONLY - coba buka untuk baca
  • return: -1 EACCES - kode error spesifik untuk permission denied

Perbaikannya

# Cek permission sebenarnya
ls -l /tmp/secret.txt
# Output: ---------- 1 user user 12 Nov  6 10:30 /tmp/secret.txt
 
# Perbaiki permission
chmod 644 /tmp/secret.txt
 
# Verifikasi dengan strace
strace -e openat cat /tmp/secret.txt 2>&1 | grep secret

Output baru:

openat(AT_FDCWD, "/tmp/secret.txt", O_RDONLY) = 3  ← Berhasil! Dapat fd 3

Memahami Return Values: Berhasil atau Gagal?

Return values memberitahu apakah syscall berhasil atau gagal:

Return ValueArtiContoh
Angka positifBerhasil, biasanya byte yang diproseswrite(...) = 12
0Berhasil (EOF atau tidak ada yang diproses)read(...) = 0
-1 KODEERRORGagal dengan error spesifikopenat(...) = -1 ENOENT

Kode Error Umum

# ENOENT - File tidak ada
strace cat /file_tidak_ada 2>&1 | grep ENOENT
 
# EACCES - Permission denied
strace cat /etc/shadow 2>&1 | grep EACCES
 
# EISDIR - Adalah directory (tidak bisa cat directory)
strace cat /etc 2>&1 | grep EISDIR
 
# ECONNREFUSED - Connection refused (network)
strace curl http://localhost:9999 2>&1 | grep ECONNREFUSED

Pro tip: Memahami error codes membantu debug lebih cepat. Google β€œerrno ENOENT” untuk penjelasan detail.


Pola Fallback: Bagaimana Program Handle Failure dengan Graceful

Program sering coba beberapa lokasi sebelum menyerah. Mari lihat aksi ini:

strace -e trace=openat cat /etc/hostname 2>&1 | grep locale

Kamu akan lihat pola seperti:

openat(AT_FDCWD, "/usr/lib/locale/C.UTF-8/LC_CTYPE", ...) = -1 ENOENT
openat(AT_FDCWD, "/usr/lib/locale/C.utf8/LC_CTYPE", ...) = 3

Program punya priority list:

  1. Coba lokasi A (C.UTF-8) β†’ Gagal (file tidak ada)
  2. Coba lokasi B (C.utf8) β†’ Berhasil!

Ini disebut mekanisme fallback - degradasi graceful daripada crash langsung.

Mengapa Ini Penting

Ketika kamu lihat beberapa openat calls yang gagal, jangan panik! Program hanya coba berbagai opsi. Baru worry kalau semua percobaan gagal.


Memahami AT_FDCWD: Magic Relative Path

Kamu sudah lihat AT_FDCWD di setiap panggilan openat(). Apa itu?

AT_FDCWD = β€œAT File Descriptor Current Working Directory”

Artinya: β€œBuka file ini relatif terhadap current working directory”

Bukti dengan Eksperimen

# Dari /tmp
cd /tmp
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = -1 ENOENT
# Dia cari /tmp/hostname - tidak ada!
 
# Dari /etc
cd /etc
strace -e openat cat hostname 2>&1 | grep "hostname"
# Output: openat(AT_FDCWD, "hostname", ...) = 3
# Dia cari /etc/hostname - Berhasil!

Key insight: Relative paths di-resolve dari directory kamu saat ini. Itu sebabnya cd penting!


Advanced: Trace Process yang Sedang Berjalan

Kamu bisa attach strace ke process yang sudah running:

# Cari process ID
ps aux | grep nginx
 
# Attach ke process (butuh root/sudo)
sudo strace -p <PID>
 
# Follow child processes juga
sudo strace -f -p <PID>
 
# Save ke file untuk analisa nanti
sudo strace -o trace.log -p <PID>

Use Case Real-World

Skenario: Web server production kamu kadang lambat, tapi tidak bisa reproduce di testing.

Solusi: Attach strace selama periode lambat:

sudo strace -c -p $(pgrep nginx | head -1)
# Biarkan jalan 60 detik
# Tekan Ctrl+C
 
# Sekarang analisa syscall mana yang paling banyak menghabiskan waktu

Peringatan: Strace menambah overhead signifikan (10-100x lebih lambat). Gunakan dengan hemat di production!


Pertimbangan Performance: Kapan TIDAK Pakai Strace

Masalah Overhead

Strace menambah overhead massive karena:

  • Harus intercept setiap syscall
  • Harus stop process untuk baca arguments
  • Harus format dan output data

Impact: Program kamu bisa jalan 10-100x lebih lambat!

Alternatif Lebih Aman untuk Production

# Hanya trace syscall spesifik (overhead lebih rendah)
strace -e trace=openat,connect -p <PID>
 
# Set time limits
timeout 10s strace -p <PID>
 
# Count calls saja (overhead minimal)
strace -c -p <PID>

Kapan Pakai Strace di Production

βœ… Use case bagus:

  • Diagnosa masalah spesifik untuk waktu singkat
  • Pahami kenapa service tidak mau start
  • Cari configuration files yang sedang diakses

❌ Use case jelek:

  • Monitoring terus-menerus (pakai proper monitoring tools)
  • Performance profiling (pakai perf)
  • Production load testing (akan mengacaukan hasil)

Latihan Debugging Praktis

Latihan 1: Cari Configuration Files

Pertanyaan: Configuration files apa yang dibaca sshd saat startup?

strace -e trace=openat /usr/sbin/sshd -t 2>&1 | grep "\.conf"

Kamu akan discover files seperti:

  • /etc/ssh/sshd_config
  • /etc/gai.conf
  • /etc/nsswitch.conf

Latihan 2: Debug Network Connection

Pertanyaan: Ke mana curl coba connect ketika dapat timeout?

strace -e trace=connect curl https://example.com 2>&1

Cari connect() syscall dengan IP address dan port number.

Latihan 3: Cari Missing Library

Pertanyaan: Kenapa binary custom ini gagal dengan β€œerror while loading shared libraries”?

strace ./my_program 2>&1 | grep "\.so"

Kamu akan lihat file .so mana yang dicari dan di mana.


Kesalahan Umum dan Cara Menghindarinya

1. Lupa Redirect stderr

# ❌ Salah - output strace ke layar, hilang di pipe
strace cat /etc/hostname | grep hostname
 
# βœ… Benar - redirect stderr ke stdout dulu
strace cat /etc/hostname 2>&1 | grep hostname

2. Tidak Filter Ketika Perlu

# ❌ Terlalu banyak noise - ribuan baris
strace curl google.com
 
# βœ… Fokus ke yang penting
strace -e trace=network,openat curl google.com

3. Kurang Permission

# ❌ Gagal ketika trace process user lain
strace -p 1234
 
# βœ… Pakai sudo
sudo strace -p 1234

4. Lupa Tentang Child Processes

# ❌ Hanya trace parent process
strace bash -c "ls /tmp"
 
# βœ… Follow children dengan -f
strace -f bash -c "ls /tmp"

Cheat Sheet Referensi Cepat

# ============ PENGGUNAAN DASAR ============
strace <command>                           # Tracing dasar
strace -o output.log <command>             # Save ke file
strace -p <PID>                            # Attach ke running process
 
# ============ FILTERING ============
strace -e trace=openat,read <command>      # Syscall spesifik
strace -e trace=file <command>             # File operations saja
strace -e trace=network <command>          # Network operations saja
strace -e trace=process <command>          # Process operations saja
strace -e trace=\!memory <command>         # Semua KECUALI memory
 
# ============ KONTROL OUTPUT ============
strace -c <command>                        # Summary statistik
strace -t <command>                        # Show timestamps
strace -T <command>                        # Show waktu per call
strace -v <command>                        # Verbose (full structures)
strace -s 1000 <command>                   # Show 1000 byte pertama dari strings
 
# ============ KONTROL PROCESS ============
strace -f <command>                        # Follow child processes
strace -ff -o trace <command>              # File terpisah per process
sudo strace -p <PID>                       # Attach ke running process
 
# ============ ADVANCED ============
strace -e trace=file -e signal=none <cmd>  # Tidak ada signal output
strace -y <command>                        # Show file descriptor paths
strace -k <command>                        # Show stack traces

Langkah Selanjutnya: Beyond Strace

Kamu sudah menguasai fundamental! Berikut yang bisa dieksplorasi selanjutnya:

1. ltrace - Library Call Tracer

Mirip strace tapi trace library function calls daripada syscalls.

ltrace ls

2. perf - Performance Analysis

Lebih baik untuk production performance profiling dengan overhead minimal.

perf record -g -p <PID>
perf report

3. bpftrace - Modern System Tracing

Pakai eBPF untuk efficient tracing dengan custom scripts.

bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s\n", str(args->filename)); }'

4. SystemTap - Advanced System Instrumentation

Enterprise-grade tracing dan monitoring.


Kesimpulan: Kekuatan Debug Baru Kamu

Strace seperti punya X-ray vision untuk sistem Linux kamu. Kamu sudah belajar:

βœ… Apa itu syscalls dan kenapa penting
βœ… Cara baca dan interpretasi output strace
βœ… File descriptors dan fungsinya
βœ… Teknik debugging praktis untuk masalah real
βœ… Pola umum, error, dan cara fix-nya
βœ… Kapan pakai strace dan kapan pakai alternatif

Challenge Praktik

Cara terbaik menguasai strace adalah lewat praktik. Ini challenge kamu:

  1. Pilih command apa saja yang kamu pakai sehari-hari
  2. Jalankan dengan strace -c untuk lihat statistik
  3. Cari top 3 syscalls yang paling sering dipanggil
  4. Research apa yang dilakukan syscalls tersebut
  5. Jalankan lagi dengan detailed tracing hanya untuk syscalls itu

Setiap kali menemui error misterius, gunakan strace terlebih dahulu. Kamu akan kagum dengan apa yang ditemukan!

Bagikan Pengalaman Kamu

Pernah pakai strace untuk solve masalah sulit? Menemukan pola menarik? Saya ingin dengar ceritanya!

Follow saya untuk lebih banyak technical content deep-dive tentang Linux, Cloud Engineering, dan Infrastructure.


Tags: #Linux #Strace #Debugging #Syscalls #SystemAdministration #DevOps #CloudEngineering #Troubleshooting #LinuxInternals #OpenStack #InfrastructureMonitoring #SystemCalls #FileDescriptors #RealWorldDebugging


Happy debugging! | Selamat debugging! πŸ›πŸ”

Written by Minh Phu Pham

Published on November 6, 2025

Share this article:

Β© 2025 | Minh Phu Pham. All rights reserved.