Published on

Logging Everything and Nothing Useful — The Noise Problem

Authors

Introduction

Logging too little and logging too much are both problems, but too much is often harder to fix. When every log line costs the same amount of attention, flooding the log with noise makes finding the signal nearly impossible. The goal is actionable logs: each line should tell you something you can act on.

What "Logging Everything Uselessly" Looks Like

// ❌ Classic noise logging
app.get('/health', (req, res) => {
  logger.info('Health check requested')  // 60 times/minute — pure noise
  res.json({ status: 'ok' })
  logger.info('Health check completed')  // doubles the noise
})

// ❌ Every database query logged at INFO
async function getUser(id: string) {
  logger.info(`Executing query: SELECT * FROM users WHERE id = '${id}'`)  // security risk too
  const user = await db.query('SELECT * FROM users WHERE id = $1', [id])
  logger.info(`Query returned ${user.rows.length} rows`)
  return user.rows[0]
}

// ❌ Every cache operation logged
async function getCached(key: string) {
  logger.info(`Cache GET: ${key}`)
  const val = await redis.get(key)
  logger.info(`Cache ${val ? 'HIT' : 'MISS'}: ${key}`)
  return val
}
// Result: 80% of log volume is health checks, SQL, and cache ops
// When a real error occurs, it's page 47 in the log viewer

Log Levels and What They Mean

// Use log levels intentionally — each level has a specific audience

logger.error(...)   // An error that requires IMMEDIATE human attention
                    // → Alerts fire, on-call engineer woken up
                    // → "Payment processing completely down"

logger.warn(...)    // Something unexpected but the system is still working
                    // → Reviewed in morning standup
                    // → "Retry 3/3 succeeded for payment", "Cache miss rate high"

logger.info(...)    // Business-significant events (not operational noise)
                    // → Reviewed for trend analysis
                    // → "Order created", "User registered", "Payment succeeded"

logger.debug(...)   // Development and deep debugging only
                    // → NEVER enabled in production by default
                    // → SQL queries, cache hits, internal state

Fix 1: Log Business Events, Not Technical Operations

// ❌ Logging technical operations (noise)
logger.info('DB query executed')
logger.info('Cache key set')
logger.info('HTTP request received')
logger.info('HTTP response sent: 200')

// ✅ Logging business events (signal)
// What matters to a human reading this later?
logger.info({ userId, orderId, total, paymentMethod }, 'Order placed')
logger.warn({ userId, orderId, attempt: 3 }, 'Payment retry required')
logger.error({ userId, orderId, error: err.message }, 'Payment failed — customer action required')
logger.info({ userId, plan: 'pro', mrr: 99 }, 'User upgraded')
logger.warn({ userId, daysOverdue: 3 }, 'Invoice payment overdue')

Fix 2: Filter Out Health Check Noise

import morgan from 'morgan'

// Custom morgan format — skip health check logs
app.use(morgan('combined', {
  skip: (req) => req.path === '/health' || req.path === '/metrics',
}))

// Or with pino-http:
import pinoHttp from 'pino-http'
app.use(pinoHttp({
  autoLogging: {
    ignore: (req) => ['/health', '/ready', '/metrics'].includes(req.url ?? ''),
  },
  customLogLevel: (req, res) => {
    if (res.statusCode >= 500) return 'error'
    if (res.statusCode >= 400) return 'warn'
    if (res.statusCode >= 300) return 'silent'  // redirects aren't interesting
    return 'info'
  },
}))

Fix 3: Sampling High-Volume Logs

// For very high volume operations, sample — don't log every instance
let requestCount = 0

app.use((req, res, next) => {
  requestCount++

  // Log 1% of successful requests for performance sampling
  // Log 100% of errors
  const shouldLog = res.statusCode >= 400 || requestCount % 100 === 0

  if (shouldLog) {
    logger.info({
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration: Date.now() - req.startTime,
      sampled: res.statusCode < 400,
    }, 'HTTP request')
  }

  next()
})

Fix 4: Log Context, Not Strings

// ❌ String concatenation — not queryable
logger.error('Failed to process order ' + orderId + ' for user ' + userId + ': ' + error.message)

// ✅ Structured fields — every field is queryable in Elasticsearch/Loki
logger.error({
  orderId,
  userId,
  error: error.message,
  errorCode: error.code,
  attempt: 3,
  duration: Date.now() - startTime,
}, 'Order processing failed')

// Now you can query: error_code:"PAYMENT_DECLINED" AND user_id:"u123"
// Or alert: count of error logs with error_code:"DB_CONNECTION_FAILED" > 5 in 1 minute

Fix 5: Adaptive Log Level in Production

// Normally INFO — increase to DEBUG only during incidents
const logLevel = process.env.LOG_LEVEL ?? 'info'
const logger = pino({ level: logLevel })

// Runtime log level change without restart (via admin API)
app.post('/admin/log-level', adminAuth, (req, res) => {
  const { level } = req.body
  if (!['error', 'warn', 'info', 'debug'].includes(level)) {
    return res.status(400).json({ error: 'Invalid level' })
  }

  logger.level = level
  logger.info({ level }, 'Log level changed')
  res.json({ level })
})
// During incident: POST /admin/log-level { "level": "debug" }
// After incident: POST /admin/log-level { "level": "info" }

Useful Logging Checklist

  • ✅ ERROR = immediate action required, WARN = review tomorrow, INFO = business event
  • ✅ Health check and metrics endpoints excluded from access logs
  • ✅ All logs are structured JSON — every important value is a field, not embedded in a string
  • ✅ SQL queries, cache operations, and internal state at DEBUG level (off in production)
  • ✅ Correlation IDs (request ID, trace ID) on every log line
  • ✅ Sample high-volume success logs; always log errors at 100%
  • ✅ Log volume measured and monitored — unexpected spikes indicate a bug

Conclusion

Good logging is more about what you don't log than what you do. Health checks, routine SQL queries, cache hits, and successful HTTP responses are operational noise — they don't tell you anything you can act on. Log business events at INFO, warnings at WARN, and errors at ERROR with full context. Make every log line structured JSON so it's queryable. The test: in a production incident, can you find the root cause from logs alone within 5 minutes? If not, you're logging the wrong things.