Telemetry | Backend Developer | Ephizen

Beyond Logging#

Logs tell you what happened. Telemetry tells you:

How often things happen (metrics)
How long things take (traces)
What's connected to what (distributed tracing)

Together, they give you complete visibility into your application.

The Three Pillars of Observability#

1. Logs#

What happened, in detail.

"User 123 logged in at 14:23:45"
"Order 456 failed: payment declined"

2. Metrics#

Numbers over time.

requests_total: 1,234,567
response_time_p99: 245ms
error_rate: 0.1%
active_users: 523

3. Traces#

Request flow across services.

Request → API Gateway → Auth Service → User Service → Database
         [2ms]          [15ms]         [8ms]          [25ms]

Simple Metrics with prom-client#

bash

npm install prom-client

javascript

// src/utils/metrics.js
import client from 'prom-client';

// Collect default metrics (CPU, memory, etc.)
client.collectDefaultMetrics();

// Custom metrics
export const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'path', 'status'],
});

export const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'path', 'status'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
});

export const activeConnections = new client.Gauge({
  name: 'active_connections',
  help: 'Number of active connections',
});

// Export the registry for the /metrics endpoint
export { client };

Metrics Types#

Counter - Only goes up

javascript

const loginAttempts = new client.Counter({
  name: 'login_attempts_total',
  help: 'Total login attempts',
  labelNames: ['success'],
});

loginAttempts.inc({ success: 'true' });
loginAttempts.inc({ success: 'false' });

Gauge - Can go up or down

javascript

const activeUsers = new client.Gauge({
  name: 'active_users',
  help: 'Currently active users',
});

activeUsers.inc();  // User connected
activeUsers.dec();  // User disconnected
activeUsers.set(42); // Set directly

Histogram - Distribution of values

javascript

const responseTimes = new client.Histogram({
  name: 'response_time_seconds',
  help: 'Response time distribution',
  buckets: [0.1, 0.5, 1, 2, 5],
});

responseTimes.observe(0.234); // Record a value

Metrics Middleware#

javascript

// src/middleware/metrics.js
import { httpRequestsTotal, httpRequestDuration } from '../utils/metrics.js';

export function metricsMiddleware(req, res, next) {
  const start = Date.now();

  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const labels = {
      method: req.method,
      path: req.route?.path || req.path,
      status: res.statusCode,
    };

    httpRequestsTotal.inc(labels);
    httpRequestDuration.observe(labels, duration);
  });

  next();
}

Expose Metrics Endpoint#

javascript

// src/app.js
import { client } from './utils/metrics.js';
import { metricsMiddleware } from './middleware/metrics.js';

app.use(metricsMiddleware);

// Prometheus scrapes this endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.send(await client.register.metrics());
});

Output:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/users",status="200"} 1234
http_requests_total{method="POST",path="/api/orders",status="201"} 567

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",path="/api/users",status="200",le="0.1"} 1000
http_request_duration_seconds_bucket{method="GET",path="/api/users",status="200",le="0.5"} 1200

Business Metrics#

Track what matters to your business:

javascript

// src/utils/metrics.js
export const ordersCreated = new client.Counter({
  name: 'orders_created_total',
  help: 'Total orders created',
  labelNames: ['status'],
});

export const orderValue = new client.Histogram({
  name: 'order_value_dollars',
  help: 'Order value distribution',
  buckets: [10, 50, 100, 250, 500, 1000],
});

export const paymentFailures = new client.Counter({
  name: 'payment_failures_total',
  help: 'Payment failures',
  labelNames: ['reason'],
});

Use in services:

javascript

// src/services/orders.js
import { ordersCreated, orderValue } from '../utils/metrics.js';

export async function createOrder(userId, items) {
  const order = await Order.create({ ... });

  ordersCreated.inc({ status: 'success' });
  orderValue.observe(order.total);

  return order;
}

Health Checks#

Beyond simple "200 OK":

javascript

// src/routes/health.js
import mongoose from 'mongoose';
import { redis } from '../config/redis.js';

export async function healthCheck(req, res) {
  const health = {
    status: 'ok',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {},
  };

  // Check MongoDB
  try {
    await mongoose.connection.db.admin().ping();
    health.checks.mongodb = { status: 'ok' };
  } catch (error) {
    health.checks.mongodb = { status: 'error', message: error.message };
    health.status = 'degraded';
  }

  // Check Redis
  try {
    await redis.ping();
    health.checks.redis = { status: 'ok' };
  } catch (error) {
    health.checks.redis = { status: 'error', message: error.message };
    health.status = 'degraded';
  }

  // Memory check
  const memUsage = process.memoryUsage();
  health.checks.memory = {
    heapUsed: Math.round(memUsage.heapUsed / 1024 / 1024) + 'MB',
    heapTotal: Math.round(memUsage.heapTotal / 1024 / 1024) + 'MB',
  };

  const statusCode = health.status === 'ok' ? 200 : 503;
  res.status(statusCode).json(health);
}

Kubernetes Probes#

javascript

// Liveness - is the app running?
app.get('/health/live', (req, res) => {
  res.json({ status: 'ok' });
});

// Readiness - is the app ready to serve traffic?
app.get('/health/ready', async (req, res) => {
  try {
    await mongoose.connection.db.admin().ping();
    await redis.ping();
    res.json({ status: 'ok' });
  } catch {
    res.status(503).json({ status: 'not ready' });
  }
});

Distributed Tracing with OpenTelemetry#

For microservices, trace requests across services:

bash

npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node

javascript

// src/tracing.js
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const sdk = new NodeSDK({
  serviceName: 'user-api',
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_URL || 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Load it first:

javascript

// src/index.js
import './tracing.js'; // Must be first!
import { app } from './app.js';
// ...

Simple APM Without Infrastructure#

If you don't want to run Prometheus/Grafana:

Logging-Based Metrics#

javascript

// Log metrics periodically
import { logger } from './utils/logger.js';

setInterval(() => {
  const memUsage = process.memoryUsage();
  logger.info({
    type: 'metrics',
    memory: {
      heapUsed: Math.round(memUsage.heapUsed / 1024 / 1024),
      heapTotal: Math.round(memUsage.heapTotal / 1024 / 1024),
    },
    uptime: process.uptime(),
  }, 'System metrics');
}, 60000); // Every minute

Request Timing Logs#

javascript

// Already have this from request logging middleware
// Just ensure you log duration
req.log.info({
  duration: Date.now() - start,
  statusCode: res.statusCode,
}, 'Request completed');

Simple SaaS Options#

No infrastructure to manage:

Datadog - Full APM suite
New Relic - Metrics, traces, logs
Sentry - Error tracking + performance
Better Stack (Logtail) - Logs + uptime

What to Monitor#

System Metrics#

CPU usage
Memory usage
Disk space
Network I/O

Application Metrics#

Request rate (requests/second)
Error rate (errors/requests)
Response time (p50, p95, p99)
Active connections

Business Metrics#

User signups
Orders created
Revenue
Feature usage

Dependencies#

Database response time
Cache hit rate
External API latency
Queue depth

Key Takeaways#

Logs, metrics, traces - Different tools for different insights
Use Prometheus metrics - Standard format, works everywhere
Track business metrics - Not just technical health
Health checks matter - Kubernetes needs them
Start simple - Logging + basic metrics, add tracing later

The Minimum

At minimum:

Request logging with duration
Health check endpoint
Error tracking (even just logs)

Add Prometheus metrics when you need dashboards. Add tracing when you have multiple services.

Beyond Logging#

Logs tell you what happened. Telemetry tells you:

How often things happen (metrics)
How long things take (traces)
What's connected to what (distributed tracing)

Together, they give you complete visibility into your application.

The Three Pillars of Observability#

1. Logs#

What happened, in detail.

"User 123 logged in at 14:23:45"
"Order 456 failed: payment declined"

2. Metrics#

Numbers over time.

requests_total: 1,234,567
response_time_p99: 245ms
error_rate: 0.1%
active_users: 523

3. Traces#

Request flow across services.

Request → API Gateway → Auth Service → User Service → Database
         [2ms]          [15ms]         [8ms]          [25ms]

Simple Metrics with prom-client#

bash

npm install prom-client

javascript

// src/utils/metrics.js
import client from 'prom-client';

// Collect default metrics (CPU, memory, etc.)
client.collectDefaultMetrics();

// Custom metrics
export const httpRequestsTotal = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'path', 'status'],
});

export const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'path', 'status'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 5],
});

export const activeConnections = new client.Gauge({
  name: 'active_connections',
  help: 'Number of active connections',
});

// Export the registry for the /metrics endpoint
export { client };

Metrics Types#

Counter - Only goes up

javascript

const loginAttempts = new client.Counter({
  name: 'login_attempts_total',
  help: 'Total login attempts',
  labelNames: ['success'],
});

loginAttempts.inc({ success: 'true' });
loginAttempts.inc({ success: 'false' });

Gauge - Can go up or down

javascript

const activeUsers = new client.Gauge({
  name: 'active_users',
  help: 'Currently active users',
});

activeUsers.inc();  // User connected
activeUsers.dec();  // User disconnected
activeUsers.set(42); // Set directly

Histogram - Distribution of values

javascript

const responseTimes = new client.Histogram({
  name: 'response_time_seconds',
  help: 'Response time distribution',
  buckets: [0.1, 0.5, 1, 2, 5],
});

responseTimes.observe(0.234); // Record a value

Metrics Middleware#

javascript

// src/middleware/metrics.js
import { httpRequestsTotal, httpRequestDuration } from '../utils/metrics.js';

export function metricsMiddleware(req, res, next) {
  const start = Date.now();

  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const labels = {
      method: req.method,
      path: req.route?.path || req.path,
      status: res.statusCode,
    };

    httpRequestsTotal.inc(labels);
    httpRequestDuration.observe(labels, duration);
  });

  next();
}

Expose Metrics Endpoint#

javascript

// src/app.js
import { client } from './utils/metrics.js';
import { metricsMiddleware } from './middleware/metrics.js';

app.use(metricsMiddleware);

// Prometheus scrapes this endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.send(await client.register.metrics());
});

Output:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/users",status="200"} 1234
http_requests_total{method="POST",path="/api/orders",status="201"} 567

# HELP http_request_duration_seconds HTTP request duration in seconds
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",path="/api/users",status="200",le="0.1"} 1000
http_request_duration_seconds_bucket{method="GET",path="/api/users",status="200",le="0.5"} 1200

Business Metrics#

Track what matters to your business:

javascript

// src/utils/metrics.js
export const ordersCreated = new client.Counter({
  name: 'orders_created_total',
  help: 'Total orders created',
  labelNames: ['status'],
});

export const orderValue = new client.Histogram({
  name: 'order_value_dollars',
  help: 'Order value distribution',
  buckets: [10, 50, 100, 250, 500, 1000],
});

export const paymentFailures = new client.Counter({
  name: 'payment_failures_total',
  help: 'Payment failures',
  labelNames: ['reason'],
});

Use in services:

javascript

// src/services/orders.js
import { ordersCreated, orderValue } from '../utils/metrics.js';

export async function createOrder(userId, items) {
  const order = await Order.create({ ... });

  ordersCreated.inc({ status: 'success' });
  orderValue.observe(order.total);

  return order;
}

Health Checks#

Beyond simple "200 OK":

javascript

// src/routes/health.js
import mongoose from 'mongoose';
import { redis } from '../config/redis.js';

export async function healthCheck(req, res) {
  const health = {
    status: 'ok',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {},
  };

  // Check MongoDB
  try {
    await mongoose.connection.db.admin().ping();
    health.checks.mongodb = { status: 'ok' };
  } catch (error) {
    health.checks.mongodb = { status: 'error', message: error.message };
    health.status = 'degraded';
  }

  // Check Redis
  try {
    await redis.ping();
    health.checks.redis = { status: 'ok' };
  } catch (error) {
    health.checks.redis = { status: 'error', message: error.message };
    health.status = 'degraded';
  }

  // Memory check
  const memUsage = process.memoryUsage();
  health.checks.memory = {
    heapUsed: Math.round(memUsage.heapUsed / 1024 / 1024) + 'MB',
    heapTotal: Math.round(memUsage.heapTotal / 1024 / 1024) + 'MB',
  };

  const statusCode = health.status === 'ok' ? 200 : 503;
  res.status(statusCode).json(health);
}

Kubernetes Probes#

javascript

// Liveness - is the app running?
app.get('/health/live', (req, res) => {
  res.json({ status: 'ok' });
});

// Readiness - is the app ready to serve traffic?
app.get('/health/ready', async (req, res) => {
  try {
    await mongoose.connection.db.admin().ping();
    await redis.ping();
    res.json({ status: 'ok' });
  } catch {
    res.status(503).json({ status: 'not ready' });
  }
});

Distributed Tracing with OpenTelemetry#

For microservices, trace requests across services:

bash

npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node

javascript

// src/tracing.js
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const sdk = new NodeSDK({
  serviceName: 'user-api',
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_URL || 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Load it first:

javascript

// src/index.js
import './tracing.js'; // Must be first!
import { app } from './app.js';
// ...

Simple APM Without Infrastructure#

If you don't want to run Prometheus/Grafana:

Logging-Based Metrics#

javascript

// Log metrics periodically
import { logger } from './utils/logger.js';

setInterval(() => {
  const memUsage = process.memoryUsage();
  logger.info({
    type: 'metrics',
    memory: {
      heapUsed: Math.round(memUsage.heapUsed / 1024 / 1024),
      heapTotal: Math.round(memUsage.heapTotal / 1024 / 1024),
    },
    uptime: process.uptime(),
  }, 'System metrics');
}, 60000); // Every minute

Request Timing Logs#

javascript

// Already have this from request logging middleware
// Just ensure you log duration
req.log.info({
  duration: Date.now() - start,
  statusCode: res.statusCode,
}, 'Request completed');

Simple SaaS Options#

No infrastructure to manage:

Datadog - Full APM suite
New Relic - Metrics, traces, logs
Sentry - Error tracking + performance
Better Stack (Logtail) - Logs + uptime

What to Monitor#

System Metrics#

CPU usage
Memory usage
Disk space
Network I/O

Application Metrics#

Request rate (requests/second)
Error rate (errors/requests)
Response time (p50, p95, p99)
Active connections

Business Metrics#

User signups
Orders created
Revenue
Feature usage

Dependencies#

Database response time
Cache hit rate
External API latency
Queue depth

Key Takeaways#

Logs, metrics, traces - Different tools for different insights
Use Prometheus metrics - Standard format, works everywhere
Track business metrics - Not just technical health
Health checks matter - Kubernetes needs them
Start simple - Logging + basic metrics, add tracing later

The Minimum

At minimum:

Request logging with duration
Health check endpoint
Error tracking (even just logs)

Add Prometheus metrics when you need dashboards. Add tracing when you have multiple services.

Beyond Logging#

The Three Pillars of Observability#

1. Logs#

2. Metrics#

3. Traces#

Simple Metrics with prom-client#

Metrics Types#

Metrics Middleware#

Expose Metrics Endpoint#

Business Metrics#

Health Checks#

Kubernetes Probes#

Distributed Tracing with OpenTelemetry#

Simple APM Without Infrastructure#

Logging-Based Metrics#

Request Timing Logs#

Simple SaaS Options#

What to Monitor#

System Metrics#

Application Metrics#

Business Metrics#

Dependencies#

Key Takeaways#

Ready to level up your skills?

Beyond Logging#

The Three Pillars of Observability#

1. Logs#

2. Metrics#

3. Traces#

Simple Metrics with prom-client#

Metrics Types#

Metrics Middleware#

Expose Metrics Endpoint#

Business Metrics#

Health Checks#

Kubernetes Probes#

Distributed Tracing with OpenTelemetry#

Simple APM Without Infrastructure#

Logging-Based Metrics#

Request Timing Logs#

Simple SaaS Options#

What to Monitor#

System Metrics#

Application Metrics#

Business Metrics#

Dependencies#

Key Takeaways#

Ready to level up your skills?