StatsD, File Descriptors, and the Singleton Pattern
TLDR - Each new instance of a hot-shots client binds to a different UDP file descriptor. Since we were creating a new client for every API request, our API instances consumed all available UDP file descriptors in the container. This led to DNS resolution failures when any service tried to connect to the API. We fixed this by exporting the StatsD client as a singleton, guaranteeing a single StatsD client per Node.js process.
Background
All of our services are containerized and managed with Kubernetes. We recently added StatsD metrics to our entire Node.js API layer. The metrics being collected range from 'number of requests per route' to 'amount of time a worker takes to process an item from a queue'. hot-shots is our StatsD client of choice. It is a fork of node-statsd and provides a nice API for integrating with DogStatsD, Telegraf, and Etsy's StatsD.
The Problem
About a day after deploying our updated API with the StatsD changes, it threw an out of memory error in the middle of the night. The next day we started seeing a flurry of EADDRINUS
errors in our logs that indicated a possible port binding/DNS issue. Weird. To validate this, we exec'd into one of our containers and used nslookup
and dns.resolve
to manually resolve our service's hostnames. Sure enough, none of them would resolve.
The googling intensified. We looked into things ranging from Intermittent DNS failures on GKE to using Docker container networking to build a debugging container. Unfortunately, none of the articles we found pointed us to a clear diagnosis for our problem. It was then, in our moment of despair, that @smerchek finally saw the pattern: all of the services that were failing to resolve had the new StatsD changes on them.
This realization made us look closer at the new code. Were we calling the wrong hot-shots functions? Were we initializing the client incorrectly? It wasn't too long after examining our StatsD client code that we realized a potential problem. We had been using the factory pattern to create our client like so:
import { Logger } from './logger'
const StatsD: StatsDClient = require('hot-shots')
type StatsDFactoryConfig = {
stat_prefix: string
}
const STATSD_ENVIRONMENT_TAG = `environment:${process.env.STATSD_ENV_TAG || 'development'}`
export const StatsDFactory = {
create: (config: StatsDFactoryConfig): StatsDClient => {
return new StatsD({
host: process.env.STATSD_HOST || 'localhost',
port: process.env.STATSD_PORT || '8125',
prefix: config.stat_prefix || '',
globalTags: [ process.env.STATSD_ENVIRONMENT_TAG ],
errorHandler: err => Logger.warn(`StatsD client error`, {
log_key: 'statsd_client_error',
error: err,
})
})
},
}
Every time you create a new hot-shots client, it binds to a UDP file descriptor. Because we were creating a new client for each API request, database query, and worker process, we quickly ran out of the file descriptors allocated to each container. Since DNS queries use UDP it makes sense that we were unable to resolve any of the service hostnames.
We verified the abundance of UDP sockets in use by running:
/app # cat /proc/net/sockstat
sockets: used 158893
TCP: inuse 3 orphan 0 tw 232 alloc 188 mem 47
UDP: inuse 28232 mem 4
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
I don't know about you, but I think 28232
UDP sockets in use seems like a bit too much.
The Solution
Ideally, we only need one instance of hot-shots throughout the life of the node process. This use case if perfect for the singleton pattern. We modified our client to use the singleton pattern instead of the factory pattern:
import { Logger } from './logger'
import * as StatsD from 'hot-shots'
const STATSD_ENVIRONMENT_TAG = `environment:${process.env.STATSD_ENV_TAG}`
const STATSD_PROCESS_NAME_TAG = `process:${process.env.PROCESS_NAME}`
const STATSD_COMMIT_SHA1_TAG = `commit:${process.env.COMMIT_SHA1}`
export const StatsdClient: StatsDClient = new StatsD({
host: process.env.STATSD_HOST,
port: process.env.STATSD_PORT,
prefix: '',
globalTags: [ STATSD_ENVIRONMENT_TAG, STATSD_PROCESS_NAME_TAG, STATSD_COMMIT_SHA1_TAG ],
errorHandler: err => Logger.warn(`StatsD client error`, {
log_key: 'statsd_client_error',
error: err,
})
})
As soon as we deployed this change, we went from 28232 UDP sockets in use down to 1!
/app # cat /proc/net/sockstat
sockets: used 941
TCP: inuse 8 orphan 0 tw 135 alloc 166 mem 27
UDP: inuse 1 mem 3
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
The DNS resolution errors vanished and all was well in the world.
To recap:
- We started to see DNS errors after integrating StatsD (using hot-shots) into our services.
- We realized that the problem was caused by the way we were initializing our hot-shots client.
- We changed our approach from having multiple instances of the client using the factory pattern to a single instance of the client using the singleton pattern.
- Our DNS issues were resolved (ha) and life could carry on as normal!
By the way, we are looking for highly skilled software engineers to join our team. Check out our job listing to learn more!