raby.sh - unix

Post index

MongoDB, Ruby and ObjectId collisionsThu 27 July 2023
SIGTERM and PID 1: Why does a container linger after receiving a SIGTERM.Wed 23 February 2022
Debugging 90s hangs during shutdown on Ubuntu 20.04Sat 15 January 2022
Adventures in Ansible landFri 26 November 2021
Installing OpenBSD on OVH's VPS 2016 KVM machines.Wed 28 October 2015
Needles and haystacks: Finding the one bad request among billions with tcpdumpFri 23 October 2015
1M HTTP Requests per second using Nginx and Ubuntu 12.04 on EC2Wed 12 August 2015
OSX: Where are my TIME_WAIT ?!Sat 13 September 2014
Python: pcap modules comparisonSat 13 September 2014
VMWare: Routing and Large Receive Offload considered harmfulSat 17 May 2014

Thu 27 July 2023
MongoDB, Ruby and ObjectId collisions
This is a (hopefully) short piece explaining the process of debugging ObjectId collisions in MongoDB, in an environment involving Ruby and containers (GCP Cloud Run Jobs -- basically kubernetes jobs).

MongoDB is a document database: json documents can be stored in it and each document has its own unique ObjectId. (This is a very simplified summary, more info here.)

An ObjectId is a 12 byte number that uniquely identifies a document. This number is generated on the client, before inserting a new document in the db. While the format is standard, each client library has to implement it.

The ObjectId format is as follows:
- 4 bytes timestamp in seconds from unix epoch
- 5 bytes random value generated once per process. This random value is unique to the machine and process.
- 3 bytes counter, initialized to a random value, incremented every time an ObjectId is generated.
I like these IDs, they're sortable-ish across machines, the 5 bytes of random "process id" should be enough to avoid collisions, they're fast to generate and you can get 16M of them per second before they repeat.
Read more

Wed 23 February 2022
SIGTERM and PID 1: Why does a container linger after receiving a SIGTERM.

tl;dr: pid 1 is special on Linux, it is unkillable, meaning that signals that would normally terminate a process if it has no handler installed do not terminate it. In other words, pid 1 must handle SIGTERM explicitely for the usual scemantics to apply. I keep rediscovering this with containers...

Read more

Sat 15 January 2022
Debugging 90s hangs during shutdown on Ubuntu 20.04
I spent some time yesterday trying to understand why one of the machines here seemed to always take 90s before shutting down or rebooting. This article is a summary of the wild goose chase I embarked on to try and understand what was happening.

When systemd machines hang during shutdown, it usually means that there's a service that is not stopping fast enough or it could be that systemd is waiting for the DefaultTimeoutStopSec timeout before killing the remaining processes after all services have been stopped.

In my case, what I was seeing is the following:
```
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
[ 243.652848 ] systemd-shutdown[1]: Waiting for process: containerd-shim, containerd-shim, containerd-shim, fluent-bit
```
These processes are part or k3s, they are the parent processes of containers or processes running in the containers themselves. Clearly, it has something to do with k3s.

Update: the systemd bugfix was backported to Ubuntu 20.04.
Read more

Fri 26 November 2021
Adventures in Ansible land

I've been meaning to look into Ansible for a long time now, but somehow never got around to it.

SaltStack has been my automation/orchestration tool of choice since about 2013 and I can usually get stuff done with it without glancing at the docs too much.

However, the security history of Salt isn't great: there's been a bunch of security issues in it that got me thinking that maybe I should look at alternatives. (For example, this was a very good one... and there's been a bunch of authentication bypass too which do not inspire confidence)

Anyways, I started tinkering with Ansible to automate some stuff on my workstation.

The following is a collection of short notes and weirdness I encountered so far, to save my future self some time when I encounter them again.

Read more

Wed 28 October 2015
Installing OpenBSD on OVH's VPS 2016 KVM machines.

I've been thinking about running OpenBSD again for a while now, yesterday I had some inspiration and decided to try to boot it on OVH's VPS machines.

OVH doesn't have the best reputation regarding availability and support (both of which I confirmed in the past...), but they are cheap and they have a datacenter in Beauharnois, Qc, that's less than 50km from home.

They're only offering Linux distribution for their VPS SSD instances at the moment, but since the virtualzation technology is KVM, booting the OpenBSD ramdisk kernel (bsd.rd) and doing the installation is all that is needed to get a working OpenBSD machine.

Here's how I did it.

Read more

Fri 23 October 2015
Needles and haystacks: Finding the one bad request among billions with tcpdump

This week, we had a few weird crashes with an HTTP server which we could not easily reproduce and we had a hard time pin-pointing the source of the issue. We knew the problems were triggered by bad input, but since the process was continuously receiving around 3000 requests per second at the time, it was pretty hard to isolate the exact request(s) that made it crash.

The idea we had was to capture HTTP requests data up to the point where the process crashed. Then, we would open the trace and look for the last successful requests, the faulty one would be in there somewhere.

Read more

Wed 12 August 2015
1M HTTP Requests per second using Nginx and Ubuntu 12.04 on EC2

At work, we have been doing quite a few tests lately to understand what is the maximum number of http queries per second (QPS) that a modern server running Ubuntu 12.04 with a recent Linux kernel could handle.

Read more

Sat 13 September 2014
OSX: Where are my TIME_WAIT ?!

While doing some packet drop testing for a pcap script I'm writing with a collegue a work, I hit a strange situation on my Mac where ab would do ~16k HTTP connections really fast, then stop, then timeout.

Turns out this is caused by the lack of available ephemeral port …

Read more

Sat 13 September 2014
Python: pcap modules comparison

an overview of python modules wrapping libpcap

Read more

Sat 17 May 2014
VMWare: Routing and Large Receive Offload considered harmful

If you're doing ip routing on a VMware virtual machine, make sure to disable Large Receive Offload (LRO) at the vmware level or it will fail in interesting ways.

Read more

Page 1 / 1