1. MongoDB, Ruby and ObjectId collisions

    This is a (hopefully) short piece explaining the process of debugging ObjectId collisions in MongoDB, in an environment involving Ruby and containers (GCP Cloud Run Jobs -- basically kubernetes jobs).

    MongoDB is a document database: json documents can be stored in it and each document has its own unique ObjectId. (This is a very simplified summary, more info here.)

    An ObjectId is a 12 byte number that uniquely identifies a document. This number is generated on the client, before inserting a new document in the db. While the format is standard, each client library has to implement it.

    The ObjectId format is as follows:

    • 4 bytes timestamp in seconds from unix epoch
    • 5 bytes random value generated once per process. This random value is unique to the machine and process.
    • 3 bytes counter, initialized to a random value, incremented every time an ObjectId is generated.

    I like these IDs, they're sortable-ish across machines, the 5 bytes of random "process id" should be enough to avoid collisions, they're fast to generate and you can get 16M of them per second before they repeat.

    Read more


  2. Debugging 90s hangs during shutdown on Ubuntu 20.04

    I spent some time yesterday trying to understand why one of the machines here seemed to always take 90s before shutting down or rebooting. This article is a summary of the wild goose chase I embarked on to try and understand what was happening.

    When systemd machines hang during shutdown, it usually means that there's a service that is not stopping fast enough or it could be that systemd is waiting for the DefaultTimeoutStopSec timeout before killing the remaining processes after all services have been stopped.

    In my case, what I was seeing is the following:

    [ OK ] Finished Reboot.
    [ OK ] Reached target Reboot.
    [ 243.652848 ] systemd-shutdown[1]: Waiting for process: containerd-shim, containerd-shim, containerd-shim, fluent-bit
    

    These processes are part or k3s, they are the parent processes of containers or processes running in the containers themselves. Clearly, it has something to do with k3s.

    Update: the systemd bugfix was backported to Ubuntu 20.04.

    Read more

  3. Adventures in Ansible land

    I've been meaning to look into Ansible for a long time now, but somehow never got around to it.

    SaltStack has been my automation/orchestration tool of choice since about 2013 and I can usually get stuff done with it without glancing at the docs too much.

    However, the security history of Salt isn't great: there's been a bunch of security issues in it that got me thinking that maybe I should look at alternatives. (For example, this was a very good one... and there's been a bunch of authentication bypass too which do not inspire confidence)

    Anyways, I started tinkering with Ansible to automate some stuff on my workstation.

    The following is a collection of short notes and weirdness I encountered so far, to save my future self some time when I encounter them again.

    Read more

  4. Installing OpenBSD on OVH's VPS 2016 KVM machines.

    I've been thinking about running OpenBSD again for a while now, yesterday I had some inspiration and decided to try to boot it on OVH's VPS machines.

    OVH doesn't have the best reputation regarding availability and support (both of which I confirmed in the past...), but they are cheap and they have a datacenter in Beauharnois, Qc, that's less than 50km from home.

    They're only offering Linux distribution for their VPS SSD instances at the moment, but since the virtualzation technology is KVM, booting the OpenBSD ramdisk kernel (bsd.rd) and doing the installation is all that is needed to get a working OpenBSD machine.

    Here's how I did it.

    Read more

  5. Needles and haystacks: Finding the one bad request among billions with tcpdump

    This week, we had a few weird crashes with an HTTP server which we could not easily reproduce and we had a hard time pin-pointing the source of the issue. We knew the problems were triggered by bad input, but since the process was continuously receiving around 3000 requests per second at the time, it was pretty hard to isolate the exact request(s) that made it crash.

    The idea we had was to capture HTTP requests data up to the point where the process crashed. Then, we would open the trace and look for the last successful requests, the faulty one would be in there somewhere.

    Read more


  6. OSX: Where are my TIME_WAIT ?!

    While doing some packet drop testing for a pcap script I'm writing with a collegue a work, I hit a strange situation on my Mac where ab would do ~16k HTTP connections really fast, then stop, then timeout.

    Turns out this is caused by the lack of available ephemeral port …

    Read more



Page 1 / 1