Recording sleep sounds - Roman Klimenko

Sleep audio recording apps have recently become very popular. I was also curious to hear what I say when I sleep and whether I snore.

Assuming your phone can record audio and you have a standard computer to run a simple Python code, you don't have to buy anything else or pay for the subscription to check the loud moments of your night's sleep.

First, we need to record audio while we sleep. In my case, I just started recording on my iPhone before I went to bed, and the next morning, I had an eight-hour M4A file.

Scrolling through such a long recording is not handy, so let's extract only the fragments that are louder than a given threshold.

Let's leave the Python environment setup out of the scope of this article. Still, we need to install pydub (a Python module to "manipulate audio with a simple and easy high-level interface") and create an empty Python script (e.g., sleep.py) to start coding.

The simplest example looks like this:

from pydub import AudioSegment
from pydub.silence import split_on_silence

audio = AudioSegment.from_file("sleep_audio.m4a", format="m4a")

chunks = split_on_silence(audio, silence_thresh=-54)

sum(chunks).export("loud_fragments.mp3", format="mp3")

Here, we load our audio file and split it into silent sections with a given threshold in dBFS (0 is the maximum volume, so -54 means 54 decibels quieter than the maximum). In the last line, we export the result to an mp3 file.

Now, let's make the processing faster and fine-tune a few parameters to improve the outcome:

...

chunks = split_on_silence(
    audio,
    keep_silence=10_000,
    min_silence_len=10_000,
    silence_thresh=-54,
    seek_step=1_000)

...

keep_silence=10_000 - (in ms) Leave some silence at the beginning and end of the chunks. This prevents the sound from sounding abruptly cut off.
min_silence_len=10_000 - (in ms) minimum length of silence to be used for a split.
silence_thresh=-54 - (in dBFS) anything quieter than this will be considered silence.
seek_step=1_000 - is the step size for interacting with the segment in milliseconds. By default, it iterates every millisecond, which makes processing an 8-hour audio very long.

To distinguish one fragment from another, we can add a "beep" in between, preceded and followed by a few seconds of silence. We can also raise the volume of the audio chunks by adding the number of decibels like: chunk + 3.

...

from pydub.generators import Square

...

silence = AudioSegment.silent(duration=500)
beep = Square(1_000).to_audio_segment(duration=250, volume=-54)
beep_with_silence = silence + beep + silence

loud_fragments = sum([(chunk + 3) + beep_with_silence for chunk in chunks])

loud_fragments.export("loud_fragments.mp3", format="mp3")

The final version of my script looks like this:

import sys
from pathlib import Path

from pydub import AudioSegment
from pydub.generators import Square
from pydub.silence import split_on_silence

GAIN = 3
KEEP_SILENCE = 10_000
MIN_SILENCE_LEN = 60_000
SEEK_STEP = 100
SILENCE_THRESH = -54


def main(file_path):

    audio = AudioSegment.from_file(file_path, format="m4a")

    print(f"Loaded audio: {audio.duration_seconds} seconds")

    chunks = [chunk for chunk in split_on_silence(audio,
                                                  keep_silence=KEEP_SILENCE,
                                                  min_silence_len=MIN_SILENCE_LEN,
                                                  seek_step=SEEK_STEP,
                                                  silence_thresh=SILENCE_THRESH)
              if chunk.duration_seconds > (KEEP_SILENCE * 2 / 1_000)]

    print(f"Found {len(chunks)} chunks")

    silence = AudioSegment.silent(duration=500)
    beep = Square(1_000).to_audio_segment(duration=250, volume=SILENCE_THRESH)
    beep_with_silence = silence + beep + silence

    loud_audio = sum([(chunk + GAIN) + beep_with_silence for chunk in chunks])

    print(f"Combined loud audio: {loud_audio.duration_seconds} seconds")

    Path("./data").mkdir(parents=True, exist_ok=True)

    loud_audio.export(f"./data/{Path(file_path).stem}_loud.mp3", format="mp3")


if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python sleep.py <file_path>")
        sys.exit(1)
    main(sys.argv[1])

I can run it like python sleep.py myaudio.m4a, and it generates a shorter version of my sleep audio recording. It usually takes 8-12 minutes out of eight hours of sleep.