PostgreSQL Trap: Arrays

A few months ago, we partnered with a library to make their staff multilingual. This was our first production user of our translation app.

This post covers the engineering lessons we learned from the first week of usage.

We had barely any analytics in the app. It’s an MVP and we wanted to focus on qualitative feedback rather than hiding behind dashboards. But we didn’t need analytics to know that they were using the app because my phone was buzzing like mad:

Alert emails from Opsgenie: "Disk is almost full", "CPU load is >80%", "Time spent in IO is too high", etc.

Alert emails from Opsgenie: “Disk latency is growing”, “CPU load is >80%”, “Time spent in IO is too high”

Thank goodness I set up production alerts!

Our servers were apparently on fire, but how exactly? I hopped on the server and everything pointed to the database: htop said that the postgres process was using lots of CPU and performing lots of disk I/O, and /var/lib/postgres was 27 GiB.

The server’s disk was full!

Graph of disk space used, I/O utilization, and CPU utilization. Sudden jump from 0% CPU to 60-90%; 0% I/O to 100% I/O; and 10 GiB usage to 27 GiB usage.

Server health graphs on Grafana showing problems across multiple metrics

Checking the application’s logs showed no database errors, but because the disk was full, I couldn’t be sure that the logs were complete.

Fixing Production

First, I made sure we weren’t being DDoSd. I looked at the app’s admin panel and didn’t see anything suspicious. There was usage of the app, but nothing crazy.

A server with 0 bytes of free disk space isn’t good, so I looked for a quick way to reclaim some disk space. Unfortunately, SSH-ing into the server took a while! But that gave me time to do some research and planning.

Once logged in, I ran journalctl --vacuum-size=20M to clean up old log files for some systemd services. This gave me a couple dozen megabytes to work with. After making some room, the PostgreSQL database settled down and stopped hogging the CPU.

I looked up cheap ways to reduce disk usage in the database. I’m new to PostgreSQL, so I wasn’t familiar with the VACUUM command. I tried it out, and sure enough, there was a lot of stuff that needed to be cleaned up in the client_upload table.

First, I ran a normal VACUUM. The command took a while (at least 20 minutes). Disk usage went down a little bit, but the database was still consuming 27 GiB. I then ran a VACUUM (FULL) and the database’s disk usage came crashing down. Hurray!

Disk space used went from 10 GiB up to 28 GiB then down to 3 GiB.

PostgreSQL vacuum saved the day

Lesson Learned

Production monitoring paid off immediately. We chose Opsgenie because of its free plan with SMS notifications. I also configured a custom buzz sound for notifications from Opsgenie which helped me keep on top of production issues.

Disk space is a shared resource. Put the database on a separate partition so you can be sure that application logs are available. (We still need to do this in our production environment…)

SSH-ing into a machine with no disk space can suck. In case SSH hangs the next time we need to log in, we created a local rescue account that we can log into via the serial interface.

Root Cause #1

Why did our server explode when we put the app in the hands of the customer? I first dug into the data in the PostgreSQL database.

One row had 1.29 GiB of data.

Yeah, a gigabyte in one row is fishy, but that doesn’t explain why the database was consuming over 27 GiB of space. But let’s figure out why the row had so much data anyway.

Our app is a voice translator app. When a user speaks into the app, the recording is sent to our server for processing. For quality assurance, we also store these recordings in our database.

The fishy 1.29 GiB row had about 9 hours of uncompressed audio data. That many hours is not reasonable. We only save data while the user is recording, and there’s no way someone intentionally held down the record button for 9 hours.

What could keep the app recording for 9 hours? Maybe the user put a book on the iPad? I doubt it. It’s probably a stupid bug in our software.

I’ll cut to the chase and just tell you what the bug was. If the user taps the record button with multiple fingers, DragGesture‘s onEnded callback is never called, even if all fingers were released:

SwiftUI DragGesture's on

How to use SwiftUI incorrectly, apparently

Because of this quirk in SwiftUI, our app got into an inconsistent state and was continuously recording (for 9 hours).

Fixing this bug was easy:

  • Instead of listening for touches on the record button directly, create an overlay view that listens for touches.
  • Write a custom UIViewRepresentable class which serves as the overlay view.
  • Write a custom UIView class which shows nothing. (Don’t forget to implement an NSCoder initializer for deserialization!)
  • Create a custom UIGestureRecognizer class which keeps track of touches. (Don’t forget to implement all the optional callbacks too, especially touchesCancelled!)
  • Connect your UIGestureRecognizer to your UIView.
  • Plumb callbacks through all the layers so that your gesture recognizer can cleanly communicate touches beginning and ending to your SwiftUI code (like DragGesture was supposed to do).

Maybe I should have just went with React Native instead of SwiftUI. 🙄

Anyway, after fixing the client and fighting Apple’s app review process, these super long 1.29 GiB recordings never happened again.

Lessons Learned

Don’t trust SwiftUI’s innocent-looking examples that seem to do exactly what you need.

SwiftUI is incomplete. You will eventually need to dip into ‘legacy’ UIKit.

Root Cause #2

We figured out what caused 1.29 GiB of data to be logged to PostgreSQL. But how did 1.29 GiB turn into 27 GiB?

Let’s discuss how exactly recordings are stored in the database. Here’s what our schema looks like for recordings:

CREATE TABLE recording (
  id UUID DEFAULT gen_random_uuid(),
  uploaded_pieces bytea[]
);

uploaded_pieces is a column which stores an array of byte arrays. When the server receives data from the client, it stores the data in the database by appending the received chunk to the uploaded_pieces column using the || operator:

-- First, we create an empty row
-- when the client begins the upload.
INSERT INTO recording (uploaded_pieces)
VALUES (array[]::bytea[]);

-- Each time the server receives a chunk
-- from the client, it stores the data:
UPDATE recording
SET uploaded_pieces
  = uploaded_pieces || '\x123456'::bytea
WHERE id = ...;

-- Second chunk:
UPDATE recording
SET uploaded_pieces
  = uploaded_pieces || '\x7890'::bytea
WHERE id = ...;

-- Third chunk:
UPDATE recording
SET uploaded_pieces
  = uploaded_pieces || '\xabcdef'::bytea
WHERE id = ...;

-- etc.

When designing this schema, I considered how PostgreSQL would represent this data on disk. After reading up on PostgreSQL tuples, TOAST, and HOT, I had a decent understanding of how arrays worked. Each row pointed to the uploaded_pieces array which was likely stored on the heap, and the array pointed to different bytea pieces also on the heap. My mental model of a row in the recording table looked like the following diagram:

Graphical representation of a PostgreSQL row containing an array of byte arrays.

How bytea[] worked in my head

When appending to uploaded_pieces, PostgreSQL cannot modify the array in place. Instead, it creates a new array. PostgreSQL leaves the old array unmodified. When the transaction commits, the old array will not be referenced anymore, and it will eventually be vacuumed (aka garbage collected).

Graphical representation of a PostgreSQL row containing an array of byte arrays.

How bytea[] appending worked in my head

Creating new arrays by appending seems to be a problem because it leaves around old arrays. Those old arrays need to be vacuumed, but because vacuuming happens after a time delay, PostgreSQL does not vacuum immediately after committing the append. My hypothesis is that this accumulation of arrays is what caused 1.29 GiB to grow to 27 GiB.

Testing my Theory

To make sure this really was the problem, I created a test program that performed many array appends with a single row. Each bytea is 1024 bytes. Every 1,000 appends, we perform a VACUUM FULL. Here’s what disk usage looked like during the test:

Graph of disk usage while appending 1024-byte pieces to a []bytea. 19,182.5 MiB usage at 20,000 appends before vacuum; 20.5 MiB after vacuum.

At the end of the test, vacuuming reduced disk usage from 18.7 GiB to 20.5 MiB (a 1,000× reduction)!

Disk usage increases as we append and decreases every 1,000 appends due to explicit vacuuming.

After 20,000 appends, the database consumed 19 GiB of storage. Vacuuming cleaned up 99.9% of the usage. Wow!

Why is there so much garbage? Let’s estimate how much garbage there is. Let’s focus on the last part of the graph between 19,000 appends and 20,000 appends.

After vacuuming after 19,000 appends, the database consumed only 19.43 MiB of disk space. Appending 1,000 times (without vacuuming) caused disk usage to increase by 18.7 GiB. Basically all of this disk usage is vacuumable garbage, so we have 1,000 arrays consuming all of that 18.7 GiB. Therefore, each array consumes 18.7 GiB ÷ 1,000 = 19.1 MiB. Hmm, that number looks interesting; it’s close to how big the database was before appending…

The Light Bulb Moment 💡

The dots connected in my brain and I figured out a flaw in my mental model. My mental model did not consider that the byteas could become garbage. To test this, I tweaked my test program to append 1 MiB byteas instead of 1 KiB byteas. (No vacuuming this time. I wanted to focus on the garbage accumulation.) The size of each bytea should now dwarf the size of the array:

Graph of disk usage while appending 1 megabyte pieces to a []bytea. 10.4 MiB increase in disk usage when growing the array from 9 items to 10 items.

Analyzing fewer appends with bigger pieces

The chart shows a quadratic trend, not a linear one. The second append consumed 0 + 1 + 2 = 3 MiB, the third append consumed 0 + 1 + 2 + 3 = 6 MiB, the fourth append consumed 0 + 1 + 2 + 3 + 4 = 10 MiB, and so on.

If the disk usage was linear—which is what we want—we would expect append #10 to increase disk usage by only 1 MiB, but in reality appending to a 9-element array consumes 10.4 MiB (57.1 MiB – 46.7 MiB).

In other words, the first append consumed 0 + 1 = 1 MiB, the second append consumed 0 + 1 + 2 = 3 MiB, the third append consumed 0 + 1 + 2 + 3 = 6 MiB, and so on.

Why is consumption quadratic in the size of the byteas? It’s as if all the byteas are copied whenever we create a new array:

Graphical representation of a PostgreSQL row containing an array of byte arrays.

Appending to a bytea[] copies each individual bytea in addition to copying the array itself

Despite my original intuition of PostgreSQL’s tuple storage on heaps, PostgreSQL does not implement copy-on-write for byteas.

Lessons Learned

Modifying big rows in PostgreSQL is disk intensive. The database will copy the row entirely, including “heap” data, without reusing the old row’s storage.

Application developers need to keep in mind how their application impacts PostgreSQL’s vacuum system. Vacuuming is not as magical as the garbage collectors in C#, Go, and JavaScript.

Fixing the Bug

How do we prevent excessive bytea copying in our database? I tried various tuning parameters, such as fillfactor, but nothing seemed to work.

The real solution is to change the table’s design. Instead of appending into an array, insert a new row. In my case, this necessitated a new table:

CREATE TABLE recording (
  id UUID DEFAULT gen_random_uuid()
);
CREATE TABLE recording_uploaded_piece (
  recording_id UUID NOT NULL,
  order_index INT NOT NULL,
  uploaded_piece bytea NOT NULL,
  PRIMARY KEY (testtable_id, order_index)
);

When the client uploads data to the server, we store the data in the database by simply inserting a new row:

-- Each time the server receives a chunk
-- from the client, it stores the data:
INSERT INTO recording_uploaded_piece
(order_index, uploaded_piece, recording_id)
VALUES (1, '\x123456'::bytea, ...);

-- Second chunk:
INSERT INTO recording_uploaded_piece
(order_index, uploaded_piece, recording_id)
VALUES (2, '\x7890'::bytea, ...);

-- Third chunk:
INSERT INTO recording_uploaded_piece
(order_index, uploaded_piece, recording_id)
VALUES (3, '\xabcdef'::bytea, ...); 

-- etc.

When inserting, we need to keep track of an order_index. Without it, pieces can get out of order; we wouldn’t be able to tell which piece was first and which was last. order_index is later used when loading pieces from the database to reconstruct the original recording:

SELECT uploaded_piece
FROM recording_uploaded_piece
WHERE recording_id = ...
ORDER BY order_index ASC;

How well does this solution perform? I ran it through my test program:

Graph of disk usage while appending 1024-byte pieces to a table using INSERT. 23.1 MiB usage at 20,000 appends.

With INSERT, no garbage is generated, so no vacuuming is necessary

No crazy disk usage. Hurray! Problem fixed.

Lessons Learned

Normalized data can outperform de-normalized data. I always thought that de-normalized data was the way to go for performance (except that data duplication can cause problems), but apparently not!

Inserting into a database is what PostgreSQL is good at. Prefer inserts over updates.

Final Thoughts

When I was originally planning the design for the recording feature, I researched pitfalls of PostgreSQL arrays. I couldn’t find anyone complaining about arrays, so I tried to understand how they work from PostgreSQL’s documentation just to be sure. Despite my due diligence, I still felt into a PostgreSQL performance trap.

I wrote this article to answer the question: “Do PostgreSQL arrays have any downsides? Will they explode with simple usage patterns?”

The answer is yes. Someone experienced in PostgreSQL probably knew the answer already, but a professional newbie like me obviously didn’t, even after some research!

Or perhaps my research method was wrong. Maybe I should have asked ChatGPT. I thought it gave me a pretty decent answer:

Question: Are there any performance downsides to frequently appending to a bytea[] in PostgreSQL?

 

Answer: Yes, there are performance downsides to frequently appending to a bytea field in PostgreSQL. Here are the main concerns:

  • Data Size and Rewrites: Each time you append data to a bytea column, PostgreSQL has to rewrite the entire row to accommodate the new data. Since PostgreSQL handles updates by creating a new version of the row and marking the old version for deletion, appending frequently can lead to a lot of write operations and increased disk usage.
  • Wasted Space and VACUUM: Due to the nature of PostgreSQL’s Multi-Version Concurrency Control (MVCC), frequent updates to the same bytea field can result in “dead tuples” (old, now-unused row versions). This can increase the size of your table on disk until a VACUUM operation is performed to clean up these dead tuples.

Actually, no, ChatGPT 4o completely misunderstood my request. It thought I was talking about just a bytea, not an array of byteas. 🤦‍♀️ Later in its response, it even suggested I use jsonb instead of bytea[]. Yikes!

This post was written without the assistance of LLMs.

Update January 8, 2025: Replaced instances of []bytea with bytea[]. (Too much Go programming…)

Featured Articles

For Customers: Fire Lingo Cloud Training

How to Create New Projects (3:20 min)   How to Review Usage Limits and Add Users (1:39 min) *NOTE: Please note that as of November, 2025 Fire Lingo Cloud comes with unlimited translations.

For Partners: Fire Lingo Face-to-Face Training

01 Introduction to Traduality (2:41 min)02 The Fire Lingo tablets (1:16 min)03 Using Fire Lingo with members (3:37 min)04 Best Practices Serving Members (3:50 min)05 Addressing concerns about recordings (1:39 min)06 Practice with simulated member (3:36 min)07 How to...

You May Also Like…

Fixing Go panic() on Windows

Fixing Go panic() on Windows

This article chronicles my journey developing the Windows Print Debug Go package. Along the way, we will learn a bit...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.