This post covers the engineering lessons we learned from the first week of usage.
We had barely any analytics in the app. It’s an MVP and we wanted to focus on qualitative feedback rather than hiding behind dashboards. But we didn’t need analytics to know that they were using the app because my phone was buzzing like mad:

Alert emails from Opsgenie: “Disk latency is growing”, “CPU load is >80%”, “Time spent in IO is too high”
Thank goodness I set up production alerts!
Our servers were apparently on fire, but how exactly? I hopped on the server and everything pointed to the database: htop said that the postgres process was using lots of CPU and performing lots of disk I/O, and /var/lib/postgres was 27 GiB.
The server’s disk was full!

Server health graphs on Grafana showing problems across multiple metrics
Checking the application’s logs showed no database errors, but because the disk was full, I couldn’t be sure that the logs were complete.
Fixing Production
First, I made sure we weren’t being DDoSd. I looked at the app’s admin panel and didn’t see anything suspicious. There was usage of the app, but nothing crazy.
A server with 0 bytes of free disk space isn’t good, so I looked for a quick way to reclaim some disk space. Unfortunately, SSH-ing into the server took a while! But that gave me time to do some research and planning.
Once logged in, I ran journalctl --vacuum-size=20M to clean up old log files for some systemd services. This gave me a couple dozen megabytes to work with. After making some room, the PostgreSQL database settled down and stopped hogging the CPU.
I looked up cheap ways to reduce disk usage in the database. I’m new to PostgreSQL, so I wasn’t familiar with the VACUUM command. I tried it out, and sure enough, there was a lot of stuff that needed to be cleaned up in the client_upload table.
First, I ran a normal VACUUM. The command took a while (at least 20 minutes). Disk usage went down a little bit, but the database was still consuming 27 GiB. I then ran a VACUUM (FULL) and the database’s disk usage came crashing down. Hurray!

PostgreSQL vacuum saved the day
Lesson Learned
Production monitoring paid off immediately. We chose Opsgenie because of its free plan with SMS notifications. I also configured a custom buzz sound for notifications from Opsgenie which helped me keep on top of production issues.
Disk space is a shared resource. Put the database on a separate partition so you can be sure that application logs are available. (We still need to do this in our production environment…)
SSH-ing into a machine with no disk space can suck. In case SSH hangs the next time we need to log in, we created a local rescue account that we can log into via the serial interface.
Root Cause #1
Why did our server explode when we put the app in the hands of the customer? I first dug into the data in the PostgreSQL database.
One row had 1.29 GiB of data.
Yeah, a gigabyte in one row is fishy, but that doesn’t explain why the database was consuming over 27 GiB of space. But let’s figure out why the row had so much data anyway.
Our app is a voice translator app. When a user speaks into the app, the recording is sent to our server for processing. For quality assurance, we also store these recordings in our database.
The fishy 1.29 GiB row had about 9 hours of uncompressed audio data. That many hours is not reasonable. We only save data while the user is recording, and there’s no way someone intentionally held down the record button for 9 hours.
What could keep the app recording for 9 hours? Maybe the user put a book on the iPad? I doubt it. It’s probably a stupid bug in our software.
I’ll cut to the chase and just tell you what the bug was. If the user taps the record button with multiple fingers, DragGesture‘s onEnded callback is never called, even if all fingers were released:

How to use SwiftUI incorrectly, apparently
Because of this quirk in SwiftUI, our app got into an inconsistent state and was continuously recording (for 9 hours).
Fixing this bug was easy:
- Instead of listening for touches on the record button directly, create an overlay view that listens for touches.
- Write a custom
UIViewRepresentableclass which serves as the overlay view. - Write a custom
UIViewclass which shows nothing. (Don’t forget to implement an NSCoder initializer for deserialization!) - Create a custom
UIGestureRecognizerclass which keeps track of touches. (Don’t forget to implement all the optional callbacks too, especiallytouchesCancelled!) - Connect your
UIGestureRecognizerto yourUIView. - Plumb callbacks through all the layers so that your gesture recognizer can cleanly communicate touches beginning and ending to your SwiftUI code (like
DragGesturewas supposed to do).
Maybe I should have just went with React Native instead of SwiftUI. 🙄
Anyway, after fixing the client and fighting Apple’s app review process, these super long 1.29 GiB recordings never happened again.
Lessons Learned
Don’t trust SwiftUI’s innocent-looking examples that seem to do exactly what you need.
SwiftUI is incomplete. You will eventually need to dip into ‘legacy’ UIKit.
Root Cause #2
We figured out what caused 1.29 GiB of data to be logged to PostgreSQL. But how did 1.29 GiB turn into 27 GiB?
Let’s discuss how exactly recordings are stored in the database. Here’s what our schema looks like for recordings:
CREATE TABLE recording ( id UUID DEFAULT gen_random_uuid(), uploaded_pieces bytea[] );
uploaded_pieces is a column which stores an array of byte arrays. When the server receives data from the client, it stores the data in the database by appending the received chunk to the uploaded_pieces column using the || operator:
-- First, we create an empty row -- when the client begins the upload. INSERT INTO recording (uploaded_pieces) VALUES (array[]::bytea[]); -- Each time the server receives a chunk -- from the client, it stores the data: UPDATE recording SET uploaded_pieces = uploaded_pieces || '\x123456'::bytea WHERE id = ...; -- Second chunk: UPDATE recording SET uploaded_pieces = uploaded_pieces || '\x7890'::bytea WHERE id = ...; -- Third chunk: UPDATE recording SET uploaded_pieces = uploaded_pieces || '\xabcdef'::bytea WHERE id = ...; -- etc.
When designing this schema, I considered how PostgreSQL would represent this data on disk. After reading up on PostgreSQL tuples, TOAST, and HOT, I had a decent understanding of how arrays worked. Each row pointed to the uploaded_pieces array which was likely stored on the heap, and the array pointed to different bytea pieces also on the heap. My mental model of a row in the recording table looked like the following diagram:

How bytea[] worked in my head
uploaded_pieces, PostgreSQL cannot modify the array in place. Instead, it creates a new array. PostgreSQL leaves the old array unmodified. When the transaction commits, the old array will not be referenced anymore, and it will eventually be vacuumed (aka garbage collected).
How bytea[] appending worked in my head
Testing my Theory
To make sure this really was the problem, I created a test program that performed many array appends with a single row. Each bytea is 1024 bytes. Every 1,000 appends, we perform a VACUUM FULL. Here’s what disk usage looked like during the test:
![Graph of disk usage while appending 1024-byte pieces to a []bytea. 19,182.5 MiB usage at 20,000 appends before vacuum; 20.5 MiB after vacuum.](https://traduality.com/wp-content/uploads/2024/11/Screenshot-2024-11-02-at-00.09.21-1024x581.png)
At the end of the test, vacuuming reduced disk usage from 18.7 GiB to 20.5 MiB (a 1,000× reduction)!
After 20,000 appends, the database consumed 19 GiB of storage. Vacuuming cleaned up 99.9% of the usage. Wow!
Why is there so much garbage? Let’s estimate how much garbage there is. Let’s focus on the last part of the graph between 19,000 appends and 20,000 appends.
After vacuuming after 19,000 appends, the database consumed only 19.43 MiB of disk space. Appending 1,000 times (without vacuuming) caused disk usage to increase by 18.7 GiB. Basically all of this disk usage is vacuumable garbage, so we have 1,000 arrays consuming all of that 18.7 GiB. Therefore, each array consumes 18.7 GiB ÷ 1,000 = 19.1 MiB. Hmm, that number looks interesting; it’s close to how big the database was before appending…
The Light Bulb Moment 💡
The dots connected in my brain and I figured out a flaw in my mental model. My mental model did not consider that the byteas could become garbage. To test this, I tweaked my test program to append 1 MiB byteas instead of 1 KiB byteas. (No vacuuming this time. I wanted to focus on the garbage accumulation.) The size of each bytea should now dwarf the size of the array:
![Graph of disk usage while appending 1 megabyte pieces to a []bytea. 10.4 MiB increase in disk usage when growing the array from 9 items to 10 items.](https://traduality.com/wp-content/uploads/2024/11/Screenshot-2024-11-01-at-23.56.32-1024x591.png)
Analyzing fewer appends with bigger pieces
If the disk usage was linear—which is what we want—we would expect append #10 to increase disk usage by only 1 MiB, but in reality appending to a 9-element array consumes 10.4 MiB (57.1 MiB – 46.7 MiB).
In other words, the first append consumed 0 + 1 = 1 MiB, the second append consumed 0 + 1 + 2 = 3 MiB, the third append consumed 0 + 1 + 2 + 3 = 6 MiB, and so on.
Why is consumption quadratic in the size of the byteas? It’s as if all the byteas are copied whenever we create a new array:

Appending to a bytea[] copies each individual bytea in addition to copying the array itself
byteas.Lessons Learned
Modifying big rows in PostgreSQL is disk intensive. The database will copy the row entirely, including “heap” data, without reusing the old row’s storage.
Application developers need to keep in mind how their application impacts PostgreSQL’s vacuum system. Vacuuming is not as magical as the garbage collectors in C#, Go, and JavaScript.
Fixing the Bug
How do we prevent excessive bytea copying in our database? I tried various tuning parameters, such as fillfactor, but nothing seemed to work.
The real solution is to change the table’s design. Instead of appending into an array, insert a new row. In my case, this necessitated a new table:
CREATE TABLE recording ( id UUID DEFAULT gen_random_uuid() ); CREATE TABLE recording_uploaded_piece ( recording_id UUID NOT NULL, order_index INT NOT NULL, uploaded_piece bytea NOT NULL, PRIMARY KEY (testtable_id, order_index) );
When the client uploads data to the server, we store the data in the database by simply inserting a new row:
-- Each time the server receives a chunk -- from the client, it stores the data: INSERT INTO recording_uploaded_piece (order_index, uploaded_piece, recording_id) VALUES (1, '\x123456'::bytea, ...); -- Second chunk: INSERT INTO recording_uploaded_piece (order_index, uploaded_piece, recording_id) VALUES (2, '\x7890'::bytea, ...); -- Third chunk: INSERT INTO recording_uploaded_piece (order_index, uploaded_piece, recording_id) VALUES (3, '\xabcdef'::bytea, ...); -- etc.
When inserting, we need to keep track of an order_index. Without it, pieces can get out of order; we wouldn’t be able to tell which piece was first and which was last. order_index is later used when loading pieces from the database to reconstruct the original recording:
SELECT uploaded_piece FROM recording_uploaded_piece WHERE recording_id = ... ORDER BY order_index ASC;
How well does this solution perform? I ran it through my test program:

With INSERT, no garbage is generated, so no vacuuming is necessary
No crazy disk usage. Hurray! Problem fixed.
Lessons Learned
Normalized data can outperform de-normalized data. I always thought that de-normalized data was the way to go for performance (except that data duplication can cause problems), but apparently not!
Inserting into a database is what PostgreSQL is good at. Prefer inserts over updates.
Final Thoughts
When I was originally planning the design for the recording feature, I researched pitfalls of PostgreSQL arrays. I couldn’t find anyone complaining about arrays, so I tried to understand how they work from PostgreSQL’s documentation just to be sure. Despite my due diligence, I still felt into a PostgreSQL performance trap.
I wrote this article to answer the question: “Do PostgreSQL arrays have any downsides? Will they explode with simple usage patterns?”
The answer is yes. Someone experienced in PostgreSQL probably knew the answer already, but a professional newbie like me obviously didn’t, even after some research!
Or perhaps my research method was wrong. Maybe I should have asked ChatGPT. I thought it gave me a pretty decent answer:
Question: Are there any performance downsides to frequently appending to a
bytea[]in PostgreSQL?
Answer: Yes, there are performance downsides to frequently appending to a
byteafield in PostgreSQL. Here are the main concerns:
- Data Size and Rewrites: Each time you append data to a
byteacolumn, PostgreSQL has to rewrite the entire row to accommodate the new data. Since PostgreSQL handles updates by creating a new version of the row and marking the old version for deletion, appending frequently can lead to a lot of write operations and increased disk usage.- Wasted Space and VACUUM: Due to the nature of PostgreSQL’s Multi-Version Concurrency Control (MVCC), frequent updates to the same
byteafield can result in “dead tuples” (old, now-unused row versions). This can increase the size of your table on disk until aVACUUMoperation is performed to clean up these dead tuples.
Actually, no, ChatGPT 4o completely misunderstood my request. It thought I was talking about just a bytea, not an array of byteas. 🤦♀️ Later in its response, it even suggested I use jsonb instead of bytea[]. Yikes!
This post was written without the assistance of LLMs.
Update January 8, 2025: Replaced instances of []bytea with bytea[]. (Too much Go programming…)




0 Comments