the problem
select length('𝓦𝓮𝓲𝓻𝓭 𝓒𝓱𝓪𝓻𝓪𝓬𝓽𝓮𝓻');
returns:
7
this is not incorrect. length() counts characters, and those unicode glyphs are single code points.
the problem shows up later.
you validate with length(), but the database enforces limits in bytes.
what actually breaks
example:
varchar(10)
your app logic:
length(text) <= 10→ passes
postgres storage:
- utf-8 bytes > 10 → insert FAILS
result: validation passes but error on insert.
the fix
validate using bytes when storage limits matter:
select octet_length(text) <= 10
mental model
length()→ semantic length (characters)octet_length()→ physical length (bytes on disk / wire)
variation
bit_length()→ bit length (bits)byte_length()→ byte length (bigquery)char_length()→ character length (characters)
read more: