Diagnostic Queries
Symptoms
Input contained a byte sequence that is not valid in the UTF8 encoding. PostgreSQL raises SQLSTATE 22021 (character_not_in_repertoire).
- A byte sequence isn’t valid UTF8.
- Common when loading data in a different encoding (e.g. Latin-1).
- The offending byte is shown in hex.
What the server log shows
ERROR: invalid byte sequence for encoding "UTF8": 0xe9
Why PostgreSQL raises this — what the manual says
Section 23.3.3 Automatic Character Set Conversion Between Server and Client:
“the server will still check that incoming data is valid for that encoding”
A UTF8 database validates that incoming bytes form legal UTF8. A byte from another encoding (e.g. 0xe9 from Latin-1) is not valid UTF8, so PostgreSQL rejects it with 22021.
Common causes
- Loading Latin-1/Windows-1252 data into a UTF8 database without conversion.
client_encodingnot matching the actual data encoding.- Binary/garbage bytes in a text field.
How to fix it
- Set
client_encodingto the source encoding so the server converts it (e.g.SET client_encoding = 'LATIN1';). - Convert the file to UTF8 first (e.g.
iconv -f latin1 -t utf8). - Clean out invalid bytes before loading.
Related & next steps
Reference: PostgreSQL 18 Section 24.3 “Character Set Support”.
Thanks — noted. This helps keep the database accurate.