Diagnostic Queries
Symptoms
Bytes being read into the database are not valid in the target/client encoding. PostgreSQL raises SQLSTATE 22021 (character_not_in_repertoire) and shows the offending byte.
- Common during COPY/import of files with the wrong declared encoding.
- The message shows the bad byte, e.g.
0x80. - Mismatch between actual file bytes and
client_encoding.
What the server log shows
ERROR: invalid byte sequence for encoding "UTF8": 0x80
Why PostgreSQL raises this — what the manual says
Section 23.3.3 Automatic Character Set Conversion Between Server and Client:
“the server will still check that incoming data is valid for that encoding”
PostgreSQL validates that incoming bytes form legal code points in the declared client encoding before converting to the database encoding. An illegal byte sequence (e.g. Latin-1 bytes labelled UTF-8) cannot be decoded and fails with 22021.
Common causes
- Importing a Latin-1/Windows-1252 file as UTF-8.
- A wrong
client_encodingfor the data being sent. - Mixed-encoding data in one file.
How to fix it
- Set the correct client encoding:
SET client_encoding = 'LATIN1';(or convert the file to UTF-8 first). - For COPY, specify the file’s encoding:
COPY t FROM '…' WITH (ENCODING 'WIN1252');. - Clean/convert the source data with
iconvbefore loading.
Related & next steps
Reference: PostgreSQL 18 Section 24.3 “Character Set Support”.
Thanks — noted. This helps keep the database accurate.