SQLSTATE 22021 ERROR Class 22: Data Exception

character_not_in_repertoire invalid byte sequence for encoding “UTF8”: 0x… — 22021

PostgreSQL error "invalid byte sequence for encoding "UTF8": 0x…" (SQLSTATE 22021): what it means, common causes, and how to fix it.

PG 12, 13, 14, 15, 16, 17, 18 Official docs
Last reviewed May 2025 Grounded in source

Diagnostic Queries

Symptoms

Input contained a byte sequence that is not valid in the UTF8 encoding. PostgreSQL raises SQLSTATE 22021 (character_not_in_repertoire).

  • A byte sequence isn’t valid UTF8.
  • Common when loading data in a different encoding (e.g. Latin-1).
  • The offending byte is shown in hex.

What the server log shows

ERROR:  invalid byte sequence for encoding "UTF8": 0xe9

Why PostgreSQL raises this — what the manual says

Section 23.3.3 Automatic Character Set Conversion Between Server and Client:

“the server will still check that incoming data is valid for that encoding”

A UTF8 database validates that incoming bytes form legal UTF8. A byte from another encoding (e.g. 0xe9 from Latin-1) is not valid UTF8, so PostgreSQL rejects it with 22021.

Common causes

  • Loading Latin-1/Windows-1252 data into a UTF8 database without conversion.
  • client_encoding not matching the actual data encoding.
  • Binary/garbage bytes in a text field.

How to fix it

  1. Set client_encoding to the source encoding so the server converts it (e.g. SET client_encoding = 'LATIN1';).
  2. Convert the file to UTF8 first (e.g. iconv -f latin1 -t utf8).
  3. Clean out invalid bytes before loading.

Related & next steps

Reference: PostgreSQL 18 Section 24.3 “Character Set Support”.

Was this helpful?