Wednesday, February 4, 2026

Selecting the best code web page and collation for migration from mainframe Db2 to Amazon RDS for Db2


When migrating from Db2 mainframe (z/OS) to Amazon Relational Database Service (Amazon RDS) for Db2 on Linux, deciding on the suitable code web page and collation sequence is important for guaranteeing information compatibility. The suitable selection will assist forestall information truncation, character enlargement points, and keep constant show of overseas characters (similar to accented Latin letters) and sorting conduct.

Whereas Amazon RDS for Db2 defaults to Unicode, you may obtain mainframe compatibility by means of cautious configuration. This publish guides you thru deciding on the suitable code web page for Amazon RDS for Db2 to assist guarantee seamless information migration.

Code web page, collation, and territory parameters in Amazon RDS for Db2 can’t be modified after database creation. Cautious choice upfront avoids the necessity for database recreation and remigration.

Understanding mainframe CCSID and code web page

In Db2 for z/OS®, every character is mapped to a quantity relying upon the CCSID (coded character set identifier) and code web page related to that character. A code web page is a numeric identifier for a selected mapping of bytes and is simply the byte-to-glyph mapping. An instance is code web page 037, often known as Prolonged Binary Coded Decimal Interchange Code (EBCDIC) for the US and Canada. The CCSID is a full identifier describing the character set, encoding, and conversion guidelines. The CCSID contains code web page and locale with conversion metadata. An instance is CCSID 0037 (EBCDIC for the US and Canada together with conversion guidelines). Each CCSID implies a code web page, however a code web page doesn’t indicate a whole CCSID. Db2 for z/OS performs character conversion internally and desires CCSID to know precisely methods to convert information.

Regional code pages

North American and Western European code pages

  • IBM-37 (CCSID 37, CP037)
  • IBM-500 (CCSID 500, CP500)
  • IBM-1047 (CCSID 1047, CP1047)

German and Austrian code pages

  • IBM-273 (CCSID 273, CP273): Conventional German and Austrian EBCDIC
  • IBM-1141 (CCSID 1141, CP1141): German and Austrian EBCDIC with Euro image help

Japanese code pages

  • IBM-930 (CCSID 930): Japanese EBCDIC blended SBCS and DBCS code web page utilizing Katakana SBCS and JIS X 0208 DBCS
  • IBM-939 (CCSID 939): Japanese EBCDIC blended SBCS and DBCS code web page utilizing Latin (Roman) SBCS and JIS X 0208 DBCS
  • IBM-1390 (CCSID 1390): Japanese EBCDIC blended SBCS/DBCS character CCSID with Euro (€) image help
  • IBM-1399 (CCSID 1399): Japanese EBCDIC DBCS-only graphic CCSID
  • IBM-5026 (CCSID 5026): Japanese EBCDIC Katakana SBCS code web page with uppercase-only Latin letters and Katakana-oriented collation
  • IBM-5035 (CCSID 5035): Japanese EBCDIC DBCS-only graphic CCSID

These EBCDIC code pages encode character units optimized for particular areas and languages. Whereas many share comparable Latin-1 character repertoires, they differ within the project of particular code factors to reinforce compatibility with native necessities and open techniques. On IBM platforms, character CCSIDs (CHAR and VARCHAR) and graphic CCSIDs (GRAPHIC and VARGRAPHIC) are distinct. DBCS-only CCSIDs similar to 1399 and 5035 are supposed for graphic information and can’t signify SBCS characters.

Regional code web page particulars

North American and Western European:

  • CCSID 37 (CP037): Conventional EBCDIC for the U.S., Canada, the Netherlands, and Portugal. Optimized for legacy mainframe functions with normal image placements. Originated within the Sixties for System/360 mainframes.
  • CCSID 500 (CP500): Worldwide EBCDIC variant for Belgium, French-Canada, and Switzerland. Options changes for native symbols (for instance, forex symbols like CHF). Shares most code factors with CCSID 37 however contains swaps for multilingual help.
  • CCSID 1047 (CP1047): Open Methods EBCDIC launched within the Nineties for OS/390 USS (Unix System Companies). A variant of CCSID 37 with six particular character swaps to align with POSIX/C requirements.
  • CCSID 1140 (CP1140): Enhanced U.S. and Canada (CP037) that features the Euro image (€).

German and Austrian:

  • CCSID 273 (CP273): Conventional German and Austrian EBCDIC encoding with German-specific characters similar to ä, ö, ü, and ß. Broadly utilized in German-speaking international locations for legacy mainframe techniques.
  • CCSID 1141 (CP1141): Enhanced German and Austrian EBCDIC that features the Euro image (€). Launched to help European financial union necessities whereas sustaining German character help.

Japanese:

  • CCSID 930 (CP930): Japanese EBCDIC blended SBCS and DBCS code web page utilizing Latin (Roman) SBCS and JIS X 0208 DBCS.
  • CCSID 939 (CP939): Japanese EBCDIC blended SBCS and DBCS code web page utilizing Katakana SBCS and JIS X 0208 DBCS. It helps each SBCS (Katakana, Latin, and digits) and DBCS Kanji. It’s quite common on z/OS and older mainframe environments.
  • CCSID 1390 (CP1390): Japanese EBCDIC blended SBCS and DBCS code with Euro (€) image help.
  • CCSID 1399 (CP1399): Japanese EBCDIC DBCS graphic CCSOID (no SBCS).
  • CCSID 5026 (CP5026): Japanese EBCDIC Katakana SBCS code web page with uppercase-only Latin letters and a Katakana-oriented collation sequence.
  • CCSID 5035 (CP5035): Japanese EBCDIC DBCS graphic CCSID (DBCS solely).

Migration compatibility and ISO requirements

For Latin-based code pages (37, 500, 1047, 273): ISO-8859-1 (Latin-1 or CCSID 819) serves because the direct ASCII equal, enabling lossless conversion for many characters.

Euro image consideration: ISO-8859-1 doesn’t embody the Euro image (€) as a result of it was standardized earlier than the Euro’s introduction. For mainframe techniques utilizing Euro-enabled code pages (1140, 1141, 1390), take into account:

  • ISO-8859-15 (Latin-9): Contains the Euro image and replaces some not often used characters from ISO-8859-1
  • UTF-8: Complete Unicode help together with Euro and all worldwide characters

For Japanese code pages (930, 939, 1390, 1399): UTF-8 is the advisable goal encoding as a result of ISO-8859-1 can’t signify Japanese characters. Word that both 930 or 939 may be transformed to UTF-8 as a result of conversion is one path. For those who’re attempting to change again from UTF-8 to both 930 or 939, nearly half of the characters in UTF-8 received’t work in 930 and 939. Due to this, you received’t be capable to convert forwards and backwards with out points.

Understanding Db2 LUW code pages

Db2 LUW (Linux, Unix, and Home windows) doesn’t use CCSIDs the identical approach Db2 on z/OS makes use of.

  • Db2 LUW is constructed on a code web page as a substitute of CCSIDs.
  • A code web page is immutable and outlined on the database creation time.
  • DB2 LUW depends on shopper/server code web page conversion.
  • Db2 LUW doesn’t have a CCSID catalog column—the platform makes use of an Worldwide Parts for Unicode (ICU) and US locales somewhat than IBM CCSID definitions.

Code web page help: Db2 LUW helps varied code pages together with UTF-8 (Unicode) for complete character help.

Character compatibility: ISO-8859-1 serves because the direct ASCII counterpart to IBM 37, IBM 500, and IBM-1047’s character repertoire, enabling lossless conversion for Latin-1 characters.

Collation conduct:

  • Normal collations are locale-based (for instance, en_US)
  • Customized 256-byte sequences can simulate EBCDIC sorting for single-byte databases
  • Unicode databases use CLDR (Widespread Locale Knowledge Repository) or UCA (Unicode Collation Algorithm) collations, making EBCDIC simulation extra complicated

Character show consistency: International characters (non-ASCII) show accurately when information is correctly transformed throughout switch. With out conversion, EBCDIC bytes seem as invalid characters in LUW (for instance, 0xC1 represents A in EBCDIC however is invalid in ASCII). Db2’s DRDA protocol and instruments like db2move deal with computerized conversion.

Code web page choice technique for Amazon RDS for Db2

On this part, we overview mainframe in comparison with Amazon RDS code web page flexibility.

Mainframe Db2 permits code web page definition at a number of ranges:

  • Subsystem stage: Defining CCSIDs (SBCS_CCSID, MIXED_CCSID, and DBCS_CCSID) establishes defaults for objects and functions that don’t explicitly specify a CCSID
  • Tablespace and Desk stage: A CCSID clause throughout creation units the default encoding for all character columns until overridden on the column stage
  • Column stage: Particular CCSID for particular person columns (for instance, VARCHAR(50) CCSID 1208 for UTF-8)

Db2 defines the code web page solely at database creation time and can’t be modified afterward.

Key issues

When creating an Amazon RDS for Db2 occasion, keep away from specifying the default database identify throughout occasion creation, as a result of it defaults to UTF-8 with US territory.

Code web page equivalency:

  • IBM code web page 819 (ISO-8859-1) is the precise equal of mainframe code pages 37, 500, 1047, and 273 (excluding the Euro image).
  • For Euro image help, ISO-8859-15 (Latin-9) gives compatibility with code pages 1141 and different Euro-enabled variants.
  • Japanese code pages (930, 939, 1390, and 1399) require UTF-8 encoding for correct character illustration. Nevertheless the counterpart of Japanese code pages 930 and 939 in Db2 LUW is IBM-943. It seems that most Japanese clients select IBM-943 for Japanese code pages 930 and 939.

Choice framework

Select the suitable code web page based mostly in your particular migration necessities.

Possibility 1a: ISO-8859-1 Code set

Use ISO-8859-1 when:

  • Utilizing just one CCSID (37, 500, or 1047) for all tables and columns in mainframe Db2.
  • Zero information truncation is required.
  • Actual mainframe sorting order should be preserved. The sorting order in Db2 is enabled on the time of the database creation by means of the collation parameter.
  • Multi-language help isn’t wanted.
  • The Euro image isn’t required (ISO-8859-1 predates Euro introduction).

Implementation:

$ db2 hook up with rdsadmin consumer  utilizing 
$ db2 "name rdsadmin.create_database('',32768,'ISO-8859-1','US','EBCDIC_819_037')"

Substitute , , and  along with your precise values.

Possibility 1b: ISO-8859-15 code set (Euro help)

Use ISO-8859-15 when:

  • Migrating from Euro-enabled mainframe code pages (1141, and 1390)
  • Euro image (€) help is required
  • Sustaining single-byte character encoding—SBCS (Single Byte Character Set)
  • Restricted multi-language necessities

Implementation:

$ db2 "name rdsadmin.create_database('',32768,'ISO-8859-15','US','SYSTEM')"

Collation sequence choices:

The fifth parameter specifies the collation sequence. Select based mostly in your mainframe CCSID:

Collation worth

Goal code web page

Mainframe CCSID

Description

EBCDIC_819_037

ISO-8859-1

37

EBCDIC US English

EBCDIC_819_500

ISO-8859-1 500, 1047 EBCDIC Worldwide
EBCDIC_850_037 ASCII Latin 37 EBCDIC US English
EBCDIC_850_500 ASCII Latin 500 EBCDIC Worldwide
EBCDIC_932_5026 Japanese 932 (collation 5026) EBCDIC US English
EBCDIC_932_5035 Japanese 932 (collation 5035) EBCDIC Worldwide
EBCDIC_1252_037 Home windows Latin 37 EBCDIC US English
EBCDIC_1252_500 Home windows Latin 500 EBCDIC Worldwide

Choice information:

  • Mainframe CCSID 37: Use EBCDIC_819_037
  • Mainframe CCSID 500 or 1047: Use EBCDIC_819_500
  • Mainframe CCSID 273: Use EBCDIC_819_500
  • Mainframe CCSID 930: Use EBCDIC_932_5026 (Katakana collation)
  • Mainframe CCSID 939: Use EBCDIC_932_5035 (Latin collation)
  • Euro-enabled CCSID (1141, 1390, 1399): Take into account UTF-8 or ISO-8859-15

The CREATE_DATABASE command creates a Db2 database with:

  • Beneficial 32 Ok web page dimension. The default web page dimension is 8 Ok.
  • ISO-8859-1 code set (equal to IBM code web page 819)
  • US territory
  • EBCDIC collation for the required mainframe code web page

Supported territory codes for ISO-8859-1 and ISO-8859-15: AL, AU, AT, BE, BR, CA, CH, CN, DE, DK, ES, ET, FI, GB, ID, IE, IN, IS, IT, JP, KE, KR, Lat, MY, NL, NZ, NO, PH, PT, TW, TZ, US, and ZA

Seek the advice of the IBM documentation for legitimate territory and code web page combos.

Possibility 2: UTF-8 code web page

Use UTF-8 when:

  • Multi-language information help is required
  • Migrating from Japanese mainframe code pages (930, 939, or 1390)
  • Euro image (€) and complete worldwide character help wanted
  • Future proofing for world character dealing with is necessary
  • Mainframe sorting order preservation isn’t important

Vital issues:

  • Character enlargement: UTF-8 may require extra cupboard space
    • Accented vowels (à, é, î), forex symbols (¢, £, Â¥), fractions (¼, ½, ¾), and particular characters (ß, ¬, µ) require 1 byte in mainframe code pages however 2 bytes in UTF-8
    • DDL modifications required: CHAR and VARCHAR lengths should be adjusted, which isn’t trivial

Implementation:

$ db2 hook up with rdsadmin consumer  utilizing 
$ db2 "name rdsadmin.create_database('',32768,'UTF-8','US','SYSTEM')"

UTF-8 is supported in all territories. Seek the advice of the IBM documentation for out there territory choices.

Understanding CODEUNITS32 in UTF-8 Db2 databases

When utilizing UTF-8 in Amazon RDS for Db2, mainframe migration inevitably encounters information truncation points as a result of character enlargement—some characters develop from 1 byte on mainframe to 2 bytes or extra in UTF-8.

To keep away from DDL modifications, Db2 gives a mechanism to vary the default string measurement from OCTETS to CODEUNITS32 by modifying the STRING_UNITS database configuration parameter.

$ db2 hook up with rdsadmin consumer  utilizing 
$ db2 "name rdsadmin.update_db_param('','STRING_UNITS', 'CODEUNITS32', 'NO')" 

This transformation requires an occasion restart as a result of STRING_UNITS isn’t a dynamic parameter. The DDLs should be created after updating the parameter for this parameter to take impact.

Sensible instance: ISO-8859-1 database

Alternatively, you may specify CODEUNITS32 on the column stage when creating objects with out altering the STRING_UNITS parameter.

Create an ISO-8859-1 database:

$ db2 hook up with rdsadmin consumer  utilizing 
$ db2 "name rdsadmin.create_database('MYDB',32768, 'ISO-8859-1','US','EBCDIC_819_037')" 

Check with pattern information:

db2 hook up with mydb consumer  utilizing 
db2 "create desk t1 (c1 char(2))"
db2 "insert into t1 values ('ßA')"
db2 "insert into t1 values ('ßB')"
db2 "insert into t1 values ('ßC')"
db2 "insert into t1 values ('¬I')"
db2 "insert into t1 values ('µ5')"
db2 "insert into t1 values ('¼O')"
db2 "insert into t1 values ('¼Q')"
db2 "insert into t1 values ('¼S')"
db2 "choose c1, character_length(c1, octets) LENGTH from t1"

Outcome:

C1 LENGTH          
-- -----------
ßA           2
ßB           2
ßC           2
¬I           2
µ5           2
¼O           2
¼Q           2
¼S           2
 
  8 file(s) chosen.

Discover that every column makes use of precisely 2 bytes within the ISO-8859-1 database. Do the identical take a look at with a UTF-8 database.

Create a UTF-8 database:

$ db2 hook up with rdsadmin consumer  utilizing 
$ db2 "name rdsadmin.create_database('MYUTFDB',32768, 'UTF-8','US','SYSTEM')"

Check with CODEUNITS32:

db2 hook up with myutfdb consumer  utilizing 
db2 "create desk t1 (c1 char(2 CODEUNITS32))"
db2 "insert into t1 values ('ßA')"
db2 "insert into t1 values ('ßB')"
db2 "insert into t1 values ('ßC')"
db2 "insert into t1 values ('¬I')"
db2 "insert into t1 values ('µ5')"
db2 "insert into t1 values ('¼O')"
db2 "insert into t1 values ('¼Q')"
db2 "insert into t1 values ('¼S')"
db2 "choose c1, character_length(c1, octets) LENGTH from t1"

Outcome:

C1 LENGTH          
-- -----------
ßA           3
ßB           3
ßC           3
¬I           3
µ5           3
¼O           3
¼Q           3
¼S           3
 
  8 file(s) chosen.

Key statement: Every column makes use of 3 bytes within the UTF-8 database, regardless of being outlined as CHAR(2 CODEUNITS32). The CODEUNITS32 specification permits for 2 characters with as much as 4 bytes every (8 bytes complete allocation).

CODEUNITS32 trade-offs and proposals

Impression of database stage CODEUNITS32:

  • Default CHAR and VARCHAR allocation modifications from 1 byte to 4 bytes per character
  • Reduces most lengths:
    • CHAR: from 255 to 63 characters
    • VARCHAR: from 32,704 to eight,174 bytes (32 KB web page dimension)
  • For databases with 90% ASCII characters, dimension can broaden to three.8 occasions the unique

It’s advisable to keep away from database stage CODEUNITS32. One choice is to make use of selective column-level CODEUNITS32, however accomplish that with warning, as a result of it may waste vital house.

For instance, CREATE TABLE t1 (c1 CHAR(2 CODEUNITS32)) allocates 8 bytes.

A greater strategy is to make use of OCTETS with adjusted size. OCTETS is a illustration of a byte. It’s a one-to-one mapping between a personality and a quantity related in non-Unicode code pages. CREATE TABLE t1 (c1 CHAR(4 OCTETS)) allocates precisely 4 bytes.

Finest observe: Use CODEUNITS32 solely when sure that you just’ll retailer 3–4 byte characters (for instance, East Asian languages). For many worldwide characters, use and alter OCTETS size accordingly.

Collation variations: EBCDIC in comparison with SYSTEM

Kind order comparability.

EBCDIC collation order: Particular characters, lowercase letters, uppercase letters, then numerals

SYSTEM collation order: Numerals, uppercase letters, lowercase letters, then particular characters

Sensible examples

EBCDIC collation take a look at:

db2 "with s(v) as (values '-','0','A','a','ß','¬','µ','¼') choose v as VALUE from s order by v"

Outcome:

VALUE
-----
ß    
¬    
-    
a    
µ    
¼    
A    
0    
  8 file(s) chosen.

The consequence reveals the anticipated type order of particular characters, lowercase, uppercase, and numerals.

SYSTEM collation take a look at:

$ db2 "with s(v) as (values '-','0','A','a','ß','¬','µ','¼') choose v as VALUE from s order by v"

Outcome:

VALUE
-----
-
0
A
a
¬
µ
¼
ß
  8 file(s) chosen.

The consequence reveals the anticipated type order of numerals, uppercase, lowercase, and particular characters.

Dealing with unsupported characters in ISO-8859-1

When functions try to insert characters not out there in ISO-8859-1 (CCSID 819), the database performs silent character substitution.

Instance take a look at:

db2 "create desk t1(c1 char(4))"
db2 "insert into t1 values ('常')" -- Japanese character
db2 "choose c1, hex(c1) hex, character_length(c1, octets) size from t1"

Outcome:

C1   HEX      LENGTH     
---- -------- -----------
     1A202020           4

What occurred:

  • No error or warning was generated
  • The Japanese character was changed by SUB (substitute) character (0x1A)
  • The remaining bytes are full of areas (0x20)

Key takeaway: ISO-8859-1 databases can’t retailer unsupported characters. Unknown characters are silently changed with 0x1A adopted by areas (0x20) for remaining byte positions.

Guaranteeing character consistency throughout platforms

Knowledge migration finest practices

Conversion instruments: Use Db2 federation (catalog z/OS database in LUW) or instruments like db2move/EXPORT with ASC/DEL codecs for computerized character conversion.

Test the mainframe CCSID configuration

Desk-level CCSID:

SELECT NAME, ENCODING_SCHEME
FROM SYSIBM.SYSTABLES
WHERE NAME = '' AND CREATOR = '

Column-level CCSID:

SELECT NAME, CCSID
FROM SYSIBM.SYSCOLUMNS WHERE TBNAME = '' AND TBCREATOR = '';

Instance consequence:

NAME                |CCSID|
--------------------+-----+
COL1.               |   37|
COL2                |    0| -- 0 = subsystem default
COL3                |    0|
COL4                |   37|

Validation and testing

Character testing: Insert take a look at information with worldwide characters (for instance, café, niño) on mainframe, export, and import to LUW and confirm show consistency utilizing GUI purchasers similar to:

  • DBeaver
  • DataGrip
  • IBM Knowledge Studio
  • IBM Database Administration Console

Conversion dangers: Spherical-trip conversions (EBCDIC → ASCII → EBCDIC) may lose character variants with out actual mappings. Confirm the outcomes utilizing IBM’s official CCSID tables.

Conclusion

Code web page, collation, and territory parameters in Db2 are immutable after database creation—in contrast to different database techniques. This makes the preliminary choice important for migration success.Key issues:

  • ISO-8859-1: Select for actual mainframe compatibility and 0 information loss. Use an Amazon RDS for Db2 outlined collation sequence to protect mainframe sorting conduct
  • ISO-8859-15: Take into account for Euro image help when migrating from Euro-enabled mainframe code pages (1141, 1390) and confirm Amazon RDS help
  • UTF-8: Choose for multi-language help, however pay attention to character enlargement and potential DDL modifications
  • Early planning: Deal with code web page and collation necessities early within the migration cycle to forestall information high quality points

Correct code web page choice prevents information truncation, manages character enlargement successfully, and helps guarantee constant software conduct throughout platforms.

Acknowledgements

Due to Chiranjeev Mukherjee, Hajime Minagawa, and Akira Shimosako for rigorously reviewing this weblog publish from the mainframe migration perspective and particularly for Japanese character units.


Concerning the authors

Vikram

Vikram is a Sr. DBE for Amazon RDS for Db2. Vikram has over 20 years of expertise in Db2. He enjoys creating new merchandise from the bottom up. In his spare time, he practices meditation and enjoys listening to podcasts.

Sumit Kumar

Sumit Kumar

Sumit is a Senior Options Architect at AWS and enjoys fixing complicated issues. He has been serving to clients throughout varied industries to construct and design their workloads on the AWS Cloud. He enjoys cooking, enjoying chess, and spending time along with his household.

Rajib Sarkar

Rajib Sarkar

Rajib is a Senior Database Engineer for Amazon RDS for Db2. Rajib has over 20 years of Db2 expertise.

Kshitij Sanghoi

Kshitij Sanghoi

Kshitij is a Sr. Software program Improvement Engineer and has over 15 years of IT expertise together with greater than 11 years at AWS.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles