SC22 N914
US position regarding the German NB proposal on UTF-16 datatype
in SC22 N3356
L2 has considered the proposal from the German NB for a new work item to add
a UTF-16 data type to the C standard SC
22  N 3356, . This was discussed in a meeting with C language
committee members during the L2 meeting on 2002-02-12. On the basis of that
discussion, L2 recommends that the US JTC 1 TAG adopt the following as the US
position:
  - The U.S. NB supports this new work item. Adding a UTF-16 datatype and
    string literal support to the C standard would greatly benefit implementers
    of Unicode / 10646 in making use of the C standard.
  
 - In particular, the following additions would be technically advantageous:
    
      - UTF-16 datatype.  Exactly 16 bits, to explicitly hold a
        Unicode / 10646 UTF-16 code unit.
      
 - UTF-16 string type. Linked explicitly with the UTF-16 datatype,
        so that static string initialization with UTF-16 data would be easy and
        explicit.
      
 - UTF-32 datatype. Exactly 32 bits, to explicitly hold a
        Unicode/10646 code point (without the cross-platform size ambiguity of
        wchar_t).
      
 - UTF-32 string type (optional). Linked explicitly with
        the UTF-32 datatype. This might be useful, but for most implementations
        is less important than having the UTF-16 string type.
    
 
   - Regarding the terminology to be associated with any such new datatypes for
    C, usage of "UTF-16" and "UTF-32" is preferred. The
    exact form of the names for new datatypes would, of course, be up to the C
    committee to determine, but names along the lines of "utf16_t",
    "utf32_t" or the like would be satisfactory.
    
      - It is advisable to avoid any terminological usage involving
        "UCS-2" and "UCS-4". The term "UCS-2"
        would be misleading, since it is the fixed-width 16-bit form of
        10646, limited only to the BMP, whereas all significant implementations
        are now moving to the variable-width UTF-16, to get all-plane
        support for 10646. Use of "UCS-4" is not parallel, and just
        induces a cognitive matching problem of converting from 4 octets to 32
        bits -- which is the more normal concept for a 32-bit datatype.
        Furthermore, the "16" and "32" are more normal
        concepts for C programmers dealing with datatype sizes.
 
    
   - The U.S. does not suggest adding any corresponding API's for the standard
    libraries, to match already existing API's relevant to char and wchar_t
    string types. Simply making the datatype additions listed in (2)
    above would meet the essential requirements that vendors have on the
    language to make their Unicode porting and other tasks simpler and more
    uniform. API support for Unicode semantics is, at this point at least, more
    appropriately provided by various third-party add-on libraries.
  
 - The U.S. considers it important that other language standards, and in
    particular, C++, take these issues into account, so that if a new datatype
    or datatypes are added to C, interoperability with other languages can be
    maintained as well.