Contents

Overview of meetings

A planning meeting of the European Ordering Rules Project Team (EOR PT) and an Ad hoc meeting on country codes and language codes, were held on the afternoon of 2 February 1998,

This last one was not directly related to the work of the EOR PT, except that it is possible that these codes might possibly be used in charts in documents produced by the EOR PT. This meeting is documented separately elsewhere.

The EOR PT meeting also included an Open Session to allow input from others particularly involved in multilingual sorting standardisation in CEN/TC304 and ISO/IEC JTC1/SC22/WG20 in particular.

The EOR PT meeting also continued on 3 September 1998.

The main actions of the EOR PT in Brussels would be to define its aims; to produce a Policy Statement, analogous to the one produced by the Project Team on Multilingual European Subsets, to derive a press release from this, for circulation to those that the EOR PT intended to solicit opinions from, and to draw up an action plan with target dates;

The aims and action plan were discussed in detail during the meeting: drawing up a policy statement and press release were drawn up later.

Roll call

In addition, for the Open section of the meeting, the following also attended:

Scope of work

For the area of its work, this meeting of the EOR PT would define its Aims, which would be:

  • to provide a language-independent sorting specification;
  • to provide an IT-enablement of that specification.
  • to make every effort to avoid conflict with any other standard on alphabetical ordering originating from CEN, ISO and ISO/IEC JTC1
  • to write a draft standard in plain English, providing a single default ordering of characters of the Multilingual European Subsets numbers 1 and 2 (MES-1 and MES-2)
  • to agree ordering for one script in relation to other scripts;
  • to agree ordering for Latin characters
  • to agree ordering for Greek characters
  • to agree ordering for Cyrillic characters
  • to agree ordering for Special characters
  • to be readable by humans, and to provide a single specification for implementors (one option only); to note that any variations are outside the scope of this standard;
  • to decide whether fonts have any relevance in sorting;
  • to provide sorting specifications at the character level only: specific meanings or uses, e.g. visual appearance, names or meanings of characters would have no effect on the basic character-by-character sorting specification.
  • to decide on word-by-word or letter-by-letter sorting.
  • to agree precedence of upper and lower case provision.
  • to agree the repertoire of basic letters (e.g. A-Z)
  • to agree on treatment of letters with diacritics
  • to agree on treatment of modified letters. e.g. thorn, b with hook, d with hook; dz with caron too??? and ae etc and ffi etc.
  • to decide whether modified characters should be filed after the unmodified one, or if the modified character should be treated as if it were the unmodified character. With letters like THORN, the problem is not for those who know what thorn is, it is for those who don't know what it is.
  • to agree methods of representing the glyphs or description of characters in tables, both in the printed drafts for comment and in email drafts for comment.
  • to meet user's expectations regarding multilingual sorting;
  • to achieve consensus with users
  • to list relationships and dependencies, and to liaise with, other CEN/TC304 projects, including:
  • to provide an informative annex regarding any extremely common variations on the default ordering (relate to information in the European Alphabets project and on information on ISO 12199???)
  • to provide an Informative annex, showing a list of common variations across more than one country; e.g. umlaut; y/ij; aa in Norwegian and Danish; Turkish i/I (12199 not elegant)
  • to define sorting in the MES-1 and MES-2; to extrapolate from this recommendations for input into sorting in the MES-3; and for sorting in other ISO and/or ISO/IEC JTC1 standards.
  • to achieve consensus from users at the earliest possible stage, and to ensure continued consensus.

    Action plan, with target dates, and methods of working

    Time frame: February through June,
    1. to develop a draft standard;
    2. to prepare documentation for any open meetings;
    3. to have email/web page communication as an initial consensus exercise; with an initial draft;
    4. 27-28 April 1998: reserved for meeting/electronic meeting of PT
    5. 1-5 June, Reykjavik: EOR PT Open meeting; resolving comments;
    6. email voting/comments on the draft sorting standard to CEN/TC304.
    This time frame is also based on comments from Ad Hoc Meeting on Organization of Project Teams, 2 February 1998).

    To hold initial planning meetings

    Håvard Hjulstad, Marc Küster, John Clews: in person in Brussels, 2-4 February 1998, and by email during the time of the EOR PT;

    Initial consultation exercise

    Håvard Hjulstad, Marc Küster, John Clews; Keld Simonsen; Johan van Wingen, Arnold Winkler; Mike Ksar, 2 February 1998;

    Secondary consultation exercise

    To hold initial consultation with others, in order to focus on aims, overlaps, and differences between the various input documents.

    This would be done by informing CEN/TC304 members of developments, and asking for general agreement/disagreement/comments; and also by sending documents and/or web-site details to relevant email lists.

    Allocation of tasks among PT members

    This would be agreed by PT members during the Brussels meeting. Håvard Hjulstad and Marc Küster would present a verbal report on initial plans on scope and working methods to the CEN/TC304 Plenary meeting.

    EOR PT Open meeting session

    List of possible input documents (expanded after Open session)

    Discussion of selected existing standards related to sorting

  • ISO FDIS 12199 (ISO/TC37/SC2)
  • ISO 999 (ISO/TC46)
  • Work in Unicode Consortium/Java community

    ENV draft on sorting

    ENV draft on sorting: Keld Simonsen had written earlier drafts on European sorting which (as well as other input documents) could have a valuable role as input to the new PT work, to avoid any reinvention of the wheel.

    This had fairly brief text, and several tables to show IT-enablement of sorting. The question was raised whether it may be possible to abbreviate these tables to show the general pattern rather than every element. There were complications in doing this, as some other standards were intended to reference this document.

    Discussion between CEN/TC304 and ELOT had enabled Greece to reduce their requirements to four levels of sorting rather than the original five levels

    As far as possible this work had also been coordinated with the development of ISO/IEC 14651. There were four levels of sorting:

    In addition there were Specific rules, e.g. for AE, and Sharp S. This defined a default order: the only variations allowed were for treatment of space to allow word-by-word or letter-by-letter sorting.

    ISO/IEC 14651

    Arnold Winkler as convenor of ISO/IEC JTC1/SC22/WG20 described ISO/IEC 14651, with additional input from Keld Simonsen.

    ISO/IEC 14651 has two main aims:

    1. fully deterministic sorting order
    2. to show expected ordering.

    The text of ISO/IEC 14651 focused on generalized sorting methods and APIs for sorting, with tailorability to specific requirements being a major feature. Sorting on diacritics allows sorting backwards or forwards in a word (French is the one major language where sorting backwards is used).

    This is followed by three tables (spread over 30 pages) covering letters, specials, and others, given in a POSIX-compatible syntax, and in a LOTOS specification in Annex D, supplied by Greece.

    Annex E provides test data for checking conformance, and Annex F provides an exam,ple (Danish) to show specific variations.

    Of the two draft standards, the European standard was simpler, with only a toggle on SPACE,

    Arnold Winkler agreed that the table alone was not easy to use for non-IT purposes. Alain La Bonte had been invited to the NCITS/L2 meeting (formerly ANSI X3L2) on character coding.

    Its goal is to have predictable output, but the options confuse this.

    ISO/IEC FCD 14651 was going for voting over April and May 1998, and ISO/IEC JTC1/SC22/WG20 would deal with the results in its next meeting in the beginning of June 1998.

    Arnold Winkler considered that ISO/IEC 14651 was not so much a sorting standard - but APIs for a very specific environment, with emphasis on pre-sorting etc.

    It would be useful to contact Alain La Bonte (editor of ISO/IEC 14651) with lists of sources where we do not agree at the earliest opportunity.

    ISO/IEC FCD 14651 can take into account combining sequences. There are no combining sequences in the MES. Sorting the repertoire is what we require. Decomposition is what you do in sorting.

    Keld Simonsen discussed IT-enablement of sorting - this was done via a POSIX specification, enhanced in ISO/IEC 14652, which provided tables to be fed into an iT implementation. For POSIX, LOTOS, and SGML formats: mappings are provided for ISO 19945 and 14652. Less time was spent on other alternative input documents:

    ISO FDIS 12199

    ISO FDIS 12199. This was known to all participants through liaison, and currently provided a default sorting order for Latin characters, with several appendices containing relevant information on sorting characteristics for different languages. This draft standard overlapped the work both of the EOR PT, and of Project 11 on Alphabets of European Languages. It had been developed by ISO/TC37/SC2, whose primary focus was terminology, but which (like ISO/TC46 committees) had attracted various language-related standards into its work programme.

    John Clews pointed out that with ISO 12199, the possibility had been raised with the ISO/TC37/SC2 Secretariat of suspending processing in the FDIS 12199. Since then the ISO/TC37/SC2 Secretariat has sent a letter to ISO Central Secretariat to request to be allowed to hold back ISO 12199 until its coordination with other related projects has been ensured.

    Did that possibility also exist for ISO/IEC 14651, now at the FCD stage? Arnold Winkler replied that anything was possible, but that one should try and contact the editor direct, or that one should ensure that national member body votes stated what was required, and vote NO with comments, stating that resolution of these comments would transform the national body's NO vote into a YES vote.

    ISO 999

    ISO 999 had been developed by ISO/TC46, ad its Section 8 contained some information on sorting, but this was in fact very general, and there was no need to spend time on this. There was another ISO/TC46 standard that covered sorting, but this was not available.

    Unicode

    No information on relevant work going on in the Unicode Consortium was available to the meeting.

    Additional comments by EOR PT on existing work on sorting

    For the CEN/TC304 PT on sorting, Håvard Hjulstad wondered if it might be possible just to reference ISO/IEC 14651? The options are referencing it or repeating it.

    The European draft standard could provide a list of characters in a sorted order, and ISO/IEC 14651 might be able to specify contiguous ranges of characters where the rest of the field was otherwise identical except for the character, ID or reference.

    However, for special characters (in particular symbols), perhaps the European standard could reference the tables in ISO/IEC 14651, and use the rest as an example?

    John Clews and Marc Küster considered that the CEN system has not enough access to enough experts in getting consensus on established sorting order, particularly among the academic community, and the library community, who would be the biggest users, with most experience, of existing multilingual sort orders.

    The academic community has already reacted adversely to UCS (ISO/IEC 10646 and Unicode) because of some major errors in the glyphs of the Unicode version of the Greek IOTA ADSCRIPT characters, although they are correct in the equivalent tables in ISO/IEC 10646.

    If there is insufficient consultation/consensus building the same adverse reactions might be applied to ISO/IEC 14651 by its intended users. Although there was consultation with national member bodies of CEN/TC304, there was insufficient contact with academic users and library users. ISO/IEC 14651 currently provided too prescriptive a sort order without having established sufficient contact with, and feedback from, existing multilingual users.

    The PT also undertook a brief run through some of the main input documents to determine which were most useful for the present work of the PT: this is not documented here.

    Author: John Clews
    2 March 1998 -- John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG tel: +44 (0) 1423 888 432


    EOR-Home