This last one was not directly related to the work of the EOR PT, except that it is possible that these codes might possibly be used in charts in documents produced by the EOR PT. This meeting is documented separately elsewhere.
The EOR PT meeting also included an Open Session to allow input from others particularly involved in multilingual sorting standardisation in CEN/TC304 and ISO/IEC JTC1/SC22/WG20 in particular.
The EOR PT meeting also continued on 3 September 1998.
The main actions of the EOR PT in Brussels would be to define its aims; to produce a Policy Statement, analogous to the one produced by the Project Team on Multilingual European Subsets, to derive a press release from this, for circulation to those that the EOR PT intended to solicit opinions from, and to draw up an action plan with target dates;
The aims and action plan were discussed in detail during the meeting: drawing up a policy statement and press release were drawn up later.
In addition, for the Open section of the meeting, the following also attended:
For the area of its work, this meeting of the EOR PT would define its Aims, which would be:
This would be done by informing CEN/TC304 members of developments, and asking for general agreement/disagreement/comments; and also by sending documents and/or web-site details to relevant email lists.
This had fairly brief text, and several tables to show IT-enablement of sorting. The question was raised whether it may be possible to abbreviate these tables to show the general pattern rather than every element. There were complications in doing this, as some other standards were intended to reference this document.
Discussion between CEN/TC304 and ELOT had enabled Greece to reduce their requirements to four levels of sorting rather than the original five levels
As far as possible this work had also been coordinated with the development of ISO/IEC 14651. There were four levels of sorting:
ISO/IEC 14651 has two main aims:
The text of ISO/IEC 14651 focused on generalized sorting methods and APIs for sorting, with tailorability to specific requirements being a major feature. Sorting on diacritics allows sorting backwards or forwards in a word (French is the one major language where sorting backwards is used).
This is followed by three tables (spread over 30 pages) covering letters, specials, and others, given in a POSIX-compatible syntax, and in a LOTOS specification in Annex D, supplied by Greece.
Annex E provides test data for checking conformance, and Annex F provides an exam,ple (Danish) to show specific variations.
Of the two draft standards, the European standard was simpler, with only a toggle on SPACE,
Arnold Winkler agreed that the table alone was not easy to use for non-IT purposes. Alain La Bonte had been invited to the NCITS/L2 meeting (formerly ANSI X3L2) on character coding.
Its goal is to have predictable output, but the options confuse this.
ISO/IEC FCD 14651 was going for voting over April and May 1998, and ISO/IEC JTC1/SC22/WG20 would deal with the results in its next meeting in the beginning of June 1998.
Arnold Winkler considered that ISO/IEC 14651 was not so much a sorting standard - but APIs for a very specific environment, with emphasis on pre-sorting etc.
It would be useful to contact Alain La Bonte (editor of ISO/IEC 14651) with lists of sources where we do not agree at the earliest opportunity.
ISO/IEC FCD 14651 can take into account combining sequences. There are no combining sequences in the MES. Sorting the repertoire is what we require. Decomposition is what you do in sorting.
Keld Simonsen discussed IT-enablement of sorting - this was done via a POSIX specification, enhanced in ISO/IEC 14652, which provided tables to be fed into an iT implementation. For POSIX, LOTOS, and SGML formats: mappings are provided for ISO 19945 and 14652. Less time was spent on other alternative input documents:
John Clews pointed out that with ISO 12199, the possibility had been raised with the ISO/TC37/SC2 Secretariat of suspending processing in the FDIS 12199. Since then the ISO/TC37/SC2 Secretariat has sent a letter to ISO Central Secretariat to request to be allowed to hold back ISO 12199 until its coordination with other related projects has been ensured.
Did that possibility also exist for ISO/IEC 14651, now at the FCD stage? Arnold Winkler replied that anything was possible, but that one should try and contact the editor direct, or that one should ensure that national member body votes stated what was required, and vote NO with comments, stating that resolution of these comments would transform the national body's NO vote into a YES vote.
The European draft standard could provide a list of characters in a sorted order, and ISO/IEC 14651 might be able to specify contiguous ranges of characters where the rest of the field was otherwise identical except for the character, ID or reference.
However, for special characters (in particular symbols), perhaps the European standard could reference the tables in ISO/IEC 14651, and use the rest as an example?
John Clews and Marc Küster considered that the CEN system has not enough access to enough experts in getting consensus on established sorting order, particularly among the academic community, and the library community, who would be the biggest users, with most experience, of existing multilingual sort orders.
The academic community has already reacted adversely to UCS (ISO/IEC 10646 and Unicode) because of some major errors in the glyphs of the Unicode version of the Greek IOTA ADSCRIPT characters, although they are correct in the equivalent tables in ISO/IEC 10646.
If there is insufficient consultation/consensus building the same adverse reactions might be applied to ISO/IEC 14651 by its intended users. Although there was consultation with national member bodies of CEN/TC304, there was insufficient contact with academic users and library users. ISO/IEC 14651 currently provided too prescriptive a sort order without having established sufficient contact with, and feedback from, existing multilingual users.
The PT also undertook a brief run through some of the main input documents to determine which were most useful for the present work of the PT: this is not documented here.
Author: John Clews
2 March 1998
--
John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 (0) 1423 888 432