In UnicodeData 3.0.1, the two entries read:
0598;HEBREW ACCENT ZARQA;Mn;230;NSM;;;;;N;;*;;; (230=above)
05AE;HEBREW ACCENT ZINOR;Mn;228;NSM;;;;;N;;;;; (228=above left)
This has obviously been changed since Unicode 1.0, but into the wrong direction, making the entries more consistently wrong now.
All sources other than code tables, that is, various grammars of Biblical Hebrew and Breuer's book on cantillation marks (Mordekhai Broyer (Breuer): Taamey hammiqra be-21 sfarim uvesifrey eme"t; Jerusalem, TShM"B (=1981)) which I took as ultimate referee, agree on the following:
The table of cantillation marks on page 867 of volume 18 of the Hebrew Encyclopedia contains only the accents of the 21 books and therefore no Tsinorit. Hence, it is not a source to resolve the confusion of Tsinorit with Zarqa=Tsinor. That "Tsinori" instead of "Tsinor" is given as synonym of Zarqa adds a little to that confusion but is not contradicting the above summary.
In contrast to these findings, Unicode (here following Israeli national standard SI 1311-2) makes a distinction between ZARQA and ZINOR (sic!) where ZINOR seems to play the rôle of Tsinorit, as the much more similar names suggest. This interpretation, to wit that ZINOR should have been TSINORIT, is also supported by the order of the accents: first all distinctive accents in decreasing strength, then the conjunctive accents; in each class first the accents for the 21 books (or for all books), then for the 3 books. From this order, one sees that U+0598 was intended to be a distinctive accent of medium strength in the 21 books - exactly what Zarqa is. One can safely conclude that ZARQA indeed means Zarqa=Tsinor and that ZINOR means Tsinorit. However, the glyph chart shows the two characters swapped, and the combining classes (whose impact on normalisation is minimal in this particular case) are in accordance with the glyph chart and not with the above interpretation of the character names. After Unicode 1.0, but before Unicode 3.0, a remark "=zinorit" was added to the character ZARQA, in conformance with the glyphs. Opinions are divided whether this remark is sufficient to safely refute the initial interpretation that ZARQA means Zarqa and ZINOR means Tsinorit. In any case, both interpretations have been taken for granted by several people in the recent discussion, so that it is not unfair to assert that there is a significant ambiguity left.
Here are the criteria according to which problems are identified and possible solutions are evaluated in this article:
Unicode (and also SI 1311-2) has the strategy that graphically equal, but semantically different characters are to be treated as the same character and not as distinct characters (Unicode 3.0, p.17), with a number of exceptions that do not apply here. This strategy is also followed with other cantillation marks: Tipeha=Tarha, Merkha=Yored, Meteg=Siluq, etc. Hence, Zarqa and Tsinor have to be treated as the same character. Whether Tsinorit is to be treated as distinct from Zarqa=Tsinor depends on whether one considers its different position relative to the base letter as a feature of the character or as a detail of the rendering process. As there are now two code points assigned, and there are glyph variants that are definitely wrong for Tsinorit but not for Zarqa=Tsinor, it would be a step backwards to unify all three.
Given a sequence of characters, the standard must specify unambiguously how to encode it.
However, there may be some unavoidable ambiguity when the standard specifies characters as distinct although they may have similar appearance in some renderings. Examples of such a situation could be:
It is not defined whether the encoding of an 18th century Latin-script text is required to preserve the distinction of long s (U+017F) and final s (U+0073) or whether these characters may both be represented as the unique s (U+0073) that is in use today.
Analogously, it is not defined whether an occurrence of Zarqa that is positioned like a Tsinorit (i.e. on top of the base letter) may be encoded with the same code as a Tsinorit, even though they are distinct characters according to criterion 1 ("code what is written, not what is meant", in the spirit of unification).
In such cases where absolute uniqueness of encoding cannot be achieved, it is at least required that the standard be unambiguous once the users of the standard have set up their policy how to treat the pertinent characters. In other words: ambiguities how to apply the standard in a given situation may be unavoidable, but there should not be ambiguities what the standard says.
Even though glyphs are not normative for characters, the use of glyphs in the standard must not violate the character identity (Unicode 3.0, p.40, item D2). This principle is currently violated: if one of the two glyphs now used for ZARQA and ZINOR denotes a Zarqa and the other does not, then it is unambiguously not the glyph of ZARQA.
Names of Unicode characters must no longer be changed (policy on the Unicode WWW site, also in the standard document?).
Changes of properties of Unicode characters are to be kept to a minimum (Unicode 3.0, p.73).
It is desirable but not mandatory that the set of cantillation marks in Unicode follow Israeli standard SI 1311-2. If not, the remark on p.187 of Unicode 3.0 has to be modified.
It is desirable but not mandatory that the cantillation marks appear in the code in an order which appears to be logical, given the semantics of the marks.
When we start with criterion 1 above, we find that two different Unicode character names, ZARQA and ZINOR, denote the same character and a unique name of a character to be encoded, TSINORIT, is missing as the name of a Unicode character. Because of criterion 4, this problem cannot be fixed. Similarly, criterion 3 cannot be completely fulfilled as it is impossible to give two character names a different identity and different glyphs when in reality they are names of the same character. A solution will consist in providing enough comment for the user that, despite the unavoidable inaccuracy of character names, criterion 2 is fulfilled, and to define the character identities so that criterion 3 is not too grossly violated.
If one takes the statement seriously that Unicode defines characters, not glyphs, in particular as expressed as principle D2, then one has to change the glyph to match the character name instead of leaving both the glyph and the character name as they are: If the glyph picture of "LATIN CAPITAL LETTER A" shows a "B", then the picture is wrong, and not "LATIN CAPITAL LETTER A" a somewhat outlandish name for a "B". In this spirit, this page contained originally a request to change one name (ZINOR->TSINORIT), and then, in order to comply with the policy not to change names, a request that the glyphs be restored to the correct order although the names cannot be straightened. The fact that the shown glyphs are already being implemented in fonts (so that they are de facto treated as normative) and that at least one name cannot be made entirely correct could be used as an argument to keep the two characters swapped as they are now in the glyph chart.
Now, there are four ways to proceed:
Place emphasis on character identity, interpret the character names in the most straightforward way (which is also the originally intended way), discard character properties and glyph charts: in other words enforce criteria 3 and 7 at the expense of criterion 5. This is solution 1 below.
Place emphasis on stability of character properties, in particular of the present glyphs and combining classes: in other words enforce criterion 5 at the expense of criteria 3 and 7. As the position of the mark relative to the base letter is not always the same, the definition of the characters can be worded so that criterion 3 is at least sometimes fulfilled for each of the characters. This is solution 2 below.
Solution 3 below is a minor variant of solution 2 in which the sometimes existing uncertainty about placement of the marks is not exploited to render the inaccurate character names more plausible. Rather, one of the character names is declared plain wrong.
The strongest way to achieve fulfilment of criterion 3 is to take one of the spurious character names out of the game by depracating the character and to introduce the missing name as a new character. This can be done in many possible ways. One of them is presented below as solution 4.
Initially, solution 1 was suggested here as only solution. Now, as an exact match between character names and character identity cannot be achieved anyway, solution 2 might be a fair compromise to strike the balance between the stability of the standard and the plausibility of the definitions contained therein. I leave it to the various standards bodies to find the solution they consider most consistent with the standards' policies. My personal preference is still with solution 1 (or even the more consistent solution 4), but much more important is that the standard become as soon as possible unambiguous also in the context of the users (people who encode text and process encoded text) and not only in the context of the font designers.
Change the relevant standard text on p.387 to become:
0598 HEBREW ACCENT ZARQA
= tsinor, zinor
05AE HEBREW ACCENT ZINOR
= tsinorit, zinorit
* despite the character name, this is not the accent Tsinor (=Zinor)
but Tsinorit (=Zinorit)
-> 0598 zarqa
Modify the combining classes of the characters to become
0598;HEBREW ACCENT ZARQA;Mn;228;NSM;;;;;N;;*;;; (228=above left)
05AE;HEBREW ACCENT ZINOR;Mn;230;NSM;;;;;N;;;;; (230=above)
This modification should not have any impact on the outcome of the normalisation as there is never more than one accent of these classes applied to the same base letter.
Correct the position of the marks in the charts to reflect this correction of combining classes. In other words: just swap the two pictures.
Change the relevant standard text on p.387 to become:
0598 HEBREW ACCENT ZARQA
= tsinor, zinor, tsinorit, zinorit
* must be used for the accent Tsinorit (=Zinorit)
* may be used for the accents Zarqa and Tsinor (=Zinor)
for explicitly specifying that the accent is to appear above
the letter like a Tsinorit
-> 05AE zinor
05AE HEBREW ACCENT ZINOR
= zarqa, tsinor
* should regularly be used for the accents Zarqa and Tsinor (=Zinor),
except for explicitly specifying that the accent is to appear above
the letter
-> 0598 zarqa
Change the relevant standard text on p.387 to become:
0598 HEBREW ACCENT ZARQA
= tsinorit, zinorit
* despite the character name, this is not the accent Zarqa but Tsinorit (=Zinorit)
-> 05AE zinor
05AE HEBREW ACCENT ZINOR
= zarqa, tsinor
-> 0598 zarqa
Change the relevant standard text on p.387 to become:
0598 HEBREW ACCENT ZARQA
= tsinor, zinor
05A2 HEBREW ACCENT TSINORIT
= zinorit
05AE HEBREW ACCENT ZINOR
* depracated
-> 0598 zarqa
-> 05A2 tsinorit
Modify the combining classes of the characters to become
0598;HEBREW ACCENT ZARQA;Mn;228;NSM;;;;;N;;*;;; (228=above left)
05A2;HEBREW ACCENT TSINORIT;Mn;230;NSM;;;;;N;;;;; (230=above)
This modification (only of ZARQA) should not have any impact on the outcome of the normalisation as there is never more than one accent of these classes applied to the same base letter.
Correct the position of the marks in the charts to reflect this correction of combining classes.
In the case that a vowel is represented in vocalised text by both a vowel point and a consonant (a mater lectionis), Unicode 3.0 fails to define the order of these two characters. In nearly all cases, this order is evident from the typographical appearance which is the same as if one of the consonants had no vowel point. In the case of the combination of Holam and Vav, however, there is a need to define the intended sequence. Whatever the desired sequence of Unicode characters, it has an influence on the definition of character VAV WITH HOLAM (U+FB4B). Example:
Is the word "shalom" to be spelt asSHIN + SHIN DOT + QAMATS LAMED + HOLAM VAV FINAL MEMor asSHIN + SHIN DOT + QAMATS LAMED VAV + HOLAM FINAL MEMand isSHIN WITH SHIN DOT + QAMATS LAMED VAV WITH HOLAM FINAL MEMequivalent?
A good place to insert such additional clarification into the Unicode standard is the paragraph near the end of p.186 which begins with "Vowels". It could be enhanced with the following explanation appended. In its wording, the same strategy was followed as with other scripts explained in chapters 7 to 11 of the standard: the principles of the script are explained in a bit more detail than is needed for readers that are already acquainted with the script:
These vowel points are used in liturgical texts including the Bible, in poems, in dictionaries, and whenever the exact vocalisation must be uniquely specified. In most other texts, they are omitted. Independently of the presence of vowel points, vowels are frequently represented by the letters U+05D0 HEBREW LETTER ALEF, U+05D5 HEBREW LETTER VAV, U+05D9 HEBREW LETTER YOD, and, restricted to the end of a word, U+05D4 HEBREW LETTER HE. When vowel points are present, they do not only denote vowels that are not represented by one of these letters, but they also determine which occurrences of these letters serve as substitutes for vowels, and if so, for which vowels. A vowel may thus be represented by both a letter and a vowel point. In case of the vowel shuruq, i.e. the vowel /u/ in an open or a word-final syllable, the vowel point U+05BC HEBREW POINT DAGESH OR MAPIQ is applied to the vav which acts as a substitute for the vowel. In all other cases the vowel point is applied to the preceding consonant, and the letter representing the vowel remains without vowel point.
If this interpretation is not wanted, the text has to be modified to read:
These vowel points [...] both a letter and a vowel point. If the letter is vav, the vowel point (either U+05B9 HEBREW POINT HOLAM or U+05BC HEBREW POINT DAGESH OR MAPIQ) is applied to the vav. In all other cases the vowel point is applied to the preceding consonant, and the letter representing the vowel remains without vowel point.
The first of these alternatives requires that the definition of VAV WITH HOLAM (U+FB4B) be changed to HOLAM + VAV; otherwise the character is of no use.
On the other hand, the second of the alternatives has the consequence that HOLAM may be applied to the last character of a word so that, theoretically, it interferes typographically with an accent that is positioned left above the word like a Zarqa (whatever the Unicode name of that accent will become). Practically, it does not interfere, as such a Holam is written on top of the Vav and thus right of the accent, as also the combining class defines.
The pictures in the glyphs in the glyph chart are unclear for the following accents. Here is what they should look like:
The accent SEGOL (U+0592) consists of three dots that are close together and not scattered around the top of the base character. Take the picture of the point SEGOL (U+05B6), rotate the whole picture by 180°, and shift the accent into an upper left position like U+0599.
SHALSHELET (U+0593) consists of three complete < signs touching each other, not of two and a half.
DARGA (U+05A7) has sharp corners; it looks like a mirrored Z, not like an S.