Documentation of mc2xbm

by Helmut Richter


Purpose

mc2xbm translates Hebrew text from Michigan-Claremont encoding to a pixel file in xbm format (black and white only). Further processing, in particular the introduction of colour or the conversion to other formats like gif or jpg, must be done by means of appropriate graphics software.

The program is copyrighted but is available free of charge for non-commercial purposes. Write to the author to obtain a copy.

Warning: Due to the continuing proliferation of books, extensive studying may be hazardous to your health.
(The Bible: Ecclesiastes 12:12)


Command line options

-e text

Input text; overrides -i option. The text is read from the command line; only a single line of output is generated (19 pixels high).

-i file

Input file; may be overridden by -e option. The text is read from a file; one or more lines of output are generated (height is 20 pixels/line). If neither the -e nor the -i option is specified, the input is read from the standard input.

-o file

Output file (specify "-" to denote standard output). Must be specified, otherwise no output is produced. This option is ignored when the -d option is specified.

-d directory

Search the directory for a GIF image file which contains the desired pixel file. If none is found, produce one. Return the file name on the standard output in the format of an HTML image tag. This option requires that the convert command for the conversion of graphic file formats is installed on the system. This option is only suited for short texts that take up less space than a line.

-l file

Output file for word list. The pixel positions of all words are listed.

Currently, this option is not implemented.

-v n

Vocalisation level:

-v 0: No vocalisation marks are output.
-v 1: Only the following vocalisation marks are output:
Sin dot, Cholam, Shuruq, Dagesh (only in Bet, Gimel, Dalet, Kaf, Pe, and Taw)
-v 2: All vocalisation marks are output.

Currently, this option is not implemented, v=2 is used.

-c n

Cantillation level:

-c 0: No cantillation marks are output.
-c 1: Only the following cantillation marks are output:
Maqqef, Sof pasuq
-c 2: All cantillation marks are output.

Currently, this option is not implemented, c=1 is used.

-t file

Test output file; receives the input tokens that were processed.


Encoding of Hebrew characters

The input is processed in one of the following modes:

Normal mode

This is the initial mode. Hebrew characters and marks are processed according to the Michigan-Claremont encoding.

Strict mode

The processor switches to strict mode upon a "{" and switches back to normal mode upon a "}".

In strict mode, the typographic appearance of the letters is denoted in more detail. Basically, the Michigan-Claremont encoding is employed, but with the following modifications:

Strict mode is only needed if the way something is written deviates from the standard, e.g. a non-final Mem as last letter in an abbreviation like "M''{M}".

Latin mode

The processor switches to Latin mode upon a "[" and switches back to the previous mode upon an occurrence of a "]" that is not immediately followed by another "]". Latin mode may not extend over more than one line.

Example: "M[[X]]N" is interpreted as
  M (normal mode)
  [ (Latin mode)
  X (Latin mode)
  ] (Latin mode)
  N (normal mode).

Text in Latin mode represents text in Latin script according to the Latin1 character set (ISO 8859-1). The existence of this mode does not imply that all characters of the Latin1 character set are supported - in fact, very few are -; it is only a means to denote such characters if they are supported. Currently, the following characters are supported:

Comment mode

The processor switches to comment mode upon a "~" and switches back to the previous mode upon the next newline character. Text is not processed while in comment mode.


© Helmut Richter      published here 1998-03-02; last update 1998-12-29      http://www.lrz.de/~hr/lang/mc2xbm.html