GML Tag Notes
From Open Watcom
Contents |
Introduction
This page is intended to accumulate notes on how wgml 4.0 uses the various tags. The information presented here should be used to update the WGML Reference so that it actually describes wgml 4.0.
Since I have been working with those tags used with the device library, the entries will include and may well be dominated by those tags. It might be wondered how the information here differs from that presented in discussing the device library.
The pages which deal with the device library are concerned primarily with where the tags appear in the source file and how they are encoded in the binary file. This page is intended to discuss how those tags (or, more correctly, the blocks defined by these tags) are used by wgml 4.0.
:BINCLUDE and :GRAPHIC
These notes are being accumulated prior to implementing these tags.
These tags have several interesting characteristics.
For example, they are very hard to test. When tested with the PS device, the file will not display, because the content of the specified input file is output as-is (at least, it is when :BINCLUDE is given a record type), and that content for files in general does not consist of valid PS language statements. When used with TASA, it becomes clear that :GRAPHIC, used with a non-PS text file, doesn't output any text at all, but just leaves the space where the content is to appear.
Binary Data
Term binary data generally means that any byte value may appear in the file. In particular, it means that byte values below 0x20 may appear in the file, including several which can cause problems if the data is processed as text. When UTF-8 is considered, the escape characters for multi-byte characters at the upper end of possible byte values can also be problematic.
Some binary files encode the binary data so that it can be processed as text. The phrase true binary data will be used to exclude such data, since it does not pose the same problems.
A .BMP file was used as a sample of true binary data. Although it can not be guaranteed that absolutely every byte value appeared in it, it was possible to verify that many values below 0x20 did (including 0x0d and 0x0a). It should be noted that the WGML Reference does not provide any reason to believe that wgml 4.0 can actually process BMP files for inclusion in a document.
When used as the first item producing output in a document specification with tag :BINCLUDE using no record type, these effects were observed:
- The first line contained one space character, which indicates "newline 1" and a blank line.
- The second line started with +, which indicates an overprinted line, plus ten spaces, which is the correct number of spaces for a one-inch margin.
- The data then followed, broken up into 80-byte groups.
- A CR/LF (0x0d 0x0a) sequence was inserted after each 80-byte group.
- A CR/LF (0x0d 0x0a) sequence was inserted after the end of the data in the file.
- This was followed by the next (first) text line, which was also indented by one inch.
Since, if I am interpreting the BMP format correctly, the first block splits a Palette entry between the second and third byte, it seems reasonable to conclude that this behavior can be very disruptive to any file containing true binary data.
When used as the first item producing output in a document specification with tag :BINCLUDE using a record type of "(t:)", these effects were observed:
- The first line contained one space character, which indicates "newline 1" and a blank line.
- The second line started with +, which indicates an overprinted line, plus ten spaces, which is the correct number of spaces for a one-inch margin.
- The data then followed, up to (in this case) the first 0x0a character.
- Every 0x0a is replaced by a CR/LF (0x0d 0x0a) sequence.
- This was followed by the next (first) text line, which was also indented by one inch.
Replacing every 0x0a character with "0x0d 0x0a" is also unlikely to promote proper use of the data.
There are a few additional observations to make:
- If device TERM is used, then output halts and this message is produced:
IO--004: System message is 'No space left on device'
Error number is 12
Output operation failed
- If device PS is used, then output halts and this message is produced:
IO--011: Output file's record size is too small for
the device 'ps'
- The remaining devices tested (mostly TASA, but also HELP and WHELP as a check) reported no such problems, yet each of them has an output file record size which is certainly too small for the amount of data produced, especially when a record type was specified (81,383 bytes from the start of the second line to the end at the first 0x0a byte).
But perhaps :BINCLUDE is not intended for use with files containing true binary data; that is, perhaps it is intended for use only with files whose content can be treated as text.
When used as the first item producing output in a document specification with tag :GRAPHIC, these effects were observed:
- For character devices (TASA, TERM, HELP, WHELP) the space occupied by the image is reserved, but the image data does not appear in any form (this is documented behavior with devices not supported by :GRAPHIC).
- If device PS is used, then output halts and this message is produced:
IO--011: Output file's record size is too small for
the device 'ps'
- Use or omission of a record type had no effect.
The WGML Reference has this to say about tag :GRAPHIC:
If the image file is not a PostScript graphic, a special validity check is performed on the file to determine if it is a WATCOM GKS PXA image file. If it is not a PXA file, it is assumed to be a PostScript graphic file. PXA files are supported with PostScript, HP LaserJet Plus, and IBM PC Graphic printers, although grey scales are only supported with a PostScript device.
This suggests that :GRAPHIC, at one time, did process a graphics file, although whether the PXA file contained true binary data or not is unknown.
Some information on GKS is available: the file DOCS\DOC\FG\fgkslib.old. It was (essentially) a 16-bit DOS TSR ("Graphics Kernal System") which (as other programs did at the time) was oriented toward displaying graphics on as many graphics adapters (Hercules, Tandy, IBM PC, whatever) as possible, and which included an internal device, the Pixel Array, which produced PXA files. Since the same function that produced PXA files also produced metafiles (MET), it is likely that PXA was a true binary format. This one file (DOCS\DOC\FG\fgkslib.old) is the only trace found in the repository: neither GKS, nor source for GKS, nor any PXA files can be found. It is, of course, possible that the DOS graphics package was developed from GKS.
Since :BINCLUDE and :GRAPHIC are only used with device PS in the Open Watcom Document Build System, and searching the repository shows that they are invariably used with PS (or EPS) files, examining those files may be useful. There are two sets, both of which contain files which are definitely used:
- Those created by bmp2eps and stored in DOCS\PS\TMP.
- Those stored in DOCS\DOC\GML, created by various programs.
- Those created by CSG Graphics Screen Capture and stored in DOCS\DOC\LR\GP.
The second group consists in:
- light2.eps, created by CSG Bitmap to EPS Converter
- ltning.eps, created by FreeHand 7.0
- owlogo.eps, created by Corel Draw
- pwrs.eps, named "UNTITLED.CDR from CorelDRAW!"
(to be continued, need to verify if any second group files are used other than ltning.eps & see what sort of data they contain)
:BINCLUDE
The attributes for :BINCLUDE are all required. They are used in this way:
- attribute file provides the name of the input file to process;
- attribute depth provides the vertical space the contents of the input file will occupy when those contents are processed; and
- attribute reposition advises wgml whether or not it needs to reposition the print position.
Note that attribute reposition is device-specific, which is an unusual feature for a tag.
When the input file to :BINCLUDE does not have a record type, the effect is the same as if the output record type was "(t:46)" instead of whatever it actually is. The WGML Reference describes this effect in these words:
The required attribute file specifies the name of the file to include. The value of the attribute is a character string, and may be any valid file name. The input file is processed as containing binary data. If the input is text data, a record type such as "(t:80)" must be prefixed to the file name.
How, exactly, "processed as containing binary data" equates to "pretending the output file has record type "(t:46)" is unclear. ***this needs to be verified, see results above with .BMP file***
If the record type used is "(t:0)", then the following error appears:
IO--001: For file '(t:0)julia256.bmp'
System message is 'No such file or directory'
Cannot open file
That is, the record type is treated as part of the file name instead of being detached and (in the case shown) "julia256.bmp" being passed to the operating system as the file to open.
These forms of the record type all appear to work identically (research continues):
- (t)
- (t:)
- (t:1)
- (t:120) or any other positive integer of reasonable size
There is no indication that the input records are truncated, no matter how long they may appear to be.
Testing with TASA clarified the role of "start" and "end" as values for the :BINCLUDE attribute reposition:
- When the value "start" is given, :BINCLUDE emits blank lines equal to the value of attribute depth after the content of the input file.
- When the value "end" is given, :BINCLUDE only emits the content of the input file.
Of course, with a device like TASA, the correct value will be "end", since the data itself will cause the print head to move down the page. If, as seems likely, the current vertical position is updated by the value of attribute depth, then this value must be correct or the actual position of the print head and the position used internally will differ. With PS, the value is always "start", which should result at most in an :ABSOLUTEADDRESS invocation to reset the print head to the proper position. In this case, an incorrect value for depth will have visible effects on the output: missing lines of the image, or undesired blank space at the top. At least, that is how the situation appears at the moment; testing continues.
Tag :BINCLUDE occurs inside several macros (chap, preface, h1); it is only used with various "rule" files, which print the lines at the top of the first page of a major section and, since those lines appear, it is most likely being used in the Open Watcom documents. One peculiarity was noted: rule6x8.eps does not have a newline at the end of its second line, and that line does not appear in the output file. At least, that is true of the files in docs\doc\gml. The version of rule.eps in docs\gml\help fails in a different way: the last operator ("restore") is joined with the next token ("1000", that is, a one-inch margin) and the result ("restore1000") is not recognized by the GhostScript parser. This can be fixed by placing a space after "restore" on the last line. However, this is not generally true: the last lines in the files in docs\doc\gml do not have trailing spaces, and yet no such problems occur. It seems unlikely that this is the intended behavior of :BINCLUDE.
Tag :BINCLUDE can occur after the start of a line, that is, it can be preceded by document text. (testing continues)
:GRAPHIC
Tag :GRAPHIC occurs both inside macros and explicitly in some files.
Tag :GRAPHIC can occur after the start of a line, that is, it can be preceded by document text. (testing continues)
Implementation
At this time, this section contains the remaining pre-implementation notes. Eventually it will discuss the implementation.
At this point, several puzzles need to be solved:
- Although the PS manual suggests that the data can be pure binary data, the actual eps files are either text (albeit with the character codes expressed as two-hex-digits) or something called ASCII85. It is, thus, not clear that PS can actually handle, or wgml 4.0 actually process, binary data as such.
- The use of :BINARY and :GRAPHIC on the same file produces radically different results, apparently because :GRAPHIC encapsulates the image and controls its size and position while :BINARY does not. :BINARY does, however, produce a PS file that suppresses all further images and/or text from actually appearing.
- The behavior of :GRAPHIC in mid-line must be seen to be believed. As noted for aspects of how :BINARY behaves, it is hard to believe that this is the intended behavior.
... to be continued
:FONTPAUSE
The WGML Reference states in section 15.10.3.3 FONTPAUSE Attribute:
The fontpause attribute specifies a character value which is the font pausing method to be used when switching into the font.
and in section 15.10.5 FONTPAUSE Block:
In some cases, the font switch may require physical intervention at the output device by the operator. Examples of such an intervention would be changing a print wheel or color ribbon.
This section uses terminology discussed here to describe the various blocks.
With my test device and test driver files I used ten test font files. For each of these fonts, a separate :FONTPAUSE was defined. Since the fonts were numbered from "01" to "10", the :FONTPAUSE blocks were named (that is, had for their value of attribute type) "pause01" through "pause10". These fonts were used in specific contexts:
- "01" through "06" were used with :DEFAULTFONT blocks 0 through 5 (and so paired with font styles "plain", "bold", "uline", "uscore", "ulbold", and "usbold" respectively).
- "07" and "08" were used with the :BOX and :UNDERSCORE blocks, respectively.
- "09" and "10" were reserved for used with the command-line FONT option.
Each :FONTPAUSE block was configured to identify itself when interpreted by wgml 4.0. The "pause02" block was also used to contain the function sequences being tested; a "FONT" line in default.opt, was used to vary the font used with the :DEFAULTFONT 0 (and so interpreted first) between font "01" and font "02", which aided in testing.
The results reported here can be expanded to show which :FONTPAUSE corresponded to which intance:
The output for the two situations with minimal :FONTPAUSE blocks was:
instance "pause01" is first "pause02" is first 1 pause01 pause02 2 pause04 pause04 3 pause08 pause08 4 pause04 pause04 5 pause08 pause08 6 pause04 pause04 7 pause01 pause02 8 pause02 pause02 9 pause01 pause02 10 pause02 pause02 11 pause01 pause02
The variation between "pause04" and "pause08" is the result of using the corresponding :DEVICEFONTs for the :FONTSTYLE with the value "uscore" for its attribute type ("pause04") and for the value of attribute font in the :UNDERSCORE block ("pause08").
As discussed here, the only change occurs when both fonts (those using "pause01" and "pause02" in the first column and those using "pause02" in the second column) use the style "plain": the last two "pause02" lines disappear, a result of the fact that font style "plain" only requires one line pass while font style "bold" requires two. This, of course, means that when a :FONTPAUSE is interpreted depends not only on the :DEVICEFONT it is associated with and the font switching process but also on the font style it is associated with in the :DEFAULTFONT block, making the description quoted above not quite complete. Of course, the font style does this by requiring multiple line passes, which in turn require additional font switches, so the description is correct as far as it goes.
The discussion here also notes that a :FONTPAUSE will be interpreted, in some instances, even when the :FONTSWITCH blocks are not. One of those situations, as might be expected, is that the fonts being switched are the same font. The problem is that they can be associated with different font styles.
If the example given above of manually changing the ribbon, so that, for example, :FONTSTYLE "bold" prints text in red while :FONTSTYLE "plain" does not, then associating the same :FONTPAUSE with both :FONTSTYLE instances is going to cause problems: the operator will not be able to tell whether to change the ribbon or not.
The only tool available to distinguish between the two :FONTSTYLE instances is the device function %font_number(). Unfortunately, the command line option FONT can remap both the font and the font style assigned to a given :DEFAULTFONT and so to a particular %font_number(). What is really needed is a %font_style() function, but none exists.
The net effect is that, if a :FONTPAUSE is needed, it may be a very bad idea to use the corresponding font (that is, :DEVICEFONT, which maps the font name to the font pause) with more than one :FONTSTLYE or more than one :DEFAULTFONT (which maps the :DEVICEFONT to a :FONTSTYLE), depending on just what the :FONTPAUSE is intended to accomplish.
Implementation Notes
The :FONTPAUSE block occupies a very odd position: there is no need to implement it at all, since it is not used in any :DEVICE block known to me; and yet it is so useful in analysing the use of the :FONTSWITCH and :FONTSTYLE blocks that, inevitably, it's implementation in wgml 4.0 is also made quite clear:
The :FONTPAUSE block is interpreted whenever a font switch is called for. When the font switch occurs, then the :FONTPAUSE is interpreted after the :ENDVALUE block of the font being switched from (if any) and before the :STARTVALUE block of the font being switched to; even if the font switch does not actually occur, the :FONTPAUSE block is still interpreted.
The situations in which a font switch does not actually occur when called for are discussed here.
:FONTSTYLE
This block is not documented in the WGML Reference. As a result, a detailed examination of how it and each of its sub-blocks is used is unavoidable. This may take some time to assemble and organize properly.
The :STARTVALUE Block
This section discusses the :FONTSTYLE block :STARTVALUE block. The :LINEPROC block also has a :STARTVALUE block; it is discussed in its own section.
There is reason to believe that the :FONTSTYLE block :STARTVALUE and :ENDVALUE blocks are not intended to be used as an ON/OFF pair: when an extremely simple test file, one containing nothing but text organized into paragraphs with the :P. tag (that is, no header, no title, no TOC, no index, no footers, no markup), was processed and examined, the :FONTSTYLE block :ENDVALUE block never appeared. The :FONTSTYLE block :STARTVALUE block, on the other hand, appeared at the start of each text line.
The :FONTSTYLE block :STARTVALUE block is always followed immediately by the :LINEPROC block :STARTVALUE block. The :LINEPROC block :STARTVALUE block is usually preceeded immediately by the :FONTSTYLE block :STARTVALUE block; the exceptions are discussed here.
The :FONTSTYLE block :STARTVALUE block appears in these contexts:
- As part of the action of device function %enterfont().
- As part of the normal font switch sequence (but not of the alternate font switch, used with device functions %ulineon()/%ulineoff()).
- As part of the first line pass font style application sequence, when a font switch is not required.
- As part of the subsequent line pass font style application sequence, when a font switch is not required.
- As part of the alternate font style application sequence, when a font switch is not required.
The :ENDVALUE Block
This section discusses the :FONTSTYLE block :ENDVALUE block. The :LINEPROC block also has an :ENDVALUE block in the :LINEPROC block; it is discussed in its own section.
The :ENDVALUE block occurs in these contexts:
- During a font switch.
- During a subsequent line pass, under certain conditions.
It does not appear in this context:
- The current font style is the last (or only) font style used in the text_line.
It is, of course, this fact that prevents the :FONTSTYLE block :STARTVALUE and :ENDVALUE blocks from being used as an ON/OFF switch.
The context in which it is interpreted differs in the two cases listed above:
- During a font switch, the :ENDVALUE block of the font style associated with the font being switched from is interpreted in the context of the font being switched to, with which it may or may not be associated (nothing prevents two :DEFAULTFONT blocks from associating the same font style with two different fonts).
- During a subsequent line pass, the :ENDVALUE block is interpreted outside of a font switch, and then it is interpreted in the context of the font it is associated with.
This suggests that the :ENDVALUE block should not depend on any of the device functions which return values associated with the current font.
Usage Notes
It is reasonably clear from the two prior sections that the :FONTSTYLE block STARTVALUE and :ENDVALUE blocks are not, in fact, used by wgml 4.0 as an ON/OFF switch.
On the other hand, the :DRIVER block in HELPDRV.PCD in the Open Watcom repository does this in font style "bold":
- the :STARTVALUE block emits "0x1b" followed by "b", and
- the :ENDVALUE block emits "0x1b" followed by "p"
which certainly looks like an ON/OFF switch switching the style to "bold" and back to "plain".
This can only work if the targeted device does not require that the :FONTSTYLE block STARTVALUE and :ENDVALUE blocks be used as an ON/OFF switch to function properly. Possible examples of how a device might do this are:
- the device resets itself to its default state at the end of each line; or
- the device has no memory: it can process these codes repeatedly and the effect is exactly the same as if it processed them once.
The :LINEPROC Block
These blocks define exactly what actions the device is to take to implement the font style. Each :LINEPROC block defines the actions to take on one specified line pass.
No :LINEPROC Present
The :LINEPROC block is entirely optional; if none is present, then wgml 4.0 behaves exactly as if this :LINEPROC block was present:
:LINEPROC
pass = 1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
Empty :LINEPROC Instances
At the very end of this section, it is noted that a :LINEPROC of this form:
:LINEPROC pass = 1 :eLINEPROC
is accepted and compiled by gendev 4.1 as if it contained an :ENDVALUE block with no device functions present.
When a font style using such a :LINEPROC block is used by wgml 4.0, however, the result is this message:
Abnormal program termination: Memory protection fault
regardless of which line pass it is assigned to.
Examination of the output file shows that wgml 4.0 does not produce this error until it reaches the line pass affected while printing out the text to which the font style is being applied.
Sub-block Usage
Since each :LINEPROC block must contain at least one sub-block, and since, as discussed here, each sub-block must contain at least one device function, the discussion now turns to the various sub-blocks, starting with an overview.
The :LINEPROC block contains five sub-blocks. Examination of the test documents show that, when a text_chars instance is being processed, they generally appear in these positions:
- The :STARTVALUE block and :FIRSTWORD block appear either before the first text_chars instance of the line or before the first text_chars instance with a new value for field font_number is processed. If the :FIRSTWORD block is not defined, then the :STARTWORD block appears in its place. These blocks appear before the :STARTWORD block as such.
- The :STARTWORD block appears before each text_chars instance which does not follow a font switch, even if this results in it appearing twice in a row because the :FIRSTWORD block is not defined.
- The :ENDWORD block appears after each text_chars instance.
- The :ENDVALUE block appears after the last text_chars instance (and after the :ENDWORD block).
To be specific, blocks which appear before the text_chars instance appear before the spaces (or :HTAB block or :ABSOLUTEADDRESS block) used to position the print head at the point where the first non-space character is to appear, unless, of course, device function %dotab() is involved.
From this, it appears that three ON/OFF switches exist:
- The first pairs :STARTVALUE with :ENDVALUE, and applies to each set of consecutive text_chars instances with the same value for field font_number.
- The second pairs :FIRSTWORD with :ENDVALUE, and applies to each set of consecutive text_chars instances with the same value for field font_number.
- The third pairs :STARTWORD with :ENDWORD, and applies to most text_chars instances (those whose associated font style defines a :FIRSTWORD block and which follow a font switch are the exception).
The above reflects two observed rules:
- If the :FIRSTWORD block is not defined, then the :STARTWORD block appears in every context where the :FIRSTWORD block appears when it is defined, without known exception.
- When a font switch occurs, there is no :STARTWORD block (if a :FIRSTWORD block exists) or no second :STARTWORD block (if no :FIRSTWORD block exists).
It does not matter if the :FIRSTWORD block consists entirely of "%image('')", which produces no output or side effects of any kind; the only requirement is that it be defined.
The only exception to the second rule involves the drawing of the top line using :BOX block characters with tag :FIG and the Index (that is, in these cases no font switch occurred, and in no case did the second :STARTWORD block appear in drawing such lines). This probably means that the :STARTWORD block, as such, is not used when drawing horizontal or vertical lines using the characters defined in the :BOX block.
The :FIRSTWORD block and :ENDVALUE block are regularly used to implement underlining, that is, where every character in the affected phrase (but not any preceding whitespace) is underlined, including internal spaces.
The :STARTWORD block and :ENDWORD block are regularly used to implement underscoring, that is, where every non-space character in the affected phrase is underlined, but whitespace is not.
Considering the above information, these suggestions might be made:
- When a :FIRSTWORD block is called for, if no such block is defined, then a :STARTWORD block is used instead. This suggests that :LINEPROC blocks which actually implement underscoring should not define a :FIRSTWORD block.
- If a :FIRSTWORD block is defined, then, in some cases, no :STARTWORD block will appear. This suggests that :LINEPROC blocks which actually implement underlining should not define a :STARTWORD block.
- If no :FIRSTWORD block is defined, then, in some cases, the :STARTWORD block will be interpreted twice in succession. This suggests that the :STARTWORD block, if defined, should be defined in such a way that it can be interpreted twice in succession without causing problems for the device.
The :STARTVALUE Block
As shown here, this block is the only place where device function %textpass() may be used; as noted here, whether or not that function is present determines whether or not the output text actually appears in the output file.
The :LINEPROC block :STARTVALUE block is usually preceded immediately by a :FONTSTYLE block :STARTVALUE block; known exceptions are:
- In some cases, as part of the preparation for the first text line, as discussed here.
- In some cases, as part of drawing a box using the characters in the :BOX block when processing tag .FIG, as discussed here.
The :LINEPROC block :STARTVALUE block is always followed immediately by the :LINEPROC block :FIRSTWORD block. Furthermore, the :LINEPROC block :FIRSTWORD block only appears when immediately preceded by the :LINEPROC block :STARTVALUE block.
Device function %ulineon() can also be placed in this block. The effect, as shown by the tests done so far, is indistinguishable from placing device function %ulineon() in the :FIRSTWORD block instead.
Unlike the :FIRSTWORD block, a :FONTSTYLE block which differs from the overprint "uscore" :FONTSTYLE block discussed below only in that the line pass 2 :LINEPROC block has a :STARTVALUE block works normally, i.e., the first word is underscored.
This block is interpreted at the start of each text_chars instance which has a value for the field font_number which is different than the value in the prior text_chars instance. However, there is at least one context in which it is intepreted at the start of each text_chars instance, as discussed at the end of this section; although the appearance of the :STARTVALUE block is not mentioned, it does in fact appear each time just as the :ENDVALUE block does. A closer examination of this issue will eventually be done.
The :FIRSTWORD Block
The :LINEPROC block :FIRSTWORD block only appears when immediately preceded by the :LINEPROC block :STARTVALUE block. Furthermore, the :LINEPROC block :STARTVALUE block is always followed immediately by the :LINEPROC block :FIRSTWORD block.
As shown here, this block can contain device function %ulineon(); indeed, the overprint :FONTSTYLE "uline" discussed below does exactly that.
However, device function %ulineon() can also be placed in the :STARTVALUE block. The effect, as shown by the tests done so far, is indistinguishable from placing device function %ulineon() in the :FIRSTWORD block.
A :FONTSTYLE block which differs from the overprint "uscore" :FONTSTYLE block discussed below only in that the line pass 2 :LINEPROC block has a :FIRSTWORD block results in the first word not being underscored. As noted at the end of the section on sub-block usage, it it generally best to implement only one of the :FIRSTWORD and :STARTWORD blocks.
This block can is also allowed to contain device function %ulineoff(). Since device function %ulineoff() must be preceded by device function %ulineon() in the same :LINEPROC block, the :STARTVALUE block must contain %ulineon() or gendev 4.0 will not process the source file.
When a :FONTSTYLE block with a line pass 2 :LINEPROC block with a :STARTVALUE block containing device function %ulineon() and a :FIRSTWORD block containing device function %ulineoff() is tested, then the result is:
- If the %ulineon() function is preceded by %dotab(), then the initial horizontal positioning (left margin) is output. Nothing else appears, although various :LINEPROC block sub-blocks are interepreted.
- If the %ulineon() function is not preceded by %dotab(), then nothing whatsoever appears on the second line pass, although various :LINEPROC block sub-blocks are interpreted (that is, a second line pass does occur).
This block is interpreted at the start of each text_chars instance which has a different value for field font_number than the previous text_chars instance had. However, there is at least one context in which it is intepreted at the start of each text_chars instance, as discussed here. A closer examination of this issue will eventually be done.
The :STARTWORD Block
As shown here, this block can contain device function %ulineon(); indeed, the overprint :FONTSTYLE "uscore" discussed below does exactly that. And, if by "uscore" is meant a font style which underscores words but not spaces, then the :STARTWORD block is where device function %ulineon() needs to be.
This block is also able to contain device function %ulineoff(). Three cases exist, and the results are recorded here.
When a :FONTSTYLE block with a line pass 2 :LINEPROC block with a :STARTVALUE block containing device function %ulineon() and a :STARTWORD block containing device function %ulineoff() is tested, and no :FIRSTWORD block is present, then the result is:
- If the %ulineon() function is preceded by %dotab(), then the initial horizontal positioning (left margin) is output. Nothing else appears, although various :LINEPROC block sub-blocks are interepreted.
- If the %ulineon() function is not preceded by %dotab(), then nothing whatsoever appears on the second line pass, although various :LINEPROC block sub-blocks are interepreted (that is, a second line pass does occur).
When a :FONTSTYLE block with a line pass 2 :LINEPROC block with a :STARTVALUE block containing device function %ulineon() and a :STARTWORD block containing device function %ulineoff() is tested, and a :FIRSTWORD block is present, then the result is:
- If the %ulineon() function is preceded by %dotab(), then the first word (only) is underscored.
- If the %ulineon() function is not preceded by %dotab(), then the initial horizontal positioning (left margin) and the first word (only) are underscored.
These results also occur when the %ulineon() (with or without preceding %dotab()) is in the :FIRSTWORD block rather than the :STARTVALUE block.
This block is interpreted at the start of each text_chars instance, except as documented here.
The :ENDWORD Block
As shown here, this block can contain device function %ulineoff(). Indeed, the overprint "uscore" discussed below requires this block to contain device function %ulineoff() in order to work properly.
This block is interpreted at the end of each text_chars instance, that is, after the text has been output. When used with %ulineoff(), the observed behavior is much less clear, although the effect (stopping the output of underscore characters with the last character output previously) is quite clear. Additional research will need to be done.
The :ENDVALUE Block
As shown here, this block can contain device function %ulineoff(). Indeed, the overprint "uline" discussed below requires this block to contain device function %ulineoff() in order to work properly.
This block is only interpreted under certain conditions. It has been observed in these contexts:
- When a :NEWPAGE block is interpreted.
- When a :NEWLINE block is interpreted.
- As part of establishing the left margin before text output begins.
- As part of processing the first text line.
- As part of the sequence for processing text lines.
- Presumably as part of the sequence(s) for boxing, although this needs more work.
- As part of the "new font text_chars instance" sequence used in the first line pass sequence.
- As part of the "new font text_chars instance" sequence used in the subsequent line pass sequence as discussed here.
- As part of the sequence used with device function %ulineon() and %ulineoff(), as discussed here.
- When a :FINISH block is processed.
- If no :FINISH block is defined, at the very end of text output.
In general, it occurs when something new starts and text has been output and it has not already been done. This is implemented by encapsulating it into a function, which interprets this block if either of the textpass or uline flags have the value "true". This function also sets the value of the textpass flag to "false". So far, this appears to work quite well.
This block is also interpreted by extremely specialized functions or parts of functions with use the at_start and set_margin flags. These functions or parts of functions are only active at the very start of the final document pass because these flags are only "true" for a very brief period at the start of the final document pass.
All four of these flags are also discussed here.
Implementing Font Styles
The implementation of a particular font style depends on the characteristics of the device.
Some devices define separate fonts for each style, which are then paired in the :DEFAULTFONT instances with font style "plain". Of course, this can lead to a very large number of :FONT blocks. The PS :DEVICE block does this for the "times" font.
Some devices perform some actions themselves. Thus, the WHELP :DEVICE block pairs the same :DEVICEFONT block with various :FONTSTYLE blocks -- and then implements those styles with a single :LINEPROC which uses the :STARTVALUE block and :ENDVALUE block as an ON/OFF switch to cause (presumably) the program WHLPCVT to implement the desired style.
Device PSDRV also provides definitions of the usual font styles which vary between having one :LINEPROC and using the :FIRSTWORD block and :ENDVALUE block for underlining and the :STARTWORD block and :ENDWORD block for underscoring, but in both cases emitting PostScript commands rather than using device functions %ulineon() and %ulineoff(). On the other hand, for font styles involving "bold", the two-line-pass approach discussed below is used.
It is clearly not possible to discuss all possible implementations of any particular style. The following sections will focus on implementations of font styles which rely entirely on the device functions and the behavior of wgml 4.0. For one thing, these are the definitions which are referred to by the page on sequencing, particularly (but not necessarily exclusively) the section on applying font styles.
A Font Style That Prints Nothing Out
This may seem like an odd choice, since no examples exist and it would seem to have no value beyond testing, but it does represent a minimal case.
This font style:
:FONTSTYLE
type='redact'
:LINEPROC
pass=1
:STARTVALUE
%image('')
:eSTARTVALUE
:eLINEPROC
:eFONTSTYLE
is accepted by both gendev 4.1 and wgml 4.0, and, since the argument to device function %image() is an empty string, produces precisely nothing when applied.
As to utility, in theory, this could be used whenever it is necessary to maintain two versions of a document: one with all information included for internal use, and one with certain information redacted for external use. If the external version uses the above :FONTSTYLE for font style "redact", then blank places will occur where the material to be removed would otherwise have been. If the internal version uses a different version of :FONTSTYLE "redact", one which does an explicit or implicit %textpass(), then the internal version will show all the information. In practise, this would require a fair amount of thought and planning; however, text that is never printed is more secure than text which is printed and then blacked out.
Overprint "bold"
This :LINEPROC prints the same text twice, starting at the same position each time:
:FONTSTYLE
type=bold
:LINEPROC
pass=1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:LINEPROC
pass=2
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:eFONTSTYLE
The version used in testing had additional %image() statements to help in detecting the sequence of events.
Overprint "uline"
This :LINEPROC prints the text line once, and then prints underscore characters, starting at the same position each time:
:FONTSTYLE
type=uline
:LINEPROC
pass=1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:LINEPROC
pass=2
:FIRSTWORD
%dotab()
%ulineon()
:eFIRSTWORD
:ENDVALUE
%dotab()
%ulineoff()
:eENDVALUE
:eLINEPROC
:eFONTSTYLE
Preliminary testing showed that this does, indeed, place the underscore character under every character included in the set of contiguous text_chars instances using this font style, including spaces between text_chars instances and any final text_chars instances which have no text but only generate spaces. Each text_chars instance is underlined separately. The initial horizontal positioning (left margin plus indentation), however, was not underlined.
Overprint "uscore"
This :LINEPROC prints the text line once, and then prints underscore characters under each word, starting at the same position each time:
:FONTSTYLE
type=uscore
:LINEPROC
pass=1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:LINEPROC
pass=2
:STARTWORD
%dotab()
%ulineon()
:eSTARTWORD
:ENDWORD
%dotab()
%ulineoff()
:eENDWORD
:eLINEPROC
:eFONTSTYLE
Preliminary testing showed that this does, indeed, place the underscore character under each text_chars intance's text. For each text_chars instance, the horizontal positioning is done (using spaces only, never :HTAB, apparently) and then enough underscore characters are emitted to place one under each non-blank character in the text_chars instance's text.
Overprint "ulbold"
This :LINEPROC prints the text line once, and then prints underscore characters, and then prints the text line again, starting at the same position each time:
:FONTSTYLE
type=ulbold
:LINEPROC
pass=1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:LINEPROC
pass=2
:FIRSTWORD
%dotab()
%ulineon()
:eFIRSTWORD
:ENDVALUE
%dotab()
%ulineoff()
:eENDVALUE
:eLINEPROC
:LINEPROC
pass=3
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:eFONTSTYLE
At least, that is what should do. Testing proceeds.
In some cases, the "bold" part is done by issuing control codes to the device while the underlining is done as shown in line pass 2 above.
Overprint "usbold"
This :LINEPROC prints the text line once, and then prints underscore characters under each word, and then prints the text line again, starting at the same position each time:
:FONTSTYLE
type=usbold
:LINEPROC
pass=1
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:LINEPROC
pass=2
:STARTWORD
%dotab()
%ulineon()
:eSTARTWORD
:ENDWORD
%dotab()
%ulineoff()
:eENDWORD
:eLINEPROC
:LINEPROC
pass=3
:STARTVALUE
%textpass()
:eSTARTVALUE
:eLINEPROC
:eFONTSTYLE
At least, that is what should do. Testing proceeds.
In some cases, the "bold" part is done by issuing control codes to the device while the underscoring is done as shown in line pass 2 above.
Overprint Font Style Notes
These notes are based on examining how overprint font styles work with the PostScript device.
None of this has been implemented. The primary reason for this is that the Open Watcom document build system does not, so far as I can tell, actually use any overprint font styles. All font effects seen are produced by using specific fonts with font style plain, except for shading, which uses font style shade. Both plain and shade have only one pass.
The overprint font styles "bold", "ulbold", and "usbold" in the PostScript :DRIVER block use two line passes with device function %textpass() to produce the "bold" effect. The font styles which do underlining or underscoring, however, use PostScript macros uline and euline instead of device functions %ulineon() and %unlineoff(). How wgml 4.0 would, in theory, use %ulineon() and %ulineoff() with the PostScript device had to be investigated with a test device whose :DRIVER block's defined name starts with "ps", which causes wgml 4.0 to treat it as a PostScript device.
The simplest part of this is discovering why the PostScript :DRIVER block does not use device functions to produce underscores. Consider this phrase:
the first sentence
for underlining, this should be produced:
the first sentence __________________
for underscoring, this should be produced:
the first sentence ___ _____ ________
however, what wgml 4.0 produces for underlining would look more like this (the actual start position of the underscore blocks is very hard to relate to the positions of the words, but the underscore blocks almost certainly do not form a continuous line):
the first sentence __ ____ _______
and, what wgml 4.0 produces for underscoring would look like this:
the first sentence __ ___ ______
while what our wgml produces for underlining would look more like this (the actual start position of the underscore blocks is very hard to relate to the positions of the words, but the underscore blocks almost certainly do not form a continuous line):
the first sentence __ ____ ________
and, what our wgml produces for underscoring would look like this:
the first sentence __ ___ ______
The results for underscoring are identical. The results for underlining are a bit different. But neither is correct.
For our wgml, at least, this is the result of dividing the space to be underscored by the width of an underscore character using integer math with no rounding. Since wgml 4.0 gets much the same result, it is, most likely, doing the same thing. What is seen above is the result of truncation errors, which are unavoidable when using a single character to fill the space occupied by text in a variable-width font. Hence the use of PostScript macros uline and euline, which turned out to work quite well when tested.
Creating bold text by overprinting would not, one would think, work very well with a page-drawing device such as PostScript: it works with a typewriter or impact printer because more ink is deposited at the same position, but, with Postscript, one would think that each position either has a dot or does not have dot and so would look the same no matter how many times the same text is output starting at the same position.
It should be no surprise, then, to learn that wgml 4.0 does not output the second pass text to the same position as is used on the first pass. Instead, for PostScript devices, wgml 4.0 adds 6 horizontal base units (that is, 0.006 inch) to the address used in the first pass. This can be seen when using the PS device with justification on (so that each word is positioned individually); it is not simply an effect of a test device.
This initially seemed to produce a very fertile topic for investigation. These items would need to be investigated to fully understand what wgml 4.0 is doing:
- Does this apply to all devices, at least potentially? Or is it specific to either devices which do page addressing (that is, define an :ABSOLUTEADDRESS block) or PostScript devices only?
- Does this apply to all font styles with "bold" in their name? Or does it apply only to certain font styles, specifically bold, ulbold, and usbold?
- Does this apply to all subsequent passes? Or only to those on which text is emitted? Or only on specific passes: pass 2 for "bold" and pass 3 for "ulbold, usbold"?
- Is the value added always "6"? Or does it vary with the number of horizontal base units per inch? If it does vary with the number of horizontal base units per inch, how is it computed?
Questions 2 and 3 have the potential to provide reasons for three of the standard font style names and for why it is the third pass of font styles uline and uscore which contain the second use of device function %textpass().
When the output of wgml 4.0 and our wgml produced using an option file containing
( font 0 times-roman 2 10 ( font 1 times-roman bold 2 10 ( font 2 times-roman uline 2 10 ( font 3 times-roman uscore 2 10
(that is, configured to actually use the overprint font styles) was examined, it became clear that this was far too complicated to work on unless and until it becomes necessary. The results leading to this conclusion were:
- Our wgml, despite printing the text in the same position both times, produced output that looked very similar to wgml 4.0. The output from wgml 4.0 was a bit darker, so the slight offset does have an effect.
- With some files, overprint bold looks quite nice, especially when produced by wgml 4.0 (that is, with the offset).
- With other files, overprint bold looks horrible: it looks as if each letter were separately printed as opposed to a single letter printed a bit wider than usual. The gap is so large that the result can barely be read. This happened with our wgml as well as with wgml 4.0.
The files used were a bit different; the one producing a good result was very simple, the other file was less simple. And the first and third items suggest that PostScript itself may be doing something in this situation, something that doesn't always look very good. There seems to be little point in implementing a feature that is never used by the Open Watcom document build system and which may depend more on how PostScript behaves then on how wgml behaves.
:LAYOUT
This is a collection of notes on the use of layout files.
- There should only be one layout file, at most, provided on the command-line. Providing a second causes them to be listed as "current", in inverse order, but also causes problems with document formatting later on, so this should probably be an error.
- Files intended for inclusion in :LAYOUT sections should not start with the :LAYOUT tag; it will cause an error if present: it appears that :LAYOUT sections do not nest. The error message is not very specific.
- If a file intended for inclusion in a :LAYOUT section ends with the :eLAYOUT tag, the effect appears to be indistinguishable from what would happen it if were not present, so this should probably be an error.
- Both the file given with the command-line option LAYOUT and the document specification file may contain the :LAYOUT and :eLAYOUT tags. These layout sections are not nested, but are processed sequentially.
The layout file specified on the command line, if any, is processed before the layout section, if any, in the document specification.
A clarification on the first point above: what appears to have happened in the specific case tested is this:
- The document specification was made "current".
- The second LAYOUT file was made "current".
- The first LAYOUT file was made "current".
- The first LAYOUT file was processed.
- The :LAYOUT section in the document specification was processed.
- wgml 4.0 produced the specified header (top banner).
- wgml 4.0 then reacted to the second LAYOUT file, which contained the :H0 layout tag and attributes, by complaining that an attribute was found instead of text.
This suggests that, once wgml 4.0 concluded the :LAYOUT section was done, it proceeded to process the remaining items as part of the document, starting with the second LAYOUT file, presumably because it was still part of a linked list of such files. It accepted :H0 as a valid tag -- but the document tag :H0 does not take the same set of attributes as the layout tag :H0. That wgml 4.0, in this context, recognized a layout attribute as an (illegal) attribute shows that the tag processing code is always aware all of the attribute names.

