Home > POD-table

POD-table

POD-table is a project mainly written in Perl, it's free.

Write tables in POD and convert them into HTML tables and LATEX tables

=head1 READ-ME

=head2 Presentation of the read-me file

This F file has a double purpose.

=over 4

=item 1

Describe how the layout system functions.

=item 2

Be used as an input file for a demo of the layout system.

=back

In other words, either you read the POD file as is to learn what will happen, or you read the HTML generated file or the PDF generated file to learn what has juste happened.

=head2 Short Background

I have developed this layout system because I noticed a boardgame, the rules of which are available on the net. I though it would be nice if this game had a larger audience, including French-speaking people. In addition, the rules were rather short, so I decided to translate them in French.

As many Perl programmers, I like using the POD format, because it is simple and yet it provides most basic features: chapters, lists, links, bold chars, italics. The only missing basic feature is tables. Alas, there were several tables in the game rules. Of course, when using POD, you can always show a table by using preformatted text, as this:

.-----.-----.-----. | 8 | 1 | 6 | .-----.-----.-----. | 3 | 5 | 7 | .-----.-----.-----. | 4 | 9 | 2 | .-----.-----.-----.

The result is acceptable when using C to display the POD or when reading it in your favourite text editor, but the same thing in HTML with C<EpreE> tags or in LATEX within a C environment is quite frustrating. Having the following is much better:

+-----+-----+-----+ | 8 | 1 | 6 | +-----+-----+-----+ | 3 | 5 | 7 | +-----+-----+-----+ | 4 | 9 | 2 | +-----+-----+-----+

(Of course, if you read this using C or you favourite text editor, or if you browse it on the Git Hub website, you will not see much difference).

There is the C module, which allows us to generate pretty tables in HTML, LATEX and other advanced display formats, but when you edit your source file under Emacs / vi / other or when you display the file with C, the result is disappointing.

Another solution consists in generating the HTML / LATEX file, and then edit it using Emacs, to replace the ASCII-art table with a proper table. See the C and C functions, as well as the other CI functions. The downside is that this is incompatible with automatic processing and F.

This is why I created these scripts, to layout nice tables from a F procedure, without requiring any intervention once C is running.

=head2 Installation

You just need to download the repository and store its files into the same directory. Then, run C for a demo of what the programmes do.

There are a few prerequisites for the programmes.

=over 4

=item *

A complete installation of Perl. If you have installed Perl with your distro's F<.rpm>'s or F<.deb>'s, make sure that C, C and C have been installed, too. Sometimes, they are not in the main Perl package.

=item *

A rather recent version of Emacs. Version 23 is fine, I think that version 22 is fine, too, but version 21 is too much old.

=item *

An HTML browser, compatible with UTF-8 and tables. But nowadays, is there still anybody who has not such a browser?

=item *

A LATEX installation, including the commands C and C. You will need also the Palatino (C), Helvetica (C) and Courier (C) fonts, and the C, C, C, C, C et C packages, even if C does not benefit from C. Actually, I always use all these packages, in the same manner many Perl programmers use C and C. But if you do not like these fonts and packages, you can edit the F<.skel> files to remove them and possibly add others.

=back

=head2 How to write tables in your editor

You can read the chapter about table-based text in the Emacs documentation. Anyway, here is what you need to know, even if you use another text editor.

The table must be interpreted by C and C as pre-formatted text. Therefore, each line must begin with some space.

In addition, the tables must be recognized as such by Emacs. Therefore, the lines delimiting the rows and the columns must be minus signs C<-> and pipes C<|>. The junctions between lines must be plus signs C<+>.

I For technical reasons, I will show examples below, where the line junctions are dots instead of plus signs. Just imagine they are plus signs.

=head3 Basic Example

In this table, all cells have the same size.

.----.----.----.----. | 13 | 2 | 3 | 16 | .----.----.----.----. | 1 | 14 | 15 | 4 | .----.----.----.----. | 8 | 11 | 10 | 5 | .----.----.----.----. | 12 | 7 | 6 | 9 | .----.----.----.----.

which results in:

+----+----+----+----+ | 13 | 2 | 3 | 16 | +----+----+----+----+ | 1 | 14 | 15 | 4 | +----+----+----+----+ | 8 | 11 | 10 | 5 | +----+----+----+----+ | 12 | 7 | 6 | 9 | +----+----+----+----+

=head3 Wide Cell

In this table, several cells from the same row have been merged into a wide cell.

.----.----.----.----.----.----.----.----. | H | | He | .----.----.----.----.----.----.----.----. | Li | Be | B | C | N | O | F | Ne | .----.----.----.----.----.----.----.----. | Na | Mg | Al | Si | P | S | Cl | Ar | .----.----.----.----.----.----.----.----.

which results in:

+----+----+----+----+----+----+----+----+ | H | | He | +----+----+----+----+----+----+----+----+ | Li | Be | B | C | N | O | F | Ne | +----+----+----+----+----+----+----+----+ | Na | Mg | Al | Si | P | S | Cl | Ar | +----+----+----+----+----+----+----+----+

Yet, for the upper line, the line junctions are not really line junctions. Therefore, the plus signs can be replaced by dashes, for an horizontal plain line.

.----.-----------------------------.----. | H | | He | .----.----.----.----.----.----.----.----. | Li | Be | B | C | N | O | F | Ne | .----.----.----.----.----.----.----.----. | Na | Mg | Al | Si | P | S | Cl | Ar | .----.----.----.----.----.----.----.----.

which results in:

+----+-----------------------------+----+ | H | | He | +----+----+----+----+----+----+----+----+ | Li | Be | B | C | N | O | F | Ne | +----+----+----+----+----+----+----+----+ | Na | Mg | Al | Si | P | S | Cl | Ar | +----+----+----+----+----+----+----+----+

=head3 Heightened Cell

In the following table, some cells are the result of a merger of basic cells from the same column. For example:

.------------------.--------------.------.------. | | Cretaceous | 130 | 65 | | .--------------.------.------. | Secondary | Jurassic | 200 | 130 | | .--------------.------.------. | | Triassic | 235 | 200 | .------------------.--------------.------.------. | | Permian | 280 | 235 | | .--------------.------.------. | | Carboniferous| 350 | 280 | | .--------------.------.------. | | Devonian | 400 | 350 | | Primary .--------------.------.------. | | Silurian | 440 | 400 | | .--------------.------.------. | | Ordovician | 500 | 440 | | .--------------.------.------. | | Cambrian | 540 | 500 | .------------------.--------------.------.------.

results in:

+------------------+--------------+------+------+ | | Cretaceous | 130 | 65 | | +--------------+------+------+ | Secondary | Jurassic | 200 | 130 | | +--------------+------+------+ | | Triassic | 235 | 200 | +------------------+--------------+------+------+ | | Permian | 280 | 235 | | +--------------+------+------+ | | Carboniferous| 350 | 280 | | +--------------+------+------+ | | Devonian | 400 | 350 | | Primary +--------------+------+------+ | | Silurian | 440 | 400 | | +--------------+------+------+ | | Ordovician | 500 | 440 | | +--------------+------+------+ | | Cambrian | 540 | 500 | +------------------+--------------+------+------+

Note that if you read a LATEX-generated version, there is a problem: the word "Primary" has disappeared. This is because it is aligned with a partial horizontal line and the LATEX table generator cannot cope with this situation. On the other side, the HTML version is fine. To obtain a better version in LATEX, you can shift the word one line upwards or downward:

.------------------.--------------.------.------. | | Cretaceous | 130 | 65 | | .--------------.------.------. | Secondary | Jurassic | 200 | 130 | | .--------------.------.------. | | Triassic | 235 | 200 | .------------------.--------------.------.------. | | Permian | 280 | 235 | | .--------------.------.------. | | Carboniferous| 350 | 280 | | .--------------.------.------. | Primary | Devonian | 400 | 350 | | .--------------.------.------. | | Silurian | 440 | 400 | | .--------------.------.------. | | Ordovician | 500 | 440 | | .--------------.------.------. | | Cambrian | 540 | 500 | .------------------.--------------.------.------.

which results in:

+------------------+--------------+------+------+ | | Cretaceous | 130 | 65 | | +--------------+------+------+ | Secondary | Jurassic | 200 | 130 | | +--------------+------+------+ | | Triassic | 235 | 200 | +------------------+--------------+------+------+ | | Permian | 280 | 235 | | +--------------+------+------+ | | Carboniferous| 350 | 280 | | +--------------+------+------+ | Primary | Devonian | 400 | 350 | | +--------------+------+------+ | | Silurian | 440 | 400 | | +--------------+------+------+ | | Ordovician | 500 | 440 | | +--------------+------+------+ | | Cambrian | 540 | 500 | +------------------+--------------+------+------+

The downside is that now, in the HTML version, the "Primary" word is no longer center-aligned vertically. I must admit that the absence of vertical alignment would not have been noticed if my script had not removed all C<Ebr /E> tags.

Go easy on the rowspans. In some cases, the C function gets confused while generating HTML. For example:

.----.----.----. | 1 | 11 | 21 | | 2 | 12 | 22 | | 3 .----.----. | 4 | 14 | 24 | .----. 15 | 25 | | 6 | 16 | 26 | | 7 .----. 27 | | 8 | 18 | 28 | .----.----.----.

yields:

+----+----+----+ | 1 | 11 | 21 | | 2 | 12 | 22 | | 3 +----+----+ | 4 | 14 | 24 | +----+ 15 | 25 | | 6 | 16 | 26 | | 7 +----+ 27 | | 8 | 18 | 28 | +----+----+----+

In LATEX, the result is correct, but not so in HTML.

=head3 Two-Dimension Extensions

According to Emacs' documentation, you can have cells that spans several columns and several rows at the same time, provided they are rectangular. For example,

.--.-----------------------------------------------.--. |H | |He| .--.--. .--.--.--.--.--.--. |Li|Be| |B |C |N |O |F |Ne| .--.--. .--.--.--.--.--.--. |Na|Mg| |Al|Si|P |S |Cl|Ar| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--. |K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--. |Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.

is forbidden. On the other side, you can have:

.--.--.-----------------------------.--------------.--. |H | | | |He| .--.--. .--.--.--.--.--.--. |Li|Be| |B |C |N |O |F |Ne| .--.--. .--.--.--.--.--.--. |Na|Mg| |Al|Si|P |S |Cl|Ar| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--. |K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--. |Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.

or:

.--.-----------------------------------------------.--. |H | |He| .--.--.-----------------------------.--.--.--.--.--.--. |Li|Be| |B |C |N |O |F |Ne| .--.--. .--.--.--.--.--.--. |Na|Mg| |Al|Si|P |S |Cl|Ar| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--. |K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--. |Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe| .--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.--.

which give respectively:

+--+--+-----------------------------+--------------+--+ |H | | | |He| +--+--+ +--+--+--+--+--+--+ |Li|Be| |B |C |N |O |F |Ne| +--+--+ +--+--+--+--+--+--+ |Na|Mg| |Al|Si|P |S |Cl|Ar| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

+--+-----------------------------------------------+--+ |H | |He| +--+--+-----------------------------+--+--+--+--+--+--+ |Li|Be| |B |C |N |O |F |Ne| +--+--+ +--+--+--+--+--+--+ |Na|Mg| |Al|Si|P |S |Cl|Ar| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Actually, the first possibility, with the one empty cell, is still recognised by Emacs as a table, although there is some confusion about the extent of this cell. The HTML generation gives incorrect results, while the LATEX generation gives a quite acceptable result. See for yourself:

+--+-----------------------------------------------+--+ |H | |He| +--+--+ +--+--+--+--+--+--+ |Li|Be| |B |C |N |O |F |Ne| +--+--+ +--+--+--+--+--+--+ |Na|Mg| |Al|Si|P |S |Cl|Ar| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |K |Ca|Sc|Ti|V |Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |Rb|Sr|Y |Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I |Xe| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

=head2 How It Works

The layout system functions with the following steps.

=over 4

=item 1

The standard conversion programme C or C

=item 2

An E-lisp script, interpreted by Emacs in batch mode, because you can launche Emacs in batch mode, that is, without opening a window and without input from the human-operated keyboard. This script searches for tables inside C<EpreE> tags in the generated HTML, or inside C environments in the generated LATEX file, and replace them with real C<EtableE> tables or real C environments.

=item 3

A Perl script, which performs various updates.

=item 4

That's all for the HTML file, but for LATEX, you still have to generate C<.dvi> or C<.pdf> files.

=back

=head3 How the E-lisp script works

Ths E-lisp script includes two interactive functions, which are nothing more than wrappers calling a third function with the proper parameteres. Even if the script is meant to be used in batch mode, it can be useful from time to time to call these functions interactively from Emacs, to debug them or to add new features.

The script main body is a loop with a regexp search for a table horizontal line. Usually, if the search finds nothing, this triggers an exception which halts the script. As a result, the lines after the end of the loop are not executed, including the save-file command. In addition, in a C, the script ends with a status code indicating a failure and the C process stops. So we prefer a search which does not halt execution, but which returns an internal value indicating that the process must leave the loop. This is achieved by giving a third parameter to the C function. As a result, we have to provide a second parameter to the function. But this is no big deal, the C function is perfect for this purpose.

Each iteration deals with a different table. The iteration uses the following steps:

=over 4

=item 1

The script locates the beginning of the text span that contains the table, that is, the C<EpreE> tag, or the LATEX statement C<egin{verbatim}>.

=item 2

The script places the C inside the table and activates the table functions of Emacs (by using C).

=item 3

The script generates the source of the table in the target language, HTML or LATEX.

=item 4

The script locates the end of the text span that contains the table, that is, the C<E/preE> tag, or the LATEX statement C<end{verbatim}>.

=item 5

The script deletes the C

 or C source.

=item 6

The script switches to the buffer with the generated source. It makes a few amendments. For example, for HTML, I do not care about the precise amount of space and the exact number of line breaks. So the script removes the C<Enbsp;> entities and the C<Ebr /E> tags. And for LATEX, the script inserts vertical spaces to prevent the table from being glued to the preceding and following paragraphs.

=item 7

The script copies and pastes the generated source into the HTML or LATEX file.

=back

And the script loops, searching a new table to process.

Upon exiting the loop, the buffer is written into a file, the name of which is the input filename, shortened by one char. So the extension F<.html1> becomes F<.html> and the extension F<.tex1> becomes F<.tex>.

=head4 Known problems

If the table is not correct, for example if the top line or the bottom line are missing, the result will be a truncated array. Thus:

| Vanish | Evaporated | Invisible | .---------------.---------------.---------------. | Stays | Present | Permanent | .---------------.---------------.---------------. | Missing | Hidden | Absent |

gives :

| Vanish | Evaporated | Invisible | +---------------+---------------+---------------+ | Stays | Present | Permanent | +---------------+---------------+---------------+ | Missing | Hidden | Absent |

And even the script crashes with:

| Vanish | Evaporated | Invisible | .---------------.---------------.---------------. | Missing | Hidden | Absent |

Also, if two arrays are contiguous, without a blank line, only the first one will appear in the generated file. Thus:

.---------------.---------------. | Stays | Present | .---------------.---------------. | Constant | Permanent | .---------------.---------------. .---------------.---------------.---------------. | Vanish | Evaporated | Invisible | .---------------.---------------.---------------. | Missing | Hidden | Absent | .---------------.---------------.---------------.

will result in:

+---------------+---------------+ | Stays | Present | +---------------+---------------+ | Constant | Permanent | +---------------+---------------+ +---------------+---------------+---------------+ | Vanish | Evaporated | Invisible | +---------------+---------------+---------------+ | Missing | Hidden | Absent | +---------------+---------------+---------------+

=head3 How the Perl Scripts Works

The Perl scripts executes a few simple modifications, but necessary ones.

=head4 F

The text I translated had a copyright and a trademark notices. No problem for the copyright, the POD C<EEcopyE> string gives the proper result. But I do not know anything similar for the trademark symbol. So the Perl script looks for the C<TM> regexp with word boundaries (so the "ATMOSPHERE" and "HETMAN" words and especially the "HTML" acronym will not be modified) and replaces it with the proper HTML sequence.

Same thing, for the present C file, the C<LATEX> regexp is replaced with the sequence including vertical shifts of the letters C and epsilon C.

=head4 F

This script has the same functions as F, plus a few others.

Firstly, for the LATEX file, we switch from C encoding to C encoding, which was not necessary for HTML.

Secondly, in the rules I translated, there were several references to other paragraphs of the rules, such as:

(see Short background --- p.1)

I have translated these references in the POD translation and I have added C<LE E> tags around the paragraph title, while keeping the page number. So the generated HTML will contain meaningless page numbers, because if you print the rules, your page numbers will vary a lot, depending on the font size you use. On the other hand, the LATEX output (C<.dvi> or C<.pdf>) will have stable page numbers, so it is worth using the right page numbers in the generated file. LATEX provides a function for this purpose: C< hepageref>, but C does not generate it. So the Perl script parses the LATEX source to insert this function where appropriate.

This is done in two passes. The first pass locates all C<section> statements and similar and stores the corresponding label. The second pass locates all references

(see Short background --- p.1)

extracts the label associated to the paragraph title ("Short background" in this case) and replaces the page number with the C<pageref> statement. That gives:

Cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1, cf. L --- p.1.

This means that you parse a LATEX file with regexps, which is as dirty as parsing HTML with regexps. But the LATEX source is a very simple one and therefore you can use simple regexps to search for simple patterns in a simple source text.

Thirdly, the LATEX file generated by C is not a standalone file, like the HTML generated file. So the F script merges this generated file with a skeleton file.

A last remark, which applies not only to my layout system, but also to LATEX in general. When a LATEX document contains page references, it can happen that some page numbers are skewed, because the generation uses an old C<.aux> file. This is why the C or C command is used twice in a row. The first time, its purpose is to refresh the C<.aux> file with up-to-date page numbers. The second time, its purpose is to generate the output file with the proper page numbers.

=head2 What next?

Coding in E-lisp is fine, but coding in Perl is better. I plan to write a Perl module which would do the same as C and perhaps also C (although there are already modules generating HTML tables on CPAN). I do not know yet if this will be a module dealing with plain text or a C and C plug-in.

=head2 License

The programmes and the additional files are published under the same terms as Perl: GNU General Public License (GPL) or Artistic License.