Pod::Man - Convert POD data to formatted *roff input
use Pod::Man;
my $parser = Pod::Man->new (release => $VERSION, section => 8);
# Read POD from STDIN and write to STDOUT.
$parser->parse_file (\*STDIN);
# Read POD from file.pod and write to file.1.
$parser->parse_from_file ('file.pod', 'file.1');
Pod::Man is a module to convert documentation in the POD format (the preferred language for documenting Perl) into *roff input using the man macro set. The resulting *roff code is suitable for display on a terminal using nroff(1), normally via man(1), or printing using troff(1). It is conventionally invoked using the driver script pod2man, but it can also be used directly.
By default (on non-EBCDIC systems), Pod::Man outputs UTF-8. Its output should work with the man program on systems that use groff (most Linux distributions) or mandoc (most BSD variants), but may result in mangled output on older UNIX systems. To choose a different, possibly more backward-compatible output mangling on such systems, set the encoding
option to roff
(the default in earlier Pod::Man versions). See the encoding
option and "ENCODING" for more details.
See "COMPATIBILTY" for the versions of Pod::Man with significant backward-incompatible changes (other than constructor options, whose versions are documented below), and the versions of Perl that included them.
Create a new Pod::Man object. ARGS should be a list of key/value pairs, where the keys are chosen from the following. Each option is annotated with the version of Pod::Man in which that option was added with its current meaning.
[1.00] Sets the centered page header for the .TH
macro. The default, if this option is not specified, is User Contributed Perl Documentation
.
[4.00] Sets the left-hand footer for the .TH
macro. If this option is not set, the contents of the environment variable POD_MAN_DATE, if set, will be used. Failing that, the value of SOURCE_DATE_EPOCH, the modification date of the input file, or the current time if stat() can't find that file (which will be the case if the input is from STDIN
) will be used. If taken from any source other than POD_MAN_DATE (which is used verbatim), the date will be formatted as YYYY-MM-DD
and will be based on UTC (so that the output will be reproducible regardless of local time zone).
[5.00] Specifies the encoding of the output. The value must be an encoding recognized by the Encode module (see Encode::Supported), or the special values roff
or groff
. The default on non-EBCDIC systems is UTF-8.
If the output contains characters that cannot be represented in this encoding, that is an error that will be reported as configured by the errors
option. If error handling is other than die
, the unrepresentable character will be replaced with the Encode substitution character (normally ?
).
If the encoding
option is set to the special value groff
(the default on EBCDIC systems), or if the Encode module is not available and the encoding is set to anything other than roff
, Pod::Man will translate all non-ASCII characters to \[uNNNN]
Unicode escapes. These are not traditionally part of the *roff language, but are supported by groff and mandoc and thus by the majority of manual page processors in use today.
If the encoding
option is set to the special value roff
, Pod::Man will do its historic transformation of (some) ISO 8859-1 characters into *roff escapes that may be adequate in troff and may be readable (if ugly) in nroff. This was the default behavior of versions of Pod::Man before 5.00. With this encoding, all other non-ASCII characters will be replaced with X
. It may be required for very old troff and nroff implementations that do not support UTF-8, but its representation of any non-ASCII character is very poor and often specific to European languages.
If the output file handle has a PerlIO encoding layer set, setting encoding
to anything other than groff
or roff
will be ignored and no encoding will be done by Pod::Man. It will instead rely on the encoding layer to make whatever output encoding transformations are desired.
WARNING: The input encoding of the POD source is independent from the output encoding, and setting this option does not affect the interpretation of the POD input. Unless your POD source is US-ASCII, its encoding should be declared with the =encoding
command in the source. If this is not done, Pod::Simple will will attempt to guess the encoding and may be successful if it's Latin-1 or UTF-8, but it will produce warnings. See perlpod(1) for more information.
[2.27] How to report errors. die
says to throw an exception on any POD formatting error. stderr
says to report errors on standard error, but not to throw an exception. pod
says to include a POD ERRORS section in the resulting documentation summarizing the errors. none
ignores POD errors entirely, as much as possible.
The default is pod
.
[1.00] The fixed-width font to use for verbatim text and code. Defaults to CW
. Some systems prefer CR
instead. Only matters for troff output.
[1.00] Bold version of the fixed-width font. Defaults to CB
. Only matters for troff output.
[1.00] Italic version of the fixed-width font (something of a misnomer, since most fixed-width fonts only have an oblique version, not an italic version). Defaults to CI
. Only matters for troff output.
[1.00] Bold italic (in theory, probably oblique in practice) version of the fixed-width font. Pod::Man doesn't assume you have this, and defaults to CB
. Some systems (such as Solaris) have this font available as CX
. Only matters for troff output.
[5.00] By default, Pod::Man applies some default formatting rules based on guesswork and regular expressions that are intended to make writing Perl documentation easier and require less explicit markup. These rules may not always be appropriate, particularly for documentation that isn't about Perl. This option allows turning all or some of it off.
The special value all
enables all guesswork. This is also the default for backward compatibility reasons. The special value none
disables all guesswork. Otherwise, the value of this option should be a comma-separated list of one or more of the following keywords:
Convert function references like foo()
to bold even if they have no markup. The function name accepts valid Perl characters for function names (including :
), and the trailing parentheses must be present and empty.
Make the first part (before the parentheses) of manual page references like foo(1)
bold even if they have no markup. The section must be a single number optionally followed by lowercase letters.
If no guesswork is enabled, any text enclosed in C<> is surrounded by double quotes in nroff (terminal) output unless the contents are already quoted. When this guesswork is enabled, quote marks will also be suppressed for Perl variables, function names, function calls, numbers, and hex constants.
Convert Perl variable names to a fixed-width font even if they have no markup. This transformation will only be apparent in troff output, or some other output format (unlike nroff terminal output) that supports fixed-width fonts.
Any unknown guesswork name is silently ignored (for potential future compatibility), so be careful about spelling.
[5.00] Add commands telling groff that the input file is in the given language. The value of this setting must be a language abbreviation for which groff provides supplemental configuration, such as ja
(for Japanese) or zh
(for Chinese).
Specifically, this adds:
.mso <language>.tmac
.hla <language>
to the start of the file, which configure correct line breaking for the specified language. Without these commands, groff may not know how to add proper line breaks for Chinese and Japanese text if the manual page is installed into the normal manual page directory, such as /usr/share/man.
On many systems, this will be done automatically if the manual page is installed into a language-specific manual page directory, such as /usr/share/man/zh_CN. In that case, this option is not required.
Unfortunately, the commands added with this option are specific to groff and will not work with other troff and nroff implementations.
[4.08] Sets the quote marks used to surround C<> text. lquote
sets the left quote mark and rquote
sets the right quote mark. Either may also be set to the special value none
, in which case no quote mark is added on that side of C<> text (but the font is still changed for troff output).
Also see the quotes
option, which can be used to set both quotes at once. If both quotes
and one of the other options is set, lquote
or rquote
overrides quotes
.
[4.08] Set the name of the manual page for the .TH
macro. Without this option, the manual name is set to the uppercased base name of the file being converted unless the manual section is 3, in which case the path is parsed to see if it is a Perl module path. If it is, a path like .../lib/Pod/Man.pm
is converted into a name like Pod::Man
. This option, if given, overrides any automatic determination of the name.
If generating a manual page from standard input, the name will be set to STDIN
if this option is not provided. In this case, providing this option is strongly recommended to set a meaningful manual page name.
[2.27] Normally, L<> formatting codes with a URL but anchor text are formatted to show both the anchor text and the URL. In other words:
L<foo|http://example.com/>
is formatted as:
foo <http://example.com/>
This option, if set to a true value, suppresses the URL when anchor text is given, so this example would be formatted as just foo
. This can produce less cluttered output in cases where the URLs are not particularly important.
[4.00] Sets the quote marks used to surround C<> text. If the value is a single character, it is used as both the left and right quote. Otherwise, it is split in half, and the first half of the string is used as the left quote and the second is used as the right quote.
This may also be set to the special value none
, in which case no quote marks are added around C<> text (but the font is still changed for troff output).
Also see the lquote
and rquote
options, which can be used to set the left and right quotes independently. If both quotes
and one of the other options is set, lquote
or rquote
overrides quotes
.
[1.00] Set the centered footer for the .TH
macro. By default, this is set to the version of Perl you run Pod::Man under. Setting this to the empty string will cause some *roff implementations to use the system default value.
Note that some system an
macro sets assume that the centered footer will be a modification date and will prepend something like Last modified:
. If this is the case for your target system, you may want to set release
to the last modified date and date
to the version number.
[1.00] Set the section for the .TH
macro. The standard section numbering convention is to use 1 for user commands, 2 for system calls, 3 for functions, 4 for devices, 5 for file formats, 6 for games, 7 for miscellaneous information, and 8 for administrator commands. There is a lot of variation here, however; some systems (like Solaris) use 4 for file formats, 5 for miscellaneous information, and 7 for devices. Still others use 1m instead of 8, or some mix of both. About the only section numbers that are reliably consistent are 1, 2, and 3.
By default, section 1 will be used unless the file ends in .pm
in which case section 3 will be selected.
[2.19] If set to a true value, send error messages about invalid POD to standard error instead of appending a POD ERRORS section to the generated *roff output. This is equivalent to setting errors
to stderr
if errors
is not already set.
This option is for backward compatibility with Pod::Man versions that did not support errors
. Normally, the errors
option should be used instead.
[2.21] This option used to set the output encoding to UTF-8. Since this is now the default, it is ignored and does nothing.
As a derived class from Pod::Simple, Pod::Man supports the same methods and interfaces. See Pod::Simple for all the details. This section summarizes the most-frequently-used methods and the ones added by Pod::Man.
Direct the output from parse_file(), parse_lines(), or parse_string_document() to the file handle FH instead of STDOUT
.
Direct the output from parse_file(), parse_lines(), or parse_string_document() to the scalar variable pointed to by REF, rather than STDOUT
. For example:
my $man = Pod::Man->new();
my $output;
$man->output_string(\$output);
$man->parse_file('/some/input/file');
Be aware that the output in that variable will already be encoded in UTF-8.
Read the POD source from PATH and format it. By default, the output is sent to STDOUT
, but this can be changed with the output_fh() or output_string() methods.
Read the POD source from INPUT, format it, and output the results to OUTPUT.
parse_from_filehandle() is provided for backward compatibility with older versions of Pod::Man. parse_from_file() should be used instead.
Parse the provided lines as POD source, writing the output to either STDOUT
or the file handle set with the output_fh() or output_string() methods. This method can be called repeatedly to provide more input lines. An explicit undef
should be passed to indicate the end of input.
This method expects raw bytes, not decoded characters.
Parse the provided scalar variable as POD source, writing the output to either STDOUT
or the file handle set with the output_fh() or output_string() methods.
This method expects raw bytes, not decoded characters.
As of Pod::Man 5.00, the default output encoding for Pod::Man is UTF-8. This should work correctly on any modern system that uses either groff (most Linux distributions) or mandoc (Alpine Linux and most BSD variants, including macOS).
The user will probably have to use a UTF-8 locale to see correct output. This may be done by default; if not, set the LANG or LC_CTYPE environment variables to an appropriate local. The locale C.UTF-8
is available on most systems if one wants correct output without changing the other things locales affect, such as collation.
The backward-compatible output format used in Pod::Man versions before 5.00 is available by setting the encoding
option to roff
. This may produce marginally nicer results on older UNIX versions that do not use groff or mandoc, but none of the available options will correctly render Unicode characters on those systems.
Below are some additional details about how this choice was made and some discussion of alternatives.
The default output encoding for Pod::Man has been a long-standing problem. troff and nroff predate Unicode by a significant margin, and their implementations for many UNIX systems reflect that legacy. It's common for Unicode to not be supported in any form.
Because of this, versions of Pod::Man prior to 5.00 maintained the highly conservative output of the original pod2man, which output pure ASCII with complex macros to simulate common western European accented characters when processed with troff. The nroff output was awkward and sometimes incorrect, and characters not used in western European scripts were replaced with X
. This choice maximized backwards compatibility with man and nroff/troff implementations at the cost of incorrect rendering of many POD documents, particularly those containing people's names.
The modern implementations, groff (used in most Linux distributions) and mandoc (used by most BSD variants), do now support Unicode. Other UNIX systems often do not, but they're now a tiny minority of the systems people use on a daily basis. It's increasingly common (for very good reasons) to use Unicode characters for POD documents rather than using ASCII conversions of people's names or avoiding non-English text, making the limitations in the old output format more apparent.
Four options have been proposed to fix this:
Optionally support UTF-8 output but don't change the default. This is the approach taken since Pod::Man 2.1.0, which added the utf8
option. Some Pod::Man users use this option for better output on platforms known to support Unicode, but since the defaults have not changed, people continued to encounter (and file bug reports about) the poor default rendering.
Convert characters to troff \(xx
escapes. This requires maintaining a large translation table and addresses only a tiny part of the problem, since many Unicode characters have no standard troff name. groff has the largest list, but if one is willing to assume groff is the formatter, the next option is better.
Convert characters to groff \[uNNNN]
escapes. This is implemented as the groff
encoding for those who want to use it, and is supported by both groff and mandoc. However, it is no better than UTF-8 output for portability to other implementations. See "Testing results" for more details.
Change the default output format to UTF-8 and ask those who want maximum backward compatibility to explicitly select the old encoding. This fixes the issue for most users at the cost of backwards compatibility. While the rendering of non-ASCII characters is different on older systems that don't support UTF-8, it's not always worse than the old output.
Pod::Man 5.00 and later makes the last choice. This arguably produces worse output when manual pages are formatted with troff into PostScript or PDF, but doing this is rare and normally manual, so the encoding can be changed in those cases. The older output encoding is available by setting encoding
to roff
.
Here is the results of testing encoding
values of utf-8
and groff
on various operating systems. The testing methodology was to create man/man1 in the current directory, copy encoding.utf8 or encoding.groff from the podlators 5.00 distribution to man/man1/encoding.1, and then run:
LANG=C.UTF-8 MANPATH=$(pwd)/man man 1 encoding
If the locale is not explicitly set to one that includes UTF-8, the Unicode characters were usually converted to ASCII (by, for example, dropping an accent) or deleted or replaced with <?>
if there was no conversion.
Tested on 2022-09-25. Many thanks to the GCC Compile Farm project for access to testing hosts.
OS UTF-8 groff
------------------ ------- -------
AIX 7.1 no [1] no [2]
Alpine 3.15.0 yes yes
CentOS 7.9 yes yes
Debian 7 yes yes
FreeBSD 13.0 yes yes
NetBSD 9.2 yes yes
OpenBSD 7.1 yes yes
openSUSE Leap 15.4 yes yes
Solaris 10 yes no [2]
Solaris 11 no [3] no [3]
I did not have access to a macOS system for testing, but since it uses mandoc, it's behavior is probably the same as the BSD hosts.
Notes:
Unicode characters were converted to one or two random ASCII characters unrelated to the original character.
Unicode characters were shown as the body of the groff escape rather than the indicated character (in other words, text like [u00EF]
).
Unicode characters were deleted entirely, as if they weren't there. Using nroff -man
instead of man to format the page showed the same results as Solaris 10. Using groff -k -man -Tutf8
to format the page produced the correct output.
PostScript and PDF output using groff on a Debian 12 system do not support combining accent marks or SMP characters due to a lack of support in the default output font.
Testing on additional platforms is welcome. Please let the author know if you have additional results.
(F) You specified a *roff font (using fixed
, fixedbold
, etc.) that wasn't either one or two characters. Pod::Man doesn't support *roff fonts longer than two characters, although some *roff extensions do (the canonical versions of nroff and troff don't either).
(F) The errors
parameter to the constructor was set to an unknown value.
(F) The quote specification given (the quotes
option to the constructor) was invalid. A quote specification must be either one character long or an even number (greater than one) characters long.
(F) The POD document being formatted had syntax errors and the errors
option was set to die
.
If set and Encode is not available, silently fall back to an encoding of groff
without complaining to standard error. This environment variable is set during Perl core builds, which build Encode after podlators. Encode is expected to not (yet) be available in that case.
If set, this will be used as the value of the left-hand footer unless the date
option is explicitly set, overriding the timestamp of the input file or the current time. This is primarily useful to ensure reproducible builds of the same output file given the same source and Pod::Man version, even when file timestamps may not be consistent.
If set, and POD_MAN_DATE and the date
options are not set, this will be used as the modification time of the source file, overriding the timestamp of the input file or the current time. It should be set to the desired time in seconds since UNIX epoch. This is primarily useful to ensure reproducible builds of the same output file given the same source and Pod::Man version, even when file timestamps may not be consistent. See https://reproducible-builds.org/specs/source-date-epoch/ for the full specification.
(Arguably, according to the specification, this variable should be used only if the timestamp of the input file is not available and Pod::Man uses the current time. However, for reproducible builds in Debian, results were more reliable if this variable overrode the timestamp of the input file.)
Pod::Man 1.02 (based on Pod::Parser) was the first version included with Perl, in Perl 5.6.0.
The current API based on Pod::Simple was added in Pod::Man 2.00. Pod::Man 2.04 was included in Perl 5.9.3, the first version of Perl to incorporate those changes. This is the first version that correctly supports all modern POD syntax. The parse_from_filehandle() method was re-added for backward compatibility in Pod::Man 2.09, included in Perl 5.9.4.
Support for anchor text in L<> links of type URL was added in Pod::Man 2.23, included in Perl 5.11.5.
parse_lines(), parse_string_document(), and parse_file() set a default output file handle of STDOUT
if one was not already set as of Pod::Man 2.28, included in Perl 5.19.5.
Support for SOURCE_DATE_EPOCH and POD_MAN_DATE was added in Pod::Man 4.00, included in Perl 5.23.7, and generated dates were changed to use UTC instead of the local time zone. This is also the first release that aligned the module version and the version of the podlators distribution. All modules included in podlators, and the podlators distribution itself, share the same version number from this point forward.
Pod::Man 4.10, included in Perl 5.27.8, changed the formatting for manual page references and function names to bold instead of italic, following the current Linux manual page standard.
Pod::Man 5.00, included in Perl 5.37.7, changed the default output encoding to UTF-8, overridable with the new encoding
option. It also fixed problems with bold or italic extending too far when used with C<> escapes, and began converting Unicode zero-width spaces (U+200B) to the \:
*roff escape. It also dropped attempts to add subtle formatting corrections in the output that would only be visible when typeset with troff, which had previously been a significant source of bugs.
Pod::Man v6.0.0 and later unconditionally convert -
to the \-
*roff escape, representing an ASCII hyphen-minus. Earlier versions attempted to use heuristics to decide when a given -
character should translate to a hyphen-minus or a true hyphen, but these heuristics were buggy and fragile. v6.0.0 and later also unconditionally convert `
and '
to ASCII grave accent and apostrophe marks instead of the default *roff behavior of interpreting them as paired quotes.
There are numerous bugs and language-specific assumptions in the nroff fallbacks for accented characters in the roff
encoding. Since the point of this encoding is backward compatibility with the output from earlier versions of Pod::Man, and it is deprecated except when necessary to support old systems, those bugs are unlikely to ever be fixed.
Pod::Man doesn't handle font names longer than two characters. Neither do most troff implementations, but groff does as an extension. It would be nice to support as an option for those who want to use it.
Pod::Man copies the input spacing verbatim to the output *roff document. This means your output will be affected by how nroff generally handles sentence spacing.
nroff dates from an era in which it was standard to use two spaces after sentences, and will always add two spaces after a line-ending period (or similar punctuation) when reflowing text. For example, the following input:
=pod
One sentence.
Another sentence.
will result in two spaces after the period when the text is reflowed. If you use two spaces after sentences anyway, this will be consistent, although you will have to be careful to not end a line with an abbreviation such as e.g.
or Ms.
. Output will also be consistent if you use the *roff style guide (and XKCD 1285) recommendation of putting a line break after each sentence, although that will consistently produce two spaces after each sentence, which may not be what you want.
If you prefer one space after sentences (which is the more modern style), you will unfortunately need to ensure that no line in the middle of a paragraph ends in a period or similar sentence-ending paragraph. Otherwise, nroff will add a two spaces after that sentence when reflowing, and your output document will have inconsistent spacing.
The *roff language distinguishes between two types of hyphens: -
, which is a true typesetting hyphen (roughly equivalent to the Unicode U+2010 code point), and \-
, which is the ASCII hyphen-minus (U+002D) that is used for UNIX command options and most filenames. Hyphens, where appropriate, produce better typesetting, but incorrectly using them for command names and options can cause problems with searching and cut-and-paste.
POD does not draw this distinction. Before podlators v6.0.0, Pod::Man attempted to translate -
in the input into either a hyphen or a hyphen-minus, depending on context. However, this distinction proved impossible to do correctly with heuristics. Pod::Man therefore translates all -
characters in the input to \-
in the output, ensuring that command names and options are correct at the cost of somewhat inferior typesetting and line breaking issues with long hyphenated phrases.
To use true hyphens in the Pod::Man output, declare an input character set of UTF-8 (or some other Unicode encoding) and use Unicode hyphens. Pod::Man and *roff should handle those correctly with the default output format and most modern *roff implementations.
Similarly, Pod::Man disables the default *roff behavior of turning `
and '
characters into matched quotes, and pairs of those characters into matched double quotes, because there is no good way to tell from the POD input whether this interpretation is desired or whether the intent is to use a literal grave accent or neutral apostrophe. If you want paired quotes in the output, use Unicode and its paired quote characters.
Written by Russ Allbery <rra@cpan.org>, based on the original pod2man by Tom Christiansen <tchrist@mox.perl.com>.
The modifications to work with Pod::Simple instead of Pod::Parser were contributed by Sean Burke <sburke@cpan.org>, but I've since hacked them beyond recognition and all bugs are mine.
Copyright 1999-2020, 2022-2024 Russ Allbery <rra@cpan.org>
Substantial contributions by Sean Burke <sburke@cpan.org>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Encode::Supported, Pod::Simple, perlpod(1), pod2man(1), nroff(1), troff(1), man(1), man(7)
Ossanna, Joseph F., and Brian W. Kernighan. "Troff User's Manual," Computing Science Technical Report No. 54, AT&T Bell Laboratories. This is the best documentation of standard nroff and troff. At the time of this writing, it's available at http://www.troff.org/54.pdf.
The manual page documenting the man macro set may be man(5) instead of man(7) on your system.
See perlpodstyle(1) for documentation on writing manual pages in POD if you've not done it before and aren't familiar with the conventions.
The current version of this module is always available from its web site at https://www.eyrie.org/~eagle/software/podlators/. It is also part of the Perl core distribution as of 5.6.0.