the
first byte of the pair determines the least-significant nybble of the
output byte, and with format C it determines the most-significant
nybble.
If the length of the input string is not even, it behaves as if padded
by a null byte at the end. Similarly, during unpack()ing the "extra"
nybbles are ignored.
If the input string of pack() is longer than needed, extra bytes are ignored.
A C<*> for the repeat count of pack() means to use all the bytes of
the input field. On unpack()ing the bits are converted to a string
of hexadecimal digits.
=item *
The C type packs a pointer to a null-terminated string. You are
responsible for ensuring the string is not a temporary value (which can
potentially get deallocated before you get around to using the packed result).
The C
type packs a pointer to a structure of the size indicated by the
length. A NULL pointer is created if the corresponding value for C
or
C
is C, similarly for unpack().
=item *
The C> template character allows packing and unpacking of strings where
the packed structure contains a byte count followed by the string itself.
You write IC>I.
The I can be any C template letter, and describes
how the length value is packed. The ones likely to be of most use are
integer-packing ones like C (for Java strings), C (for ASN.1 or
SNMP) and C (for Sun XDR).
The I must, at present, be C<"A*">, C<"a*"> or C<"Z*">.
For C the length of the string is obtained from the I,
but if you put in the '*' it will be ignored.
unpack 'C/a', "\04Gurusamy"; gives 'Guru'
unpack 'a3/A* A*', '007 Bond J '; gives (' Bond','J')
pack 'n/a* w/a*','hello,','world'; gives "\000\006hello,\005world"
The I is not returned explicitly from C.
Adding a count to the I letter is unlikely to do anything
useful, unless that letter is C, C or C. Packing with a
I of C or C may introduce C<"\000"> characters,
which Perl does not regard as legal in numeric strings.
=item *
The integer types C, C, C, and C may be
immediately followed by a C suffix to signify native shorts or
longs--as you can see from above for example a bare C does mean
exactly 32 bits, the native C (as seen by the local C compiler)
may be larger. This is an issue mainly in 64-bit platforms. You can
see whether using C makes any difference by
print length(pack("s")), " ", length(pack("s!")), "\n";
print length(pack("l")), " ", length(pack("l!")), "\n";
C and C also work but only because of completeness;
they are identical to C and C.
The actual sizes (in bytes) of native shorts, ints, longs, and long
longs on the platform where Perl was built are also available via
L:
use Config;
print $Config{shortsize}, "\n";
print $Config{intsize}, "\n";
print $Config{longsize}, "\n";
print $Config{longlongsize}, "\n";
(The C<$Config{longlongsize}> will be undefine if your system does
not support long longs.)
=item *
The integer formats C, C, C, C, C, C, C, and C
are inherently non-portable between processors and operating systems
because they obey the native byteorder and endianness. For example a
4-byte integer 0x12345678 (305419896 decimal) would be ordered natively
(arranged in and handled by the CPU registers) into bytes as
0x12 0x34 0x56 0x78 # big-endian
0x78 0x56 0x34 0x12 # little-endian
Basically, the Intel and VAX CPUs are little-endian, while everybody
else, for example Motorola m68k/88k, PPC, Sparc, HP PA, Power, and
Cray are big-endian. Alpha and MIPS can be either: Digital/Compaq
used/uses them in little-endian mode; SGI/Cray uses them in big-endian
mode.
The names `big-endian' and `little-endian' are comic references to
the classic "Gulliver's Travels" (via the paper "On Holy Wars and a
Plea for Peace" by Danny Cohen, USC/ISI IEN 137, April 1, 1980) and
the egg-eating habits of the Lilliputians.
Some systems may have even weirder byte orders such as
0x56 0x78 0x12 0x34
0x34 0x12 0x78 0x56
You can see your system's preference with
print join(" ", map { sprintf "%#02x", $_ }
unpack("C*",pack("L",0x12345678))), "\n";
The byteorder on the platform where Perl was built is also available
via L:
use Config;
print $Config{byteorder}, "\n";
Byteorders C<'1234'> and C<'12345678'> are little-endian, C<'4321'>
and C<'87654321'> are big-endian.
If you want portable packed integers use the formats C, C,
C, and C, their byte endianness and size are known.
See also L.
=item *
Real numbers (floats and doubles) are in the native machine format only;
due to the multiplicity of floating formats around, and the lack of a
standard "network" representation, no facility for interchange has been
made. This means that packed floating point data written on one machine
may not be readable on another - even if both use IEEE floating point
arithmetic (as the endian-ness of the memory representation is not part
of the IEEE spec). See also L.
Note that Perl uses doubles internally for all numeric calculation, and
converting from double into float and thence back to double again will
lose precision (i.e., C) will not in general
equal $foo).
=item *
If the pattern begins with a C, the resulting string will be treated
as Unicode-encoded. You can force UTF8 encoding on in a string with an
initial C, and the bytes that follow will be interpreted as Unicode
characters. If you don't want this to happen, you can begin your pattern
with C (or anything else) to force Perl not to UTF8 encode your
string, and then follow this with a C somewhere in your pattern.
=item *
You must yourself do any alignment or padding by inserting for example
enough C<'x'>es while packing. There is no way to pack() and unpack()
could know where the bytes are going to or coming from. Therefore
C (and C) handle their output and input as flat
sequences of bytes.
=item *
A ()-group is a sub-TEMPLATE enclosed in parentheses. A group may
take a repeat count, both as postfix, and via the C> template
character.
=item *
C and C accept C modifier. In this case they act as
alignment commands: they jump forward/back to the closest position
aligned at a multiple of C bytes. For example, to pack() or
unpack() C's C one may need to
use the template C; this assumes that doubles must be
aligned on the double's size.
For alignment commands C of 0 is equivalent to C of 1;
both result in no-ops.
=item *
A comment in a TEMPLATE starts with C<#> and goes to the end of line.
=item *
If TEMPLATE requires more arguments to pack() than actually given, pack()
assumes additional C<""> arguments. If TEMPLATE requires less arguments
to pack() than actually given, extra arguments are ignored.
=back
Examples:
$foo = pack("CCCC",65,66,67,68);
# foo eq "ABCD"
$foo = pack("C4",65,66,67,68);
# same thing
$foo = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);
# same thing with Unicode circled letters
$foo = pack("ccxxcc",65,66,67,68);
# foo eq "AB\0\0CD"
# note: the above examples featuring "C" and "c" are true
# only on ASCII and ASCII-derived systems such as ISO Latin 1
# and UTF-8. In EBCDIC the first example would be
# $foo = pack("CCCC",193,194,195,196);
$foo = pack("s2",1,2);
# "\1\0\2\0" on little-endian
# "\0\1\0\2" on big-endian
$foo = pack("a4","abcd","x","y","z");
# "abcd"
$foo = pack("aaaa","abcd","x","y","z");
# "axyz"
$foo = pack("a14","abcdefg");
# "abcdefg\0\0\0\0\0\0\0"
$foo = pack("i9pl", gmtime);
# a real struct tm (on my system anyway)
$utmp_template = "Z8 Z8 Z16 L";
$utmp = pack($utmp_template, @utmp1);
# a struct utmp (BSDish)
@utmp2 = unpack($utmp_template, $utmp);
# "@utmp1" eq "@utmp2"
sub bintodec {
unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
}
$foo = pack('sx2l', 12, 34);
# short 12, two zero bytes padding, long 34
$bar = pack('s@4l', 12, 34);
# short 12, zero fill to position 4, long 34
# $foo eq $bar
The same template may generally also be used in unpack().
=back