is a type.
=item C
is a pointer.
=item C
is a number.
=item C
is a string.
=back
C, C, C, etc. represent variables of their respective types.
=head2 File Operations
Instead of the F functions, you should use the Perl abstraction
layer. Instead of C types, you need to be handling C
types. Don't forget that with the new PerlIO layered I/O abstraction
C types may not even be available. See also the C
documentation for more information about the following functions:
Instead Of: Use:
stdin PerlIO_stdin()
stdout PerlIO_stdout()
stderr PerlIO_stderr()
fopen(fn, mode) PerlIO_open(fn, mode)
freopen(fn, mode, stream) PerlIO_reopen(fn, mode, perlio) (Dep-
recated)
fflush(stream) PerlIO_flush(perlio)
fclose(stream) PerlIO_close(perlio)
=head2 File Input and Output
Instead Of: Use:
fprintf(stream, fmt, ...) PerlIO_printf(perlio, fmt, ...)
[f]getc(stream) PerlIO_getc(perlio)
[f]putc(stream, n) PerlIO_putc(perlio, n)
ungetc(n, stream) PerlIO_ungetc(perlio, n)
Note that the PerlIO equivalents of C and C are slightly
different from their C library counterparts:
fread(p, size, n, stream) PerlIO_read(perlio, buf, numbytes)
fwrite(p, size, n, stream) PerlIO_write(perlio, buf, numbytes)
fputs(s, stream) PerlIO_puts(perlio, s)
There is no equivalent to C; one should use C instead:
fgets(s, n, stream) sv_gets(sv, perlio, append)
=head2 File Positioning
Instead Of: Use:
feof(stream) PerlIO_eof(perlio)
fseek(stream, n, whence) PerlIO_seek(perlio, n, whence)
rewind(stream) PerlIO_rewind(perlio)
fgetpos(stream, p) PerlIO_getpos(perlio, sv)
fsetpos(stream, p) PerlIO_setpos(perlio, sv)
ferror(stream) PerlIO_error(perlio)
clearerr(stream) PerlIO_clearerr(perlio)
=head2 Memory Management and String Handling
Instead Of: Use:
t* p = malloc(n) Newx(p, n, t)
t* p = calloc(n, s) Newxz(p, n, t)
p = realloc(p, n) Renew(p, n, t)
memcpy(dst, src, n) Copy(src, dst, n, t)
memmove(dst, src, n) Move(src, dst, n, t)
memcpy(dst, src, sizeof(t)) StructCopy(src, dst, t)
memset(dst, 0, n * sizeof(t)) Zero(dst, n, t)
memzero(dst, 0) Zero(dst, n, char)
free(p) Safefree(p)
strdup(p) savepv(p)
strndup(p, n) savepvn(p, n) (Hey, strndup doesn't
exist!)
strstr(big, little) instr(big, little)
memmem(big, blen, little, len) ninstr(big, bigend, little, little_end)
strcmp(s1, s2) strLE(s1, s2) / strEQ(s1, s2)
/ strGT(s1,s2)
strncmp(s1, s2, n) strnNE(s1, s2, n) / strnEQ(s1, s2, n)
memcmp(p1, p2, n) memNE(p1, p2, n)
!memcmp(p1, p2, n) memEQ(p1, p2, n)
Notice the different order of arguments to C and C than used
in C and C.
Most of the time, though, you'll want to be dealing with SVs internally
instead of raw C strings:
strlen(s) sv_len(sv)
strcpy(dt, src) sv_setpv(sv, s)
strncpy(dt, src, n) sv_setpvn(sv, s, n)
strcat(dt, src) sv_catpv(sv, s)
strncat(dt, src) sv_catpvn(sv, s)
sprintf(s, fmt, ...) sv_setpvf(sv, fmt, ...)
If you do need raw strings, some platforms have safer interfaces, and
Perl makes sure a version of these are available on all platforms:
strlcat(dt, src, sizeof(dt)) my_strlcat(dt, src, sizeof(dt))
strlcpy(dt, src, sizeof(dt)) my_strlcpy(dt, src, sizeof(dt))
strnlen(s) my_strnlen(s, maxlen)
Note also the existence of C and C, combining
concatenation with formatting.
Sometimes instead of zeroing the allocated heap by using Newxz() you
should consider "poisoning" the data. This means writing a bit
pattern into it that should be illegal as pointers (and floating point
numbers), and also hopefully surprising enough as integers, so that
any code attempting to use the data without forethought will break
sooner rather than later. Poisoning can be done using the Poison()
macros, which have similar arguments to Zero():
PoisonWith(dst, n, t, b) scribble memory with byte b
PoisonNew(dst, n, t) equal to PoisonWith(dst, n, t, 0xAB)
PoisonFree(dst, n, t) equal to PoisonWith(dst, n, t, 0xEF)
Poison(dst, n, t) equal to PoisonFree(dst, n, t)
=head2 Character Class Tests
There are several types of character class tests that Perl implements.
All are more fully described in L and
L.
The C library routines listed in the table below return values based on
the current locale. Use the entries in the final column for that
functionality. The other two columns always assume a POSIX (or C)
locale. The entries in the ASCII column are only meaningful for ASCII
inputs, returning FALSE for anything else. Use these only when you
B that is what you want. The entries in the Latin1 column assume
that the non-ASCII 8-bit characters are as Unicode defines them, the
same as ISO-8859-1, often called Latin 1.
Instead Of: Use for ASCII: Use for Latin1: Use for locale:
isalnum(c) isALPHANUMERIC(c) isALPHANUMERIC_L1(c) isALPHANUMERIC_LC(c)
isalpha(c) isALPHA(c) isALPHA_L1(c) isALPHA_LC(u )
isascii(c) isASCII(c) isASCII_LC(c)
isblank(c) isBLANK(c) isBLANK_L1(c) isBLANK_LC(c)
iscntrl(c) isCNTRL(c) isCNTRL_L1(c) isCNTRL_LC(c)
isdigit(c) isDIGIT(c) isDIGIT_L1(c) isDIGIT_LC(c)
isgraph(c) isGRAPH(c) isGRAPH_L1(c) isGRAPH_LC(c)
islower(c) isLOWER(c) isLOWER_L1(c) isLOWER_LC(c)
isprint(c) isPRINT(c) isPRINT_L1(c) isPRINT_LC(c)
ispunct(c) isPUNCT(c) isPUNCT_L1(c) isPUNCT_LC(c)
isspace(c) isSPACE(c) isSPACE_L1(c) isSPACE_LC(c)
isupper(c) isUPPER(c) isUPPER_L1(c) isUPPER_LC(c)
isxdigit(c) isXDIGIT(c) isXDIGIT_L1(c) isXDIGIT_LC(c)
tolower(c) toLOWER(c) toLOWER_L1(c)
toupper(c) toUPPER(c)
For the corresponding functions like C, I, use
C for non-locale; or C for locale.
And use C instead of C, I. There are
no direct equivalents for locale; best to put the string into an SV.
Don't use any of the functions like C. Those are
non-portable, and interfere with Perl's internal handling.
To emphasize that you are operating only on ASCII characters, you can
append C<_A> to each of the macros in the ASCII column: C,
C, and so on.
(There is no entry in the Latin1 column for C even though there
is an C, which is identical to C; the
latter name is clearer. There is no entry in the Latin1 column for
C because the result can be non-Latin1. You have to use
C, as described in L.)
Note that the libc caseless comparisons are crippled; Unicode
provides a richer set, using the concept of folding. If you need
more than equality/non-equality, it's probably best to store your
strings in an SV and use SV functions to do the comparision. Similarly
for collation.
=head2 F functions
Instead Of: Use:
atof(s) my_atof(s) or Atof(s)
atoi(s) grok_atoUV(s, &uv, &e)
atol(s) grok_atoUV(s, &uv, &e)
strtod(s, &p) Strtod(s, &p)
strtol(s, &p, n) Strtol(s, &p, b)
strtoul(s, &p, n) Strtoul(s, &p, b)
But note that these are subject to locale; see L.
Typical use is to do range checks on C before casting:
int i; UV uv;
char* end_ptr = input_end;
if (grok_atoUV(input, &uv, &end_ptr)
&& uv <= INT_MAX)
i = (int)uv;
... /* continue parsing from end_ptr */
} else {
... /* parse error: not a decimal integer in range 0 .. MAX_IV */
}
Notice also the C, C, and C functions in
F for converting strings representing numbers in the respective
bases into Cs. Note that grok_atoUV() doesn't handle negative inputs,
or leading whitespace (being purposefully strict).
=head2 Miscellaneous functions
You should not even B to use F functions, but if you
think you do, use the C stack in F instead.
~asctime() Perl_sv_strftime_tm()
~asctime_r() Perl_sv_strftime_tm()
chsize() my_chsize()
~ctime() Perl_sv_strftime_tm()
~ctime_r() Perl_sv_strftime_tm()
~cuserid() DO NOT USE; see its man page
dirfd() my_dirfd()
duplocale() Perl_setlocale()
~ecvt() my_snprintf()
~endgrent_r() endgrent()
~endhostent_r() endhostent()
~endnetent_r() endnetent()
~endprotoent_r() endprotoent()
~endpwent_r() endpwent()
~endservent_r() endservent()
~endutent() endutxent()
exit(n) my_exit(n)
~fcvt() my_snprintf()
freelocale() Perl_setlocale()
~ftw() nftw()
getenv(s) PerlEnv_getenv(s)
~gethostbyaddr() getaddrinfo()
~gethostbyname() getnameinfo()
~getpass() DO NOT USE; see its man page
~getpw() getpwuid()
~getutent() getutxent()
~getutid() getutxid()
~getutline() getutxline()
~gsignal() DO NOT USE; see its man page
localeconv() Perl_localeconv()
mblen() mbrlen()
mbtowc() mbrtowc()
newlocale() Perl_setlocale()
pclose() my_pclose()
popen() my_popen()
~pututline() pututxline()
~qecvt() my_snprintf()
~qfcvt() my_snprintf()
querylocale() Perl_setlocale()
int rand() double Drand01()
srand(n) { seedDrand01((Rand_seed_t)n);
PL_srand_called = TRUE; }
~readdir_r() readdir()
realloc() saferealloc(), Renew() or Renewc()
~re_comp() regcomp()
~re_exec() regexec()
~rexec() rcmd()
~rexec_af() rcmd()
setenv(s, val) my_setenv(s, val)
~setgrent_r() setgrent()
~sethostent_r() sethostent()
setlocale() Perl_setlocale()
setlocale_r() Perl_setlocale()
~setnetent_r() setnetent()
~setprotoent_r() setprotoent()
~setpwent_r() setpwent()
~setservent_r() setservent()
~setutent() setutxent()
sigaction() rsignal(signo, handler)
~siginterrupt() rsignal() with the SA_RESTART flag instead
signal(signo, handler) rsignal(signo, handler)
~ssignal() DO NOT USE; see its man page
strcasecmp() a Perl foldEQ-family function
strerror() sv_string_from_errnum()
strerror_l() sv_string_from_errnum()
strerror_r() sv_string_from_errnum()
strftime() Perl_sv_strftime_tm()
strtod() my_strtod() or Strtod()
system(s) Don't. Look at pp_system or use my_popen.
~tempnam() mkstemp() or tmpfile()
~tmpnam() mkstemp() or tmpfile()
tmpnam_r() mkstemp() or tmpfile()
uselocale() Perl_setlocale()
vsnprintf() my_vsnprintf()
wctob() wcrtomb()
wctomb() wcrtomb()
wsetlocale() Perl_setlocale()
The Perl-furnished alternatives are documented in L, which you
should peruse anyway to see what all is available to you.
The lists are incomplete. Think when using an unlisted function if it
seems likely to interfere with Perl.
=head1 Dealing with locales
Like it or not, your code will be executed in the context of a locale,
as are all C language programs. See L. Most libc calls are
not affected by the locale, but a surprising number are:
addmntent() getspent_r() sethostent()
alphasort() getspnam() sethostent_r()
asctime() getspnam_r() setnetent()
asctime_r() getwc() setnetent_r()
asprintf() getwchar() setnetgrent()
atof() glob() setprotoent()
atoi() gmtime() setprotoent_r()
atol() gmtime_r() setpwent()
atoll() grantpt() setpwent_r()
btowc() iconv_open() setrpcent()
catopen() inet_addr() setservent()
ctime() inet_aton() setservent_r()
ctime_r() inet_network() setspent()
cuserid() inet_ntoa() sgetspent_r()
daylight inet_ntop() shm_open()
dirname() inet_pton() shm_unlink()
dprintf() initgroups() snprintf()
endaliasent() innetgr() sprintf()
endgrent() iruserok() sscanf()
endgrent_r() iruserok_af() strcasecmp()
endhostent() isalnum() strcasestr()
endhostent_r() isalnum_l() strcoll()
endnetent() isalpha() strerror()
endnetent_r() isalpha_l() strerror_l()
endprotoent() isascii() strerror_r()
endprotoent_r() isascii_l() strfmon()
endpwent() isblank() strfmon_l()
endpwent_r() isblank_l() strfromd()
endrpcent() iscntrl() strfromf()
endservent() iscntrl_l() strfroml()
endservent_r() isdigit() strftime()
endspent() isdigit_l() strftime_l()
err() isgraph() strncasecmp()
error() isgraph_l() strptime()
error_at_line() islower() strsignal()
errx() islower_l() strtod()
fgetwc() isprint() strtof()
fgetwc_unlocked() isprint_l() strtoimax()
fgetws() ispunct() strtol()
fgetws_unlocked() ispunct_l() strtold()
fnmatch() isspace() strtoll()
forkpty() isspace_l() strtoq()
fprintf() isupper() strtoul()
fputwc() isupper_l() strtoull()
fputwc_unlocked() iswalnum() strtoumax()
fputws() iswalnum_l() strtouq()
fputws_unlocked() iswalpha() strverscmp()
fscanf() iswalpha_l() strxfrm()
fwprintf() iswblank() swprintf()
fwscanf() iswblank_l() swscanf()
getaddrinfo() iswcntrl() syslog()
getaliasbyname_r() iswcntrl_l() timegm()
getaliasent_r() iswdigit() timelocal()
getdate() iswdigit_l() timezone
getdate_r() iswgraph() tolower()
getfsent() iswgraph_l() tolower_l()
getfsfile() iswlower() toupper()
getfsspec() iswlower_l() toupper_l()
getgrent() iswprint() towctrans()
getgrent_r() iswprint_l() towlower()
getgrgid() iswpunct() towlower_l()
getgrgid_r() iswpunct_l() towupper()
getgrnam() iswspace() towupper_l()
getgrnam_r() iswspace_l() tzname
getgrouplist() iswupper() tzset()
gethostbyaddr() iswupper_l() ungetwc()
gethostbyaddr_r() iswxdigit() vasprintf()
gethostbyname() iswxdigit_l() vdprintf()
gethostbyname2() isxdigit() verr()
gethostbyname2_r() isxdigit_l() verrx()
gethostbyname_r() localeconv() versionsort()
gethostent() localtime() vfprintf()
gethostent_r() localtime_r() vfscanf()
gethostid() MB_CUR_MAX vfwprintf()
getlogin() mblen() vprintf()
getlogin_r() mbrlen() vscanf()
getmntent() mbrtowc() vsnprintf()
getmntent_r() mbsinit() vsprintf()
getnameinfo() mbsnrtowcs() vsscanf()
getnetbyaddr() mbsrtowcs() vswprintf()
getnetbyaddr_r() mbstowcs() vsyslog()
getnetbyname() mbtowc() vwarn()
getnetbyname_r() mktime() vwarnx()
getnetent() nan() vwprintf()
getnetent_r() nanf() warn()
getnetgrent() nanl() warnx()
getnetgrent_r() nl_langinfo() wcrtomb()
getprotobyname() openpty() wcscasecmp()
getprotobyname_r() printf() wcschr()
getprotobynumber() psiginfo() wcscoll()
getprotobynumber_r() psignal() wcsftime()
getprotoent() putpwent() wcsncasecmp()
getprotoent_r() putspent() wcsnrtombs()
getpw() putwc() wcsrchr()
getpwent() putwchar() wcsrtombs()
getpwent_r() regcomp() wcstod()
getpwnam() regexec() wcstof()
getpwnam_r() res_nclose() wcstoimax()
getpwuid() res_ninit() wcstold()
getpwuid_r() res_nquery() wcstombs()
getrpcbyname_r() res_nquerydomain() wcstoumax()
getrpcbynumber_r() res_nsearch() wcswidth()
getrpcent_r() res_nsend() wcsxfrm()
getrpcport() rpmatch() wctob()
getservbyname() ruserok() wctomb()
getservbyname_r() ruserok_af() wctrans()
getservbyport() scandir() wctype()
getservbyport_r() scanf() wcwidth()
getservent() setaliasent() wordexp()
getservent_r() setgrent() wprintf()
getspent() setgrent_r() wscanf()
(The list doesn't include functions that manipulate the locale, such as
C.)
If any of these functions are called directly or indirectly from your
code, you are affected by the current locale.
The first thing to know about this list is that there are better
alternatives to many of the functions, which it's highly likely that you
should be using instead. See L