=head1 NAME
perlreapi - Perl regular expression plugin interface
=head1 DESCRIPTION
As of Perl 5.9.5 there is a new interface for plugging and using
regular expression engines other than the default one.
Each engine is supposed to provide access to a constant structure of the
following format:
typedef struct regexp_engine {
REGEXP* (*comp) (pTHX_
const SV * const pattern, const U32 flags);
I32 (*exec) (pTHX_
REGEXP * const rx,
char* stringarg,
char* strend, char* strbeg,
I32 minend, SV* screamer,
void* data, U32 flags);
char* (*intuit) (pTHX_
REGEXP * const rx, SV *sv,
char *strpos, char *strend, U32 flags,
struct re_scream_pos_data_s *data);
SV* (*checkstr) (pTHX_ REGEXP * const rx);
void (*free) (pTHX_ REGEXP * const rx);
void (*numbered_buff_FETCH) (pTHX_
REGEXP * const rx,
const I32 paren,
SV * const sv);
void (*numbered_buff_STORE) (pTHX_
REGEXP * const rx,
const I32 paren,
SV const * const value);
I32 (*numbered_buff_LENGTH) (pTHX_
REGEXP * const rx,
const SV * const sv,
const I32 paren);
SV* (*named_buff) (pTHX_
REGEXP * const rx,
SV * const key,
SV * const value,
U32 flags);
SV* (*named_buff_iter) (pTHX_
REGEXP * const rx,
const SV * const lastkey,
const U32 flags);
SV* (*qr_package)(pTHX_ REGEXP * const rx);
#ifdef USE_ITHREADS
void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
#endif
REGEXP* (*op_comp) (...);
When a regexp is compiled, its C field is then set to point at
the appropriate structure, so that when it needs to be used Perl can find
the right routines to do so.
In order to install a new regexp handler, C<$^H{regcomp}> is set
to an integer which (when casted appropriately) resolves to one of these
structures. When compiling, the C method is executed, and the
resulting C structure's engine field is expected to point back at
the same structure.
The pTHX_ symbol in the definition is a macro used by Perl under threading
to provide an extra argument to the routine holding a pointer back to
the interpreter that is executing the regexp. So under threading all
routines get an extra argument.
=head1 Callbacks
=head2 comp
REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);
Compile the pattern stored in C using the given C and
return a pointer to a prepared C structure that can perform
the match. See L below for an explanation of
the individual fields in the REGEXP struct.
The C parameter is the scalar that was used as the
pattern. Previous versions of Perl would pass two C indicating
the start and end of the stringified pattern; the following snippet can
be used to get the old parameters:
STRLEN plen;
char* exp = SvPV(pattern, plen);
char* xend = exp + plen;
Since any scalar can be passed as a pattern, it's possible to implement
an engine that does something with an array (C<< "ook" =~ [ qw/ eek
hlagh / ] >>) or with the non-stringified form of a compiled regular
expression (C<< "ook" =~ qr/eek/ >>). Perl's own engine will always
stringify everything using the snippet above, but that doesn't mean
other engines have to.
The C parameter is a bitfield which indicates which of the
C flags the regex was compiled with. It also contains
additional info, such as if C
- RXf_PMf_KEEPCOPY
TODO: Document this
=item Character set
The character set semantics are determined by an enum that is contained
in this field. This is still experimental and subject to change, but
the current interface returns the rules by use of the in-line function
C. The only currently documented
value returned from it is REGEX_LOCALE_CHARSET, which is set if
C