<= 0 : Preformat entire document
1 : one line triggers
>= 2 : two lines trigger
(default: 2)
=item --endpreformat_trigger_lines I | --preend I | -pe I
How many lines of unpreformatted-looking text are needed to switch from
<= 0 : Never preformat within document
1 : one line triggers
>= 2 : two lines trigger
(default: 2)
NOTE for --prebegin and --preend:
A zero takes precedence. If one is zero, the other is ignored.
If both are zero, entire document is preformatted.
=item --preformat_start_marker I
What flags the start of a preformatted section if --use_preformat_marker
is true.
(default: "^(:?(:?<)|<)PRE(:?(:?>)|>)\$")
=item --preformat_end_marker I
What flags the end of a preformatted section if --use_preformat_marker
is true.
(default: "^(:?(:?<)|<)/PRE(:?(:?>)|>)\$")
=item --preformat_whitespace_min I | --prewhite I | -p I
Minimum number of consecutive whitespace characters to trigger
normal preformatting.
NOTE: Tabs are expanded to spaces before this check is made.
That means if B is 8 and this is 5, then one tab may be
expanded to 8 spaces, which is enough to trigger preformatting.
(default: 5)
=item --prepend_file I | --prepend_body I | --pp I
If you want something prepended to the processed body text, put the
filename here. The prepended text will not be processed at all, so make
sure it's plain text or decent HTML.
(default: nothing)
=item --preserve_indent | -pi
Preserve the first-line indentation of paragraphs marked with indents
by replacing the spaces of the first line with non-breaking spaces.
(default: false)
=item --short_line_length I | --shortline I | -s I
Lines this short (or shorter) must be intentionally broken and are kept
that short.
(default: 40)
=item --style_url I
This gives the URL of a stylesheet; a LINK tag will be added to the
output.
=item --tab_width I | --tabwidth I | -tw I
How many spaces equal a tab?
(default: 8)
=item --table_type I=0/1
--table_type ALIGN=1 --table_type BORDER=0
This determines which types of tables will be recognised when "make_tables"
is true. The possible types are ALIGN, PGSQL, BORDER and DELIM.
(default: all types are true)
=item --title I | -t I
You can specify a title. Otherwise it will use a blank one.
(default: nothing)
=item --titlefirst | -tf
Use the first non-blank line as the title.
=item --underline_delimiter I
This defines what character (or string) is taken to be the delimiter of
text which is to be interpreted as underlined (that is, to be given a
tag). If this is empty, then no underlining of text will be done.
This is NOT the same as the following "underline" options, which are
about underlining of "header" sections.
(default: _)
=item --underline_length_tolerance I | --ulength I | -ul I
How much longer or shorter can header underlines be and still be header
underlines?
(default: 1)
=item --underline_offset_tolerance I | --uoffset I | -uo I
How far offset can header underlines be and still be header underlines?
(default: 1)
=item --unhyphenation | --unhypnenate | -u
Enables unhyphenation of text.
(default: true)
=item --use_mosaic_header | --mosaic | -mh
Use this option if you want to force the heading styles to match what Mosaic
outputs. (Underlined with "***"s is H1,
with "==="s is H2, with "+++" is H3, with "---" is H4, with "~~~" is H5
and with "..." is H6)
This was the behavior of txt2html up to version 1.10.
(default: false)
=item --use_preformat_marker | --preformat_marker | -pm
Turn on preformatting when encountering "" on a line by itself, and turn
it off when there's a line containing only "
".
When such preformatted text is detected, the PRE tag will be given the
class 'quote_explicit'.
(default: off)
=item --xhtml
Try to make the output conform to the XHTML standard, including
closing all open tags and marking empty tags correctly. This
turns on --lower_case_tags and overrides the --doctype option.
Note that if you add a header or a footer file, it is up to you
to make it conform; the header/footer isn't touched by this.
Likewise, if you make link-dictionary entries that break XHTML,
then this won't fix them, except to the degree of putting all tags
into lower-case.
(default: true)
=back
=head1 FILE FORMATS
=head2 Options Files
Options can be given in files as well as on the command-line by
flagging an option file with @I in the command-line.
Also, the files ~/.txt2htmlrc and ./.txt2htmlrc are checked for options.
The format is as follows:
Lines starting with # are comments. Lines enclosed in PoD markers are
also comments. Blank lines are ignored. The options themselves
should be given the way they would be on the command line, that is,
the option name (I the --) followed by its value (if any).
For example:
# set link dictionaries
--default_link_dict /home/kat/.TextToHTML.dict
# set options for poetry
--titlefirst
--short_line_length 60
See L for more information.
=head2 Link Dictionary
A link dictionary file contains patterns to match, and what to convert
them to. It is called a "link" dictionary because it was intended to be
something which defined what a href link was, but it can be used for
more than that. However, if you wish to define your own links, it is
strongly advised to read up on regular expressions (regexes) because
this relies heavily on them.
The file consists of comments (which are lines starting with #)
and blank lines, and link entries.
Each entry consists of a regular expression, a -> separator (with
optional flags), and a link "result".
In the simplest case, with no flags, the regular expression
defines the pattern to look for, and the result says what part
of the regular expression is the actual link, and the link which
is generated has the href as the link, and the whole matched pattern
as the visible part of the link. The first character of the regular
expression is taken to be the separator for the regex, so one
could either use the traditional / separator, or something else
such as | (which can be helpful with URLs which are full of / characters).
So, for example, an ftp URL might be defined as:
|ftp:[\w/\.:+\-]+| -> $&
This takes the whole pattern as the href, and the resultant link
has the same thing in the href as in the contents of the anchor.
But sometimes the href isn't the whole pattern.
/<URL:\s*(\S+?)\s*>/ --> $1
With the above regex, a () grouping marks the first subexpression,
which is represented as $1 (rather than $& the whole expression).
This entry matches a URL which was marked explicity as a URL
with the pattern (note the < is shown as the
entity, not the actual character. This is because by the
time the links dictionary is checked, all such things have
already been converted to their HTML entity forms)
This would give us a link in the form
<URL:foo>
B
However, if we want more control over the way the link is constructed,
we can construct it ourself. If one gives the h flag, then the
"result" part of the entry is taken not to contain the href part of
the link, but the whole link.
For example, the entry:
/<URL:\s*(\S+?)\s*>/ -h-> $1
will take and give us foo
However, this is a very powerful mechanism, because it
can be used to construct custom tags which aren't links at all.
For example, to flag *italicised words* the following
entry will surround the words with EM tags.
/\B\*([a-z][a-z -]*[a-z])\*\B/ -hi-> $1
B
This turns on ignore case in the pattern matching.
B
This turns on execute in the pattern substitution. This really
only makes sense if h is turned on too. In that case, the "result"
part of the entry is taken as perl code to be executed, and the
result of that code is what replaces the pattern.
B
This marks the entry as a once-only link. This will convert the
first instance of a matching pattern, and ignore any others
further on.
For example, the following pattern will take the first mention
of HTML::TextToHTML and convert it to a link to the module's home page.
"HTML::TextToHTML" -io-> http://www.example.com/tools/text_to_html/
=head2 Input File Format
For the most part, this module tries to use intuitive conventions for
determining the structure of the text input. Unordered lists are
marked by bullets; ordered lists are marked by numbers or letters;
in either case, an increase in indentation marks a sub-list contained
in the outer list.
Headers (apart from custom headers) are distinguished by "underlines"
underneath them; headers in all-capitals are distinguished from
those in mixed case. All headers, both normal and custom headers,
are expected to start at the first line in a "paragraph".
Tables require a more rigid convention. A table must be marked as a
separate paragraph, that is, it must be surrounded by blank lines.
Tables come in different types. For a table to be parsed, its
--table_type option must be on, and the --make_tables option must be true.
B
Columns must be separated by two or more spaces (this prevents
accidental incorrect recognition of a paragraph where interword spaces
happen to line up). If there are two or more rows in a paragraph and
all rows share the same set of (two or more) columns, the paragraph is
assumed to be a table. For example
-e File exists.
-z File has zero size.
-s File has nonzero size (returns size).
becomes
-e File exists.
-z File has zero size.
-s File has nonzero size (returns size).
This guesses for each column whether it is intended to be left,
centre or right aligned.
B
This table type has nice borders around it, and will be rendered
with a border, like so:
+---------+---------+
| Column1 | Column2 |
+---------+---------+
| val1 | val2 |
| val3 | val3 |
+---------+---------+
The above becomes
Column1 Column2
val1 val2
val3 val3
It can also have an optional caption at the start.
My Caption
+---------+---------+
| Column1 | Column2 |
+---------+---------+
| val1 | val2 |
| val3 | val3 |
+---------+---------+
B
This format of table is what one gets from the output of a Postgresql
query.
Column1 | Column2
---------+---------
val1 | val2
val3 | val3
(2 rows)
This can also have an optional caption at the start.
This table is also rendered with a border and table-headers like
the BORDER type.
B
This table type is delimited by non-alphanumeric characters, and has to
have at least two rows and two columns before it's recognised as a table.
This one is delimited by the '| character:
| val1 | val2 |
| val3 | val3 |
But one can use almost any suitable character such as : # $ % + and so on.
This is clever enough to figure out what you are using as the delimiter
if you have your data set up like a table. Note that the line has to
both begin and end with the delimiter, as well as using it to separate
values.
This can also have an optional caption at the start.
=head1 EXAMPLES
B
txt2html --infile thing.txt --outfile thing.html
This will create a HTML file called C from the plain
text file C.
=head1 BUGS
Tell me about them.
=head1 PREREQUISITES
Pod::Usage
HTML::TextToHTML
Getopt::Long
Getopt::ArgvFile
File::Basename
YAML::Syck
perldoc
=head1 SCRIPT CATEGORIES
Web
=head1 ENVIRONMENT
=over
=item HOME
txt2html looks in the HOME directory for config files.
=back
=head1 FILES
These files are only read if the Getopt::ArgvFile module is
available on the system.
=over
=item C<~/.txt2htmlrc>
User configuration file.
=item C<.txt2htmlrc>
Configuration file in the current working directory; overrides
options in C<~/.txt2htmlrc> and is overridden by command-line options.
=back
=head1 SEE ALSO
perl(1)
htmltoc(1)
HTML::TextToHTML
Getopt::Long
Getopt::ArgvFile
=head1 AUTHOR
Kathryn Andersen (RUBYKAT)
perlkat AT katspace dot com
http//www.katspace.com/
based on txt2html by Seth Golub
=head1 COPYRIGHT
Original txt2html script copyright (c) 1994-2000 Seth Golub seth AT aigeek.com
Copyright (c) 2002-2005 Kathryn Andersen
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
=cut
#################################################################
# Includes
use Pod::Usage;
use Getopt::Long;
use File::Basename;
use HTML::TextToHTML;
#################################################################
# Subroutines
sub init_data ($) {
my $data_ref = shift;
my %args = ();
$args{manpage} = 0;
$args{debug} = 0;
$args{version} = 0;
$args{quiet} = 0;
$args{help} = 0;
$args{infile} = [];
$args{instring} = [];
$data_ref->{args} = \%args;
}
sub process_args ($) {
my $data_ref = shift;
my $args_ref = $data_ref->{args};
my $ok = 1;
# check the rc file if we can
if (eval("require Getopt::ArgvFile")) {
no strict;
my $bn = basename($0, '');
my $rc_name = ".${bn}rc";
Getopt::ArgvFile::argvFile(
startupFilename=>$rc_name,
home=>1,
current=>1);
}
$ok = GetOptions($args_ref,
'help',
'manpage|man_help',
'debug',
'version',
'verbose!',
'append_file|append_body|ab=s',
'append_head|ah=s',
'body_deco=s',
'bold_delimiter=s',
'bullets=s',
'bullets_ordered=s',
'caps_tag|capstag|ct=s',
'custom_heading_regexp|heading|H=s@',
'default_link_dict|dict=s',
'demoronize!',
'dict_debug|db=n',
'doctype|dt=s',
'eight_bit_clean|8!',
'escape_HTML_chars|escapechars|ec!',
'explicit_headings|EH!',
'extract!',
'hrule_min|r=n',
'indent_width|iw=n',
'indent_par_break|ipb!',
'infile=s@',
'instring=s@',
'italic_delimiter=s',
'links_dictionaries|link|l=s@',
'link_only|linkonly|LO!',
'lower_case_tags|lc_tags|LC!',
'mailmode|m!',
'make_anchors|anchors!',
'make_links!',
'make_tables|tables!',
'min_caps_length|caps|c=n',
'outfile|out|o=s',
'par_indent=n',
'preformat_trigger_lines|prebegin|pb=n',
'endpreformat_trigger_lines|preend|pe=n',
'preformat_start_marker=s',
'preformat_end_marker=s',
'preformat_whitespace_min|prewhite|p=n',
'prepend_file|prepend_body|pp=s',
'preserve_indent|pi!',
'short_line_length|shortline|s=n',
'style_url=s',
'tab_width|tabwidth|tw=n',
'table_type=n%',
'title|t=s',
'titlefirst|tf!',
'underline_delimiter=s',
'underline_length_tolerance|ulength|ul=n',
'underline_offset_tolerance|uoffset|uo=n',
'unhyphenation|unhyphenate!',
'utf8',
'use_mosaic_header|mosaic|mh!',
'use_preformat_marker|preformat_marker|pm!',
'xhtml!',
);
if (!$ok)
{
pod2usage({ -message => "$0",
-exitval => 1,
-verbose => 0,
});
}
if ($args_ref->{'version'})
{
print STDERR "$0 version: $VERSION\n";
exit 0;
}
if ($args_ref->{'manpage'})
{
pod2usage({ -message => "$0 version $VERSION",
-exitval => 0,
-verbose => 2,
});
}
if ($args_ref->{'help'})
{
pod2usage({ -message => "$0 version $VERSION",
-exitval => 0,
-verbose => 1,
});
}
# transfer script-only things to the data-ref
undef $args_ref->{help};
undef $args_ref->{manpage};
undef $args_ref->{version};
# make the object
my $doc = HTML::TextToHTML->new(%{$args_ref});
$data_ref->{doc} = $doc;
}
#################################################################
# Main
MAIN: {
my %data = ();
my $result = 0;
init_data(\%data);
process_args(\%data);
# now the remainder must be input-files
# Push the infiles onto the infile array,
# because there might already have been infiles added with --infile.
foreach my $df (@ARGV)
{
if ($data{doc}->{debug}) {
print STDERR "--infile $df\n";
}
push @{$data{doc}->{infile}}, $df;
}
# if we have no infile, and no instring
# assume stdin, and mark that with '-'
if (!@{$data{doc}->{infile}} and !@{$data{doc}->{instring}})
{
if ($data{doc}->{debug}) {
print STDERR "using STDIN\n";
}
push @{$data{doc}->{infile}}, '-';
}
$data{doc}->txt2html();
}
# vim: sw=4 sts=4 ai
txt2html-2.51/README 0000444 0001750 0001750 00000010021 11007226425 012216 0 ustar kat kat
==== NAME ====
HTML::TextToHTML - convert plain text file to HTML.
==== VERSION ====
This describes version ``2.50'' of HTML::TextToHTML.
==== DESCRIPTION ====
HTML::TextToHTML converts plain text files to HTML. The txt2html script uses
this module to do the same from the command-line.
It supports headings, tables, lists, simple character markup, and
hyperlinking, and is highly customizable. It recognizes some of the apparent
structure of the source document (mostly whitespace and typographic layout),
and attempts to mark that structure explicitly using HTML. The purpose for
this tool is to provide an easier way of converting existing text documents
to HTML format, giving something nicer than just whapping the text into a
big PRE block.
== History ==
The original txt2html script was written by Seth Golub (see
http://www.aigeek.com/txt2html/), and converted to a perl module by Kathryn
Andersen (see http://www.katspace.com/tools/text_to_html/) and made into a
sourceforge project by Sun Tong (see
http://sourceforge.net/projects/txt2html/). Earlier versions of the
HTML::TextToHTML module called the included script texthyper so as not to
clash with the original txt2html script, but now the projects have all been
merged.
==== REQUIRES ====
HTML::TextToHTML requires Perl 5.8.1 or later.
For installation, it needs:
Module::Build
The txt2html script needs:
Getopt::Long
Getopt::ArgvFile
Pod::Usage
File::Basename
For testing, it also needs:
Test::More
For debugging, it also needs:
YAML::Syck
==== INSTALLATION ====
Make sure you have the dependencies installed first! (see REQUIRES above)
Some of those modules come standard with more recent versions of perl, but I
thought I'd mention them anyway, just in case you may not have them.
If you don't know how to install these, try using the CPAN module, an easy
way of auto-installing modules from the Comprehensive Perl Archive Network,
where the above modules reside. Do "perldoc perlmodinstall" or "perldoc
CPAN" for more information.
To install this module type the following:
perl Build.PL
./Build
./Build test
./Build install
Or, if you're on a platform (like DOS or Windows) that doesn't like the "./"
notation, you can do this:
perl Build.PL
perl Build
perl Build test
perl Build install
In order to install somewhere other than the default, such as in a directory
under your home directory, like "/home/fred/perl" go
perl Build.PL --install_base /home/fred/perl
as the first step instead.
This will install the files underneath /home/fred/perl.
You will then need to make sure that you alter the PERL5LIB variable to find
the modules, and the PATH variable to find the script.
Therefore you will need to change: your path, to include
/home/fred/perl/script (where the script will be)
PATH=/home/fred/perl/script:${PATH}
the PERL5LIB variable to add /home/fred/perl/lib
PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
Note that the system links dictionary will be installed as
"/home/fred/perl/share/txt2html/txt2html.dict"
If you want to install in a temporary install directory (such as if you are
building a package) then instead of going
perl Build install
go
perl Build install destdir=/my/temp/dir
and it will be installed there, with a directory structure under
/my/temp/dir the same as it would be if it were installed plain. Note that
this is NOT the same as setting --install_base, because certain things are
done at build-time which use the install_base info.
See "perldoc perlrun" for more information on PERL5LIB, and see "perldoc
Module::Build" for more information on installation options.
==== AUTHOR ====
Kathryn Andersen (RUBYKAT)
perlkat AT katspace dot com
http//www.katspace.com/
based on txt2html by Seth Golub
==== COPYRIGHT AND LICENCE ====
Original txt2html script copyright (c) 1994-2000 Seth Golub
Copyright (c) 2002-2005 by Kathryn Andersen
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
txt2html-2.51/version.txt 0000444 0001750 0001750 00000000005 11007226424 013564 0 ustar kat kat 2.51
txt2html-2.51/t/ 0000755 0001750 0001750 00000000000 11007226425 011611 5 ustar kat kat txt2html-2.51/t/10para.t 0000444 0001750 0001750 00000010044 11007226425 013057 0 ustar kat kat #########################
use Test::More tests => 18;
use HTML::TextToHTML;
ok(1); # If we made it this far, we are ok.
# Insert your test code below, the Test module is use()ed here so read
# its man page ( perldoc Test ) for help writing this test script.
my $conv = new HTML::TextToHTML();
ok( defined $conv, 'new() returned something' );
ok( $conv->isa('HTML::TextToHTML'), " and it's the right class" );
$conv->args(
system_link_dict=>'txt2html.dict',
default_link_dict=>'',
);
#
# test the process_para method alone
#
$test_str = "Matty had a little truck
he drove it round and round
and everywhere that Matty went
the truck was *always* found.
";
$ok_str = "Matty had a little truck
he drove it round and round
and everywhere that Matty went
the truck was always found.
";
$out_str = $conv->process_para($test_str);
ok($out_str, 'converted sample string');
# compare the result
is($out_str, $ok_str, 'compare converted string with OK string');
#
# test the process_chunk method with an ordered list
#
$test_str = "Here is my list:
1. Spam
2. Jam
3. Ham
4. Pickles
";
$ok_str = "Here is my list:
- Spam
- Jam
- Ham
- Pickles
";
$out_str = $conv->process_chunk($test_str);
ok($out_str, 'converted sample string with list');
# compare the result
is($out_str, $ok_str, 'compare converted list string with OK list string');
#
# test the process_chunk method with an empty string
#
$test_str = "";
$ok_str = "";
$out_str = $conv->process_chunk($test_str);
# note we do not do an ok on this because it should be empty
# compare the result
is($out_str, $ok_str, 'compare converted empty string with OK empty string');
#
# test with is_fragment
#
$test_str = "Matty had a little truck
he drove it round and round
and everywhere that Matty went
the truck was *always* found.
";
$ok_str = "Matty had a little truck
he drove it round and round
and everywhere that Matty went
the truck was always found.
";
$out_str = $conv->process_chunk($test_str, is_fragment=>1);
ok($out_str, 'converted sample string');
# compare the result
is($out_str, $ok_str, 'compare converted string with OK string');
#
# test the process_para method with a URL
#
$test_str = "I like to look at http://www.example.com a lot";
$ok_str = 'I like to look at http://www.example.com a lot';
$out_str = $conv->process_para($test_str, is_fragment=>1);
ok($out_str, 'converted sample string with URL');
# compare the result
is($out_str, $ok_str, 'compare converted URL string with OK URL string');
#
# test process_chunk with caps_tag turned off
#
$test_str = "We have a line alone
FULL OF CAPS AND FURY
";
$ok_str = "We have a line alone
FULL OF CAPS AND FURY
";
$conv->args(caps_tag=>'');
$out_str = $conv->process_chunk($test_str, is_fragment=>1);
ok($out_str, 'converted sample string with CAPS');
# compare the result
is($out_str, $ok_str, 'compare converted CAPS string with OK CAPS string');
$conv->args(caps_tag=>'STRONG'); # restore caps to default
#
# Test with different italic/bold delimiters
#
$test_str = "I am ^bold^,
You are --really krazy--.
-----------------
";
$ok_str = "I am bold,
You are really krazy.
";
$conv->args(bold_delimiter=>'^',
italic_delimiter=>'--');
$out_str = $conv->process_chunk($test_str, is_fragment=>1);
ok($out_str, 'converted sample string with delimiters');
# compare the result
is($out_str, $ok_str, 'compare converted delimiter string with OK delimiter string');
#
# test with no bolding or italic at all
$test_str = "I am ^bold^,
You are --really krazy--.
-----------------
";
$ok_str = "I am ^bold^,
You are --really krazy--.
";
$conv->args(bold_delimiter=>'',
italic_delimiter=>'');
$out_str = $conv->process_chunk($test_str, is_fragment=>1);
ok($out_str, 'converted sample string with no bold');
# compare the result
is($out_str, $ok_str, 'compare converted no-bold string with OK no-bold string');
# restore default
$conv->args(bold_delimiter=>'#',
italic_delimiter=>'*');
txt2html-2.51/t/50xsample.t 0000444 0001750 0001750 00000004474 11007226424 013622 0 ustar kat kat # Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl test.pl'
#########################
# change 'tests => 1' to 'tests => last_test_to_print';
use Test::More tests => 5;
use HTML::TextToHTML;
ok(1); # If we made it this far, we're ok.
#########################
# compare two files
sub compare {
my $file1 = shift;
my $file2 = shift;
open(F1, $file1) || return 0;
open(F2, $file2) || return 0;
my $res = 1;
my $count = 0;
while ()
{
$count++;
my $comp1 = $_;
# remove newline/carriage return (in case these aren't both Unix)
$comp1 =~ s/\n//;
$comp1 =~ s/\r//;
my $comp2 = ;
# check if F2 has less lines than F1
if (!defined $comp2)
{
print "error - line $count does not exist in $file2\n $file1 : $comp1\n";
close(F1);
close(F2);
return 0;
}
# remove newline/carriage return
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
if ($comp1 ne $comp2)
{
print "error - line $count not equal\n $file1 : $comp1\n $file2 : $comp2\n";
close(F1);
close(F2);
return 0;
}
}
close(F1);
# check if F2 has more lines than F1
if (defined($comp2 = ))
{
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
print "error - extra line in $file2 : '$comp2'\n";
$res = 0;
}
close(F2);
return $res;
}
# Insert your test code below, the Test module is use()ed here so read
# its man page ( perldoc Test ) for help writing this test script.
my $conv = new HTML::TextToHTML();
ok( defined $conv, 'new() returned something' );
ok( $conv->isa('HTML::TextToHTML'), " and it's the right class" );
my %args = ();
$args{system_link_dict} = "txt2html.dict";
$args{default_link_dict} = "";
$args{infile} = ["tfiles/sample.txt"];
$args{append_file} = "tfiles/sample.foot2";
$args{titlefirst} = 1;
$args{mailmode} = 1;
$args{custom_heading_regexp} = ['^ *--[\w\s]+-- *$'];
$args{make_tables} = 1;
$args{make_anchors} = 1;
$args{xhtml} = 1;
$args{outfile} = "xhtml_sample.html";
$result = $conv->txt2html(%args);
ok($result, 'converted xhtml sample.txt');
# compare the files
$result = compare('tfiles/good_xhtml_sample.html', 'xhtml_sample.html');
ok($result, 'test file xhtml_sample.html matches original good_xhtml_sample.html exactly');
if ($result) {
unlink('xhtml_sample.html');
}
txt2html-2.51/t/30sample.t 0000444 0001750 0001750 00000004174 11007226424 013425 0 ustar kat kat # Before `make install' is performed this script should be runnable with
# `make test'. After `make install' it should work as `perl test.pl'
#########################
# change 'tests => 1' to 'tests => last_test_to_print';
use Test::More tests => 5;
use HTML::TextToHTML;
ok(1); # If we made it this far, we're ok.
#########################
# compare two files
sub compare {
my $file1 = shift;
my $file2 = shift;
open(F1, $file1) || return 0;
open(F2, $file2) || return 0;
my $res = 1;
my $count = 0;
while ()
{
$count++;
my $comp1 = $_;
# remove newline/carriage return (in case these aren't both Unix)
$comp1 =~ s/\n//;
$comp1 =~ s/\r//;
my $comp2 = ;
# check if F2 has less lines than F1
if (!defined $comp2)
{
print "error - line $count does not exist in $file2\n $file1 : $comp1\n";
close(F1);
close(F2);
return 0;
}
# remove newline/carriage return
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
if ($comp1 ne $comp2)
{
print "error - line $count not equal\n $file1 : $comp1\n $file2 : $comp2\n";
close(F1);
close(F2);
return 0;
}
}
close(F1);
# check if F2 has more lines than F1
if (defined($comp2 = ))
{
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
print "error - extra line in $file2 : '$comp2'\n";
$res = 0;
}
close(F2);
return $res;
}
# Insert your test code below, the Test module is use()ed here so read
# its man page ( perldoc Test ) for help writing this test script.
my $conv = new HTML::TextToHTML(xhtml=>0);
ok( defined $conv, 'new() returned something' );
ok( $conv->isa('HTML::TextToHTML'), " and it's the right class" );
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
infile=>["tfiles/sample.txt"],
outfile=>"sample.html",
append_file=>"tfiles/sample.foot",
titlefirst=>1, mailmode=>1,
custom_heading_regexp=>['^ *--[\w\s]+-- *$'],
make_tables=>1,
);
ok($result, 'converted sample.txt');
# compare the files
$result = compare('tfiles/good_sample.html', 'sample.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('sample.html');
}
txt2html-2.51/t/00_dist.t 0000444 0001750 0001750 00000000441 11007226425 013235 0 ustar kat kat # Test distribution before release
# Optional for end users if Test::Distribution not installed
use Test::More;
BEGIN {
eval {
require Test::Distribution;
};
if($@) {
plan skip_all => "Test::Distribution not installed";
}
else {
import Test::Distribution;
}
}
txt2html-2.51/t/20tfiles.t 0000444 0001750 0001750 00000032167 11007226424 013434 0 ustar kat kat # miscelaneous tests in separate files
#########################
use Test::More tests => 50;
use HTML::TextToHTML;
#########################
# compare two files
sub compare {
my $file1 = shift;
my $file2 = shift;
if (!open(F1, $file1))
{
print "error - $file1 did not open\n";
return 0;
}
if (!open(F2, $file2))
{
print "error - $file2 did not open\n";
return 0;
}
my $res = 1;
my $count = 0;
while ()
{
$count++;
my $comp1 = $_;
# remove newline/carriage return (in case these aren't both Unix)
$comp1 =~ s/\n//;
$comp1 =~ s/\r//;
my $comp2 = ;
# check if F2 has less lines than F1
if (!defined $comp2)
{
print "error - line $count does not exist in $file2\n $file1 : $comp1\n";
close(F1);
close(F2);
return 0;
}
# remove newline/carriage return
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
if ($comp1 ne $comp2)
{
print "error - line $count not equal\n $file1 : $comp1\n $file2 : $comp2\n";
close(F1);
close(F2);
return 0;
}
}
close(F1);
# check if F2 has more lines than F1
if (defined($comp2 = ))
{
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
print "error - extra line in $file2 : '$comp2'\n";
$res = 0;
}
close(F2);
return $res;
}
#-----------------------------------------------------------------
my $conv = new HTML::TextToHTML();
#
# Custom headers 1
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/custom-headers.txt"],
outfile=>"custom-headers.html",
custom_heading_regexp=>['^\d+\. +\w+', '^\d+\.\d+\. +\w+', '^\d+\.\d+\.\d+\. +\w+'],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted custom-headers.txt');
# compare the files
$result = compare('tfiles/good_custom-headers.html', 'custom-headers.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('custom-headers.html');
}
#
# Custom headers 2
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/custom-headers2.txt"],
outfile=>"custom-headers2.html",
custom_heading_regexp=>['^What: '],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted custom-headers2.txt');
# compare the files
$result = compare('tfiles/good_custom-headers2.html', 'custom-headers2.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('custom-headers2.html');
}
#
# hyphens
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/hyphens.txt"],
outfile=>"hyphens.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted hyphens.txt');
# compare the files
$result = compare('tfiles/good_hyphens.html', 'hyphens.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('hyphens.html');
}
#
# links
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/links.txt"],
outfile=>"links.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted links.txt');
# compare the files
$result = compare('tfiles/good_links.html', 'links.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('links.html');
}
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/links2.txt"],
outfile=>"links2.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted links2.txt');
# compare the files
$result = compare('tfiles/good_links2.html', 'links2.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('links2.html');
}
#
# Lists
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/list.txt"],
outfile=>"list.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list.txt');
# compare the files
$result = compare('tfiles/good_list.html', 'list.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list.html');
}
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>["tfiles/list-2.txt"],
outfile=>"list-2.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-2.txt');
# compare the files
$result = compare('tfiles/good_list-2.html', 'list-2.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-2.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
xhtml=>1,
infile=>["tfiles/list-3.txt"],
outfile=>"list-3.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-3.txt');
# compare the files
$result = compare('tfiles/good_list-3.html', 'list-3.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-3.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
xhtml=>1,
infile=>["tfiles/list-4.txt"],
outfile=>"list-4.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-4.txt');
# compare the files
$result = compare('tfiles/good_list-4.html', 'list-4.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-4.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
xhtml=>1,
infile=>["tfiles/list-5.txt"],
outfile=>"list-5.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-5.txt');
# compare the files
$result = compare('tfiles/good_list-5.html', 'list-5.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-5.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
bullets=>'-=o+*',
bullets_ordered=>'#',
extract=>0,
xhtml=>1,
infile=>["tfiles/list-custom.txt"],
outfile=>"list-custom.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-custom.txt');
# compare the files
$result = compare('tfiles/good_list-custom.html', 'list-custom.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-custom.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
xhtml=>1,
infile=>["tfiles/list-advanced.txt"],
outfile=>"list-advanced.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-advanced.txt');
# compare the files
$result = compare('tfiles/good_list-advanced.html', 'list-advanced.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-advanced.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
xhtml=>1,
infile=>["tfiles/list-styles.txt"],
outfile=>"list-styles.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted list-styles.txt');
# compare the files
$result = compare('tfiles/good_list-styles.html', 'list-styles.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('list-styles.html');
}
#
# news
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
mailmode=>1,
infile=>["tfiles/news.txt"],
outfile=>"news.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted news.txt');
# compare the files
$result = compare('tfiles/good_news.html', 'news.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('news.html');
}
#
# pre
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
use_preformat_marker=>1,
infile=>["tfiles/pre.txt"],
outfile=>"pre.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted pre.txt');
# compare the files
$result = compare('tfiles/good_pre.html', 'pre.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('pre.html');
}
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
use_preformat_marker=>0,
infile=>["tfiles/pre2.txt"],
outfile=>"pre2.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted pre2.txt');
# compare the files
$result = compare('tfiles/good_pre2.html', 'pre2.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('pre2.html');
}
#
# Tables
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
make_tables=>1,
infile=>["tfiles/table-align.txt"],
outfile=>"table-align.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted table-align.txt');
# compare the files
$result = compare('tfiles/good_table-align.html', 'table-align.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('table-align.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
make_tables=>1,
infile=>["tfiles/table-pgsql.txt"],
outfile=>"table-pgsql.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted table-pgsql.txt');
# compare the files
$result = compare('tfiles/good_table-pgsql.html', 'table-pgsql.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('table-pgsql.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
make_tables=>1,
xhtml=>1,
infile=>["tfiles/table-pgsql2.txt"],
outfile=>"table-pgsql2.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted table-pgsql2.txt');
# compare the files
$result = compare('tfiles/good_table-pgsql2.html', 'table-pgsql2.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('table-pgsql2.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
make_tables=>1,
infile=>["tfiles/table-border.txt"],
outfile=>"table-border.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted table-border.txt');
# compare the files
$result = compare('tfiles/good_table-border.html', 'table-border.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('table-border.html');
}
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
make_tables=>1,
xhtml=>1,
infile=>["tfiles/table-delim.txt"],
outfile=>"table-delim.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted table-delim.txt');
# compare the files
$result = compare('tfiles/good_table-delim.html', 'table-delim.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('table-delim.html');
}
#
# an EMPTY file (non-extracted)
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
xhtml=>0,
infile=>["tfiles/empty.txt"],
outfile=>"empty1.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted empty.txt (default)');
# compare the files
$result = compare('tfiles/good_empty.html', 'empty1.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('empty1.html');
}
#
# an EMPTY file (non-extracted) (xhtml)
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>0,
xhtml=>1,
infile=>["tfiles/empty.txt"],
outfile=>"empty2.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted empty.txt (xhtml)');
# compare the files
$result = compare('tfiles/good_empty.html', 'empty2.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('empty2.html');
}
#
# an EMPTY file (extracted)
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
xhtml=>0,
infile=>["tfiles/empty.txt"],
outfile=>"empty3.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted empty.txt (extract)');
# compare the files
$result = compare('tfiles/good_empty.html', 'empty3.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('empty3.html');
}
#
# an EMPTY file (extracted) (xhtml)
#
$conv = undef;
$conv = new HTML::TextToHTML();
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
xhtml=>1,
infile=>["tfiles/empty.txt"],
outfile=>"empty4.html",
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted empty.txt (extract) (xhtml)');
# compare the files
$result = compare('tfiles/good_empty.html', 'empty4.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('empty4.html');
}
txt2html-2.51/t/70bugs.t 0000444 0001750 0001750 00000012442 11007226425 013106 0 ustar kat kat # files which have triggered bugs
#########################
use Test::More tests => 16;
use HTML::TextToHTML;
#########################
# compare two files
sub compare {
my $file1 = shift;
my $file2 = shift;
if (!open(F1, $file1))
{
print "error - $file1 did not open\n";
return 0;
}
if (!open(F2, $file2))
{
print "error - $file2 did not open\n";
return 0;
}
my $res = 1;
my $count = 0;
while ()
{
$count++;
my $comp1 = $_;
# remove newline/carriage return (in case these are not both Unix)
$comp1 =~ s/\n//;
$comp1 =~ s/\r//;
my $comp2 = ;
# check if F2 has less lines than F1
if (!defined $comp2)
{
print "error - line $count does not exist in $file2\n $file1 : $comp1\n";
close(F1);
close(F2);
return 0;
}
# remove newline/carriage return
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
if ($comp1 ne $comp2)
{
print "error - line $count not equal\n $file1 : $comp1\n $file2 : $comp2\n";
close(F1);
close(F2);
return 0;
}
}
close(F1);
# check if F2 has more lines than F1
if (defined($comp2 = ))
{
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
print "error - extra line in $file2 : '$comp2'\n";
$res = 0;
}
close(F2);
return $res;
}
#-----------------------------------------------------------------
my $conv = new HTML::TextToHTML();
#
# bugs : make_tables
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_tables=>1,
infile=>["tfiles/robo.txt"],
outfile=>"robo.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted robo.txt');
# compare the files
$result = compare('tfiles/good_robo.html', 'robo.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('robo.html');
}
#
# bugs : list with bold chars
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_tables=>0,
infile=>["tfiles/mixed.txt"],
outfile=>"mixed.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted mixed.txt');
# compare the files
$result = compare('tfiles/good_mixed.html', 'mixed.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('mixed.html');
}
#
# bugs : dos file on unix platform not detecting headings
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_tables=>0,
infile=>["tfiles/heading1.txt"],
outfile=>"heading1.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted heading1.txt');
# compare the files
$result = compare('tfiles/good_heading1.html', 'heading1.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('heading1.html');
}
#
# bugs : link with # in it being replaced by tags
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_tables=>0,
infile=>["tfiles/links3.txt"],
outfile=>"links3.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted links3.txt');
# compare the files
$result = compare('tfiles/good_links3.html', 'links3.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('links3.html');
}
$conv = new HTML::TextToHTML();
#
# bugs : file with umlauts not doing italics and unterlines correctly
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_tables=>0,
infile=>["tfiles/umlauttest.txt"],
outfile=>"umlauttest.html",
custom_heading_regexp=>[],
xhtml=>1,
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted umlauttest.txt');
# compare the files
$result = compare('tfiles/good_umlauttest.html', 'umlauttest.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('umlauttest.html');
}
#
# bugs : UTF-8 characters are being wrongly zapped
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_anchors=>0,
infile=>["tfiles/utf8.txt"],
outfile=>"utf8.html",
custom_heading_regexp=>[],
xhtml=>1,
extract=>1,
eight_bit_clean=>1,
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted utf8.txt');
# compare the files
$result = compare('tfiles/good_utf8.html', 'utf8.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('utf8.html');
}
#
# bugs : italics with punctuation characters , and ! not converted
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_anchors=>0,
infile=>["tfiles/punct.txt"],
outfile=>"punct.html",
custom_heading_regexp=>[],
extract=>1,
eight_bit_clean=>1,
);
ok($result, 'converted punct.txt');
# compare the files
$result = compare('tfiles/good_punct.html', 'punct.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('punct.html');
}
#
# bugs : links with underscores
#
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
make_anchors=>0,
infile=>["tfiles/links4.txt"],
outfile=>"links4.html",
custom_heading_regexp=>[],
extract=>1,
eight_bit_clean=>1,
);
ok($result, 'converted links4.txt');
# compare the files
$result = compare('tfiles/good_links4.html', 'links4.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('links4.html');
}
txt2html-2.51/t/25handles.t 0000444 0001750 0001750 00000006124 11007226425 013564 0 ustar kat kat # miscelaneous tests for filehandles
#########################
use Test::More tests => 6;
use HTML::TextToHTML;
#########################
# compare two files
sub compare {
my $file1 = shift;
my $file2 = shift;
if (!open(F1, $file1))
{
print "error - $file1 did not open\n";
return 0;
}
if (!open(F2, $file2))
{
print "error - $file2 did not open\n";
return 0;
}
my $res = 1;
my $count = 0;
while ()
{
$count++;
my $comp1 = $_;
# remove newline/carriage return (in case these aren't both Unix)
$comp1 =~ s/\n//;
$comp1 =~ s/\r//;
my $comp2 = ;
# check if F2 has less lines than F1
if (!defined $comp2)
{
print "error - line $count does not exist in $file2\n $file1 : $comp1\n";
close(F1);
close(F2);
return 0;
}
# remove newline/carriage return
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
if ($comp1 ne $comp2)
{
print "error - line $count not equal\n $file1 : $comp1\n $file2 : $comp2\n";
close(F1);
close(F2);
return 0;
}
}
close(F1);
# check if F2 has more lines than F1
if (defined($comp2 = ))
{
$comp2 =~ s/\n//;
$comp2 =~ s/\r//;
print "error - extra line in $file2 : '$comp2'\n";
$res = 0;
}
close(F2);
return $res;
}
#-----------------------------------------------------------------
my $conv = new HTML::TextToHTML();
#
# FILEHANDLE
#
open (IN, "tfiles/hyphens.txt")
or die "could not open tfiles/hyphens.txt";
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
inhandle=>[\*IN],
outfile=>"fh_hyphens.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted (FH) hyphens.txt');
# compare the files
$result = compare('tfiles/good_hyphens.html', 'fh_hyphens.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('fh_hyphens.html');
}
#
# filehandle variable
#
$conv = undef;
$conv = new HTML::TextToHTML();
my $fh;
open ($fh, "tfiles/hyphens.txt")
or die "could not open tfiles/hyphens.txt";
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
inhandle=>[$fh],
outfile=>"fh2_hyphens.html",
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted (FH 2) hyphens.txt');
# compare the files
$result = compare('tfiles/good_hyphens.html', 'fh2_hyphens.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('fh2_hyphens.html');
}
#
# Output filehandle
#
$conv = undef;
$conv = new HTML::TextToHTML();
my $ofh;
open ($ofh, ">", "fh3_hyphens.html")
or die "could not open fh3_hyphens.html for writing";
$result = $conv->txt2html(
system_link_dict=>"txt2html.dict",
default_link_dict=>"",
extract=>1,
infile=>['tfiles/hyphens.txt'],
outhandle=>$ofh,
custom_heading_regexp=>[],
#debug=>1,
#dict_debug=>15,
);
ok($result, 'converted (FH 3) hyphens.txt');
# compare the files
$result = compare('tfiles/good_hyphens.html', 'fh3_hyphens.html');
ok($result, 'test file matches original example exactly');
if ($result) {
unlink('fh3_hyphens.html');
}
txt2html-2.51/lib/ 0000755 0001750 0001750 00000000000 11007226424 012113 5 ustar kat kat txt2html-2.51/lib/HTML/ 0000755 0001750 0001750 00000000000 11007226424 012657 5 ustar kat kat txt2html-2.51/lib/HTML/TextToHTML.pm 0000444 0001750 0001750 00000470640 11007226424 015142 0 ustar kat kat package HTML::TextToHTML;
use 5.8.1;
use strict;
#------------------------------------------------------------------------
=head1 NAME
HTML::TextToHTML - convert plain text file to HTML.
=head1 VERSION
This describes version B<2.51> of HTML::TextToHTML.
=cut
our $VERSION = '2.51';
=head1 SYNOPSIS
From the command line:
txt2html I
From Scripts:
use HTML::TextToHTML;
# create a new object
my $conv = new HTML::TextToHTML();
# convert a file
$conv->txt2html(infile=>[$text_file],
outfile=>$html_file,
title=>"Wonderful Things",
mail=>1,
]);
# reset arguments
$conv->args(infile=>[], mail=>0);
# convert a string
$newstring = $conv->process_chunk($mystring)
=head1 DESCRIPTION
HTML::TextToHTML converts plain text files to HTML. The txt2html script
uses this module to do the same from the command-line.
It supports headings, tables, lists, simple character markup, and
hyperlinking, and is highly customizable. It recognizes some of the
apparent structure of the source document (mostly whitespace and
typographic layout), and attempts to mark that structure explicitly
using HTML. The purpose for this tool is to provide an easier way of
converting existing text documents to HTML format, giving something nicer
than just whapping the text into a big PRE block.
=head2 History
The original txt2html script was written by Seth Golub (see
http://www.aigeek.com/txt2html/), and converted to a perl module by
Kathryn Andersen (see http://www.katspace.com/tools/text_to_html/) and
made into a sourceforge project by Sun Tong (see
http://sourceforge.net/projects/txt2html/). Earlier versions of the
HTML::TextToHTML module called the included script texthyper so as not
to clash with the original txt2html script, but now the projects have
all been merged.
=head1 OPTIONS
All arguments can be set when the object is created, and further options
can be set when calling the actual txt2html method. Arguments
to methods can take a hash of arguments.
Note that all option-names must match exactly -- no abbreviations are
allowed. The argument-keys are expected to have values matching those
required for that argument -- whether that be a boolean, a string, a
reference to an array or a reference to a hash. These will replace any
value for that argument that might have been there before.
=over
=item append_file
append_file=>I
If you want something appended by default, put the filename here.
The appended text will not be processed at all, so make sure it's
plain text or correct HTML. i.e. do not have things like:
Mary Andersen Ekitty@example.comE
but instead, have:
Mary Andersen <kitty@example.com>
(default: nothing)
=item append_head
append_head=>I
If you want something appended to the head by default, put the filename here.
The appended text will not be processed at all, so make sure it's
plain text or correct HTML. i.e. do not have things like:
Mary Andersen Ekitty@example.comE
but instead, have:
Mary Andersen <kitty@example.com>
(default: nothing)
=item body_deco
body_deco=>I
Body decoration string: a string to be added to the BODY tag so that
one can set attributes to the BODY (such as class, style, bgcolor etc)
For example, "class='withimage'".
=item bold_delimiter
bold_delimiter=>I
This defines what character (or string) is taken to be the delimiter of
text which is to be interpreted as bold (that is, to be given a STRONG
tag). If this is empty, then no bolding of text will be done.
(default: #)
=item bullets
bullets=>I
This defines what single characters are taken to be "bullet" characters
for unordered lists. Note that because this is used as a character
class, if you use '-' it must come first.
(default:-=o*\267)
=item bullets_ordered
bullets_ordered=>I
This defines what single characters are taken to be "bullet" placeholder
characters for ordered lists. Ordered lists are normally marked by
a number or letter followed by '.' or ')' or ']' or ':'. If an ordered
bullet is used, then it simply indicates that this is an ordered list,
without giving explicit numbers.
Note that because this is used as a character class, if you use '-' it
must come first.
(default:nothing)
=item caps_tag
caps_tag=>I
Tag to put around all-caps lines
(default: STRONG)
If an empty tag is given, then no tag will be put around all-caps lines.
=item custom_heading_regexp
custom_heading_regexp=>\@custom_headings
Add patterns for headings. Header levels are assigned by regexp in the
order seen in the input text. When a line matches a custom header
regexp, it is tagged as a header. If it's the first time that
particular regexp has matched, the next available header level is
associated with it and applied to the line. Any later matches of that
regexp will use the same header level. Therefore, if you want to match
numbered header lines, you could use something like this:
my @custom_headings = ('^ *\d+\. \w+',
'^ *\d+\.\d+\. \w+',
'^ *\d+\.\d+\.\d+\. \w+');
...
custom_heading_regexp=>\@custom_headings,
...
Then lines like
" 1. Examples "
" 1.1. Things"
and " 4.2.5. Cold Fusion"
Would be marked as H1, H2, and H3 (assuming they were found in that
order, and that no other header styles were encountered).
If you prefer that the first one specified always be H1, the second
always be H2, the third H3, etc, then use the "explicit_headings"
option.
This expects a reference to an array of strings.
(default: none)
=item default_link_dict
default_link_dict=>I
The name of the default "user" link dictionary.
(default: "$ENV{'HOME'}/.txt2html.dict" -- this is the same as for
the txt2html script. If there is no $ENV{HOME} then it is just '.txt2html.dict')
=item demoronize
demoronize=>1
Convert Microsoft-generated character codes that are non-ISO codes into
something more reasonable.
(default:true)
=item doctype
doctype=>I
This gets put in the DOCTYPE field at the top of the document, unless it's
empty.
Default :
'-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd'
If B is true, the contents of this is ignored, unless it's
empty, in which case no DOCTYPE declaration is output.
=item eight_bit_clean
eight_bit_clean=>1
If false, convert Latin-1 characters to HTML entities.
If true, this conversion is disabled; also "demoronize" is set to
false, since this also changes 8-bit characters.
(default: false)
=item escape_HTML_chars
escape_HTML_chars=>1
turn & E E into & > <
(default: true)
=item explicit_headings
explicit_headings=>1
Don't try to find any headings except the ones specified in the
--custom_heading_regexp option.
Also, the custom headings will not be assigned levels in the order they
are encountered in the document, but in the order they are specified on
the custom_heading_regexp option.
(default: false)
=item extract
extract=>1
Extract Mode; don't put HTML headers or footers on the result, just
the plain HTML (thus making the result suitable for inserting into
another document (or as part of the output of a CGI script).
(default: false)
=item hrule_min
hrule_min=>I
Min number of ---s for an HRule.
(default: 4)
=item indent_width
indent_width=>I
Indents this many spaces for each level of a list.
(default: 2)
=item indent_par_break
indent_par_break=>1
Treat paragraphs marked solely by indents as breaks with indents.
That is, instead of taking a three-space indent as a new paragraph,
put in a
and three non-breaking spaces instead.
(see also --preserve_indent)
(default: false)
=item infile
infile=>\@my_files
infile=>['chapter1.txt', 'chapter2.txt']
The name of the input file(s).
This expects a reference to an array of filenames.
The special filename '-' designates STDIN.
See also L and L.
(default:-)
=item inhandle
inhandle=>\@my_handles
inhandle=>[\*MYINHANDLE, \*STDIN]
An array of input filehandles; use this instead of
L or L to use a filehandle or filehandles
as input.
=item instring
instring=>\@my_strings
instring=>[$string1, $string2]
An array of input strings; use this instead of
L or L to use a string or strings
as input.
=item italic_delimiter
italic_delimiter=>I
This defines what character (or string) is taken to be the delimiter of
text which is to be interpreted as italic (that is, to be given a EM
tag). If this is empty, no italicising of text will be done.
(default: *)
=item underline_delimiter
underline_delimiter=>I
This defines what character (or string) is taken to be the delimiter of
text which is to be interpreted as underlined (that is, to be given a U
tag). If this is empty, no underlining of text will be done.
(default: _)
=item links_dictionaries
links_dictionaries=>\@my_link_dicts
links_dictionaries=>['url_links.dict', 'format_links.dict']
File(s) to use as a link-dictionary. There can be more than one of
these. These are in addition to the Global Link Dictionary and the User
Link Dictionary. This expects a reference to an array of filenames.
=item link_only
link_only=>1
Do no escaping or marking up at all, except for processing the links
dictionary file and applying it. This is useful if you want to use
the linking feature on an HTML document. If the HTML is a
complete document (includes HTML,HEAD,BODY tags, etc) then you'll
probably want to use the --extract option also.
(default: false)
=item lower_case_tags
lower_case_tags=>1
Force all tags to be in lower-case.
=item mailmode
mailmode=>1
Deal with mail headers & quoted text. The mail header paragraph is
given the class 'mail_header', and mail-quoted text is given the class
'quote_mail'.
(default: false)
=item make_anchors
make_anchors=>0
Should we try to make anchors in headings?
(default: true)
=item make_links
make_links=>0
Should we try to build links? If this is false, then the links
dictionaries are not consulted and only structural text-to-HTML
conversion is done. (default: true)
=item make_tables
make_tables=>1
Should we try to build tables? If true, spots tables and marks them up
appropriately. See L for information on how tables
should be formatted.
This overrides the detection of lists; if something looks like a table,
it is taken as a table, and list-checking is not done for that
paragraph.
(default: false)
=item min_caps_length
min_caps_length=>I
min sequential CAPS for an all-caps line
(default: 3)
=item outfile
outfile=>I
The name of the output file. If it is "-" then the output goes
to Standard Output.
(default: - )
=item outhandle
The output filehandle; if this is given then the output goes
to this filehandle instead of to the file given in L.
=item par_indent
par_indent=>I
Minumum number of spaces indented in first lines of paragraphs.
Only used when there's no blank line
preceding the new paragraph.
(default: 2)
=item preformat_trigger_lines
preformat_trigger_lines=>I
How many lines of preformatted-looking text are needed to switch to
<= 0 : Preformat entire document
1 : one line triggers
>= 2 : two lines trigger
(default: 2)
=item endpreformat_trigger_lines
endpreformat_trigger_lines=>I
How many lines of unpreformatted-looking text are needed to switch from
<= 0 : Never preformat within document
1 : one line triggers
>= 2 : two lines trigger
(default: 2)
NOTE for preformat_trigger_lines and endpreformat_trigger_lines:
A zero takes precedence. If one is zero, the other is ignored.
If both are zero, entire document is preformatted.
=item preformat_start_marker
preformat_start_marker=>I
What flags the start of a preformatted section if --use_preformat_marker
is true.
(default: "^(:?(:?<)|<)PRE(:?(:?>)|>)\$")
=item preformat_end_marker
preformat_end_marker=>I
What flags the end of a preformatted section if --use_preformat_marker
is true.
(default: "^(:?(:?<)|<)/PRE(:?(:?>)|>)\$")
=item preformat_whitespace_min
preformat_whitespace_min=>I
Minimum number of consecutive whitespace characters to trigger
normal preformatting.
NOTE: Tabs are expanded to spaces before this check is made.
That means if B is 8 and this is 5, then one tab may be
expanded to 8 spaces, which is enough to trigger preformatting.
(default: 5)
=item prepend_file
prepend_file=>I
If you want something prepended to the processed body text, put the
filename here. The prepended text will not be processed at all, so make
sure it's plain text or correct HTML.
(default: nothing)
=item preserve_indent
preserve_indent=>1
Preserve the first-line indentation of paragraphs marked with indents
by replacing the spaces of the first line with non-breaking spaces.
(default: false)
=item short_line_length
short_line_length=>I
Lines this short (or shorter) must be intentionally broken and are kept
that short.
(default: 40)
=item style_url
style_url=>I
This gives the URL of a stylesheet; a LINK tag will be added to the
output.
=item tab_width
tab_width=>I
How many spaces equal a tab?
(default: 8)
=item table_type
table_type=>{ ALIGN=>0, PGSQL=>0, BORDER=>1, DELIM=>0 }
This determines which types of tables will be recognised when "make_tables"
is true. The possible types are ALIGN, PGSQL, BORDER and DELIM.
(default: all types are true)
=item title
title=>I
You can specify a title. Otherwise it will use a blank one.
(default: nothing)
=item titlefirst
titlefirst=>1
Use the first non-blank line as the title. (See also "title")
=item underline_length_tolerance
underline_length_tolerance=>I
How much longer or shorter can underlines be and still be underlines?
(default: 1)
=item underline_offset_tolerance
underline_offset_tolerance=>I
How far offset can underlines be and still be underlines?
(default: 1)
=item unhyphenation
unhyphenation=>0
Enables unhyphenation of text.
(default: true)
=item use_mosaic_header
use_mosaic_header=>1
Use this option if you want to force the heading styles to match what Mosaic
outputs. (Underlined with "***"s is H1,
with "==="s is H2, with "+++" is H3, with "---" is H4, with "~~~" is H5
and with "..." is H6)
This was the behavior of txt2html up to version 1.10.
(default: false)
=item use_preformat_marker
use_preformat_marker=>1
Turn on preformatting when encountering "" on a line by itself, and turn
it off when there's a line containing only "
".
When such preformatted text is detected, the PRE tag will be given the
class 'quote_explicit'.
(default: off)
=item xhtml
xhtml=>1
Try to make the output conform to the XHTML standard, including
closing all open tags and marking empty tags correctly. This
turns on --lower_case_tags and overrides the --doctype option.
Note that if you add a header or a footer file, it is up to you
to make it conform; the header/footer isn't touched by this.
Likewise, if you make link-dictionary entries that break XHTML,
then this won't fix them, except to the degree of putting all tags
into lower-case.
(default: true)
=back
=head1 DEBUGGING
There are global variables for setting types and levels
of debugging. These should only be used by developers.
=over
=item $HTML::TextToHTML::Debug
$HTML::TextToHTML::Debug = 1;
Enable copious debugging output.
(default: false)
=item $HTML::TextToHTML::DictDebug
$HTML::TextToHTML::DictDebug = I;
Debug mode for link dictionaries. Bitwise-Or what you want to see:
1: The parsing of the dictionary
2: The code that will make the links
4: When each rule matches something
8: When each tag is created
(default: 0)
=back
=cut
our $Debug = 0;
our $DictDebug = 0;
=head1 METHODS
=cut
#------------------------------------------------------------------------
use YAML::Syck;
our $PROG = 'HTML::TextToHTML';
#------------------------------------------------------------------------
########################################
# Definitions (Don't change these)
#
# These are just constants I use for making bit vectors to keep track
# of what modes I'm in and what actions I've taken on the current and
# previous lines.
our $NONE = 0;
our $LIST = 1;
our $HRULE = 2;
our $PAR = 4;
our $PRE = 8;
our $END = 16;
our $BREAK = 32;
our $HEADER = 64;
our $MAILHEADER = 128;
our $MAILQUOTE = 256;
our $CAPS = 512;
our $LINK = 1024;
our $PRE_EXPLICIT = 2048;
our $TABLE = 4096;
our $IND_BREAK = 8192;
our $LIST_START = 16384;
our $LIST_ITEM = 32768;
# Constants for Link-processing
# bit-vectors for what to do with a particular link-dictionary entry
our $LINK_NOCASE = 1;
our $LINK_EVAL = 2;
our $LINK_HTML = 4;
our $LINK_ONCE = 8;
our $LINK_SECT_ONCE = 16;
# Constants for Ordered Lists and Unordered Lists.
# And Definition Lists.
# I use this in the list stack to keep track of what's what.
our $OL = 1;
our $UL = 2;
our $DL = 3;
# Constants for table types
our $TAB_ALIGN = 1;
our $TAB_PGSQL = 2;
our $TAB_BORDER = 3;
our $TAB_DELIM = 4;
# Constants for tags
use constant {
TAG_START => 1,
TAG_END => 2,
TAG_EMPTY => 3,
};
# Character entity names
# characters to replace with entities
our %char_entities = (
"\241", "¡", "\242", "¢", "\243", "£",
"\244", "¤", "\245", "¥", "\246", "¦",
"\247", "§", "\250", "¨", "\251", "©",
"\252", "ª", "\253", "«", "\254", "¬",
"\255", "", "\256", "®", "\257", "&hibar;",
"\260", "°", "\261", "±", "\262", "²",
"\263", "³", "\264", "´", "\265", "µ",
"\266", "¶", "\270", "¸", "\271", "¹",
"\272", "º", "\273", "»", "\274", "¼",
"\275", "½", "\276", "¾", "\277", "¿",
"\300", "À", "\301", "Á", "\302", "Â",
"\303", "Ã", "\304", "Ä", "\305", "Å",
"\306", "Æ", "\307", "Ç", "\310", "È",
"\311", "É", "\312", "Ê", "\313", "Ë",
"\314", "Ì", "\315", "Í", "\316", "Î",
"\317", "Ï", "\320", "Ð", "\321", "Ñ",
"\322", "Ò", "\323", "Ó", "\324", "Ô",
"\325", "Õ", "\326", "Ö", "\327", "×",
"\330", "Ø", "\331", "Ù", "\332", "Ú",
"\333", "Û", "\334", "Ü", "\335", "Ý",
"\336", "Þ", "\337", "ß", "\340", "à",
"\341", "á", "\342", "â", "\343", "ã",
"\344", "ä", "\345", "å", "\346", "æ",
"\347", "ç", "\350", "è", "\351", "é",
"\352", "ê", "\353", "ë", "\354", "ì",
"\355", "í", "\356", "î", "\357", "ï",
"\360", "ð", "\361", "ñ", "\362", "ò",
"\363", "ó", "\364", "ô", "\365", "õ",
"\366", "ö", "\367", "÷", "\370", "ø",
"\371", "ù", "\372", "ú", "\373", "û",
"\374", "ü", "\375", "ý", "\376", "þ",
"\377", "ÿ", "\267", "·",
);
# alignments for tables
our @alignments = ('', '', ' ALIGN="RIGHT"', ' ALIGN="CENTER"');
our @lc_alignments = ('', '', ' align="right"', ' align="center"');
our @xhtml_alignments =
('', '', ' style="text-align: right;"', ' style="text-align: center;"');
#---------------------------------------------------------------#
# Object interface
#---------------------------------------------------------------#
=head2 new
$conv = new HTML::TextToHTML()
$conv = new HTML::TextToHTML(titlefirst=>1,
...
);
Create a new object with new. If arguments are given, these arguments
will be used in invocations of other methods.
See L