Document Properties: Links
The Links properties page allows you to specify how
to handle hyperlinks.
Maximum Link Depth
You tell iSiloX how far to follow hyperlinks by specifying
a value for the Maximum link depth. A new installation
of iSiloX has this value initialized to a default of one.
The root source files are considered to be at a depth of zero.
Files to which they link are at a depth of one. Files to which
those files link are at a depth of two, and so on.
Recommendation
If you are creating a document based on a Web site, you
are recommended to leave the Maximum link depth value
at one because each additional increment in depth beyond
one will likely cause an exponential increase in the size
of the document. For example, at a link depth of one, if
the converted document is one megabyte in size, at a link
depth of two, it might be ten megabytes, and at a link
depth of three, it could be 100 megabytes.
Off-site Links
An off-site link is defined as a link to a target in a different
domain. iSiloX treats all file paths as belonging to the same
domain. For URLs, iSiloX treats the domain as the protocol
(e.g., http://) and the hostname. To tell iSiloX to not
follow links to targets in different domains, uncheck the
Follow off-site links checkbox. This is useful to limit
the amount of irrelevant content brought into the document.
iSiloX performs the off-site link check anew for each root
source file. What this means is that you can have root source
files in different domains. For example, you can have two
root source files, one with the URL <http://www.iSilo.com> and
another with the URL <http://www.palm.com>. Assuming that
you have unchecked the option to follow off-site links, then
when iSiloX converts the content at <http://www.iSilo.com>,
it only follows links from there with target URLs that begin with
<http://www.iSilo.com>. When iSiloX converts the content
at <http://www.palm.com>, it only follows links from there
with target URLs that begin with <http://www.palm.com>.
If the content at <http://www.palm.com> had a link to
<http://www.iSilo.com/whatsnew.htm>, iSiloX will not follow
that link.
Maximum off-site link depth
When you enable the option to follow off-site links, you can also
specify how far to follow hyperlinks that go off-site. A value
of zero is equivalent to unchecking the option to follow off-site
links. The depth is relative to the source file containing the off-site
link, rather than relative to the root source files.
If you uncheck the Follow off-site links option,
then the Maximum off-site link depth setting has no effect.
Note that the value for the Maximum link depth setting
still limits the total link depth.
So if the maximum link depth is set to two and the maximum off-site
link depth is set to one, and there is an off-site link from a source
file at depth two, that link is not followed, although it is at a depth
of one relative to the source file with that off-site link.
The maximum off-site link depth option is useful in the case where
you specify a maximum depth value greater than one in order to include
more content from a given site but want to allow links to off-site
articles.
Following Only Sub-Folder Links
In many cases, websites are structured hierarchically within
folders and sub-folders. And in such cases, it is also probably
the case that the URLs referencing the pages of such a site
are also orgznied as such, with slashes separating the different
levels of folders. For example, the iSiloX.com website has
all support pages within a folder named "support". Within the
support folder, there are sub-folders for different categories
of support, such as a sub-folder named "manual" where the
manuals are located. However, such sub-folder pages may also
have links to pages outside of the folder. If you want
to limit followed links to only sub-folders of the
root source pages then you can
check the Follow only links that are sub-folders of the
root source paths checkbox to do so. If you do, then
iSiloX only follows links which match up to the last slash
of any of the root source URLs.
As an example, if you wanted to get all the support pages
from the iSiloX.com website, you might specify
http://www.iSiloX.com/support/index.htm as the root source
URL and check Follow only links that are sub-folders of the
root source paths. The page http://www.iSiloX.com/support/index.htm
has a reference to the home page of the site http://www.iSiloX.com.
However, because you check the Follow only links that are
sub-folders of the root source paths option, that link
will not be followed. However, a link such as
http://www.iSiloX.com/support/faq.htm to the frequently asked
questions page will be followed.
Unresolved Link Detail
In most cases, since you can tell iSiloX to only follow links
up to a given maximum depth and to not follow off-site links,
you end up with a document that has hyperlinks to content
not brought into the document. These hyperlinks are referred
to as unresolved links. You can choose whether to include
the target URLs of these unresolved links in the document or not
by checking or unchecking the Include unresolved link detail
checkbox. Furthermore, with the Generate web links for unresolved
links option, you can have the unresolved links converted into
workable links that open a web browser or mail program for unresolved
http: URLs and mailto: URLs.
Including unresolved link detail
If you choose to include the unresolved link detail, iSiloX
creates a document with an additional page at the end that
lists the URLs of all unresolved links. iSiloX sets the
target of each unresolved link in the document to jump
to its corresponding target URL on this last page. This is
useful for later reference and for finding broken hyperlinks.
Not including unresolved link detail
If you choose not to include the unresolved link detail,
the unresolved hyperlinks essentially have no target. When
viewing the document within a reader and attempting to follow
such a hyperlink, the reader will tell you that the
hyperlink was unresolved, but gives no indication of the
target URL.
Generating web links
If you check the Generate web links for unresolved links
option, then iSiloX generates links that when activated in a
reader that has support for web links, will allow the unresolved
http: URLs and mailto: URLs to be
opened in a web browser or mail program, respectively.
Common sources of unresolved links
The most common sources of unresolved links are the following:
Links that are at a depth greater than that specified in the
Maximum link depth setting.
Links that are outdated and thus are broken because the target
has moved.
Links whose targets are specified incorrectly.
URL Filters
Click URL Filters to access the URL Filters dialog
to specify patterns for excluding images and the following
of links based on the image or link target URL. URL filters
are useful for excluding unwanted images and content and
for reducing document sizes. A filter is specified using
either a wildcard or regular expression pattern matching string.
If the URL of an image matches against one of the exclusion
patterns, it is not included in the document. If the target URL
of a link matches against one of the exclusion patterns, the
link is not followed and hence the target content
is not included in the document. Exceptions to exclusions can
be specified using inclusion filters.
Adding an exclusion filter
Click Add Exclusion Filter to access the dialog for
specifying a new exclusion filter. In the URL Filter dialog, select
a pattern type of either Wildcard or Regular Expression:
- Wildcard: A wildcard pattern provides a simple way
to specify simple patterns. In such a pattern, the character '*'
matches zero or more of any mix of characters and the character '?'
matches exactly one of any character. A URL matches against a wildcard
pattern if the pattern appears anywhere in the URL.
- Regular Expression: Regular expression patterns use a powerful
pattern matching language. This implementation uses the PCRE
(Perl Compatible Regular Expressions) library, version 3.9.
For more information about PCRE and the syntax for regular expressions,
you can consult the PCRE website.
In particular, follow the link there labeld "PCRE man page" and then
go to the section with the heading "REGULAR EXPRESSION DETAILS".
In the Pattern field, enter the pattern to use.
Check Case-sensitive to perform a case-sensitive
match. By default, matching is case-insensitive, with the lowercase
letters 'a' through 'z' matching the uppercase letters 'A' through 'Z'.
Deleting an exclusion filter
Select one or more exclusion filters, then click Delete Selected
Exclusion Filters to delete them. You will be asked for
confirmation before the filters are deleted.
Modifying an exclusion filter
Double-click an exclusion filter to modify it.
An inclusion filter serves as an exception to the
exclusion filters. If a given URL matches against an exclusion
filter the inclusion filters are applied to the URL, and if there
is a match against an inclusion filter, the URL is not excluded.
Click Add Inclusion Filter to access the dialog
for specifying a new inclusion filter. To delete one or more
inclusion filters, select them, then click Delete Selected
Inclusion Filters. To modify an inclusion filter, double-click it.
Example
This example specifies two exclusion filters and one inclusion filter.
- First exclusion filter: A regular expression pattern of
table[1-9].jpg
- Second exclusion filter: A wildcard pattern of
figures*plant?blue
- Inclusion filter: A case-sensitive wildcard pattern of
Table8
The first exclusion filter specifies a regular expression pattern
that is case-insensitive. The pattern matches the text "table" followed
by any digit character from '0' through '9' and then followed by the text
".jpg". So the pattern will match against any of the following:
- Table1.jpg
- http://www.acme.org/table3.jpg
- c:\My Documents\table5.jpg
- /home/acme/docs/TABLE9.jpg
But the pattern will not match against any of the following:
- Table1.gif
- http://www.acme.org/table0.jpg
- c:\My Documents\table5
- /home/acme/docs/tables.htm
The second exclusion filter is a wildcard pattern and is also
case-insensitive. The pattern matches
the text "figures" followed by zero or more of any mix
of characters, followed by the text "plant", followed by
any single character, and finally followed by the text "blue".
The pattern will thus match against any of the following:
- http://blueflowers.com/figures/plantablue.htm
- http://blueflowers.com/figuresplant1blue.htm
But the pattern will not match against any of the following:
- http://blueflowers.com/figures/plantabblue.htm
- http://blueflowers.com/figuresplantblue.htm
The first exclusion filter would exclude the URL
"http://www.acme.org/Table8.jpg".
However, because the inclusion filter notes it as an exception,
the URL would actually not be excluded. Note that this inclusion pattern
specifies a case-sensitive match, and so "http://www.acme.org/table8.jpg"
would not be noted as an exception.
Click External Documents to access the External Documents dialog
to specify which links are to external documents. A document may have
links to zero or more external documents.
In the External Documents dialog, the External document list
lists the document name and link prefix fields of each external
document specification for the document. Generally you will have
one external document specification for each external document to which
the document will link.
An external document specification consists of four pieces
of information, as shown in the External Document Specification
box of the dialog:
- Document name: This field gives the relative path to the
external document as it will be when the user accesses it using
a link. If the external document is a .pdb file, the .pdb extension
is optional. The reader application will attempt to open the file
with the exact path name you provide first. If the open is unsuccessful,
another attempt is made to open it with the .pdb extension if it was not
provided or without the .pdb extension if it was provided.
Note that on Palm OS® that if a file is stored in the internal
storage memory that the document title serves as the file name,
so when converting external documents, it is best to ensure
that the document title and document file name are the same.
Also, on Palm OS®, when a document is stored in the internal
storage memory, any external documents to which it links must also
be stored in the internal memory, and in this case, the reader
application ignores the directory part of external document paths.
Version 4.3 and later of iSilo™ support searching
for the first of multiple possibilities. You can specify multiple
names to search for by enclosing each name within double-quote
characters and separating each double-quote enclosed name from
the next with a space. When you do this, iSilo™
opens the first document that it finds in the order listed. This
is especially useful in the case for Palm OS®,
where when a document is in the internal database storage memory, its
internal database name is used since there is no notion of a file
name, but when a document is on a memory card, its file name is used.
Here is an example of specifying three different possibilites
for the name:
"Gulliver's Travels" "Gulliver_s_Travels.pdb" "Gul. Travels"
- Link prefix: This field gives
the text to match links against for identifying a link as
one to an external document. When performing the match, the
converter takes the URL of the link and removes all leading periods,
forward slashes, and baskslashes. Then it performs a case-insensitive
comparison of the prefix string against the beginning of the
remaining URL. A match indicates that the link is to that of an
external URL.
- Keep prefix: This field determines whether the link
prefix is kept for lookup.
As an example of a scenario where the prefix should not be included
in the lookup, consider two documents, call them document A and
document B, that externally link to one another such that each
document's content is wholly contained in its own directory.
Say that the directory containing document A's
content is named DirA and that the directory for document B's
content is named DirB. In order for document A to link to document B,
for document A, you would specify DirB as the prefix for identifying
links as those to document B. For document B, you would specify DirA
as the prefix for identifying links as those to document A.
The target names within a given document are relative
to the first source, which would presumably be some file immediately
within the document's directory. Hence, the directory name would
not be part of the target name and thus the prefix, which would be
the same as the directory name, should not be included for lookup.
As an example of a scenario where the prefix should be included
in the lookup, consider two documents, call them document A and
document B, that externally link to one another such that each
document's content is spread across two directories.
Say that the directories containing document A's content are
DirA1 and DirA2 and that the directories containing document B's
content are named DirB1 and DirB2. Further, say that the
directory containing all four directories is named DirAB.
In addition, say that an index file immediately within DirAB
links to content in all four subdirectories DirA1, DirA2, DirB1,
and DirB2. To create the two documents that link externally
to one another, for document A, you would specify two external
document specifications, both for externally linking to document B.
For the first specification, DirB1 would be the prefix.
For the second specification, DirB2 would be the prefix.
But since the index file is at the same level as those two directories,
you would want to keep the prefix.
- Map file: This field gives
the full path of the map file for the external document.
When using the ID or Offset lookup methods
the map file is necessary for determining the target ID or target
offset value for links to the external document. The path
must be a full path and can be an HTTP URL. This latter
capability allows the targets of a document to be easily
made public.
When converting the target external document, be sure
to use the option to generate the map file.
If two documents link to one another,
it is necessary to perform two conversion passes. The first
pass generates the map file and the second pass uses the map
files for looking up the associated target IDs or offsets.
- Lookup by: The setting of this option
determines the format in which the link information is stored as
well as how the lookup is performed in the external document. Set it
to one of the following:
- Name: The part of the URL of the link after
the prefix is considered the target name and stored as the value
to use to identify the target location within the external document
when a jump to the target occurs. In order for the links to the target
document to work properly, the target document must have been converted
with the targets lookup option set to Name as well.
- ID: A numeric value, also known as the
target ID, that uniquely identifies the target is stored and used
to identify the target location within the external document when
a jump to the target occurs. A map file
for the external document is needed to lookup the target ID values
of the external document during conversion.
- Offset: A numeric value, also known as the
target offset, that represents the location of the target in
the external document is stored and used when a jump to the target
occurs. A map file for the external document
is needed to lookup the target offset values of the external document
during conversion.
Lookup method tradeoffs
The lookup methods each have their own individual advantages
and disadvantages.
For the document storage space tradeoffs among the methods,
the Name method requires the
largest amount of storage space in the linking document
as well as in the targeted external document unless the number
of target names are very few and short in length. The ID
and Offset methods require approximately the same amount
of storage space as each other in the linking document.
In the targeted external document, the Offset method
requires no additional storage space, while the ID
method requires an amount of storage space that is generally
less than the Name method.
In terms of the speed of performing the lookup when a jump
occurs to an external document, the difference perceived by the
user is probably negligible. But the Name method requires
the most amount of processing. The ID method comes next,
while the Offset method requires the least amount of processing
for lookup.
The other important tradeoff among the methods concerns
synchronization between a document and the external documents
to which it links. For the purposes of this discussion, let us
say that we have a document named DocSource that has links to an
external document named DocTarget and that DocTarget is updated
indepedent of DocSource. The content and targets in DocTarget
change periodically such that content and targets may be added
and removed. Assume though that the targets to which
DocSource links to in DocTarget are always there, though the
specific location of the targets within the content of DocTarget
may change.
Given the scenario just described, if the lookup method is
Name, even though DocTarget may undergo many changes and DocSource
stays the same, the links from DocSource to DocTarget will always
work.
If the lookup method is ID this may not be the case.
The IDs assigned to each target within DocTarget depend to some
extent on all other external targets within DocTarget. If DocTarget
gets a new target or one is removed, the target IDs for the other
targets may change. As a result, the target IDs stored in DocSource
for the targets in DocTarget may become invalid. However,
if only the content in DocTarget changes, the target IDs will still
be valid.
If the lookup method is Offset, then neither the content
nor the targets in DocTarget may change if the links from DocSource
to DocTarget are to remain valid.
The Name lookup method, though requiring the most storage space,
is the best method to use for documents that can change independent
of one another. The Offset lookup method requires the least
amount of storage space and is a good method to use for documents
that will change together. The ID lookup method generally requires
only a modest amount of storage space compared to the Name method
and is a good method to use when only changes to the content,
such as minor corrections, are expected to occur in an external
document.
Adding a new external document specification
To add a new external document specification, fill in the fields
in the External Document Specification box and then
click Add.
Modifying an existing external document specification
In the External document list select the specification
to modify. The fields in the External Document Specification
box change to show the values for the selected specification.
Make the modifications, then click Modify.
Deleting one or more external document specifications
To delete individual specifications, select them individually,
then click Delete. To delete all specifications,
click Delete All.
Changing the order of the specifications
The order of the specifications may be important for your
document set. To change the order of a specification,
select it and then use the Move Up and Move Down
buttons to move the specification up and down, respectively,
in the list. Specifications are applied in order from top
to bottom. The first specification that matches a given link
is the one used.