Methodology
Our methodology was simple: We went to each university and downloaded
journal articles under typical, rather than ideal, local conditions. We
did in some cases also test journals access during late evening or weekend
hours, when the access center would normally be closed, for the sake of
comparison--and achieved better results. However, since actual local users
would not ordinarily enjoy similar access, only those results obtained
during the usual, higher traffic times are used to reflect the current
feasibility of journals access from these locations.
The journals we accessed represent the two basic types of online journals
that exist: those formatted in PDF files and those formatted in HTML.
Articles in PDF format need to be downloaded in their entirety, graphics
and all, before they can be viewed, and then additional software (e.g.,
Adobe Acrobat Reader) is needed to view them. The PDF files tend to be
large, typically in the 250-700 kilobyte (Kb) range and often a megabyte
or more, although smaller articles do also exist. By contrast, HTML files
are downloaded like a web page, with text tending to appear very quickly
and graphics coming in more gradually over time. The graphics on the HTML
articles first appear as small versions, with an option to download larger
versions, so that the total "weight" of an HTML article is invariably
much less than a similar PDF article (see Tables 1 and 2). Science
magazine and Nature are HTML-based journals, but they also offer
a PDF download option; there was no HTML option with any of the IDEAL
catalog's PDF-based journals. Note that there is much less variation in
overall file size with HTML articles than with PDF files. HTML articles
are consistently in the 80-150 kb range (including graphics), whereas
PDF files vary considerably between less than 100 kb and more than 3 megabytes.
|
Table 1. Typical file sizes in kilobytes: HTML vs. PDF
|
|
Article
|
HTML version (default)
|
HTML version (large)
|
PDF version
|
|
1
|
89
|
143
|
105
|
|
2
|
77
|
129
|
106
|
|
3
|
75
|
120
|
138
|
|
4
|
130
|
314
|
355
|
|
5
|
120
|
312
|
420
|
|
6
|
147
|
455
|
498
|
|
7
|
110
|
237
|
656
|
|
8
|
89
|
143
|
3077
|
|
Note: HTML files are composites of multiple text
and graphics files; the size given in the first column ("default")
is the sum of the text plus the initial default graphics in the
article, i.e., the article as it appears in its entirety without
the extra step of downloading the larger versions of the same graphics.
The second column ("HTML large") shows the total number
of kilobytes counting the larger, optional graphics. A PDF file
is a single file that includes text and high-quality graphics.
|
|
Table 2. Download times for same article: HTML vs. PDF
|
|
Article
|
Size (PDF version)
|
HTML download time
|
PDF download time
|
|
Text only
|
with graphics
|
|
1
|
80 kb
|
45 seconds
|
58 seconds
|
2 min. 30 seconds
|
|
2
|
83 kb
|
53 seconds
|
1 min. 22 seconds
|
1 min. 47 seconds
|
|
3
|
99 kb
|
48 seconds
|
1 min. 28 seconds
|
2 min. 31 seconds
|
|
4
|
198 kb
|
1 min. 5 seconds
|
1 min. 42 seconds
|
4 min. 12 seconds
|
|
5
|
303 kb
|
58 seconds
|
1 min. 31 seconds
|
9 min. 45 seconds
|
|
6
|
530 kb
|
52 seconds
|
2 min. 20 seconds
|
11 min. 4 seconds
|
|
7
|
828 kb
|
45 seconds
|
2 min. 44 seconds
|
25 min. 32 seconds
|
|
Note: These figures include experiments conducted
at the University of Ghana and the University of Cheikh Anta Diop
(Dakar, Senegal)
|
We logged our results both manually, i.e., timing the download and noting
the times of completion, and automatically, using the proxy server's logging
capability. The automatic logging function indicates to the millisecond
how long it took to download a file completely. This function is particularly
useful for evaluating the download time for PDF files, since they are
single files containing entire articles, graphics and all, and they do
not become usable until they are completely downloaded. By contrast, HTML-based
articles, as composites of separate text and graphic files, are logged
by the proxy server as multiple file downloads, with no way to tell which
group of files belongs together as a single article. In addition, the
HTML articles become useful before all the component pieces are fully
downloaded, since the text tends to appear first and can be read while
the graphics are still on the way. Therefore the automatic logging function
is less useful as a measure of how quickly HTML articles download, since
the point of download "usefulness" is different from the point
of download completion. In the case of HTML articles, then, the manually
timed entries are perhaps more enlightening although less "accurate"
than the raw transfer rate data.
In addition to testing the Internet connections by downloading journal
articles, we also (successfully) endeavored in each case to improve the
connections. We brought with us a FreeBSD Unix system with Squid proxy
server, and transferred it to three of the four universities with markedly
improved performance results. The fourth university (in Senegal) was committed
to using Linux and a different proxy server, but we were able to improve
their authentication server, enhancing network performance again.
|