Fetching

For fetching page we are going to use LWP::UserAgent. But any other HTTP client will work (e.g. HTTP::Tiny, HTTP::Lite).

Let's try fetching a page with a GET request.

use LWP::UserAgent;

my $ua =
  LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>');

my $response = $ua->get('http://example:3000/');

if ($response->is_success) {
    say $response->decoded_content;
} else {
    die $response->status_line;
}

In case of a success you'll get a sample html page, otherwise the script will die with a status line that usually holds an error message.

Let's try fetching a not existing page.

use LWP::UserAgent;

my $ua =
  LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>');

my $response = $ua->get('http://example:3000/not_found');

if ($response->is_success) {
    say $response->decoded_content;
} else {
    die $response->status_line;
}

These errors occur on the server side and we get a server error message. But what happens when we cannot connect to the server at all?

use LWP::UserAgent;

my $ua =
  LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>');

my $response = $ua->get('http://unknown.server');

if ($response->is_success) {
    say $response->decoded_content;
} else {
    die $response->status_line;
}

As you can see we got a 500 error. But it looks like the server is alive, just doesn't work correctly, which is false of course. In order to know whether this error was internal or external LWP sets a special Client-Warning header.

use LWP::UserAgent;

my $ua =
  LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>');

my $response = $ua->get('http://unknown.server');

my $client_warning = $response->headers->header('Client-Warning');

if ($client_warning && $client_warning eq 'Internal response') {
    die 'Internal error: ' . $response->status_line;
} else {
    die 'Server error: ' . $response->status_line;
}

Now we know that the error is on our side.

Exercise

Download a page from http://example:3000 and print out its size in bytes.

use LWP::UserAgent;

my $ua = LWP::UserAgent->new;

...

say ...