Fetching
For fetching page we are going to use LWP::UserAgent.
But any other HTTP
client will work (e.g.
HTTP::Tiny,
HTTP::Lite).
Let's try fetching a page with a GET
request.
use LWP::UserAgent; my $ua = LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>'); my $response = $ua->get('http://example:3000/'); if ($response->is_success) { say $response->decoded_content; } else { die $response->status_line; }
In case of a success you'll get a sample html page, otherwise the script will die with a status line that usually holds an error message.
Let's try fetching a not existing page.
use LWP::UserAgent; my $ua = LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>'); my $response = $ua->get('http://example:3000/not_found'); if ($response->is_success) { say $response->decoded_content; } else { die $response->status_line; }
These errors occur on the server side and we get a server error message. But what happens when we cannot connect to the server at all?
use LWP::UserAgent; my $ua = LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>'); my $response = $ua->get('http://unknown.server'); if ($response->is_success) { say $response->decoded_content; } else { die $response->status_line; }
As you can see we got a 500
error. But it looks like the server is alive, just doesn't work correctly, which is false of course. In order to know whether this error was internal or external LWP
sets a special Client-Warning
header.
use LWP::UserAgent; my $ua = LWP::UserAgent->new(agent => 'MyWebScraper/1.0 <http://example.com>'); my $response = $ua->get('http://unknown.server'); my $client_warning = $response->headers->header('Client-Warning'); if ($client_warning && $client_warning eq 'Internal response') { die 'Internal error: ' . $response->status_line; } else { die 'Server error: ' . $response->status_line; }
Now we know that the error is on our side.
Exercise
Download a page from http://example:3000
and print out its size in bytes.
use LWP::UserAgent; my $ua = LWP::UserAgent->new; ... say ...