Simple Web Crawler Service

Requirements

  • Simple web crawler service that takes a page URL and returns the HTML markup of that page.
  • Only handles absolute urls.
GET /?url={page absolute url}
Host: localhost:3000
Response
status: 200 OK
content-type: json
body: {
data: “html Content”
}
GET /?url={wrong string}
Host: localhost:3000
Response
status: 400
text: ‘send absolute url with protocol included’

Installation

npm install
npm start

Libraries

npx express-generator
express — no-view simple-web-crawler-service
if (!req.query || !req.query.url 
|| !validator.isURL(req.query.url,
{ require_host: true, require_protocol: true })) {
return res.status(400).send(‘send absolute url with protocol included’) }
const axios = require(‘axios’)async function getContent(url) { 
try {
let response = await axios(url)
return response.data
} catch (error) {
return null
}
}
npm install jest --save-dev
"test": "jest — coverage — watchAll"

--

--

--

Senior Software Engineer @Andela

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

On verbosity of express endpoints.

Advent of Rust: Week 1 Recap

Difference between function declaration and function expression.

Why You Should Learn NextJs?

Front-End Web Dev Bootcamp in a Pandemic pt. IV

Server count post api

3 JS Keys for Junior Developers

Awesome Vue Star Rating

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Temitope Omotunde

Temitope Omotunde

Senior Software Engineer @Andela

More from Medium

Monetize your api using Stripe Mongoose Api

Google Summer of Code 2021 -Week #3–4

Version Controlling and MongoDB

Understanding of RESTful APIs and the Best Practices