QUICK TIP - Crawling Web Pages With PHP Simple HTML DOM

0 2,156
learn_web_scraping_with_nodejs_the_crash_course.jpg | A&H Business Technology
Related posts

PHP Simple HTML DOM is a one-file library that lets you traverse the elements of an HTML and search for specific elements. The examples below show how to use this library.

Description, Requirement & Features

  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

how to implement PHP Simple HTML DOM to Crawl web pages

 

The examples below will show you how to use the library in corrected syntax, and how to scrape different HTML elements.

Scrape abc.com for basic example

Example of scraping the DevDungeon.com archive page and pulling all the post titles. In the future the page may change and this script may break. YMMV.

Scrape digg.com for advanced example

The digg.com is a popular news website. This example loads the main page from https://digg.com/, extracts news items and returns the details in a custom format.

Scrape imdb.com for advanced example

As we know that, imdb.com is a famous movie database that provides us much more information about movies such as title, year publish, actress and so on… This example loads a page from IMDb and displays the most important details in a custom format.

Scrape slashdot.org for advanced example

This is Slashdot, a website based on and running the Slashdot-Like Automated Story-Telling Homepage software. You’re reading the FAQ. Slashdot was created in 1997 by Rob “CmdrTaco” Malda. Today it’s owned by Slashdot Media. Slashdot is run primarily by a handful of editors and coders, with the help of many others. The editors are Beau (“BeauHD”) Hamilton, Manish (“manishs”) Singh, David (“EditorDavid”), and Logan (“whipslash”) Abbott.

This example loads a page from Slashdot and displays articles in a custom format and scrape exactly what you really want.

 

TinyURL for this post: https://tinyurl.com/y2wqfken

Sorry, The Comment Form Is Temporarily Closed At This Time
You may also like

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Language:English