This package provides kind of a framework and a lot of ready to use, so-called steps, that you can combine to build your own crawlers or scrapers with.
Library for Rapid (Web) Crawler and Scraper Development alternatives and similar libraries
Based on the "Scraping" category.
Alternatively, view crawler alternatives based on common mentions on social networks and blogs.
A browser testing and web crawling library for PHP and Symfony
Simple and fast HTML and XML parser
Instrument headless chrome/chromium instances from PHP
Google Search Results PHP API via Serp Api
Do you think we are missing an alternative of Library for Rapid (Web) Crawler and Scraper Development or a related project?
Library for Rapid (Web) Crawler and Scraper Development
This library provides kind of a framework and a lot of ready to use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with.
To give you an overview, here's a list of things that it helps you with:
- Crawler Politeness 😇 (respecting robots.txt, throttling,...)
- Load URLs using
- Get absolute links from HTML documents 🔗
- Get sitemaps from robots.txt and get all URLs from those sitemaps
- Crawl (load) all pages of a website 🕷
- Use any HTTP methods (GET, POST,...) and send any headers or body
- Iterate over paginated list pages 🔁
- Extract data from:
- Extract schema.org structured data in JSON-LD format from HTML documents
- Keep memory usage low by using PHP Generators 💪
- Cache HTTP responses during development, so you don't have to load pages again and again after every code change
- Get logs about what your crawler is doing (accepts any PSR-3 LoggerInterface)
- And a lot more...
You can find the documentation at crwlr.software.
If you consider contributing something to this package, read the [contribution guide (CONTRIBUTING.md)](CONTRIBUTING.md).