-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction
Pider aims to craft an elegant and useful spider framework by PHP programming language.
PHP is a good web programming language. There are a lot of web frameworks ,but less frameworks for scraping or data-process. I belive that PHP can do some more work than web ,likewise Python. So I want to create a scraping and data-process framework which incorporates crawler, data-cleaning , data-anaylsis , data-visulization.
- Templatize
Pider allow you to write a spider and manage its's life cycle through customizing just a template.
- Command Line
Pider framework provide lots of command line tools to manage spiders and datas scraped.
- Multiple Process
Single process is too slow when you want to scrape enormouse number of pages. So Pider supple multiple process module to allow you to request and extract data meantime. This feature can shortten the runtime of scrapes with large number of pages remarkably.
- Group
Sometimes, we need to request more than one page to complete a scrape task at first, and process datas scraped
from different pages after all requestes are done. We can use Group feature to bundle different requests into a group.
and these responses of requests will be bundled.Then you can process these response together easily.
- Data clean
Most datas that we pull from webpages are always half-baked for different causes. So, we often should do lots
of works to clean, reorgnized or complement the origin datas. Pider framework with a Data-Clean Model - ActiveCarbon Model
can release you from cumbersome data cleaning taskes.