Skip to content

Command Line

johans edited this page Sep 29, 2018 · 16 revisions

Command Line

     Pider framework shipped with serveral commands for multiple purposes and each one accepts a different set of arguments and options.

Configuration settings

     A lot default behaviors in Pider framework are controlled by configurations. Pider adopts the php array style configurations.

  1. src/Config/config.php (framework wide)
  2. Config/config.php (inside a Pider project's root)

Settings from these files are merged in the listed order of preference.The project configurations in the gut of Config/config.php have higher priority than the framework wide configurations(src/Config/config.php). It's recommended that configurations of framework is just for default behaviors and can't be modified unneccessarily, but configurations of project can be adjust for each project requirement.

Using the pider tool

     You can start by runing the pider tool with no arguments and it will print some usage help and the available commands.

[root@41f16764df90 pider]# ./pider 

.______    __   _______   _______ .______      
|   _  \  |  | |       \ |   ____||   _  \     
|  |_)  | |  | |  .--.  ||  |__   |  |_)  |    
|   ___/  |  | |  |  |  ||   __|  |      /     
|  |      |  | |  '--'  ||  |____ |  |\  \----.
| _|      |__| |_______/ |_______|| _| `._____|
Usage:
 ./pider [command]

Description:
 Project tools for pider

Available commands:
 help
 list
 crawl
 runspider
 rundigest
 checkurl

Get help usage

     Once you don't know or remember the usage of a command. You can just run the command with the only help option to get the detail usage (./pider list --help or ./pider crawl --help).

root@41f16764df90 pider]# ./pider list --help
Description:
  list all availabe spiders

Usage:
  list

Options:
  -h, --help            Display this help message
  -q, --quiet           Do not output any message
  -V, --version         Display this application version
      --ansi            Force ANSI output
      --no-ansi         Disable ANSI output
  -n, --no-interaction  Do not ask any interactive question
  -v|vv|vvv, --verbose  Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
[root@41f16764df90 pider]# ./pider crawl  --help
Description:
  crawl urls supplied

Usage:
  crawl [options] [--] [<url>]

Arguments:
  url                        url to crawled

Options:
  -f, --file=FILE            file contains urls to be crawled
  -s, --spider[=SPIDER]      spider be appointed
  -t, --filetype[=FILETYPE]  filetype specified, defaults: txt
  -a, --attach[=ATTACH]      data will be attached to request,json format
  -l, --loglevel[=LOGLEVEL]  log which matches level option will output
  -h, --help                 Display this help message
  -q, --quiet                Do not output any message
  -V, --version              Display this application version
      --ansi                 Force ANSI output
      --no-ansi              Disable ANSI output
  -n, --no-interaction       Do not ask any interactive question
  -v|vv|vvv, --verbose       Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Listing all spiders

     Spiders locate in spiders of your project root directory by default, and you can change the default behavior on configuration.

[root@41f16764df90 pider]# ls -la
rwxr-xr-x  1 root root    15 Sep 12 12:07 console -> src/bin/console
drwxr-xr-x  3 root root    96 Sep 11 09:41 doc
-rwxr-xr-x  1 root root 11902 Sep 11 11:56 install.sh
lrwxrwxrwx  1 root root    14 Sep 13 11:19 pider -> src/bin/pider2
lrwxrwxrwx  1 root root    14 Sep 12 07:24 piderd -> src/bin/piderd
drwxr-xr-x  3 root root    96 Sep 11 11:51 setup
drwxr-xr-x  4 root root   128 Sep 27 08:29 spiders
drwxr-xr-x 27 root root   864 Sep 21 06:36 src
[root@41f16764df90 pider]# ls -la spiders/
total 4
drwxr-xr-x  4 root root 128 Sep 27 08:29 .
drwxr-xr-x 29 root root 928 Sep 28 10:27 ..
-rw-r--r--  1 root root   0 Sep 11 09:41 .spdierignores
-rw-r--r--  1 root root 558 Sep 27 08:29 ExampleSpider.php
[root@41f16764df90 pider]# ./pider list
All Available Spiders:

 * ExampleSpider

Running a spider

    

  • with just a spider name
./pider runspdier ExampleSpider 
  • with specified spider path
./pider runspider spiders/ExampleSpider.php

Checking spiders with url

    

[root@41f16764df90 pider]# ./pider checkurl  http://www.example.com
URL:

 http://www.example.com

Available spiders:

 * ExampleSpider

Crawling a url

./pider crawl http://www.example.com

Crawling with a file containing a bunch of urls

./pider crawl -f /path/to/file

Clone this wiki locally