Pagodo (Passive Google Dork) - Automate Google Hacking Database scraping and searching

The goal of this project was to develop a passive Google dork script to collect potentially vulnerable web pages and applications on the Internet. There are 2 parts. The first is ghdb_scraper.py that retrieves Google Dorks and the second portion is pagodo.py that leverages the information gathered by ghdb_scraper.py

Requirements

Git package
Python package
Proxychains

How to install and use

Step 1:

First you download pagodo package in your linux so type this below command in your linux terminal.

git clone https://github.com/opsdisk/pagodo.git

Step 2:

Now open pagodo folder in your Linux so type this below command in your linux terminal.

cd pagodo

Step 3:

Now you need to install requirements of pagodo package so type this below command in your linux terminal.

pip install -r requirements.txt

Step 4:

Now we need to give permission to read, write and execute of all python files so type this below command in your linux terminal.

chmod +x *.py

Step 5:

Now just type this below command in your linux terminal this command will hrlp you to run this tool in your linux terminal.

python3 pagodo.py -help

Google is blocking me

If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.

Install proxychains4
apt install proxychains4 -y

Edit the /etc/proxychains4.conf configuration file to round robin the look ups through different proxy servers. In the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and 9051). Don’t know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of Cyber Plumber’s Handbook and interactive lab to learn all about Secure Shell (SSH) tunneling, port redirection, and bending traffic like a boss.

vim /etc/proxychains4.conf
round_robin
chain_len = 1
proxy_dns remote_dns_subnet 224
tcp_read_time_out 15000
tcp_connect_time_out 8000
[ProxyList]
socks4 127.0.0.1 9050
socks4 127.0.0.1 9051

Throw proxychains4 in front of the Python script and each lookup will go through a different proxy (and thus source from a different IP). You could even tune down the -e delay time because you will be leveraging different proxy boxes.

proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1

ghdb_scraper.py

To start off, pagodo.py needs a list of all the current Google dorks. A datetimestamped file with the Google dorks and the indididual dork category dorks are also provided in the repo. Fortunately, the entire database can be pulled back with 1 GET request using ghdb_scraper.py. You can dump all dorks to a file, the individual dork categories to separate dork files, or the entire json blob if you want more contextual data about the dork.

To retrieve all dorks

python3 ghdb_scraper.py -j -s

To retrieve all dorks and write them to individual categories:

python3 ghdb_scraper.py -i