Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL, Second Edition
- 6h 4m
- Michael Schrenk
- No Starch Press
- 2012
There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you?
Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions. Michael Schrenk, a highly regarded webbot developer, teaches you how to develop fault-tolerant designs, how best to launch and schedule the work of your bots, and how to create Internet agents that:
- Send email or SMS notifications to alert you to new information quickly
- Search different data sources and combine the results on one page, making the data easier to interpret and analyze
- Automate purchases, auction bids, and other online activities to save time
Sample projects for automating tasks like price monitoring and news aggregation will show you how to put the concepts you learn into practice.
This second edition of Webbots, Spiders, and Screen Scrapers includes tricks for dealing with sites that are resistant to crawling and scraping, writing stealthy webbots that mimic human search behavior, and using regular expressions to harvest specific data. As you discover the possibilities of web scraping, you'll see how webbots can save you precious time and give you much greater control over the data available on the Web.
About the Author
Michael Schrenk develops webbots and spiders for clients across North America. He has written for Computerworld and Web Techniques magazines and has taught college courses on web usability and Internet marketing. He is also an occasional speaker at DEFCON.
In this Book
-
Webbots, Spiders, and Screen Scrapers —A Guide to Developing Internet Agents with PHP/CURL, Second Edition
-
What’s in It for You?
-
Ideas for Webbot Projects
-
Downloading Web Pages
-
Basic Parsing Techniques
-
wAdvanced Parsing with Regular Expressions
-
Automating Form Submission
-
Managing Large Amounts of Data
-
Price-Monitoring Webbots
-
Image-Capturing Webbots
-
Link-Verification Webbots
-
Search-Ranking Webbots
-
Aggregation Webbots
-
FTP Webbots
-
Webbots That Read Email
-
Webbots That Send Email
-
Converting a Website into a Function
-
Spiders
-
Procurement Webbots and Snipers
-
Webbots and Cryptography
-
Authentication
-
Advanced Cookie Management
-
Scheduling Webbots and Spiders
-
Scraping Difficult Websites with Browser Macros
-
Hacking iMacros
-
Deployment and Scaling
-
Designing Stealthy Webbots and Spiders
-
Proxies
-
Writing Fault-Tolerant Webbots
-
Designing Webbot-Friendly Websites
-
Killing Spiders
-
Keeping Webbots out of Trouble