上回书说到,糗事百科段子的抓取,在无图言屌的网络时代,图片能给我们带来更大的冲击和信息量,那么这节课就抓一抓图片。
建工程
在平常写代码的文件夹下新建一个image_spider的文件夹

再在里面新建一个images文件夹

然后新建一个qiushibaike_image.py文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| import requests import re def crawl_image(image_url, image_local_path): r = requests.get(image_url, stream=True) with open(image_local_path, "wb") as f: f.write(r.content) def crwal(page): url = "http://www.qiushibaike.com/imgrank/page/" + str(page) res = requests.get(url) content_list = re.findall("<div class=\"thumb\">(.*?)</div>", res.content.decode("utf-8"), re.S) for content in content_list: image_list = re.findall("<img src=\"(.*?)\"", content) for image_url in image_list: crawl_image(image_url, "./images/" + image_url.strip().split('/')[-1]) if __name__ == '__main__': crwal(1)
|
运行
python qiushibaike_image.py
