# 爬虫教程

\[TOC]

## 参考资料

[Python3爬虫视频学习教程](http://cuiqingcai.com/4320.html)\
[Beautiful Soup 4.4.0 文档](http://beautifulsoup.readthedocs.io/zh_CN/latest/)

## 小实例入手

[Python爬虫实战（4）：抓取淘宝MM照片](http://python.jobbole.com/81359/)

## BeautifulSoup

| 解析器           | 使用方法                                                                 | 优势                              | 劣势                                  |
| ------------- | -------------------------------------------------------------------- | ------------------------------- | ----------------------------------- |
| Python标准库     | BeautifulSoup(markup, "html.parser")                                 | Python的内置标准库 执行速度适中 文档容错能力强     | Python 2.7.3 or 3.2.2)前 的版本中文档容错能力差 |
| lxml HTML 解析器 | BeautifulSoup(markup, "lxml")                                        | 速度快 文档容错能力强                     | 需要安装C语言库                            |
| lxml XML 解析器  | BeautifulSoup(markup, \["lxml", "xml"])；BeautifulSoup(markup, "xml") | 速度快 唯一支持XML的解析器                 | 需要安装C语言库                            |
| html5lib      | BeautifulSoup(markup, "html5lib")                                    | 最好的容错性；以浏览器的方式解析文档；生成HTML5格式的文档 | 速度慢；不依赖外部扩展                         |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.rizon.top/xue-xi-bi-ji/python/pa-chong.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.