Python requests 库完全指南

本文档面向零基础新手，目标是让你真正理解：

HTTP 协议基础：请求和响应是什么
requests 库的安装与入门
GET / POST / PUT / DELETE 等请求方式
请求头（Headers）、查询参数、请求体的设置
响应对象的所有属性：状态码、文本、JSON、二进制
文件上传与下载
Session 会话（保持登录状态）
超时、重试、代理的配置
身份认证（Basic Auth、Token、OAuth）
错误处理与异常
实战案例：天气查询、网页抓取、API 调用

配有大量可运行示例，全部从最基础讲起。

第一部分：HTTP 基础知识

1.1 什么是 HTTP？

HTTP（超文本传输协议）是浏览器与服务器之间"对话"的规则。

你每天浏览网页时发生的事情：

  你的浏览器                          服务器
  ──────────                          ──────
  "我要看 baidu.com 首页"  ──────►   "好的，给你HTML代码"
         HTTP 请求（Request）         HTTP 响应（Response）

  用 Python 写代码也可以做同样的事：
  requests.get('https://baidu.com')  ──► 返回 HTML 内容

1.2 HTTP 请求的组成

一个 HTTP 请求包含：

┌─────────────────────────────────────────────────────┐
│ 请求行：  GET /search?q=python HTTP/1.1              │
│           ↑    ↑              ↑                     │
│         方法  路径           协议版本                │
├─────────────────────────────────────────────────────┤
│ 请求头（Headers）：                                  │
│   Host: www.example.com                             │
│   User-Agent: Mozilla/5.0 ...                       │
│   Content-Type: application/json                    │
│   Authorization: Bearer token123                    │
├─────────────────────────────────────────────────────┤
│ 请求体（Body）：                                     │
│   {"username": "admin", "password": "123456"}       │
│   （GET 请求通常没有请求体）                         │
└─────────────────────────────────────────────────────┘

1.3 HTTP 方法（动词）

GET     ──► 获取资源（查询）          如：搜索商品
POST    ──► 提交数据（新建）          如：用户注册、提交表单
PUT     ──► 替换资源（全量更新）      如：修改用户全部信息
PATCH   ──► 修改资源（局部更新）      如：只修改用户头像
DELETE  ──► 删除资源                  如：删除一篇文章
HEAD    ──► 只获取响应头（不要正文）  如：检查文件是否存在
OPTIONS ──► 查询服务器支持哪些方法

1.4 HTTP 状态码

1xx  信息性响应
2xx  成功
  200 OK            ──► 请求成功
  201 Created       ──► 创建成功（POST 后常见）
  204 No Content    ──► 成功但无返回内容（DELETE 后常见）
3xx  重定向
  301 Moved Permanently  ──► 永久跳转
  302 Found              ──► 临时跳转
4xx  客户端错误（你的问题）
  400 Bad Request   ──► 请求格式错误
  401 Unauthorized  ──► 未认证（需要登录）
  403 Forbidden     ──► 无权限（已登录但没权限）
  404 Not Found     ──► 找不到资源
  429 Too Many Requests ──► 请求太频繁
5xx  服务器错误（对方的问题）
  500 Internal Server Error ──► 服务器内部错误
  502 Bad Gateway           ──► 网关错误
  503 Service Unavailable   ──► 服务暂时不可用

第二部分：安装与入门

2.1 安装 requests

pip install requests

验证安装：

import requests
print(requests.__version__)   # 如：2.31.0

2.2 第一个请求

import requests

# 向一个公开的测试 API 发送 GET 请求
response = requests.get('https://httpbin.org/get')

# 查看响应
print(f"状态码：{response.status_code}")   # 200
print(f"内容类型：{response.headers['Content-Type']}")
print(f"响应内容（前200字符）：{response.text[:200]}")

解释：

requests.get(url)  → 发送 GET 请求，返回 Response 对象
response.status_code → HTTP 状态码（200=成功）
response.text        → 响应内容（字符串）
response.headers     → 响应头（字典）

2.3 使用公开测试接口练习

本章大量示例使用 https://httpbin.org——这是专门用来测试 HTTP 请求的公开网站：

https://httpbin.org/get         ──► 返回你的 GET 请求信息
https://httpbin.org/post        ──► 返回你的 POST 请求信息
https://httpbin.org/put         ──► 返回你的 PUT 请求信息
https://httpbin.org/delete      ──► 返回你的 DELETE 请求信息
https://httpbin.org/status/404  ──► 返回指定状态码
https://httpbin.org/delay/3     ──► 延迟3秒后返回（测试超时）
https://httpbin.org/headers     ──► 返回你发送的请求头
https://httpbin.org/ip          ──► 返回你的 IP 地址
https://httpbin.org/json        ──► 返回一段 JSON 数据

第三部分：GET 请求

3.1 基本 GET 请求

import requests

# 最简单的 GET 请求
response = requests.get('https://httpbin.org/get')

print(f"状态码：{response.status_code}")    # 200
print(f"是否成功：{response.ok}")           # True（状态码 < 400 时为 True）
print(f"编码：{response.encoding}")         # utf-8
print(f"响应时间：{response.elapsed}")      # 如：0:00:00.234567
print(f"最终URL：{response.url}")           # 经过重定向后的最终 URL

3.2 带查询参数的 GET 请求

查询参数（Query Parameters）是 URL 中 ? 后面的部分，如： https://api.example.com/search?q=python&page=1&limit=10

import requests

# ===== 方式1：直接在 URL 里写 =====
url = 'https://httpbin.org/get?name=张三&age=25&city=北京'
response = requests.get(url)
print(response.json()['args'])
# {'age': '25', 'city': '北京', 'name': '张三'}

# ===== 方式2：用 params 参数（推荐！自动 URL 编码）=====
params = {
    'name':  '张三',
    'age':   25,
    'city':  '北京'
}
response = requests.get('https://httpbin.org/get', params=params)

# 查看实际请求的 URL
print(f"实际URL：{response.url}")
# https://httpbin.org/get?name=%E5%BC%A0%E4%B8%89&age=25&city=%E5%8C%97%E4%BA%AC
# （中文被自动编码了！这就是推荐 params 方式的原因）

print(response.json()['args'])
# {'age': '25', 'city': '北京', 'name': '张三'}

# ===== 传递列表参数（多值）=====
params_multi = {
    'ids': [1, 2, 3],     # 传递多个 id
    'tag': ['python', 'web']
}
response = requests.get('https://httpbin.org/get', params=params_multi)
print(response.url)
# ?ids=1&ids=2&ids=3&tag=python&tag=web

# ===== 实际应用：调用搜索 API =====
def search_github_repos(keyword, language='python', per_page=5):
    """搜索 GitHub 仓库"""
    url    = 'https://api.github.com/search/repositories'
    params = {
        'q':        f'{keyword} language:{language}',
        'sort':     'stars',
        'order':    'desc',
        'per_page': per_page
    }
    response = requests.get(url, params=params)

    if response.status_code == 200:
        data  = response.json()
        repos = data['items']
        print(f"找到 {data['total_count']} 个仓库，显示前{per_page}个：n")
        for repo in repos:
            print(f"  ⭐ {repo['stargazers_count']:>8,}  {repo['full_name']}")
            print(f"     {repo['description']}n")
    else:
        print(f"请求失败：{response.status_code}")

search_github_repos('web scraping')

3.3 Response 对象详解

import requests

response = requests.get('https://httpbin.org/json')

# ===== 基本信息 =====
print(f"状态码：    {response.status_code}")   # 200
print(f"是否成功：  {response.ok}")            # True
print(f"原因短语：  {response.reason}")        # 'OK'
print(f"最终 URL：  {response.url}")
print(f"响应耗时：  {response.elapsed.total_seconds():.3f}秒")

# ===== 响应头 =====
print(f"n响应头：")
for key, value in response.headers.items():
    print(f"  {key}: {value}")

print(f"nContent-Type：{response.headers.get('Content-Type')}")
print(f"Content-Length：{response.headers.get('Content-Length', '未知')}")

# ===== 响应内容（三种格式）=====

# 1. 文本格式（自动解码）
print(f"n文本内容（前100字符）：{response.text[:100]}")

# 2. JSON 格式（直接解析为 Python 字典/列表）
data = response.json()
print(f"nJSON 数据：{data}")

# 3. 二进制格式（图片/文件等）
raw_bytes = response.content
print(f"n二进制数据长度：{len(raw_bytes)} 字节")

# ===== 编码处理 =====
# requests 会自动检测编码，但有时需要手动指定
response2 = requests.get('https://www.baidu.com')
response2.encoding = 'utf-8'   # 手动指定编码
print(response2.text[:200])

# ===== 请求历史（重定向链）=====
response3 = requests.get('http://github.com')   # http 会跳转到 https
print(f"n重定向链：")
for r in response3.history:
    print(f"  {r.status_code} → {r.url}")
print(f"最终：{response3.status_code} {response3.url}")

第四部分：POST 请求

4.1 发送表单数据（application/x-www-form-urlencoded）

import requests

# 模拟提交 HTML 表单
form_data = {
    'username': 'zhangsan',
    'password': '123456',
    'remember': 'true'
}

response = requests.post('https://httpbin.org/post', data=form_data)
result   = response.json()

print(f"状态码：{response.status_code}")
print(f"发送的表单数据：{result['form']}")
# {'password': '123456', 'remember': 'true', 'username': 'zhangsan'}

print(f"Content-Type：{result['headers']['Content-Type']}")
# application/x-www-form-urlencoded

4.2 发送 JSON 数据（application/json）

import requests

# 现代 REST API 通常使用 JSON 格式
json_data = {
    'title':   '学习 Python requests',
    'content': '今天学习了 requests 库的基本用法',
    'tags':    ['python', 'http', '学习'],
    'is_public': True,
    'views':   0
}

response = requests.post(
    'https://httpbin.org/post',
    json=json_data   # 用 json= 参数，自动设置 Content-Type: application/json
)

result = response.json()
print(f"发送的 JSON：{result['json']}")
print(f"Content-Type：{result['headers']['Content-Type']}")
# application/json

# ===== 对比：data= vs json= =====

# data=：手动把字典转JSON字符串，需要手动设置 Content-Type
import json
headers = {'Content-Type': 'application/json'}
response_manual = requests.post(
    'https://httpbin.org/post',
    data=json.dumps(json_data),
    headers=headers
)

# json=：自动序列化 + 自动设置 Content-Type（推荐！）
response_auto = requests.post(
    'https://httpbin.org/post',
    json=json_data
)

# 两者效果完全相同，推荐用 json= 参数

4.3 实战：调用 RESTful API

import requests

BASE_URL = 'https://jsonplaceholder.typicode.com'

# ===== GET：获取数据 =====
def get_post(post_id):
    response = requests.get(f'{BASE_URL}/posts/{post_id}')
    return response.json()

# ===== POST：创建数据 =====
def create_post(title, body, user_id=1):
    response = requests.post(
        f'{BASE_URL}/posts',
        json={'title': title, 'body': body, 'userId': user_id}
    )
    return response.json()

# ===== PUT：全量更新 =====
def update_post(post_id, title, body):
    response = requests.put(
        f'{BASE_URL}/posts/{post_id}',
        json={'id': post_id, 'title': title, 'body': body, 'userId': 1}
    )
    return response.json()

# ===== PATCH：局部更新 =====
def patch_post(post_id, **fields):
    response = requests.patch(
        f'{BASE_URL}/posts/{post_id}',
        json=fields   # 只发送需要更新的字段
    )
    return response.json()

# ===== DELETE：删除数据 =====
def delete_post(post_id):
    response = requests.delete(f'{BASE_URL}/posts/{post_id}')
    return response.status_code   # 成功返回 200

# 测试所有操作
post = get_post(1)
print(f"获取文章：{post['title']}")

new_post = create_post('测试标题', '这是内容')
print(f"创建文章，ID：{new_post['id']}")

updated = update_post(1, '新标题', '新内容')
print(f"更新文章：{updated['title']}")

patched = patch_post(1, title='只改标题')
print(f"局部更新：{patched['title']}")

code = delete_post(1)
print(f"删除文章，状态码：{code}")   # 200

第五部分：请求头（Headers）

5.1 为什么要设置请求头？

常见场景：
  ① 模拟浏览器（User-Agent）：某些网站拒绝非浏览器请求
  ② 身份认证（Authorization）：告诉服务器你是谁
  ③ 指定数据格式（Content-Type / Accept）
  ④ 防爬保护绕过（Referer）
  ⑤ 缓存控制（Cache-Control）

import requests

# ===== 设置自定义请求头 =====
headers = {
    # 模拟 Chrome 浏览器（最常用！防止被识别为爬虫）
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/120.0.0.0 Safari/537.36',

    # 告诉服务器我能接受 JSON 格式
    'Accept': 'application/json',

    # 告诉服务器我发送的是 JSON
    'Content-Type': 'application/json',

    # 防盗链（告诉服务器是从哪个页面过来的）
    'Referer': 'https://www.example.com',

    # 接受压缩数据（加快传输速度）
    'Accept-Encoding': 'gzip, deflate, br',

    # 接受的语言
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json()['headers'])

# ===== 查看默认的请求头 =====
response2 = requests.get('https://httpbin.org/headers')
print(f"n默认 User-Agent：{response2.json()['headers']['User-Agent']}")
# python-requests/2.31.0（requests 默认 UA）

# ===== 实用函数：创建常用请求头 =====
def get_browser_headers(referer=None):
    """返回模拟浏览器的标准请求头"""
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                      'AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/120.0.0.0 Safari/537.36',
        'Accept':          'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection':      'keep-alive',
    }
    if referer:
        headers['Referer'] = referer
    return headers

response3 = requests.get('https://www.example.com', headers=get_browser_headers())
print(f"n状态码：{response3.status_code}")

5.2 Authorization 请求头（身份认证）

import requests

# ===== Bearer Token 认证（最常见于现代 API）=====
token = 'your_api_token_here'
headers = {
    'Authorization': f'Bearer {token}'
}
response = requests.get('https://httpbin.org/bearer', headers=headers)
# 或者：
response = requests.get(
    'https://httpbin.org/bearer',
    headers={'Authorization': f'Bearer {token}'}
)

# ===== API Key 认证（不同 API 传递方式不同）=====

# 方式1：放在请求头
api_key_headers = {'X-API-Key': 'my_api_key_123'}
requests.get('https://api.example.com/data', headers=api_key_headers)

# 方式2：放在查询参数
requests.get('https://api.example.com/data', params={'api_key': 'my_api_key_123'})

# 方式3：放在请求体（POST 时）
requests.post('https://api.example.com/data',
              json={'api_key': 'my_api_key_123', 'data': '...'})

第六部分：身份认证

6.1 Basic Auth（基本认证）

import requests
from requests.auth import HTTPBasicAuth

# 方式1：直接传元组（最简洁）
response = requests.get(
    'https://httpbin.org/basic-auth/user/passwd',
    auth=('user', 'passwd')
)
print(f"Basic Auth：{response.status_code} {response.json()}")

# 方式2：使用 HTTPBasicAuth 对象（显式更清晰）
response2 = requests.get(
    'https://httpbin.org/basic-auth/user/passwd',
    auth=HTTPBasicAuth('user', 'passwd')
)
print(f"HTTPBasicAuth：{response2.status_code}")

# 认证失败的情况（用错误的密码）
response3 = requests.get(
    'https://httpbin.org/basic-auth/user/passwd',
    auth=('user', 'wrong_password')
)
print(f"错误密码：{response3.status_code}")   # 401

6.2 Digest Auth 和 Token Auth

import requests
from requests.auth import HTTPDigestAuth

# Digest 认证（比 Basic 更安全）
response = requests.get(
    'https://httpbin.org/digest-auth/auth/user/passwd',
    auth=HTTPDigestAuth('user', 'passwd')
)
print(f"Digest Auth：{response.status_code}")

# 自定义 Token 认证（最常见于现代 API）
class TokenAuth(requests.auth.AuthBase):
    """自定义 Token 认证类"""

    def __init__(self, token):
        self.token = token

    def __call__(self, r):
        # 在每个请求上自动添加 Authorization 头
        r.headers['Authorization'] = f'Bearer {self.token}'
        return r

# 使用自定义认证
token_auth = TokenAuth('my_access_token_xyz')
response = requests.get('https://httpbin.org/get', auth=token_auth)
print(response.json()['headers'].get('Authorization'))
# Bearer my_access_token_xyz

第七部分：文件上传与下载

7.1 上传文件

import requests

# ===== 方式1：上传单个文件 =====
with open('/path/to/image.jpg', 'rb') as f:
    response = requests.post(
        'https://httpbin.org/post',
        files={'file': f}
    )
print(response.json()['files'])

# ===== 方式2：指定文件名和 Content-Type =====
with open('/path/to/data.csv', 'rb') as f:
    response = requests.post(
        'https://httpbin.org/post',
        files={
            'file': ('custom_name.csv', f, 'text/csv')
            #         ↑文件名           ↑内容  ↑内容类型
        }
    )

# ===== 方式3：上传多个文件 =====
files = [
    ('images', ('photo1.jpg', open('photo1.jpg', 'rb'), 'image/jpeg')),
    ('images', ('photo2.jpg', open('photo2.jpg', 'rb'), 'image/jpeg')),
]
response = requests.post('https://httpbin.org/post', files=files)

# ===== 方式4：文件 + 表单数据同时上传 =====
with open('avatar.png', 'rb') as f:
    response = requests.post(
        'https://httpbin.org/post',
        files={'avatar': f},
        data={'username': '张三', 'bio': '这是简介'}  # 同时传表单数据
    )

print(response.json()['files'])  # 文件
print(response.json()['form'])   # 表单数据

# ===== 方式5：从内存上传（不需要本地文件）=====
import io
content = b'name,agenxe5xbcxa0xe4xb8x89,25'  # CSV 内容
response = requests.post(
    'https://httpbin.org/post',
    files={'data': ('report.csv', io.BytesIO(content), 'text/csv')}
)

7.2 下载文件

import requests
import os

def download_file(url, save_path, chunk_size=8192):
    """
    下载文件到本地，支持大文件（流式下载）

    参数：
        url        - 下载链接
        save_path  - 本地保存路径
        chunk_size - 每次读取的块大小（字节）
    """
    response = requests.get(url, stream=True)  # stream=True：不立即下载全部内容
    response.raise_for_status()                # 状态码不是 2xx 时抛出异常

    # 获取文件总大小（不是所有服务器都提供）
    total_size = int(response.headers.get('Content-Length', 0))

    os.makedirs(os.path.dirname(save_path) or '.', exist_ok=True)

    downloaded = 0
    with open(save_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:   # 过滤掉保持连接的空块
                f.write(chunk)
                downloaded += len(chunk)

                # 显示进度
                if total_size:
                    pct = downloaded / total_size * 100
                    bar = '█' * int(pct / 2)
                    print(f'r  [{bar:<50}] {pct:.1f}%  '
                          f'{downloaded//1024}KB/{total_size//1024}KB',
                          end='', flush=True)

    print(f'n✅ 下载完成：{save_path}')
    return save_path

# 下载一张图片
download_file(
    'https://httpbin.org/image/png',
    'downloaded_image.png'
)

# 下载一个文本文件
download_file(
    'https://raw.githubusercontent.com/psf/requests/main/README.md',
    'requests_readme.md'
)

7.3 下载图片（直接到内存）

import requests
from PIL import Image   # pip install Pillow
import io

def download_image(url):
    """下载图片，返回 PIL Image 对象（不保存到磁盘）"""
    response = requests.get(url)
    response.raise_for_status()

    image = Image.open(io.BytesIO(response.content))
    return image

# 使用
img = download_image('https://httpbin.org/image/jpeg')
print(f"图片尺寸：{img.size}")
print(f"图片格式：{img.format}")
img.save('downloaded.jpg')

第八部分：Session 会话

8.1 为什么要用 Session？

问题：HTTP 是无状态协议，每次请求都是独立的。
     登录后，下一次请求服务器不知道你已经登录！

解决：Session（会话）自动保持 Cookie，
     让每次请求都带着"已登录"的信息。

另一个好处：同一个 Session 复用 TCP 连接，减少开销，速度更快。

import requests

# ===== 不用 Session：每次请求独立 =====
# 登录后获取 Cookie，下次请求需要手动带上
r1 = requests.get('https://httpbin.org/cookies/set/user/zhangsan')
print(f"r1 cookies：{r1.cookies.get('user')}")  # zhangsan

r2 = requests.get('https://httpbin.org/cookies')
print(f"r2 cookies：{r2.json()}")  # {} ← 第二次请求没有 Cookie！

# ===== 用 Session：自动保持 Cookie =====
session = requests.Session()

r1 = session.get('https://httpbin.org/cookies/set/user/zhangsan')
print(f"r1 cookies：{session.cookies.get('user')}")   # zhangsan

r2 = session.get('https://httpbin.org/cookies')
print(f"r2 cookies：{r2.json()}")
# {'cookies': {'user': 'zhangsan'}}  ← Cookie 自动保持！

session.close()   # 用完记得关闭

8.2 Session 的完整用法

import requests

# ===== 方式1：手动管理（记得 close）=====
session = requests.Session()

# 设置全局请求头（每次请求都带上）
session.headers.update({
    'User-Agent': 'MyApp/1.0',
    'Accept':     'application/json',
})

# 设置全局认证（每次请求都带上）
session.auth = ('username', 'password')

# 设置全局参数
session.params = {'api_version': 'v2'}

# 用 Session 发送请求（和普通请求一样的方法）
r = session.get('https://httpbin.org/get')
print(r.json())

session.close()

# ===== 方式2：with 语句（推荐！自动关闭）=====
with requests.Session() as session:
    session.headers.update({'User-Agent': 'MyApp/1.0'})

    r1 = session.get('https://httpbin.org/get')
    r2 = session.post('https://httpbin.org/post', json={'data': 'test'})

    print(f"r1 状态：{r1.status_code}")
    print(f"r2 状态：{r2.status_code}")
# 退出 with 块后自动关闭

8.3 模拟登录全过程

import requests

def simulate_login():
    """
    模拟完整的网站登录流程：
    1. 获取登录页面（可能含 CSRF token）
    2. 提交登录表单
    3. 带着 Cookie 访问需要登录的页面
    """
    with requests.Session() as session:
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                          'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
        })

        # 第1步：GET 登录页面（Session 自动保存服务器设置的 Cookie）
        print("1. 获取登录页面...")
        login_page = session.get('https://httpbin.org/cookies/set/csrf_token/abc123')
        print(f"   CSRF Token：{session.cookies.get('csrf_token')}")

        # 第2步：POST 登录请求（携带 CSRF token 和账号密码）
        print("2. 提交登录...")
        login_data = {
            'username':   'testuser',
            'password':   'testpass',
            'csrf_token': session.cookies.get('csrf_token', ''),
        }
        response = session.post(
            'https://httpbin.org/post',
            data=login_data
        )
        print(f"   登录状态：{response.status_code}")

        # 第3步：访问需要登录的页面
        print("3. 访问受保护页面...")
        protected = session.get('https://httpbin.org/cookies')
        print(f"   携带的 Cookie：{protected.json()['cookies']}")

        return session.cookies

simulate_login()

第九部分：超时、重试、代理

9.1 超时设置

import requests
from requests.exceptions import Timeout, ConnectionError

# ===== 设置超时（强烈建议！否则可能永远等下去）=====

# 连接超时 + 读取超时（元组形式）
# connect_timeout：等待服务器响应的最长时间
# read_timeout：读取响应数据的最长时间
try:
    response = requests.get(
        'https://httpbin.org/delay/5',   # 模拟5秒延迟
        timeout=(3, 5)                    # 连接超时3秒，读取超时5秒
    )
except Timeout:
    print("❌ 请求超时！")

# 统一设置（连接和读取用同一个超时值）
try:
    response = requests.get(
        'https://httpbin.org/delay/2',
        timeout=1   # 总共只等1秒
    )
except Timeout:
    print("❌ 1秒内没有响应")

# 不设超时（危险！程序可能卡死）
# response = requests.get('https://example.com')   # ❌ 没有 timeout

# ===== 建议的超时值 =====
TIMEOUT_FAST   = (3, 10)    # 快速接口：连接3秒，读取10秒
TIMEOUT_SLOW   = (5, 60)    # 慢接口（如文件上传）
TIMEOUT_STRICT = 5          # 严格限制：总共不超过5秒

9.2 自动重试（urllib3）

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session_with_retry(
    retries=3,
    backoff_factor=0.5,
    status_forcelist=(500, 502, 503, 504)
):
    """
    创建带自动重试的 Session

    参数：
        retries          - 最大重试次数
        backoff_factor   - 退避因子（重试间隔 = backoff_factor * 2^(retry次数-1)）
                           0.5 → 第1次0.5s，第2次1s，第3次2s
        status_forcelist - 遇到哪些状态码时触发重试
    """
    session = requests.Session()

    retry_strategy = Retry(
        total=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
        allowed_methods=["GET", "POST"],   # 哪些方法允许重试
        raise_on_status=False
    )

    adapter = HTTPAdapter(max_retries=retry_strategy)

    # 对 http 和 https 都启用重试
    session.mount('https://', adapter)
    session.mount('http://',  adapter)

    return session

# 使用
session = create_session_with_retry(retries=3)

try:
    response = session.get('https://httpbin.org/status/503', timeout=5)
    print(f"状态码：{response.status_code}")
except Exception as e:
    print(f"多次重试后失败：{e}")

session.close()

9.3 手动重试（更灵活）

import requests
import time
from typing import Optional

def request_with_retry(
    method: str,
    url: str,
    max_retries: int = 3,
    retry_delay: float = 1.0,
    timeout: float = 10.0,
    **kwargs
) -> Optional[requests.Response]:
    """
    带手动重试逻辑的请求函数

    会在以下情况重试：
    - 网络连接错误
    - 超时
    - 5xx 服务器错误
    """
    last_error = None

    for attempt in range(1, max_retries + 1):
        try:
            response = requests.request(method, url, timeout=timeout, **kwargs)

            if response.status_code < 500:
                return response

            # 5xx 错误，等待后重试
            print(f"  [第{attempt}次] 服务器错误 {response.status_code}，{retry_delay}s 后重试...")

        except requests.exceptions.ConnectionError as e:
            print(f"  [第{attempt}次] 连接错误：{e}，{retry_delay}s 后重试...")
            last_error = e
        except requests.exceptions.Timeout as e:
            print(f"  [第{attempt}次] 超时，{retry_delay}s 后重试...")
            last_error = e

        if attempt < max_retries:
            time.sleep(retry_delay * (2 ** (attempt - 1)))  # 指数退避

    raise RuntimeError(f"请求失败，已重试{max_retries}次。最后错误：{last_error}")

# 测试
try:
    response = request_with_retry('GET', 'https://httpbin.org/get')
    print(f"成功：{response.status_code}")
except RuntimeError as e:
    print(f"最终失败：{e}")

9.4 代理设置

import requests

# ===== 设置 HTTP/HTTPS 代理 =====
proxies = {
    'http':  'http://proxy_host:8080',
    'https': 'http://proxy_host:8080',
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(f"通过代理后的 IP：{response.json()['origin']}")

# ===== 带认证的代理 =====
proxies_auth = {
    'http':  'http://username:password@proxy_host:8080',
    'https': 'http://username:password@proxy_host:8080',
}

# ===== SOCKS 代理（需要安装 requests[socks]）=====
# pip install requests[socks]
proxies_socks = {
    'http':  'socks5://127.0.0.1:1080',
    'https': 'socks5://127.0.0.1:1080',
}

# ===== 在 Session 中设置全局代理 =====
with requests.Session() as session:
    session.proxies.update(proxies)
    r = session.get('https://httpbin.org/ip')
    print(r.json())

9.5 SSL 证书

import requests

# 默认：验证 SSL 证书（安全）
response = requests.get('https://www.baidu.com')   # 正常

# 跳过 SSL 验证（开发/测试时用，生产不推荐）
response = requests.get('https://self-signed.badssl.com/', verify=False)
# 会出现 InsecureRequestWarning 警告

# 静默忽略警告（不推荐在生产使用）
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
response = requests.get('https://self-signed.badssl.com/', verify=False)

# 指定自定义 CA 证书文件
response = requests.get('https://internal-server.com',
                         verify='/path/to/ca-bundle.crt')

# 客户端证书（双向 TLS）
response = requests.get('https://server.com',
                         cert=('/path/to/client.crt', '/path/to/client.key'))

第十部分：错误处理

10.1 异常体系

requests 的异常继承关系：

IOError
└── requests.exceptions.RequestException  ← 所有 requests 异常的基类
    ├── ConnectionError                    ← 网络连接问题
    │   ├── ProxyError                     ← 代理错误
    │   └── SSLError                       ← SSL 证书错误
    ├── Timeout                            ← 超时
    │   ├── ConnectTimeout                 ← 连接超时
    │   └── ReadTimeout                    ← 读取超时
    ├── URLRequired                        ← URL 无效
    ├── TooManyRedirects                   ← 重定向太多
    ├── MissingSchema                      ← URL 缺少协议头
    ├── InvalidSchema                      ← 不支持的协议
    └── HTTPError                          ← HTTP 错误（由 raise_for_status 抛出）

10.2 完整的错误处理

import requests
from requests.exceptions import (
    RequestException,
    ConnectionError,
    Timeout,
    HTTPError,
    TooManyRedirects,
    MissingSchema,
)

def safe_get(url, **kwargs):
    """
    带完整错误处理的 GET 请求

    返回：Response 对象，失败时返回 None
    """
    try:
        response = requests.get(url, timeout=(5, 30), **kwargs)

        # 检查 HTTP 错误（4xx, 5xx）
        # 不报错时什么都不做，有错误时抛出 HTTPError
        response.raise_for_status()

        return response

    except MissingSchema:
        print(f"❌ URL 格式错误（缺少 http:// 或 https://）：{url}")

    except ConnectionError:
        print(f"❌ 无法连接到服务器：{url}")
        print("   可能原因：网络断开、DNS 解析失败、服务器宕机")

    except Timeout:
        print(f"❌ 请求超时：{url}")

    except TooManyRedirects:
        print(f"❌ 重定向次数过多：{url}")

    except HTTPError as e:
        print(f"❌ HTTP 错误：{e.response.status_code}")
        if e.response.status_code == 401:
            print("   原因：未认证，需要登录或提供 API Key")
        elif e.response.status_code == 403:
            print("   原因：无权限访问该资源")
        elif e.response.status_code == 404:
            print("   原因：资源不存在")
        elif e.response.status_code == 429:
            print("   原因：请求太频繁，触发了限速")
        elif e.response.status_code >= 500:
            print("   原因：服务器内部错误")

    except RequestException as e:
        print(f"❌ 请求异常：{e}")

    return None

# 测试各种错误
test_urls = [
    'https://httpbin.org/get',               # 正常
    'not_a_url',                              # URL 格式错误
    'https://httpbin.org/status/404',        # 404
    'https://httpbin.org/status/500',        # 500
    'https://this-domain-does-not-exist-xyz.com',  # 连接失败
]

for url in test_urls:
    print(f"n测试：{url}")
    result = safe_get(url)
    if result:
        print(f"✅ 成功，状态码：{result.status_code}")

10.3 raise_for_status() 的用法

import requests

# raise_for_status()：遇到 4xx/5xx 时抛出 HTTPError，否则什么都不做
response = requests.get('https://httpbin.org/status/404')
print(f"状态码：{response.status_code}")   # 404

try:
    response.raise_for_status()
    print("请求成功")   # 不会到这里
except requests.exceptions.HTTPError as e:
    print(f"HTTP错误：{e}")
    # HTTPError: 404 Client Error: NOT FOUND for url: ...

# 链式写法（常见模式）
try:
    data = requests.get('https://httpbin.org/json').raise_for_status()
    # ⚠️ 注意：raise_for_status() 返回 None，不能这样链式获取 json！
except:
    pass

# 正确的链式写法
response = requests.get('https://httpbin.org/json')
response.raise_for_status()
data = response.json()
print(data)

第十一部分：综合实战案例

11.1 案例一：天气查询（OpenWeatherMap API）

import requests
from datetime import datetime

class WeatherClient:
    """
    OpenWeatherMap API 客户端

    注册地址：https://openweathermap.org/api
    免费账号可以调用 Current Weather API
    """

    BASE_URL = 'https://api.openweathermap.org/data/2.5'

    def __init__(self, api_key: str):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.params = {'appid': api_key, 'units': 'metric', 'lang': 'zh_cn'}

    def get_current(self, city: str) -> dict:
        """获取当前天气"""
        response = self.session.get(
            f'{self.BASE_URL}/weather',
            params={'q': city}
        )
        response.raise_for_status()
        return response.json()

    def get_forecast(self, city: str, days: int = 5) -> list:
        """获取未来N天天气预报"""
        response = self.session.get(
            f'{self.BASE_URL}/forecast',
            params={'q': city, 'cnt': days * 8}   # 每3小时一条，8条=1天
        )
        response.raise_for_status()
        return response.json()['list']

    def format_weather(self, data: dict) -> str:
        """格式化天气数据为可读文本"""
        city       = data['name']
        country    = data['sys']['country']
        temp       = data['main']['temp']
        feels_like = data['main']['feels_like']
        humidity   = data['main']['humidity']
        desc       = data['weather'][0]['description']
        wind_speed = data['wind']['speed']

        return (f"📍 {city}, {country}n"
                f"🌡️  温度：{temp:.1f}°C（体感 {feels_like:.1f}°C）n"
                f"🌤️  天气：{desc}n"
                f"💧 湿度：{humidity}%n"
                f"💨 风速：{wind_speed} m/s")

    def close(self):
        self.session.close()

# 使用示例
# client = WeatherClient('your_api_key_here')
# try:
#     weather = client.get_current('Beijing')
#     print(client.format_weather(weather))
# finally:
#     client.close()

11.2 案例二：通用 API 客户端基类

import requests
import logging
import time
from typing import Any, Dict, Optional

logger = logging.getLogger(__name__)

class BaseAPIClient:
    """
    通用 REST API 客户端基类
    封装了常见的功能：认证、重试、错误处理、日志
    """

    def __init__(
        self,
        base_url: str,
        api_key: Optional[str] = None,
        timeout: tuple = (5, 30),
        max_retries: int = 3
    ):
        self.base_url   = base_url.rstrip('/')
        self.timeout    = timeout
        self.max_retries = max_retries

        self._session = requests.Session()
        self._session.headers.update({
            'Accept':       'application/json',
            'Content-Type': 'application/json',
            'User-Agent':   f'PythonAPIClient/1.0',
        })

        if api_key:
            self._session.headers['Authorization'] = f'Bearer {api_key}'

    def _request(
        self,
        method: str,
        endpoint: str,
        **kwargs
    ) -> requests.Response:
        """底层请求方法，含重试逻辑"""
        url = f"{self.base_url}/{endpoint.lstrip('/')}"

        for attempt in range(1, self.max_retries + 1):
            try:
                start    = time.perf_counter()
                response = self._session.request(
                    method, url, timeout=self.timeout, **kwargs
                )
                elapsed = time.perf_counter() - start

                logger.debug(
                    f"{method.upper()} {url} → "
                    f"{response.status_code} ({elapsed:.3f}s)"
                )

                response.raise_for_status()
                return response

            except requests.exceptions.HTTPError as e:
                if e.response.status_code < 500:
                    raise   # 4xx 客户端错误，不重试
                if attempt == self.max_retries:
                    raise
                wait = 2 ** (attempt - 1)
                logger.warning(f"服务器错误，{wait}s 后重试（第{attempt}次）...")
                time.sleep(wait)

            except (requests.exceptions.ConnectionError,
                    requests.exceptions.Timeout) as e:
                if attempt == self.max_retries:
                    raise
                wait = 2 ** (attempt - 1)
                logger.warning(f"网络错误({e})，{wait}s 后重试...")
                time.sleep(wait)

    def get(self, endpoint: str, params: Dict = None, **kwargs) -> Any:
        """GET 请求，自动解析 JSON"""
        response = self._request('GET', endpoint, params=params, **kwargs)
        return response.json() if response.content else None

    def post(self, endpoint: str, data: Any = None, **kwargs) -> Any:
        """POST 请求"""
        response = self._request('POST', endpoint, json=data, **kwargs)
        return response.json() if response.content else None

    def put(self, endpoint: str, data: Any = None, **kwargs) -> Any:
        """PUT 请求"""
        response = self._request('PUT', endpoint, json=data, **kwargs)
        return response.json() if response.content else None

    def patch(self, endpoint: str, data: Any = None, **kwargs) -> Any:
        """PATCH 请求"""
        response = self._request('PATCH', endpoint, json=data, **kwargs)
        return response.json() if response.content else None

    def delete(self, endpoint: str, **kwargs) -> bool:
        """DELETE 请求，返回是否成功"""
        response = self._request('DELETE', endpoint, **kwargs)
        return response.status_code in (200, 204)

    def close(self):
        self._session.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

# 基于基类实现具体客户端
class TodoAPIClient(BaseAPIClient):
    """JSONPlaceholder Todo API 客户端"""

    def __init__(self):
        super().__init__('https://jsonplaceholder.typicode.com')

    def list_todos(self, user_id: int = None, completed: bool = None):
        params = {}
        if user_id   is not None: params['userId']    = user_id
        if completed is not None: params['completed'] = completed
        return self.get('/todos', params=params)

    def get_todo(self, todo_id: int):
        return self.get(f'/todos/{todo_id}')

    def create_todo(self, title: str, user_id: int = 1):
        return self.post('/todos', {'title': title, 'completed': False, 'userId': user_id})

    def complete_todo(self, todo_id: int):
        return self.patch(f'/todos/{todo_id}', {'completed': True})

    def delete_todo(self, todo_id: int):
        return self.delete(f'/todos/{todo_id}')

# 使用
with TodoAPIClient() as client:
    todos = client.list_todos(user_id=1, completed=False)
    print(f"未完成的 Todo：{len(todos)} 条")

    todo = client.get_todo(1)
    print(f"第1条：{todo['title']}")

    new = client.create_todo('学习 requests 库')
    print(f"新建 Todo ID：{new['id']}")

    ok = client.complete_todo(1)
    print(f"标记完成：{ok}")

11.3 案例三：网页内容抓取

import requests
from html.parser import HTMLParser
import re

class LinkParser(HTMLParser):
    """简单的 HTML 链接解析器"""
    def __init__(self):
        super().__init__()
        self.links = []
        self.title = ''
        self._in_title = False

    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            for name, value in attrs:
                if name == 'href' and value:
                    self.links.append(value)
        if tag == 'title':
            self._in_title = True

    def handle_data(self, data):
        if self._in_title:
            self.title += data

    def handle_endtag(self, tag):
        if tag == 'title':
            self._in_title = False

def scrape_page(url: str) -> dict:
    """
    抓取网页基本信息

    返回：{title, links, word_count, status_code}
    """
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                      'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
        'Accept-Language': 'zh-CN,zh;q=0.9',
    }

    try:
        response = requests.get(url, headers=headers, timeout=(5, 15))
        response.raise_for_status()
        response.encoding = response.apparent_encoding   # 自动检测编码

        html = response.text

        # 解析 HTML
        parser = LinkParser()
        parser.feed(html)

        # 提取纯文本（去掉 HTML 标签）
        text       = re.sub(r'<[^>]+>', '', html)
        text       = re.sub(r's+', ' ', text).strip()
        word_count = len(text)

        return {
            'url':        url,
            'status':     response.status_code,
            'title':      parser.title.strip(),
            'links':      [l for l in parser.links if l.startswith('http')],
            'word_count': word_count,
            'size':       len(response.content),
        }

    except requests.exceptions.RequestException as e:
        return {'url': url, 'error': str(e)}

# 测试
result = scrape_page('https://www.python.org')

print(f"URL：   {result['url']}")
print(f"标题：  {result.get('title', 'N/A')}")
print(f"状态：  {result.get('status', 'N/A')}")
print(f"大小：  {result.get('size', 0) / 1024:.1f} KB")
print(f"字符数：{result.get('word_count', 0):,}")
print(f"外链数：{len(result.get('links', []))}")
if result.get('links'):
    print(f"前5个链接：")
    for link in result['links'][:5]:
        print(f"  {link}")

第十二部分：常见陷阱与最佳实践

12.1 陷阱1：忘记设置超时

import requests

# ❌ 没有 timeout：如果服务器不响应，程序会永远等待！
response = requests.get('https://httpbin.org/delay/99')

# ✅ 正确：永远要设置 timeout
response = requests.get('https://httpbin.org/get', timeout=(3, 10))

12.2 陷阱2：每次请求都创建新 Session（低效）

import requests

# ❌ 每次都新建连接，效率低
for url in urls:
    r = requests.get(url)   # 每次都是新的 TCP 连接

# ✅ 用 Session 复用连接（减少握手开销，速度快3-5倍）
with requests.Session() as session:
    for url in urls:
        r = session.get(url)   # 复用 TCP 连接

12.3 陷阱3：直接访问 response.json() 而不检查状态码

import requests

# ❌ 如果请求失败（如返回 404 HTML），json() 会报错
response = requests.get('https://httpbin.org/status/404')
data = response.json()   # JSONDecodeError！404 返回的是 HTML，不是 JSON

# ✅ 先检查状态码
response = requests.get('https://httpbin.org/status/404')
if response.status_code == 200:
    data = response.json()
else:
    print(f"请求失败：{response.status_code}")

# ✅ 或者用 raise_for_status
try:
    response = requests.get('https://httpbin.org/json')
    response.raise_for_status()
    data = response.json()
except requests.HTTPError:
    data = None

12.4 陷阱4：中文 URL 没有编码

import requests
from urllib.parse import quote

# ❌ 中文 URL 可能导致编码问题
url = 'https://www.example.com/搜索?q=Python教程'

# ✅ 用 params 参数（requests 自动编码）
response = requests.get(
    'https://www.example.com/搜索',
    params={'q': 'Python教程'}
)

# ✅ 或者手动编码
url = 'https://www.example.com/' + quote('搜索') + '?q=' + quote('Python教程')

12.5 陷阱5：Stream 下载时忘记迭代

import requests

# ❌ stream=True 但没有迭代，内容不会自动下载
response = requests.get('https://httpbin.org/image/png', stream=True)
# 此时只下载了响应头，没有下载正文！
content = response.content   # 这里才会触发下载，但 stream=True 的优势消失了

# ✅ 正确：stream=True 配合 iter_content
response = requests.get('https://httpbin.org/image/png', stream=True)
with open('image.png', 'wb') as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

第十三部分：完整速查表

📌 发送请求
  requests.get(url, params=, headers=, timeout=, auth=)
  requests.post(url, data=, json=, files=, headers=, timeout=)
  requests.put(url, json=, headers=, timeout=)
  requests.patch(url, json=, headers=, timeout=)
  requests.delete(url, headers=, timeout=)
  requests.head(url, headers=, timeout=)
  requests.request(method, url, ...)    ← 通用方法

📌 Response 对象
  .status_code       → 状态码（200, 404...）
  .ok                → 状态码 < 400 时为 True
  .reason            → 状态码描述（'OK', 'Not Found'）
  .text              → 响应文本（自动解码）
  .content           → 响应二进制内容
  .json()            → 解析 JSON，返回 dict/list
  .headers           → 响应头字典
  .cookies           → Cookie 字典
  .url               → 最终 URL（重定向后）
  .encoding          → 编码（可手动设置）
  .apparent_encoding → 自动检测的编码
  .elapsed           → 响应耗时（timedelta）
  .history           → 重定向历史
  .raise_for_status()→ 4xx/5xx 时抛出 HTTPError

📌 请求参数
  params=   → URL 查询参数（dict）
  data=     → 表单数据（dict）或字节
  json=     → JSON 数据（dict/list，自动序列化）
  files=    → 上传文件（dict）
  headers=  → 请求头（dict）
  cookies=  → Cookie（dict）
  auth=     → 认证（元组 或 Auth 对象）
  timeout=  → 超时（秒数 或 (connect, read) 元组）
  proxies=  → 代理（dict）
  verify=   → SSL 验证（True/False/证书路径）
  stream=   → 是否流式下载
  allow_redirects= → 是否允许重定向

📌 Session
  session = requests.Session()
  session.headers.update({})    → 设置全局请求头
  session.auth = (user, pass)   → 设置全局认证
  session.cookies                → Cookie Jar
  session.params                 → 全局查询参数
  session.proxies                → 全局代理
  session.close()                → 关闭

📌 异常
  RequestException               → 所有异常的基类
  ConnectionError                → 网络连接错误
  Timeout / ConnectTimeout / ReadTimeout
  HTTPError                      → raise_for_status() 触发
  TooManyRedirects               → 重定向超限
  MissingSchema / InvalidSchema  → URL 格式问题

总结

学完本章，你应该掌握：

HTTP 基础：请求/响应结构、HTTP 方法、状态码的含义
GET 请求：params 参数自动编码、Response 对象的所有属性
POST 请求：data=（表单）vs json=（JSON API）的区别
请求头：User-Agent 模拟浏览器、Authorization 认证
文件操作：files= 上传、stream=True 流式下载大文件
Session：保持 Cookie、复用连接提升性能
超时与重试：timeout=(5,30) 防止卡死、自动重试应对网络抖动
错误处理：raise_for_status() + 捕获各类异常
身份认证：Basic Auth、Token Auth 的使用方式

最常用的模式：

import requests

# 简单请求
response = requests.get('https://api.example.com/data',
                         params={'key': 'value'},
                         headers={'Authorization': 'Bearer token'},
                         timeout=(5, 30))
response.raise_for_status()
data = response.json()

# 发送 JSON
response = requests.post('https://api.example.com/create',
                          json={'name': '张三', 'age': 25},
                          headers={'Authorization': 'Bearer token'},
                          timeout=(5, 30))
response.raise_for_status()
result = response.json()

# Session（多次请求）
with requests.Session() as session:
    session.headers['Authorization'] = 'Bearer token'
    data1 = session.get('/endpoint1', timeout=10).json()
    data2 = session.get('/endpoint2', timeout=10).json()

访问数: 39