28.Python requests 库完全指南

Python requests 库完全指南

本文档面向零基础新手,目标是让你真正理解:

  • HTTP 协议基础:请求和响应是什么
  • requests 库的安装与入门
  • GET / POST / PUT / DELETE 等请求方式
  • 请求头(Headers)、查询参数、请求体的设置
  • 响应对象的所有属性:状态码、文本、JSON、二进制
  • 文件上传与下载
  • Session 会话(保持登录状态)
  • 超时、重试、代理的配置
  • 身份认证(Basic Auth、Token、OAuth)
  • 错误处理与异常
  • 实战案例:天气查询、网页抓取、API 调用

配有大量可运行示例,全部从最基础讲起。


第一部分:HTTP 基础知识

1.1 什么是 HTTP?

HTTP(超文本传输协议)是浏览器与服务器之间"对话"的规则。

你每天浏览网页时发生的事情:

  你的浏览器                          服务器
  ──────────                          ──────
  "我要看 baidu.com 首页"  ──────►   "好的,给你HTML代码"
         HTTP 请求(Request)         HTTP 响应(Response)

  用 Python 写代码也可以做同样的事:
  requests.get('https://baidu.com')  ──► 返回 HTML 内容

1.2 HTTP 请求的组成

一个 HTTP 请求包含:

┌─────────────────────────────────────────────────────┐
│ 请求行:  GET /search?q=python HTTP/1.1              │
│           ↑    ↑              ↑                     │
│         方法  路径           协议版本                │
├─────────────────────────────────────────────────────┤
│ 请求头(Headers):                                  │
│   Host: www.example.com                             │
│   User-Agent: Mozilla/5.0 ...                       │
│   Content-Type: application/json                    │
│   Authorization: Bearer token123                    │
├─────────────────────────────────────────────────────┤
│ 请求体(Body):                                     │
│   {"username": "admin", "password": "123456"}       │
│   (GET 请求通常没有请求体)                         │
└─────────────────────────────────────────────────────┘

1.3 HTTP 方法(动词)

GET     ──► 获取资源(查询)          如:搜索商品
POST    ──► 提交数据(新建)          如:用户注册、提交表单
PUT     ──► 替换资源(全量更新)      如:修改用户全部信息
PATCH   ──► 修改资源(局部更新)      如:只修改用户头像
DELETE  ──► 删除资源                  如:删除一篇文章
HEAD    ──► 只获取响应头(不要正文)  如:检查文件是否存在
OPTIONS ──► 查询服务器支持哪些方法

1.4 HTTP 状态码

1xx  信息性响应
2xx  成功
  200 OK            ──► 请求成功
  201 Created       ──► 创建成功(POST 后常见)
  204 No Content    ──► 成功但无返回内容(DELETE 后常见)
3xx  重定向
  301 Moved Permanently  ──► 永久跳转
  302 Found              ──► 临时跳转
4xx  客户端错误(你的问题)
  400 Bad Request   ──► 请求格式错误
  401 Unauthorized  ──► 未认证(需要登录)
  403 Forbidden     ──► 无权限(已登录但没权限)
  404 Not Found     ──► 找不到资源
  429 Too Many Requests ──► 请求太频繁
5xx  服务器错误(对方的问题)
  500 Internal Server Error ──► 服务器内部错误
  502 Bad Gateway           ──► 网关错误
  503 Service Unavailable   ──► 服务暂时不可用

第二部分:安装与入门

2.1 安装 requests

pip install requests

验证安装:

import requests
print(requests.__version__)   # 如:2.31.0

2.2 第一个请求

import requests

# 向一个公开的测试 API 发送 GET 请求
response = requests.get('https://httpbin.org/get')

# 查看响应
print(f"状态码:{response.status_code}")   # 200
print(f"内容类型:{response.headers['Content-Type']}")
print(f"响应内容(前200字符):{response.text[:200]}")

解释:

requests.get(url)  → 发送 GET 请求,返回 Response 对象
response.status_code → HTTP 状态码(200=成功)
response.text        → 响应内容(字符串)
response.headers     → 响应头(字典)

2.3 使用公开测试接口练习

本章大量示例使用 https://httpbin.org——这是专门用来测试 HTTP 请求的公开网站:

https://httpbin.org/get         ──► 返回你的 GET 请求信息
https://httpbin.org/post        ──► 返回你的 POST 请求信息
https://httpbin.org/put         ──► 返回你的 PUT 请求信息
https://httpbin.org/delete      ──► 返回你的 DELETE 请求信息
https://httpbin.org/status/404  ──► 返回指定状态码
https://httpbin.org/delay/3     ──► 延迟3秒后返回(测试超时)
https://httpbin.org/headers     ──► 返回你发送的请求头
https://httpbin.org/ip          ──► 返回你的 IP 地址
https://httpbin.org/json        ──► 返回一段 JSON 数据

第三部分:GET 请求

3.1 基本 GET 请求

import requests

# 最简单的 GET 请求
response = requests.get('https://httpbin.org/get')

print(f"状态码:{response.status_code}")    # 200
print(f"是否成功:{response.ok}")           # True(状态码 < 400 时为 True)
print(f"编码:{response.encoding}")         # utf-8
print(f"响应时间:{response.elapsed}")      # 如:0:00:00.234567
print(f"最终URL:{response.url}")           # 经过重定向后的最终 URL

3.2 带查询参数的 GET 请求

查询参数(Query Parameters)是 URL 中 ? 后面的部分,如: https://api.example.com/search?q=python&page=1&limit=10

import requests

# ===== 方式1:直接在 URL 里写 =====
url = 'https://httpbin.org/get?name=张三&age=25&city=北京'
response = requests.get(url)
print(response.json()['args'])
# {'age': '25', 'city': '北京', 'name': '张三'}

# ===== 方式2:用 params 参数(推荐!自动 URL 编码)=====
params = {
    'name':  '张三',
    'age':   25,
    'city':  '北京'
}
response = requests.get('https://httpbin.org/get', params=params)

# 查看实际请求的 URL
print(f"实际URL:{response.url}")
# https://httpbin.org/get?name=%E5%BC%A0%E4%B8%89&age=25&city=%E5%8C%97%E4%BA%AC
# (中文被自动编码了!这就是推荐 params 方式的原因)

print(response.json()['args'])
# {'age': '25', 'city': '北京', 'name': '张三'}

# ===== 传递列表参数(多值)=====
params_multi = {
    'ids': [1, 2, 3],     # 传递多个 id
    'tag': ['python', 'web']
}
response = requests.get('https://httpbin.org/get', params=params_multi)
print(response.url)
# ?ids=1&ids=2&ids=3&tag=python&tag=web

# ===== 实际应用:调用搜索 API =====
def search_github_repos(keyword, language='python', per_page=5):
    """搜索 GitHub 仓库"""
    url    = 'https://api.github.com/search/repositories'
    params = {
        'q':        f'{keyword} language:{language}',
        'sort':     'stars',
        'order':    'desc',
        'per_page': per_page
    }
    response = requests.get(url, params=params)

    if response.status_code == 200:
        data  = response.json()
        repos = data['items']
        print(f"找到 {data['total_count']} 个仓库,显示前{per_page}个:n")
        for repo in repos:
            print(f"  ⭐ {repo['stargazers_count']:>8,}  {repo['full_name']}")
            print(f"     {repo['description']}n")
    else:
        print(f"请求失败:{response.status_code}")

search_github_repos('web scraping')

3.3 Response 对象详解

import requests

response = requests.get('https://httpbin.org/json')

# ===== 基本信息 =====
print(f"状态码:    {response.status_code}")   # 200
print(f"是否成功:  {response.ok}")            # True
print(f"原因短语:  {response.reason}")        # 'OK'
print(f"最终 URL:  {response.url}")
print(f"响应耗时:  {response.elapsed.total_seconds():.3f}秒")

# ===== 响应头 =====
print(f"n响应头:")
for key, value in response.headers.items():
    print(f"  {key}: {value}")

print(f"nContent-Type:{response.headers.get('Content-Type')}")
print(f"Content-Length:{response.headers.get('Content-Length', '未知')}")

# ===== 响应内容(三种格式)=====

# 1. 文本格式(自动解码)
print(f"n文本内容(前100字符):{response.text[:100]}")

# 2. JSON 格式(直接解析为 Python 字典/列表)
data = response.json()
print(f"nJSON 数据:{data}")

# 3. 二进制格式(图片/文件等)
raw_bytes = response.content
print(f"n二进制数据长度:{len(raw_bytes)} 字节")

# ===== 编码处理 =====
# requests 会自动检测编码,但有时需要手动指定
response2 = requests.get('https://www.baidu.com')
response2.encoding = 'utf-8'   # 手动指定编码
print(response2.text[:200])

# ===== 请求历史(重定向链)=====
response3 = requests.get('http://github.com')   # http 会跳转到 https
print(f"n重定向链:")
for r in response3.history:
    print(f"  {r.status_code} → {r.url}")
print(f"最终:{response3.status_code} {response3.url}")

第四部分:POST 请求

4.1 发送表单数据(application/x-www-form-urlencoded)

import requests

# 模拟提交 HTML 表单
form_data = {
    'username': 'zhangsan',
    'password': '123456',
    'remember': 'true'
}

response = requests.post('https://httpbin.org/post', data=form_data)
result   = response.json()

print(f"状态码:{response.status_code}")
print(f"发送的表单数据:{result['form']}")
# {'password': '123456', 'remember': 'true', 'username': 'zhangsan'}

print(f"Content-Type:{result['headers']['Content-Type']}")
# application/x-www-form-urlencoded

4.2 发送 JSON 数据(application/json)

import requests

# 现代 REST API 通常使用 JSON 格式
json_data = {
    'title':   '学习 Python requests',
    'content': '今天学习了 requests 库的基本用法',
    'tags':    ['python', 'http', '学习'],
    'is_public': True,
    'views':   0
}

response = requests.post(
    'https://httpbin.org/post',
    json=json_data   # 用 json= 参数,自动设置 Content-Type: application/json
)

result = response.json()
print(f"发送的 JSON:{result['json']}")
print(f"Content-Type:{result['headers']['Content-Type']}")
# application/json

# ===== 对比:data= vs json= =====

# data=:手动把字典转JSON字符串,需要手动设置 Content-Type
import json
headers = {'Content-Type': 'application/json'}
response_manual = requests.post(
    'https://httpbin.org/post',
    data=json.dumps(json_data),
    headers=headers
)

# json=:自动序列化 + 自动设置 Content-Type(推荐!)
response_auto = requests.post(
    'https://httpbin.org/post',
    json=json_data
)

# 两者效果完全相同,推荐用 json= 参数

4.3 实战:调用 RESTful API

import requests

BASE_URL = 'https://jsonplaceholder.typicode.com'

# ===== GET:获取数据 =====
def get_post(post_id):
    response = requests.get(f'{BASE_URL}/posts/{post_id}')
    return response.json()

# ===== POST:创建数据 =====
def create_post(title, body, user_id=1):
    response = requests.post(
        f'{BASE_URL}/posts',
        json={'title': title, 'body': body, 'userId': user_id}
    )
    return response.json()

# ===== PUT:全量更新 =====
def update_post(post_id, title, body):
    response = requests.put(
        f'{BASE_URL}/posts/{post_id}',
        json={'id': post_id, 'title': title, 'body': body, 'userId': 1}
    )
    return response.json()

# ===== PATCH:局部更新 =====
def patch_post(post_id, **fields):
    response = requests.patch(
        f'{BASE_URL}/posts/{post_id}',
        json=fields   # 只发送需要更新的字段
    )
    return response.json()

# ===== DELETE:删除数据 =====
def delete_post(post_id):
    response = requests.delete(f'{BASE_URL}/posts/{post_id}')
    return response.status_code   # 成功返回 200

# 测试所有操作
post = get_post(1)
print(f"获取文章:{post['title']}")

new_post = create_post('测试标题', '这是内容')
print(f"创建文章,ID:{new_post['id']}")

updated = update_post(1, '新标题', '新内容')
print(f"更新文章:{updated['title']}")

patched = patch_post(1, title='只改标题')
print(f"局部更新:{patched['title']}")

code = delete_post(1)
print(f"删除文章,状态码:{code}")   # 200

第五部分:请求头(Headers)

5.1 为什么要设置请求头?

常见场景:
  ① 模拟浏览器(User-Agent):某些网站拒绝非浏览器请求
  ② 身份认证(Authorization):告诉服务器你是谁
  ③ 指定数据格式(Content-Type / Accept)
  ④ 防爬保护绕过(Referer)
  ⑤ 缓存控制(Cache-Control)
import requests

# ===== 设置自定义请求头 =====
headers = {
    # 模拟 Chrome 浏览器(最常用!防止被识别为爬虫)
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/120.0.0.0 Safari/537.36',

    # 告诉服务器我能接受 JSON 格式
    'Accept': 'application/json',

    # 告诉服务器我发送的是 JSON
    'Content-Type': 'application/json',

    # 防盗链(告诉服务器是从哪个页面过来的)
    'Referer': 'https://www.example.com',

    # 接受压缩数据(加快传输速度)
    'Accept-Encoding': 'gzip, deflate, br',

    # 接受的语言
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
}

response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json()['headers'])

# ===== 查看默认的请求头 =====
response2 = requests.get('https://httpbin.org/headers')
print(f"n默认 User-Agent:{response2.json()['headers']['User-Agent']}")
# python-requests/2.31.0(requests 默认 UA)

# ===== 实用函数:创建常用请求头 =====
def get_browser_headers(referer=None):
    """返回模拟浏览器的标准请求头"""
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                      'AppleWebKit/537.36 (KHTML, like Gecko) '
                      'Chrome/120.0.0.0 Safari/537.36',
        'Accept':          'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection':      'keep-alive',
    }
    if referer:
        headers['Referer'] = referer
    return headers

response3 = requests.get('https://www.example.com', headers=get_browser_headers())
print(f"n状态码:{response3.status_code}")

5.2 Authorization 请求头(身份认证)

import requests

# ===== Bearer Token 认证(最常见于现代 API)=====
token = 'your_api_token_here'
headers = {
    'Authorization': f'Bearer {token}'
}
response = requests.get('https://httpbin.org/bearer', headers=headers)
# 或者:
response = requests.get(
    'https://httpbin.org/bearer',
    headers={'Authorization': f'Bearer {token}'}
)

# ===== API Key 认证(不同 API 传递方式不同)=====

# 方式1:放在请求头
api_key_headers = {'X-API-Key': 'my_api_key_123'}
requests.get('https://api.example.com/data', headers=api_key_headers)

# 方式2:放在查询参数
requests.get('https://api.example.com/data', params={'api_key': 'my_api_key_123'})

# 方式3:放在请求体(POST 时)
requests.post('https://api.example.com/data',
              json={'api_key': 'my_api_key_123', 'data': '...'})

第六部分:身份认证

6.1 Basic Auth(基本认证)

import requests
from requests.auth import HTTPBasicAuth

# 方式1:直接传元组(最简洁)
response = requests.get(
    'https://httpbin.org/basic-auth/user/passwd',
    auth=('user', 'passwd')
)
print(f"Basic Auth:{response.status_code} {response.json()}")

# 方式2:使用 HTTPBasicAuth 对象(显式更清晰)
response2 = requests.get(
    'https://httpbin.org/basic-auth/user/passwd',
    auth=HTTPBasicAuth('user', 'passwd')
)
print(f"HTTPBasicAuth:{response2.status_code}")

# 认证失败的情况(用错误的密码)
response3 = requests.get(
    'https://httpbin.org/basic-auth/user/passwd',
    auth=('user', 'wrong_password')
)
print(f"错误密码:{response3.status_code}")   # 401

6.2 Digest Auth 和 Token Auth

import requests
from requests.auth import HTTPDigestAuth

# Digest 认证(比 Basic 更安全)
response = requests.get(
    'https://httpbin.org/digest-auth/auth/user/passwd',
    auth=HTTPDigestAuth('user', 'passwd')
)
print(f"Digest Auth:{response.status_code}")

# 自定义 Token 认证(最常见于现代 API)
class TokenAuth(requests.auth.AuthBase):
    """自定义 Token 认证类"""

    def __init__(self, token):
        self.token = token

    def __call__(self, r):
        # 在每个请求上自动添加 Authorization 头
        r.headers['Authorization'] = f'Bearer {self.token}'
        return r

# 使用自定义认证
token_auth = TokenAuth('my_access_token_xyz')
response = requests.get('https://httpbin.org/get', auth=token_auth)
print(response.json()['headers'].get('Authorization'))
# Bearer my_access_token_xyz

第七部分:文件上传与下载

7.1 上传文件

import requests

# ===== 方式1:上传单个文件 =====
with open('/path/to/image.jpg', 'rb') as f:
    response = requests.post(
        'https://httpbin.org/post',
        files={'file': f}
    )
print(response.json()['files'])

# ===== 方式2:指定文件名和 Content-Type =====
with open('/path/to/data.csv', 'rb') as f:
    response = requests.post(
        'https://httpbin.org/post',
        files={
            'file': ('custom_name.csv', f, 'text/csv')
            #         ↑文件名           ↑内容  ↑内容类型
        }
    )

# ===== 方式3:上传多个文件 =====
files = [
    ('images', ('photo1.jpg', open('photo1.jpg', 'rb'), 'image/jpeg')),
    ('images', ('photo2.jpg', open('photo2.jpg', 'rb'), 'image/jpeg')),
]
response = requests.post('https://httpbin.org/post', files=files)

# ===== 方式4:文件 + 表单数据同时上传 =====
with open('avatar.png', 'rb') as f:
    response = requests.post(
        'https://httpbin.org/post',
        files={'avatar': f},
        data={'username': '张三', 'bio': '这是简介'}  # 同时传表单数据
    )

print(response.json()['files'])  # 文件
print(response.json()['form'])   # 表单数据

# ===== 方式5:从内存上传(不需要本地文件)=====
import io
content = b'name,agenxe5xbcxa0xe4xb8x89,25'  # CSV 内容
response = requests.post(
    'https://httpbin.org/post',
    files={'data': ('report.csv', io.BytesIO(content), 'text/csv')}
)

7.2 下载文件

import requests
import os

def download_file(url, save_path, chunk_size=8192):
    """
    下载文件到本地,支持大文件(流式下载)

    参数:
        url        - 下载链接
        save_path  - 本地保存路径
        chunk_size - 每次读取的块大小(字节)
    """
    response = requests.get(url, stream=True)  # stream=True:不立即下载全部内容
    response.raise_for_status()                # 状态码不是 2xx 时抛出异常

    # 获取文件总大小(不是所有服务器都提供)
    total_size = int(response.headers.get('Content-Length', 0))

    os.makedirs(os.path.dirname(save_path) or '.', exist_ok=True)

    downloaded = 0
    with open(save_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:   # 过滤掉保持连接的空块
                f.write(chunk)
                downloaded += len(chunk)

                # 显示进度
                if total_size:
                    pct = downloaded / total_size * 100
                    bar = '█' * int(pct / 2)
                    print(f'r  [{bar:<50}] {pct:.1f}%  '
                          f'{downloaded//1024}KB/{total_size//1024}KB',
                          end='', flush=True)

    print(f'n✅ 下载完成:{save_path}')
    return save_path

# 下载一张图片
download_file(
    'https://httpbin.org/image/png',
    'downloaded_image.png'
)

# 下载一个文本文件
download_file(
    'https://raw.githubusercontent.com/psf/requests/main/README.md',
    'requests_readme.md'
)

7.3 下载图片(直接到内存)

import requests
from PIL import Image   # pip install Pillow
import io

def download_image(url):
    """下载图片,返回 PIL Image 对象(不保存到磁盘)"""
    response = requests.get(url)
    response.raise_for_status()

    image = Image.open(io.BytesIO(response.content))
    return image

# 使用
img = download_image('https://httpbin.org/image/jpeg')
print(f"图片尺寸:{img.size}")
print(f"图片格式:{img.format}")
img.save('downloaded.jpg')

第八部分:Session 会话

8.1 为什么要用 Session?

问题:HTTP 是无状态协议,每次请求都是独立的。
     登录后,下一次请求服务器不知道你已经登录!

解决:Session(会话)自动保持 Cookie,
     让每次请求都带着"已登录"的信息。

另一个好处:同一个 Session 复用 TCP 连接,减少开销,速度更快。
import requests

# ===== 不用 Session:每次请求独立 =====
# 登录后获取 Cookie,下次请求需要手动带上
r1 = requests.get('https://httpbin.org/cookies/set/user/zhangsan')
print(f"r1 cookies:{r1.cookies.get('user')}")  # zhangsan

r2 = requests.get('https://httpbin.org/cookies')
print(f"r2 cookies:{r2.json()}")  # {} ← 第二次请求没有 Cookie!

# ===== 用 Session:自动保持 Cookie =====
session = requests.Session()

r1 = session.get('https://httpbin.org/cookies/set/user/zhangsan')
print(f"r1 cookies:{session.cookies.get('user')}")   # zhangsan

r2 = session.get('https://httpbin.org/cookies')
print(f"r2 cookies:{r2.json()}")
# {'cookies': {'user': 'zhangsan'}}  ← Cookie 自动保持!

session.close()   # 用完记得关闭

8.2 Session 的完整用法

import requests

# ===== 方式1:手动管理(记得 close)=====
session = requests.Session()

# 设置全局请求头(每次请求都带上)
session.headers.update({
    'User-Agent': 'MyApp/1.0',
    'Accept':     'application/json',
})

# 设置全局认证(每次请求都带上)
session.auth = ('username', 'password')

# 设置全局参数
session.params = {'api_version': 'v2'}

# 用 Session 发送请求(和普通请求一样的方法)
r = session.get('https://httpbin.org/get')
print(r.json())

session.close()

# ===== 方式2:with 语句(推荐!自动关闭)=====
with requests.Session() as session:
    session.headers.update({'User-Agent': 'MyApp/1.0'})

    r1 = session.get('https://httpbin.org/get')
    r2 = session.post('https://httpbin.org/post', json={'data': 'test'})

    print(f"r1 状态:{r1.status_code}")
    print(f"r2 状态:{r2.status_code}")
# 退出 with 块后自动关闭

8.3 模拟登录全过程

import requests

def simulate_login():
    """
    模拟完整的网站登录流程:
    1. 获取登录页面(可能含 CSRF token)
    2. 提交登录表单
    3. 带着 Cookie 访问需要登录的页面
    """
    with requests.Session() as session:
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                          'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36'
        })

        # 第1步:GET 登录页面(Session 自动保存服务器设置的 Cookie)
        print("1. 获取登录页面...")
        login_page = session.get('https://httpbin.org/cookies/set/csrf_token/abc123')
        print(f"   CSRF Token:{session.cookies.get('csrf_token')}")

        # 第2步:POST 登录请求(携带 CSRF token 和账号密码)
        print("2. 提交登录...")
        login_data = {
            'username':   'testuser',
            'password':   'testpass',
            'csrf_token': session.cookies.get('csrf_token', ''),
        }
        response = session.post(
            'https://httpbin.org/post',
            data=login_data
        )
        print(f"   登录状态:{response.status_code}")

        # 第3步:访问需要登录的页面
        print("3. 访问受保护页面...")
        protected = session.get('https://httpbin.org/cookies')
        print(f"   携带的 Cookie:{protected.json()['cookies']}")

        return session.cookies

simulate_login()

第九部分:超时、重试、代理

9.1 超时设置

import requests
from requests.exceptions import Timeout, ConnectionError

# ===== 设置超时(强烈建议!否则可能永远等下去)=====

# 连接超时 + 读取超时(元组形式)
# connect_timeout:等待服务器响应的最长时间
# read_timeout:读取响应数据的最长时间
try:
    response = requests.get(
        'https://httpbin.org/delay/5',   # 模拟5秒延迟
        timeout=(3, 5)                    # 连接超时3秒,读取超时5秒
    )
except Timeout:
    print("❌ 请求超时!")

# 统一设置(连接和读取用同一个超时值)
try:
    response = requests.get(
        'https://httpbin.org/delay/2',
        timeout=1   # 总共只等1秒
    )
except Timeout:
    print("❌ 1秒内没有响应")

# 不设超时(危险!程序可能卡死)
# response = requests.get('https://example.com')   # ❌ 没有 timeout

# ===== 建议的超时值 =====
TIMEOUT_FAST   = (3, 10)    # 快速接口:连接3秒,读取10秒
TIMEOUT_SLOW   = (5, 60)    # 慢接口(如文件上传)
TIMEOUT_STRICT = 5          # 严格限制:总共不超过5秒

9.2 自动重试(urllib3)

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session_with_retry(
    retries=3,
    backoff_factor=0.5,
    status_forcelist=(500, 502, 503, 504)
):
    """
    创建带自动重试的 Session

    参数:
        retries          - 最大重试次数
        backoff_factor   - 退避因子(重试间隔 = backoff_factor * 2^(retry次数-1))
                           0.5 → 第1次0.5s,第2次1s,第3次2s
        status_forcelist - 遇到哪些状态码时触发重试
    """
    session = requests.Session()

    retry_strategy = Retry(
        total=retries,
        backoff_factor=backoff_factor,
        status_forcelist=status_forcelist,
        allowed_methods=["GET", "POST"],   # 哪些方法允许重试
        raise_on_status=False
    )

    adapter = HTTPAdapter(max_retries=retry_strategy)

    # 对 http 和 https 都启用重试
    session.mount('https://', adapter)
    session.mount('http://',  adapter)

    return session

# 使用
session = create_session_with_retry(retries=3)

try:
    response = session.get('https://httpbin.org/status/503', timeout=5)
    print(f"状态码:{response.status_code}")
except Exception as e:
    print(f"多次重试后失败:{e}")

session.close()

9.3 手动重试(更灵活)

import requests
import time
from typing import Optional

def request_with_retry(
    method: str,
    url: str,
    max_retries: int = 3,
    retry_delay: float = 1.0,
    timeout: float = 10.0,
    **kwargs
) -> Optional[requests.Response]:
    """
    带手动重试逻辑的请求函数

    会在以下情况重试:
    - 网络连接错误
    - 超时
    - 5xx 服务器错误
    """
    last_error = None

    for attempt in range(1, max_retries + 1):
        try:
            response = requests.request(method, url, timeout=timeout, **kwargs)

            if response.status_code < 500:
                return response

            # 5xx 错误,等待后重试
            print(f"  [第{attempt}次] 服务器错误 {response.status_code},{retry_delay}s 后重试...")

        except requests.exceptions.ConnectionError as e:
            print(f"  [第{attempt}次] 连接错误:{e},{retry_delay}s 后重试...")
            last_error = e
        except requests.exceptions.Timeout as e:
            print(f"  [第{attempt}次] 超时,{retry_delay}s 后重试...")
            last_error = e

        if attempt < max_retries:
            time.sleep(retry_delay * (2 ** (attempt - 1)))  # 指数退避

    raise RuntimeError(f"请求失败,已重试{max_retries}次。最后错误:{last_error}")

# 测试
try:
    response = request_with_retry('GET', 'https://httpbin.org/get')
    print(f"成功:{response.status_code}")
except RuntimeError as e:
    print(f"最终失败:{e}")

9.4 代理设置

import requests

# ===== 设置 HTTP/HTTPS 代理 =====
proxies = {
    'http':  'http://proxy_host:8080',
    'https': 'http://proxy_host:8080',
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(f"通过代理后的 IP:{response.json()['origin']}")

# ===== 带认证的代理 =====
proxies_auth = {
    'http':  'http://username:password@proxy_host:8080',
    'https': 'http://username:password@proxy_host:8080',
}

# ===== SOCKS 代理(需要安装 requests[socks])=====
# pip install requests[socks]
proxies_socks = {
    'http':  'socks5://127.0.0.1:1080',
    'https': 'socks5://127.0.0.1:1080',
}

# ===== 在 Session 中设置全局代理 =====
with requests.Session() as session:
    session.proxies.update(proxies)
    r = session.get('https://httpbin.org/ip')
    print(r.json())

9.5 SSL 证书

import requests

# 默认:验证 SSL 证书(安全)
response = requests.get('https://www.baidu.com')   # 正常

# 跳过 SSL 验证(开发/测试时用,生产不推荐)
response = requests.get('https://self-signed.badssl.com/', verify=False)
# 会出现 InsecureRequestWarning 警告

# 静默忽略警告(不推荐在生产使用)
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
response = requests.get('https://self-signed.badssl.com/', verify=False)

# 指定自定义 CA 证书文件
response = requests.get('https://internal-server.com',
                         verify='/path/to/ca-bundle.crt')

# 客户端证书(双向 TLS)
response = requests.get('https://server.com',
                         cert=('/path/to/client.crt', '/path/to/client.key'))

第十部分:错误处理

10.1 异常体系

requests 的异常继承关系:

IOError
└── requests.exceptions.RequestException  ← 所有 requests 异常的基类
    ├── ConnectionError                    ← 网络连接问题
    │   ├── ProxyError                     ← 代理错误
    │   └── SSLError                       ← SSL 证书错误
    ├── Timeout                            ← 超时
    │   ├── ConnectTimeout                 ← 连接超时
    │   └── ReadTimeout                    ← 读取超时
    ├── URLRequired                        ← URL 无效
    ├── TooManyRedirects                   ← 重定向太多
    ├── MissingSchema                      ← URL 缺少协议头
    ├── InvalidSchema                      ← 不支持的协议
    └── HTTPError                          ← HTTP 错误(由 raise_for_status 抛出)

10.2 完整的错误处理

import requests
from requests.exceptions import (
    RequestException,
    ConnectionError,
    Timeout,
    HTTPError,
    TooManyRedirects,
    MissingSchema,
)

def safe_get(url, **kwargs):
    """
    带完整错误处理的 GET 请求

    返回:Response 对象,失败时返回 None
    """
    try:
        response = requests.get(url, timeout=(5, 30), **kwargs)

        # 检查 HTTP 错误(4xx, 5xx)
        # 不报错时什么都不做,有错误时抛出 HTTPError
        response.raise_for_status()

        return response

    except MissingSchema:
        print(f"❌ URL 格式错误(缺少 http:// 或 https://):{url}")

    except ConnectionError:
        print(f"❌ 无法连接到服务器:{url}")
        print("   可能原因:网络断开、DNS 解析失败、服务器宕机")

    except Timeout:
        print(f"❌ 请求超时:{url}")

    except TooManyRedirects:
        print(f"❌ 重定向次数过多:{url}")

    except HTTPError as e:
        print(f"❌ HTTP 错误:{e.response.status_code}")
        if e.response.status_code == 401:
            print("   原因:未认证,需要登录或提供 API Key")
        elif e.response.status_code == 403:
            print("   原因:无权限访问该资源")
        elif e.response.status_code == 404:
            print("   原因:资源不存在")
        elif e.response.status_code == 429:
            print("   原因:请求太频繁,触发了限速")
        elif e.response.status_code >= 500:
            print("   原因:服务器内部错误")

    except RequestException as e:
        print(f"❌ 请求异常:{e}")

    return None

# 测试各种错误
test_urls = [
    'https://httpbin.org/get',               # 正常
    'not_a_url',                              # URL 格式错误
    'https://httpbin.org/status/404',        # 404
    'https://httpbin.org/status/500',        # 500
    'https://this-domain-does-not-exist-xyz.com',  # 连接失败
]

for url in test_urls:
    print(f"n测试:{url}")
    result = safe_get(url)
    if result:
        print(f"✅ 成功,状态码:{result.status_code}")

10.3 raise_for_status() 的用法

import requests

# raise_for_status():遇到 4xx/5xx 时抛出 HTTPError,否则什么都不做
response = requests.get('https://httpbin.org/status/404')
print(f"状态码:{response.status_code}")   # 404

try:
    response.raise_for_status()
    print("请求成功")   # 不会到这里
except requests.exceptions.HTTPError as e:
    print(f"HTTP错误:{e}")
    # HTTPError: 404 Client Error: NOT FOUND for url: ...

# 链式写法(常见模式)
try:
    data = requests.get('https://httpbin.org/json').raise_for_status()
    # ⚠️ 注意:raise_for_status() 返回 None,不能这样链式获取 json!
except:
    pass

# 正确的链式写法
response = requests.get('https://httpbin.org/json')
response.raise_for_status()
data = response.json()
print(data)

第十一部分:综合实战案例

11.1 案例一:天气查询(OpenWeatherMap API)

import requests
from datetime import datetime

class WeatherClient:
    """
    OpenWeatherMap API 客户端

    注册地址:https://openweathermap.org/api
    免费账号可以调用 Current Weather API
    """

    BASE_URL = 'https://api.openweathermap.org/data/2.5'

    def __init__(self, api_key: str):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.params = {'appid': api_key, 'units': 'metric', 'lang': 'zh_cn'}

    def get_current(self, city: str) -> dict:
        """获取当前天气"""
        response = self.session.get(
            f'{self.BASE_URL}/weather',
            params={'q': city}
        )
        response.raise_for_status()
        return response.json()

    def get_forecast(self, city: str, days: int = 5) -> list:
        """获取未来N天天气预报"""
        response = self.session.get(
            f'{self.BASE_URL}/forecast',
            params={'q': city, 'cnt': days * 8}   # 每3小时一条,8条=1天
        )
        response.raise_for_status()
        return response.json()['list']

    def format_weather(self, data: dict) -> str:
        """格式化天气数据为可读文本"""
        city       = data['name']
        country    = data['sys']['country']
        temp       = data['main']['temp']
        feels_like = data['main']['feels_like']
        humidity   = data['main']['humidity']
        desc       = data['weather'][0]['description']
        wind_speed = data['wind']['speed']

        return (f"📍 {city}, {country}n"
                f"🌡️  温度:{temp:.1f}°C(体感 {feels_like:.1f}°C)n"
                f"🌤️  天气:{desc}n"
                f"💧 湿度:{humidity}%n"
                f"💨 风速:{wind_speed} m/s")

    def close(self):
        self.session.close()

# 使用示例
# client = WeatherClient('your_api_key_here')
# try:
#     weather = client.get_current('Beijing')
#     print(client.format_weather(weather))
# finally:
#     client.close()

11.2 案例二:通用 API 客户端基类

import requests
import logging
import time
from typing import Any, Dict, Optional

logger = logging.getLogger(__name__)

class BaseAPIClient:
    """
    通用 REST API 客户端基类
    封装了常见的功能:认证、重试、错误处理、日志
    """

    def __init__(
        self,
        base_url: str,
        api_key: Optional[str] = None,
        timeout: tuple = (5, 30),
        max_retries: int = 3
    ):
        self.base_url   = base_url.rstrip('/')
        self.timeout    = timeout
        self.max_retries = max_retries

        self._session = requests.Session()
        self._session.headers.update({
            'Accept':       'application/json',
            'Content-Type': 'application/json',
            'User-Agent':   f'PythonAPIClient/1.0',
        })

        if api_key:
            self._session.headers['Authorization'] = f'Bearer {api_key}'

    def _request(
        self,
        method: str,
        endpoint: str,
        **kwargs
    ) -> requests.Response:
        """底层请求方法,含重试逻辑"""
        url = f"{self.base_url}/{endpoint.lstrip('/')}"

        for attempt in range(1, self.max_retries + 1):
            try:
                start    = time.perf_counter()
                response = self._session.request(
                    method, url, timeout=self.timeout, **kwargs
                )
                elapsed = time.perf_counter() - start

                logger.debug(
                    f"{method.upper()} {url} → "
                    f"{response.status_code} ({elapsed:.3f}s)"
                )

                response.raise_for_status()
                return response

            except requests.exceptions.HTTPError as e:
                if e.response.status_code < 500:
                    raise   # 4xx 客户端错误,不重试
                if attempt == self.max_retries:
                    raise
                wait = 2 ** (attempt - 1)
                logger.warning(f"服务器错误,{wait}s 后重试(第{attempt}次)...")
                time.sleep(wait)

            except (requests.exceptions.ConnectionError,
                    requests.exceptions.Timeout) as e:
                if attempt == self.max_retries:
                    raise
                wait = 2 ** (attempt - 1)
                logger.warning(f"网络错误({e}),{wait}s 后重试...")
                time.sleep(wait)

    def get(self, endpoint: str, params: Dict = None, **kwargs) -> Any:
        """GET 请求,自动解析 JSON"""
        response = self._request('GET', endpoint, params=params, **kwargs)
        return response.json() if response.content else None

    def post(self, endpoint: str, data: Any = None, **kwargs) -> Any:
        """POST 请求"""
        response = self._request('POST', endpoint, json=data, **kwargs)
        return response.json() if response.content else None

    def put(self, endpoint: str, data: Any = None, **kwargs) -> Any:
        """PUT 请求"""
        response = self._request('PUT', endpoint, json=data, **kwargs)
        return response.json() if response.content else None

    def patch(self, endpoint: str, data: Any = None, **kwargs) -> Any:
        """PATCH 请求"""
        response = self._request('PATCH', endpoint, json=data, **kwargs)
        return response.json() if response.content else None

    def delete(self, endpoint: str, **kwargs) -> bool:
        """DELETE 请求,返回是否成功"""
        response = self._request('DELETE', endpoint, **kwargs)
        return response.status_code in (200, 204)

    def close(self):
        self._session.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

# 基于基类实现具体客户端
class TodoAPIClient(BaseAPIClient):
    """JSONPlaceholder Todo API 客户端"""

    def __init__(self):
        super().__init__('https://jsonplaceholder.typicode.com')

    def list_todos(self, user_id: int = None, completed: bool = None):
        params = {}
        if user_id   is not None: params['userId']    = user_id
        if completed is not None: params['completed'] = completed
        return self.get('/todos', params=params)

    def get_todo(self, todo_id: int):
        return self.get(f'/todos/{todo_id}')

    def create_todo(self, title: str, user_id: int = 1):
        return self.post('/todos', {'title': title, 'completed': False, 'userId': user_id})

    def complete_todo(self, todo_id: int):
        return self.patch(f'/todos/{todo_id}', {'completed': True})

    def delete_todo(self, todo_id: int):
        return self.delete(f'/todos/{todo_id}')

# 使用
with TodoAPIClient() as client:
    todos = client.list_todos(user_id=1, completed=False)
    print(f"未完成的 Todo:{len(todos)} 条")

    todo = client.get_todo(1)
    print(f"第1条:{todo['title']}")

    new = client.create_todo('学习 requests 库')
    print(f"新建 Todo ID:{new['id']}")

    ok = client.complete_todo(1)
    print(f"标记完成:{ok}")

11.3 案例三:网页内容抓取

import requests
from html.parser import HTMLParser
import re

class LinkParser(HTMLParser):
    """简单的 HTML 链接解析器"""
    def __init__(self):
        super().__init__()
        self.links = []
        self.title = ''
        self._in_title = False

    def handle_starttag(self, tag, attrs):
        if tag == 'a':
            for name, value in attrs:
                if name == 'href' and value:
                    self.links.append(value)
        if tag == 'title':
            self._in_title = True

    def handle_data(self, data):
        if self._in_title:
            self.title += data

    def handle_endtag(self, tag):
        if tag == 'title':
            self._in_title = False

def scrape_page(url: str) -> dict:
    """
    抓取网页基本信息

    返回:{title, links, word_count, status_code}
    """
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                      'AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36',
        'Accept-Language': 'zh-CN,zh;q=0.9',
    }

    try:
        response = requests.get(url, headers=headers, timeout=(5, 15))
        response.raise_for_status()
        response.encoding = response.apparent_encoding   # 自动检测编码

        html = response.text

        # 解析 HTML
        parser = LinkParser()
        parser.feed(html)

        # 提取纯文本(去掉 HTML 标签)
        text       = re.sub(r'<[^>]+>', '', html)
        text       = re.sub(r's+', ' ', text).strip()
        word_count = len(text)

        return {
            'url':        url,
            'status':     response.status_code,
            'title':      parser.title.strip(),
            'links':      [l for l in parser.links if l.startswith('http')],
            'word_count': word_count,
            'size':       len(response.content),
        }

    except requests.exceptions.RequestException as e:
        return {'url': url, 'error': str(e)}

# 测试
result = scrape_page('https://www.python.org')

print(f"URL:   {result['url']}")
print(f"标题:  {result.get('title', 'N/A')}")
print(f"状态:  {result.get('status', 'N/A')}")
print(f"大小:  {result.get('size', 0) / 1024:.1f} KB")
print(f"字符数:{result.get('word_count', 0):,}")
print(f"外链数:{len(result.get('links', []))}")
if result.get('links'):
    print(f"前5个链接:")
    for link in result['links'][:5]:
        print(f"  {link}")

第十二部分:常见陷阱与最佳实践

12.1 陷阱1:忘记设置超时

import requests

# ❌ 没有 timeout:如果服务器不响应,程序会永远等待!
response = requests.get('https://httpbin.org/delay/99')

# ✅ 正确:永远要设置 timeout
response = requests.get('https://httpbin.org/get', timeout=(3, 10))

12.2 陷阱2:每次请求都创建新 Session(低效)

import requests

# ❌ 每次都新建连接,效率低
for url in urls:
    r = requests.get(url)   # 每次都是新的 TCP 连接

# ✅ 用 Session 复用连接(减少握手开销,速度快3-5倍)
with requests.Session() as session:
    for url in urls:
        r = session.get(url)   # 复用 TCP 连接

12.3 陷阱3:直接访问 response.json() 而不检查状态码

import requests

# ❌ 如果请求失败(如返回 404 HTML),json() 会报错
response = requests.get('https://httpbin.org/status/404')
data = response.json()   # JSONDecodeError!404 返回的是 HTML,不是 JSON

# ✅ 先检查状态码
response = requests.get('https://httpbin.org/status/404')
if response.status_code == 200:
    data = response.json()
else:
    print(f"请求失败:{response.status_code}")

# ✅ 或者用 raise_for_status
try:
    response = requests.get('https://httpbin.org/json')
    response.raise_for_status()
    data = response.json()
except requests.HTTPError:
    data = None

12.4 陷阱4:中文 URL 没有编码

import requests
from urllib.parse import quote

# ❌ 中文 URL 可能导致编码问题
url = 'https://www.example.com/搜索?q=Python教程'

# ✅ 用 params 参数(requests 自动编码)
response = requests.get(
    'https://www.example.com/搜索',
    params={'q': 'Python教程'}
)

# ✅ 或者手动编码
url = 'https://www.example.com/' + quote('搜索') + '?q=' + quote('Python教程')

12.5 陷阱5:Stream 下载时忘记迭代

import requests

# ❌ stream=True 但没有迭代,内容不会自动下载
response = requests.get('https://httpbin.org/image/png', stream=True)
# 此时只下载了响应头,没有下载正文!
content = response.content   # 这里才会触发下载,但 stream=True 的优势消失了

# ✅ 正确:stream=True 配合 iter_content
response = requests.get('https://httpbin.org/image/png', stream=True)
with open('image.png', 'wb') as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

第十三部分:完整速查表

📌 发送请求
  requests.get(url, params=, headers=, timeout=, auth=)
  requests.post(url, data=, json=, files=, headers=, timeout=)
  requests.put(url, json=, headers=, timeout=)
  requests.patch(url, json=, headers=, timeout=)
  requests.delete(url, headers=, timeout=)
  requests.head(url, headers=, timeout=)
  requests.request(method, url, ...)    ← 通用方法

📌 Response 对象
  .status_code       → 状态码(200, 404...)
  .ok                → 状态码 < 400 时为 True
  .reason            → 状态码描述('OK', 'Not Found')
  .text              → 响应文本(自动解码)
  .content           → 响应二进制内容
  .json()            → 解析 JSON,返回 dict/list
  .headers           → 响应头字典
  .cookies           → Cookie 字典
  .url               → 最终 URL(重定向后)
  .encoding          → 编码(可手动设置)
  .apparent_encoding → 自动检测的编码
  .elapsed           → 响应耗时(timedelta)
  .history           → 重定向历史
  .raise_for_status()→ 4xx/5xx 时抛出 HTTPError

📌 请求参数
  params=   → URL 查询参数(dict)
  data=     → 表单数据(dict)或字节
  json=     → JSON 数据(dict/list,自动序列化)
  files=    → 上传文件(dict)
  headers=  → 请求头(dict)
  cookies=  → Cookie(dict)
  auth=     → 认证(元组 或 Auth 对象)
  timeout=  → 超时(秒数 或 (connect, read) 元组)
  proxies=  → 代理(dict)
  verify=   → SSL 验证(True/False/证书路径)
  stream=   → 是否流式下载
  allow_redirects= → 是否允许重定向

📌 Session
  session = requests.Session()
  session.headers.update({})    → 设置全局请求头
  session.auth = (user, pass)   → 设置全局认证
  session.cookies                → Cookie Jar
  session.params                 → 全局查询参数
  session.proxies                → 全局代理
  session.close()                → 关闭

📌 异常
  RequestException               → 所有异常的基类
  ConnectionError                → 网络连接错误
  Timeout / ConnectTimeout / ReadTimeout
  HTTPError                      → raise_for_status() 触发
  TooManyRedirects               → 重定向超限
  MissingSchema / InvalidSchema  → URL 格式问题

总结

学完本章,你应该掌握:

  1. HTTP 基础:请求/响应结构、HTTP 方法、状态码的含义
  2. GET 请求params 参数自动编码、Response 对象的所有属性
  3. POST 请求data=(表单)vs json=(JSON API)的区别
  4. 请求头User-Agent 模拟浏览器、Authorization 认证
  5. 文件操作files= 上传、stream=True 流式下载大文件
  6. Session:保持 Cookie、复用连接提升性能
  7. 超时与重试timeout=(5,30) 防止卡死、自动重试应对网络抖动
  8. 错误处理raise_for_status() + 捕获各类异常
  9. 身份认证:Basic Auth、Token Auth 的使用方式

最常用的模式:

import requests

# 简单请求
response = requests.get('https://api.example.com/data',
                         params={'key': 'value'},
                         headers={'Authorization': 'Bearer token'},
                         timeout=(5, 30))
response.raise_for_status()
data = response.json()

# 发送 JSON
response = requests.post('https://api.example.com/create',
                          json={'name': '张三', 'age': 25},
                          headers={'Authorization': 'Bearer token'},
                          timeout=(5, 30))
response.raise_for_status()
result = response.json()

# Session(多次请求)
with requests.Session() as session:
    session.headers['Authorization'] = 'Bearer token'
    data1 = session.get('/endpoint1', timeout=10).json()
    data2 = session.get('/endpoint2', timeout=10).json()

发表评论