Python爬虫实战，请求模块，Python实现抓取微博评论。-主机频道

摘要:前言用实现捕捉微博评论的数据，废话不多。让我们愉快地开始，开发工具，版本，模块，模块，模块和一些自包含的模块。环境被构建和安装并添加到环境变量中，并且可以安装所需的相关模块。

前言用Python抓取微博评论数据，废话不多说。

开心的开始吧~

开发工具**Python版本:**3.6.4

相关模块:

请求模块；

Re模块；

熊猫模块；

Lxml模块；

随机模块；

以及Python自带的一些模块。

在环境中安装Python，并将其添加到环境变量中。pip可以安装相关模块。

思路分析本文以微博热搜“亨利手写道歉信”为例，讲解如何抓取微博评论！

抓取评论页面地址

Https://m.weibo.cn/detail/4669040301182509网页分析

微博评论是动态加载的。进入浏览器的开发者工具后，在网页上向下拖拽，就可以得到我们需要的数据包。

获取真实的URL

https://m.weibo.cn/comments/hotflow? id = 4669040301182509 & mid = 4669040301182509 & max _ id _ type = 0 https://m . Weibo . cn/comments/hot flow？ID = 4669040301182509 & MID = 4669040301182509 & max _ ID = 3698934781006193 & max _ ID _ Type = 0两个URL的区别很明显。第一个URL没有参数max_id，第二个URL只出现在max_id之后，但是MAX _ ID = 0。

但是，有一点需要注意的是参数max_id_type，它实际上是变化的，所以我们需要从包中获取max_id_type。

代码实现

导入重新导入请求将熊猫导入为pd导入时间导入randomdf = pd。data frame()try:a = 1 while True:header = { " user -Agent ":" Mozilla/5.0(Windows NT 6.1；WOW64) AppleWebKit/537.36 (KHTML，像壁虎一样)Chrome/38 . 0 . 2125 . 122 UBrowser/4 . 0 . 3214 . 0 Safari/537.36 " } resposen = requests . get(" https://m . Weibo . cn/detail/4669040301182509 "，headers = header) #微博爬行了几十个页面左右，通过不断更新cookies，会让爬虫更持久...cookie = [cookie。响应中cookie的值。cookie]#登录后使用列表派生生成Cookie组件头= {# Cookie，登录后SUB使用“Cookie”:f " Weibo cn _ from = { Cookie[3]}；SUB =；_ T _ WM = { cookie[4]}；MLOGIN = { cookie[1]}；m _ Wei bocn _ PARAMS = { cookie[2]}；XSRF-TOKEN={cookie[0]}、" referer ":" https://m . Weibo . cn/detail/4669040301182509 "、" user -Agent ":" Mozilla/5.0(Windows NT 6.1；WOW64) AppleWebKit/537.36 (KHTML，像壁虎一样)Chrome/38 . 0 . 2125 . 122 UBrowser/4 . 0 . 3214 . 0 Safari/537.36 " }如果a = = 1:URL = " https://m . Weibo . cn/comments/hot flow？id = 4669040301182509 & mid = 4669040301182509 & max _ id _ type = 0 " else:URL = f " https://m . Weibo . cn/comments/hot flow？id = 4669040301182509 & mid = 4669040301182509 & max _ id = { max _ id } & max _ id _ type = { max _ id _ type } " html = requests . get(URL = URL，headers=headers)。JSON()data = html[" data "]max_id = data[" max _ id "]#获取max _ id和max_id_type并返回下一个URL max _ id _ type = data[" max _ id _ type "]for I in data[" data "]:screen _ name = I[" user "]。[" screen _ name "]I _ d = I[" user "][" ID "]like _ count = I[" like _ count "]# likes created _ at = I[" created _ at "]# Time text = re sub(r "]* > "，""，I[" text "])# Comment print(text)Data _ JSON = PD . Data frame({ " screen _ name ":[screen _ name]，" i_d": [i_d]，" like_count": [like_count]，" created _ count]睡眠(随机。uniform (2，7)) A+= 1，除了作为e的例外:print (e) df。to _ csv("微博。csv”，encoding = "utf-8 "，mode = "a+"，index = false)。

Python爬虫实战，请求模块，Python实现抓取微博评论。

相关推荐

评论抢沙发

评论前必须登录！

交流互动

热门推荐

相关推荐

评论 抢沙发

评论前必须登录！

交流互动

热门推荐

切换注册登录

用户名或邮箱

密码

切换登录注册

昵称

邮箱

评论抢沙发