python微信函數(shù) python一微信

python讀取微信內(nèi)存

python獲取微信用戶信息、視頻、文件路徑

站在用戶的角度思考問題，與客戶深入溝通，找到洪洞網(wǎng)站設(shè)計(jì)與洪洞網(wǎng)站推廣的解決方案，憑借多年的經(jīng)驗(yàn)，讓設(shè)計(jì)與互聯(lián)網(wǎng)技術(shù)結(jié)合，創(chuàng)造個(gè)性化、用戶體驗(yàn)好的作品，建站類型包括：網(wǎng)站制作、成都網(wǎng)站建設(shè)、企業(yè)官網(wǎng)、英文網(wǎng)站、手機(jī)端網(wǎng)站、網(wǎng)站推廣、空間域名、網(wǎng)絡(luò)空間、企業(yè)郵箱。業(yè)務(wù)覆蓋洪洞地區(qū)。

Python源碼：

def wx_path():

## 獲取當(dāng)前用戶名

users = os.path.expandvars('$HOMEPATH')

## 找到3ebffe94.ini配置文件

f = open(r'C:' + users + '\\AppData\\Roaming\\Tencent\\WeChat\\All Users\\config\\3ebffe94.ini')

##讀取文件將路徑放到wx_location變量里

if f == 'MyDocument:':

wx_location = 'C:' + users + '\Documents\WeChat Files'

else:

wx_location = f.read() + "\WeChat Files"

## 輸出路徑信息

print(f"\n微信用戶信息路徑: \"{wx_location}\"\n")

# 程序入口

python微信回復(fù)excel數(shù)據(jù)

要在Python中回復(fù)Excel數(shù)據(jù)，首先你需要安裝xlrd庫，它可以幫助你讀取Excel文件，接下來使用read_excel()函數(shù)來讀取Excel文件，最后使用微信提供的接口回復(fù)讀取到的數(shù)據(jù)。

用python怎么設(shè)計(jì)一個(gè)微信的接口

最近一段時(shí)間想看看能不能用萬能的python來對(duì)微信進(jìn)行一些操作（比如自動(dòng)搶紅包之類的...hahahaha），所以就在此記錄一下啦~~

1、安裝

sudo pip install itchat

2、登錄

itchat.auto_login()

注：itchat.auto_login()這種方法將會(huì)通過微信掃描二維碼登錄，但是這種登錄的方式確實(shí)短時(shí)間的登錄，并不會(huì)保留登錄的狀態(tài)，也就是下次登錄時(shí)還是需要掃描二維碼，如果加上hotReload==True,那么就會(huì)保留登錄的狀態(tài)，至少在后面的幾次登錄過程中不會(huì)再次掃描二維碼，該參數(shù)生成一個(gè)靜態(tài)文件itchat.pkl用于存儲(chǔ)登錄狀態(tài)

itchat.auto_login(hotReload=True)

3、退出登錄

主要使用的是回調(diào)函數(shù)的方法,登錄完成后的方法需要賦值在 loginCallback中退出后的方法,需要賦值在 exitCallback中.若不設(shè)置 loginCallback的值, 將會(huì)自動(dòng)刪除二維碼圖片并清空命令行顯示.

import itchat,time

def lcb():

print("登錄完成！")

def ecb():

print("退出成功！")

itchat.auto_login(loginCallback=lcb,exitCallback=ecb) #源碼中規(guī)定需要用回調(diào)函數(shù)。

time.sleep(10)

itchat.logout() ?#強(qiáng)制退出登錄

4、發(fā)送消息

send()

itchat.send(msg="WANGPC的微信消息！",toUserName="filehelper") #返回值為True或Flase

實(shí)例：

或者：

send_msg

send_msg(msg='Text Message', toUserName=None),其中的的msg是要發(fā)送的文本，toUserName是發(fā)送對(duì)象, 如果留空, 將發(fā)送給自己，返回值為True或者False

實(shí)例代碼

send_file

send_file(fileDir, toUserName=None) fileDir是文件路徑, 當(dāng)文件不存在時(shí), 將打印無此文件的提醒，返回值為True或者False

實(shí)例代碼

send_image

send_image(fileDir, toUserName=None) 參數(shù)同上

實(shí)例代碼

send_video

send_video(fileDir, toUserName=None) 參數(shù)同上

實(shí)例代碼

python怎么抓取微信閱

抓取微信公眾號(hào)的文章

一.思路分析

目前所知曉的能夠抓取的方法有：

1、微信APP中微信公眾號(hào)文章鏈接的直接抓?。?mid=2735446906idx=1sn=ece37deaba0c8ebb9badf07e5a5a3bd3scene=0#rd）

2、通過微信合作方搜狗搜索引擎（），發(fā)送相應(yīng)請(qǐng)求來間接抓取

第1種方法中，這種鏈接不太好獲取，而且他的規(guī)律不是特別清晰。

因此本文采用的是方法2----通過給 weixin.sogou.com 發(fā)送即時(shí)請(qǐng)求來實(shí)時(shí)解析抓取數(shù)據(jù)并保存到本地。

二.爬取過程

1、首先在搜狗的微信搜索頁面測試一下，這樣能夠讓我們的思路更加清晰

在搜索引擎上使用微信公眾號(hào)英文名進(jìn)行“搜公眾號(hào)”操作（因?yàn)楣娞?hào)英文名是公眾號(hào)唯一的，而中文名可能會(huì)有重復(fù)，同時(shí)公眾號(hào)名字一定要完全正確，不然可能搜到很多東西，這樣我們可以減少數(shù)據(jù)的篩選工作，只要找到這個(gè)唯一英文名對(duì)應(yīng)的那條數(shù)據(jù)即可），即發(fā)送請(qǐng)求到';query=%sie=utf8_sug_=n_sug_type_= ' % ?'python'，并從頁面中解析出搜索結(jié)果公眾號(hào)對(duì)應(yīng)的主頁跳轉(zhuǎn)鏈接。

2.獲取主頁入口內(nèi)容

使用request , urllib,urllib2,或者直接使用webdriver+phantomjs等都可以

這里使用的是request.get()的方法獲取入口網(wǎng)頁內(nèi)容

[python]?view plain?copy

#?爬蟲偽裝頭部設(shè)置

self.headers?=?{'User-Agent':?'Mozilla/5.0?(Windows?NT?6.3;?WOW64;?rv:51.0)?Gecko/20100101?Firefox/51.0'}

#?設(shè)置操作超時(shí)時(shí)長

self.timeout?=?5

#?爬蟲模擬在一個(gè)request.session中完成

self.s?=?requests.Session()

[python]?view plain?copy

#搜索入口地址，以公眾為關(guān)鍵字搜索該公眾號(hào)

def?get_search_result_by_keywords(self):

self.log('搜索地址為：%s'?%?self.sogou_search_url)

return?self.s.get(self.sogou_search_url,?headers=self.headers,?timeout=self.timeout).content

3.獲取公眾號(hào)地址

從獲取到的網(wǎng)頁內(nèi)容中，得到公眾號(hào)主頁地址，這一步驟有很多方法， beautifulsoup、webdriver，直接使用正則，pyquery等都可以

這里使用的是pyquery的方法來查找公眾號(hào)主頁入口地址

[python]?view plain?copy

#獲得公眾號(hào)主頁地址

def?get_wx_url_by_sougou_search_html(self,?sougou_search_html):

doc?=?pq(sougou_search_html)

#print?doc('p[class="tit"]')('a').attr('href')

#print?doc('div[class=img-box]')('a').attr('href')

#通過pyquery的方式處理網(wǎng)頁內(nèi)容，類似用beautifulsoup，但是pyquery和jQuery的方法類似，找到公眾號(hào)主頁地址

return?doc('div[class=txt-box]')('p[class=tit]')('a').attr('href')

4.獲取公眾號(hào)主頁的文章列表

首先需要加載公眾號(hào)主頁，這里用的是phantomjs+webdriver, 因?yàn)檫@個(gè)主頁的內(nèi)容需要JS 渲染加載，采用之前的方法只能獲得靜態(tài)的網(wǎng)頁內(nèi)容

[python]?view plain?copy

#使用webdriver?加載公眾號(hào)主頁內(nèi)容，主要是js渲染的部分

def?get_selenium_js_html(self,?url):

browser?=?webdriver.PhantomJS()

browser.get(url)

time.sleep(3)

#?執(zhí)行js得到整個(gè)頁面內(nèi)容

html?=?browser.execute_script("return?document.documentElement.outerHTML")

return?html

得到主頁內(nèi)容之后，獲取文章列表，這個(gè)文章列表中有我們需要的內(nèi)容

[python]?view plain?copy

#獲取公眾號(hào)文章內(nèi)容

def?parse_wx_articles_by_html(self,?selenium_html):

doc?=?pq(selenium_html)

print?'開始查找內(nèi)容msg'

return?doc('div[class="weui_media_box?appmsg"]')

#有的公眾號(hào)僅僅有10篇文章，有的可能多一點(diǎn)

#return?doc('div[class="weui_msg_card"]')#公眾號(hào)只有10篇文章文章的

5.解析每一個(gè)文章列表，獲取我們需要的信息

6.處理對(duì)應(yīng)的內(nèi)容

包括文章名字，地址，簡介，發(fā)表時(shí)間等

7.保存文章內(nèi)容

以html的格式保存到本地

同時(shí)將上一步驟的內(nèi)容保存成excel 的格式

8.保存json數(shù)據(jù)

這樣，每一步拆分完，爬取公眾號(hào)的文章就不是特別難了。

三、源碼

第一版源碼如下：

[python]?view plain?copy

#!/usr/bin/python

#?coding:?utf-8

import?sys

reload(sys)

sys.setdefaultencoding('utf-8')

from?urllib?import?quote

from?pyquery?import?PyQuery?as?pq

from?selenium?import?webdriver

import?requests

import?time

import?re

import?json

import?os

class?weixin_spider:

def?__init__(self,?kw):

'?構(gòu)造函數(shù)?'

self.kw?=?kw

#?搜狐微信搜索鏈接

#self.sogou_search_url?=?';query=%sie=utf8_sug_=n_sug_type_='?%?quote(self.kw)

self.sogou_search_url?=?';query=%sie=utf8s_from=input_sug_=n_sug_type_='?%?quote(self.kw)

#?爬蟲偽裝

self.headers?=?{'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?WOW64;?rv:47.0)?Gecko/20100101?FirePHP/0refox/47.0?FirePHP/0.7.4.1'}

#?操作超時(shí)時(shí)長

self.timeout?=?5

self.s?=?requests.Session()

def?get_search_result_by_kw(self):

self.log('搜索地址為：%s'?%?self.sogou_search_url)

return?self.s.get(self.sogou_search_url,?headers=self.headers,?timeout=self.timeout).content

def?get_wx_url_by_sougou_search_html(self,?sougou_search_html):

'?根據(jù)返回sougou_search_html，從中獲取公眾號(hào)主頁鏈接?'

doc?=?pq(sougou_search_html)

#print?doc('p[class="tit"]')('a').attr('href')

#print?doc('div[class=img-box]')('a').attr('href')

#通過pyquery的方式處理網(wǎng)頁內(nèi)容，類似用beautifulsoup，但是pyquery和jQuery的方法類似，找到公眾號(hào)主頁地址

return?doc('div[class=txt-box]')('p[class=tit]')('a').attr('href')

def?get_selenium_js_html(self,?wx_url):

'?執(zhí)行js渲染內(nèi)容，并返回渲染后的html內(nèi)容?'

browser?=?webdriver.PhantomJS()

browser.get(wx_url)

time.sleep(3)

#?執(zhí)行js得到整個(gè)dom

html?=?browser.execute_script("return?document.documentElement.outerHTML")

return?html

def?parse_wx_articles_by_html(self,?selenium_html):

'?從selenium_html中解析出微信公眾號(hào)文章?'

doc?=?pq(selenium_html)

return?doc('div[class="weui_msg_card"]')

def?switch_arctiles_to_list(self,?articles):

'?把a(bǔ)rticles轉(zhuǎn)換成數(shù)據(jù)字典?'

articles_list?=?[]

i?=?1

if?articles:

for?article?in?articles.items():

self.log(u'開始整合(%d/%d)'?%?(i,?len(articles)))

articles_list.append(self.parse_one_article(article))

i?+=?1

#?break

return?articles_list

def?parse_one_article(self,?article):

'?解析單篇文章?'

article_dict?=?{}

article?=?article('.weui_media_box[id]')

title?=?article('h4[class="weui_media_title"]').text()

self.log('標(biāo)題是：?%s'?%?title)

url?=?''?+?article('h4[class="weui_media_title"]').attr('hrefs')

self.log('地址為：?%s'?%?url)

summary?=?article('.weui_media_desc').text()

self.log('文章簡述：?%s'?%?summary)

date?=?article('.weui_media_extra_info').text()

self.log('發(fā)表時(shí)間為：?%s'?%?date)

pic?=?self.parse_cover_pic(article)

content?=?self.parse_content_by_url(url).html()

contentfiletitle=self.kw+'/'+title+'_'+date+'.html'

self.save_content_file(contentfiletitle,content)

return?{

'title':?title,

'url':?url,

'summary':?summary,

'date':?date,

'pic':?pic,

'content':?content

}

def?parse_cover_pic(self,?article):

'?解析文章封面圖片?'

pic?=?article('.weui_media_hd').attr('style')

p?=?re.compile(r'background-image:url(.??)')

rs?=?p.findall(pic)

self.log(?'封面圖片是：%s?'?%?rs[0]?if?len(rs)??0?else?'')

return?rs[0]?if?len(rs)??0?else?''

def?parse_content_by_url(self,?url):

'?獲取文章詳情內(nèi)容?'

page_html?=?self.get_selenium_js_html(url)

return?pq(page_html)('#js_content')

def?save_content_file(self,title,content):

'?頁面內(nèi)容寫入文件?'

with?open(title,?'w')?as?f:

f.write(content)

def?save_file(self,?content):

'?數(shù)據(jù)寫入文件?'

with?open(self.kw+'/'+self.kw+'.txt',?'w')?as?f:

f.write(content)

def?log(self,?msg):

'?自定義log函數(shù)?'

print?u'%s:?%s'?%?(time.strftime('%Y-%m-%d?%H:%M:%S'),?msg)

def?need_verify(self,?selenium_html):

'?有時(shí)候?qū)Ψ綍?huì)封鎖ip，這里做一下判斷，檢測html中是否包含id=verify_change的標(biāo)簽，有的話，代表被重定向了，提醒過一陣子重試?'

return?pq(selenium_html)('#verify_change').text()?!=?''

def?create_dir(self):

'創(chuàng)建文件夾'

if?not?os.path.exists(self.kw):

os.makedirs(self.kw)

def?run(self):

'?爬蟲入口函數(shù)?'

#Step?0?：??創(chuàng)建公眾號(hào)命名的文件夾

self.create_dir()

#?Step?1：GET請(qǐng)求到搜狗微信引擎，以微信公眾號(hào)英文名稱作為查詢關(guān)鍵字

self.log(u'開始獲取，微信公眾號(hào)英文名為：%s'?%?self.kw)

self.log(u'開始調(diào)用sougou搜索引擎')

sougou_search_html?=?self.get_search_result_by_kw()

#?Step?2：從搜索結(jié)果頁中解析出公眾號(hào)主頁鏈接

self.log(u'獲取sougou_search_html成功，開始抓取公眾號(hào)對(duì)應(yīng)的主頁wx_url')

wx_url?=?self.get_wx_url_by_sougou_search_html(sougou_search_html)

self.log(u'獲取wx_url成功，%s'?%?wx_url)

#?Step?3：Selenium+PhantomJs獲取js異步加載渲染后的html

self.log(u'開始調(diào)用selenium渲染html')

selenium_html?=?self.get_selenium_js_html(wx_url)

#?Step?4:?檢測目標(biāo)網(wǎng)站是否進(jìn)行了封鎖

if?self.need_verify(selenium_html):

self.log(u'爬蟲被目標(biāo)網(wǎng)站封鎖，請(qǐng)稍后再試')

else:

#?Step?5:?使用PyQuery，從Step?3獲取的html中解析出公眾號(hào)文章列表的數(shù)據(jù)

self.log(u'調(diào)用selenium渲染html完成，開始解析公眾號(hào)文章')

articles?=?self.parse_wx_articles_by_html(selenium_html)

self.log(u'抓取到微信文章%d篇'?%?len(articles))

#?Step?6:?把微信文章數(shù)據(jù)封裝成字典的list

self.log(u'開始整合微信文章數(shù)據(jù)為字典')

articles_list?=?self.switch_arctiles_to_list(articles)

#?Step?7:?把Step?5的字典list轉(zhuǎn)換為Json

self.log(u'整合完成，開始轉(zhuǎn)換為json')

data_json?=?json.dumps(articles_list)

#?Step?8:?寫文件

self.log(u'轉(zhuǎn)換為json完成，開始保存json數(shù)據(jù)到文件')

self.save_file(data_json)

self.log(u'保存完成，程序結(jié)束')

#?main

if?__name__?==?'__main__':

gongzhonghao=raw_input(u'輸入要爬取的公眾號(hào)')

if?not?gongzhonghao:

gongzhonghao='python6359'

weixin_spider(gongzhonghao).run()

第二版代碼：

對(duì)代碼進(jìn)行了一些優(yōu)化和整改，主要：

1.增加了excel存貯

2.對(duì)獲取文章內(nèi)容規(guī)則進(jìn)行修改

3.豐富了注釋

本程序已知缺陷：如果公眾號(hào)的文章內(nèi)容包括視視頻，可能會(huì)報(bào)錯(cuò)。

[python]?view plain?copy

#!/usr/bin/python

#?coding:?utf-8

名稱欄目：python微信函數(shù) python一微信
網(wǎng)站網(wǎng)址：http://muchs.cn/article16/doshsgg.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供商城網(wǎng)站、用戶體驗(yàn)、網(wǎng)站策劃、面包屑導(dǎo)航、服務(wù)器托管、營銷型網(wǎng)站建設(shè)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容