python学习笔记(三)

四级成绩是明天早上九点才出来的
但是学信网的查询入口加一个xff127.0.0.1就能查询到结果,可能是测试入口没有关闭
另外header还要加一个refer,然后用假期学的python知识写了个爬虫,还好只是一个get请求,很简单就搞定了。
中间没有注意readlines和readline的区别,出了一点问题。。。
然后又有全院的名单,就爬了全院的成绩。
很有成就感啊。有进步是件好事。
然后我四级过了。虽然分数不高,但对我自己来说已经满意了,一个愉快的晚上。
偷懒没有写文件写的操作。。明天上午满课,睡了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import urllib.request,urllib.parse,re
 
url&nbsp;`=&nbsp;http://www.chsi.com.cn/cet/query</div><div class="line number4 index3 alt1">header={'Referer':'[http://www.chsi.com.cn/cet/](http://www.chsi.com.cn/cet/)',</div><div class="line number5 index4 alt2">        'User-Agent':'Mozilla/5.0&nbsp;(Windows&nbsp;NT&nbsp;10.0;&nbsp;WOW64;&nbsp;rv:44.0)&nbsp;Gecko/20100101&nbsp;Firefox/44.0',</div><div class="line number6 index5 alt1">        'X-Forwarded-For':'127.0.0.1',</div><div class="line number7 index6 alt2">        'Host':'www.chsi.com.cn'}</div><div class="line number8 index7 alt1">date =`&nbsp;`open('C:/Users/north/Desktop/students.txt','r+')</div><div class="line number9 index8 alt2">while&nbsp;True:`</div><div class="line number10 index9 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;num =`&nbsp;`date.readline()`</div><div class="line number11 index10 alt2">`&nbsp;&nbsp;&nbsp;&nbsp;num =`&nbsp;`str(num).strip()#读取准考证号,文件格式是一行准考证号,一行姓名`</div><div class="line number12 index11 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;name =`&nbsp;`date.readline()`</div><div class="line number13 index12 alt2">`&nbsp;&nbsp;&nbsp;&nbsp;name =`&nbsp;`str(name).strip()#读取姓名`</div><div class="line number14 index13 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;len(name)&gt;3:`</div><div class="line number15 index14 alt2">`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name =`&nbsp;`name[:2]`</div><div class="line number16 index15 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;values =`&nbsp;`{‘zkzh’:num,`</div><div class="line number17 index16 alt2">`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;‘xm’:name}`</div><div class="line number18 index17 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;values =`&nbsp;`urllib.parse.urlencode(values)#返回zkzh=num&xm=name</div><div class="line number19 index18 alt2">    full_url&nbsp;=&nbsp;url+‘?’+values</div><div class="line number20 index19 alt1">    if`&nbsp;`num&nbsp;is&nbsp;“”:</div><div class="line number21 index20 alt2">        break`</div><div class="line number22 index21 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;req =`&nbsp;`urllib.request.Request(url&nbsp;=&nbsp;full_url,headers =`&nbsp;`header)`</div><div class="line number23 index22 alt2">`&nbsp;&nbsp;&nbsp;&nbsp;res=`&nbsp;`urllib.request.urlopen(req)`</div><div class="line number24 index23 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;page =res.read()</div><div class="line number25 index24 alt2">    page&nbsp;=&nbsp;page.decode('utf-8')</div><div class="line number26 index25 alt1">    page&nbsp;=&nbsp;page.replace('\n',’’)#去掉所有换行方便查找`</div><div class="line number27 index26 alt2">`&nbsp;&nbsp;&nbsp;&nbsp;try:`</div><div class="line number28 index27 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score =`&nbsp;`re.findall(r‘(colorRed”>.*?(\d+))’,str(page))#抓取总分</div><div class="line number29 index28 alt2">        score&nbsp;=&nbsp;str(score)`</div><div class="line number30 index29 alt1">`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;score =`&nbsp;`score[-6:-3]#处理出总分</div><div class="line number31 index30 alt2">        print(name+“ ”+score)</div><div class="line number32 index31 alt1">    except:</div><div class="line number33 index32 alt2">        print("Error",name)</div><div class="line number34 index33 alt1">   `