四级成绩是明天早上九点才出来的
但是学信网的查询入口加一个xff127.0.0.1就能查询到结果,可能是测试入口没有关闭
另外header还要加一个refer,然后用假期学的python知识写了个爬虫,还好只是一个get请求,很简单就搞定了。
中间没有注意readlines和readline的区别,出了一点问题。。。
然后又有全院的名单,就爬了全院的成绩。
很有成就感啊。有进步是件好事。
然后我四级过了。虽然分数不高,但对我自己来说已经满意了,一个愉快的晚上。
偷懒没有写文件写的操作。。明天上午满课,睡了
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | importurllib.request,urllib.parse,reurl `= “http://www.chsi.com.cn/cet/query“</div><div class="line number4 index3 alt1">header={'Referer':'[http://www.chsi.com.cn/cet/](http://www.chsi.com.cn/cet/)',</div><div class="line number5 index4 alt2">'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',</div><div class="line number6 index5 alt1">'X-Forwarded-For':'127.0.0.1',</div><div class="line number7 index6 alt2">'Host':'www.chsi.com.cn'}</div><div class="line number8 index7 alt1">date=` `open('C:/Users/north/Desktop/students.txt','r+')</div><div class="line number9 index8 alt2">while True:`</div><div class="line number10 index9 alt1">`    num=` `date.readline()`</div><div class="line number11 index10 alt2">`    num=` `str(num).strip()#读取准考证号,文件格式是一行准考证号,一行姓名`</div><div class="line number12 index11 alt1">`    name=` `date.readline()`</div><div class="line number13 index12 alt2">`    name=` `str(name).strip()#读取姓名`</div><div class="line number14 index13 alt1">`    if len(name)>3:`</div><div class="line number15 index14 alt2">`        name=` `name[:2]`</div><div class="line number16 index15 alt1">`    values=` `{‘zkzh’:num,`</div><div class="line number17 index16 alt2">`              ‘xm’:name}`</div><div class="line number18 index17 alt1">`    values=` `urllib.parse.urlencode(values)#返回zkzh=num&xm=name</div><div class="line number19 index18 alt2">full_url = url+‘?’+values</div><div class="line number20 index19 alt1">if` `num is “”:</div><div class="line number21 index20 alt2">break`</div><div class="line number22 index21 alt1">`    req=` `urllib.request.Request(url = full_url,headers=` `header)`</div><div class="line number23 index22 alt2">`    res=` `urllib.request.urlopen(req)`</div><div class="line number24 index23 alt1">`    page=res.read()</div><div class="line number25 index24 alt2">page = page.decode('utf-8')</div><div class="line number26 index25 alt1">page = page.replace('\n',’’)#去掉所有换行方便查找`</div><div class="line number27 index26 alt2">`    try:`</div><div class="line number28 index27 alt1">`        score=` `re.findall(r‘(colorRed”>.*?(\d+))’,str(page))#抓取总分</div><div class="line number29 index28 alt2">score = str(score)`</div><div class="line number30 index29 alt1">`        score=` `score[-6:-3]#处理出总分</div><div class="line number31 index30 alt2">print(name+“ ”+score)</div><div class="line number32 index31 alt1">except:</div><div class="line number33 index32 alt2">print("Error",name)</div><div class="line number34 index33 alt1">` |