Xpath基本使用-白红宇

Xpath基本使用

阅读量：2055 次

发布时间：2019-04-28

本文共 933 字，大约阅读时间需要 3 分钟。

1.导入所需库

import requestsfrom lxml import etree

2.使用ruquests解析网页

def get_page(url):    try:        headers = {
               "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}        res = requests.get(url=url, headers=headers)        res.encoding = 'utf-8'        if res.status_code == 200:            return res.text        else:            return None    except Exception:        return None

3.提取内容

if __name__ == '__main__':    url = 'https://www.*****.com/case/'    res = get_one_page(url)    tree = etree.HTML(res)    cons = tree.xpath('//div[@id="case_list"]/div')  # 返回case_list下所有div    con = tree.xpath('//div[@id="case_list"]/div[1]')  # 返回case_list下第一个div    con1 = tree.xpath('//div[@id="case_list"]/div[1]/div/a/@href')  # 返回case_list下第一个div下div下a的属性值    for con in cons:        href = con.xpath('./div/a/@href')  # ./表示当前标签        print(href)

转载地址：http://zwclf.baihongyu.com/

你可能感兴趣的文章

(PAT 1115) Counting Nodes in a BST (二叉查找树-统计指定层元素个数)

查看>>

(PAT 1143) Lowest Common Ancestor (二叉查找树的LCA)

查看>>

(PAT 1061) Dating (字符串处理)

查看>>

(PAT 1118) Birds in Forest (并查集)

查看>>

数据结构拓扑排序

查看>>