{"id":1192,"date":"2024-11-17T19:14:33","date_gmt":"2024-11-17T11:14:33","guid":{"rendered":"https:\/\/gemmartdesign.com\/?p=1192"},"modified":"2024-11-29T20:29:36","modified_gmt":"2024-11-29T12:29:36","slug":"python-post12","status":"publish","type":"post","link":"https:\/\/gemmartdesign.com\/?p=1192","title":{"rendered":"Python\u7df4\u7fd2#12-\u4f7f\u7528Beautifuls]Soup\u6a21\u7d44\u89e3\u6790\u7db2\u7ad9\u6293\u53d6\u8cc7\u6599"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">\u672c\u5468\u76ee\u6a19<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u5b78\u7fd2WebCrawler\u57fa\u672c\u7528\u6cd5\u8207\u61c9\u7528<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u4efb\u52d9<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u5982\u4f55\u8a2a\u554f\u7db2\u7ad9\u52a0\u5165\u700f\u89bd\u5668\u8cc7\u8a0a\u4e26\u53d6\u5f97\u7db2\u7ad9\u8cc7\u6599<\/li>\n\n\n\n<li>\u4f7f\u7528Beautifulsoup\u6293\u53d6\u7db2\u7ad9\u6a19\u984c<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">\u5c08\u6848\u7df4\u7fd2<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">\u76ee\u6a19: \u6293\u53d6PTT\u96fb\u5f71\u7248\u7b2c\u4e00\u9801\u6a19\u984c<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">\u7db2\u9801\u539f\u59cb\u78bc\u7bc4\u4f8b:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;div class=\"title\"&gt;\n\t\t\t\n\t\t\t\t&lt;a href=\"\/bbs\/movie\/M.1731766038.A.B52.html\"&gt;&#91;\u8acb\u76ca] \u9b54\u6cd5\u58de\u5973\u5deb4DX?&lt;\/a&gt;\n\t\t\t\n\t\t\t&lt;\/div&gt;<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">\u6587\u7ae0\u88ab\u522a\u9664\u7bc4\u4f8b<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;div class=\"title\"&gt;\n\t\t\t\n\t\t\t\t(\u672c\u6587\u5df2\u88ab\u522a\u9664) &#91;NTUIBrother]\n\t\t\t\n\t\t\t&lt;\/div&gt;<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">\u89c0\u5bdf\u539f\u59cb\u78bc:<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">\u767c\u73fe\u6bcf\u7bc7\u6587\u7ae0\u6a19\u984c\u90fd\u88ab\u4e00\u500b&lt;a&gt;\u6a19\u7c64\u5305\u88f9\uff0c\u518d\u88ab\u4e00\u500b&lt;div&gt;\u7684\u6a19\u7c64\u5305\u88f9<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u82e5\u6587\u7ae0\u88ab\u522a\u9664\u958b\u982d\u6c92\u6709&lt;a&gt;\u6a19\u7c64&gt;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\u7a0b\u5f0f\u78bc\u64b0\u5beb:<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\u5f15\u5165BeautifulSoup\u6a21\u7d44\u89e3\u6790HTML\uff0chtml.parser\u70baBeautifulSoip\u63d0\u4f9b\u7684\u89e3\u6790\u5668\u4e4b\u4e00<\/li>\n\n\n\n<li>\u5f9eHTML\u627e\u5230\u6240\u6709&lt;div&gt;\u7684\u6a19\u7c64\uff0cclass\u7684\u5c6c\u6027\u70ba&#8221;title&#8221;\u7684\u5143\u7d20\uff0cclass\u5f8c\u9762\u52a0\u4e0a_\u662f\u56e0\u70baPython\u5df2\u7d93\u5167\u5efaclass\u9019\u500b\u4fdd\u7559\u5b57\uff0c\u6240\u4ee5BeautifulSoup\u4f7f\u7528class_\u4f5c\u70ba\u66ff\u4ee3\uff0c\u907f\u514d\u885d\u7a81<\/li>\n\n\n\n<li>for\u8ff4\u5708\u627e\u5230\u6240\u6709&lt;div class=&#8221;title&#8221;&gt;\u7684\u6a19\u7c64\uff0c\u4e26\u6aa2\u67e5\u662f\u5426\u5b58\u5728&lt;a&gt;\u6a19\u7c64\uff0c\u907f\u514d\u51fa\u73feNone\u60c5\u6cc1(\u6587\u7ae0\u88ab\u522a\u9664)<\/li>\n\n\n\n<li>\u8f38\u51fa\u6240\u6709\u542b&lt;a&gt;\u6a19\u7c64\u7684\u6a19\u984c<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>import urllib.request as req\nurl=\"https:\/\/www.ptt.cc\/bbs\/movie\/index.html\"\nrequest=req.Request(url, headers={\n    \"user-agent\":\"Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/130.0.0.0 Safari\/537.36\"\n})\nwith req.urlopen(request) as response:\n    data=response.read().decode(\"utf-8\")\nimport bs4\nroot=bs4.BeautifulSoup(data,\"html.parser\")\ntitles=root.find_all(\"div\",class_=\"title\")\nfor title in titles:\n    if title.a:\n        print(title.a.string)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Output<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;\u65b0\u805e] \u91d1\u99ac61\u5f71\u5e1d\u7d44\u5206\u6790\u300a\u9918\u71fc\u300b\u5f35\u9707\uff1a\u8207\u300a\u7ddd\u9b42\n&#91;\u60c5\u5831] \u7834\u5730\u7344\u7968\u623f\u885d\u78346000\u842c\u6e2f\u5e63\n&#91;\u8a0e\u8ad6] (\u96f7)\u937e\u5b5f\u5b8f\u5766\u8a00\u300a\u9918\u71fc\u300b\u76ee\u6a19\u5f9e\u4e0d\u5728\u548c\u89e3\uff0c\u800c\u662f\n&#91;\u8a0e\u8ad6] \u60e1\u9b54\u6559\u5ba4\u7684\u8001\u5e2b(\u6709\u5287\u900f)\n&#91;\u65b0\u805e] \u300a\u5abd\u5abd\u54aa\u5440\uff013\u300b\u7c4c\u5283\u4e2d \u5e03\u6d1b\u65af\u5357\u6a02\u898b\u5176\u6210\n&#91;\u554f\u7247] \u897f\u6d0b\u6050\u6016\u7247?\n&#91;\u554f\u7247] \u8acb\u554f\u9019\u90e8\u97d3\u570b\u96fb\u5f71 or \u5f71\u96c6\u7684\u540d\u7a31\n&#91;\u8acb\u76ca] \u6709\u89d2\u982d\uff0d\u5927\u6a4b\u982d\uff0c\u6bcf\u4e00\u500b\u5e6b\u6d3e\u7684\u5408\u7167\u55ce??\n&#91;\u65b0\u805e] \u69ae\u8b7d\u5967\u65af\u5361\u81f4\u656c\u6606\u897f\u74ca\u65af\uff01\u7d05\u6bef\u773e\u661f\u96f2\u96c6\n&#91;\u516c\u544a] \u96fb\u5f71\u677f\u677f\u898f 2022\/12\/5\n&#91;\u516c\u544a] \u7981\u653f\u6cbb\u7248\u898f \u53ca \u6295\u7968\u7d50\u679c<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u672c\u5468\u76ee\u6a19 \u5b78\u7fd2WebCrawler\u57fa\u672c\u7528\u6cd5\u8207\u61c9\u7528 \u4efb\u52d9 \u5c08\u6848&hellip;<\/p>\n","protected":false},"author":1,"featured_media":1231,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","pagelayer_contact_templates":[],"_pagelayer_content":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[9],"tags":[],"class_list":["post-1192","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python"],"_links":{"self":[{"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/posts\/1192","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1192"}],"version-history":[{"count":3,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/posts\/1192\/revisions"}],"predecessor-version":[{"id":1236,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/posts\/1192\/revisions\/1236"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=\/wp\/v2\/media\/1231"}],"wp:attachment":[{"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1192"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1192"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gemmartdesign.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1192"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}