웹 크롤링 | |||
작성일 | 2024-12-19 | 조회수 | 11 |
---|---|---|---|
첨부파일 | |||
!pip install bs4 import bs4 import urllib.request from urllib.request import urlopen url="https://news.naver.com" html=urlopen(url) bs_obj=bs4.BeautifulSoup(html,"html.parser") div1=bs_obj.findAll("div",{"class":"main_brick_item _channel_main_news_card_wrapper"}) media=[] contents=[] contents2=[] temp=[] url=[] temp3=[] temp4=[] for i in div1: media.append(i.find("em",{"class":"cnf_journal_name"}).text) temp1=i.findAll("a",{"class":"_cds_link"})[0].find("strong").text temp.append(temp1) temp1=i.findAll("li",{"class":"cnf_news_item"}) temp2=i.findAll("a",{"class":"_cds_link"}) for j in temp1: temp.append(j.find("a").text) for k in temp2: temp3.append(k['href']) contents.append(temp) temp=[] url.append(temp3) temp3=[] for ii in url: for jj in ii: try: html1=urlopen(jj) bs_obj1=bs4.BeautifulSoup(html1,"html.parser") div2=bs_obj1.find("div",{"class":"newsct_article _article_body"}) temp4.append(div2.text.replace("\n","")) except: temp4.append("기사를 찾을 수 없음!") contents2.append(temp4) temp4=[] |
다음 | 다음 게시글이 없습니다. |
---|---|
이전 | [24.12.03] 자료 |