국립부경대학교 | 디지털스마트부산 아카데미

작성자,작성일,첨부파일,조회수로 작성된 표
웹 크롤링
작성일	2024-12-19	조회수	311
첨부파일
!pip install bs4 import bs4 import urllib.request from urllib.request import urlopen url="https://news.naver.com" html=urlopen(url) bs_obj=bs4.BeautifulSoup(html,"html.parser") div1=bs_obj.findAll("div",{"class":"main_brick_item _channel_main_news_card_wrapper"}) media=[] contents=[] contents2=[] temp=[] url=[] temp3=[] temp4=[] for i in div1: media.append(i.find("em",{"class":"cnf_journal_name"}).text) temp1=i.findAll("a",{"class":"_cds_link"})[0].find("strong").text temp.append(temp1) temp1=i.findAll("li",{"class":"cnf_news_item"}) temp2=i.findAll("a",{"class":"_cds_link"}) for j in temp1: temp.append(j.find("a").text) for k in temp2: temp3.append(k['href']) contents.append(temp) temp=[] url.append(temp3) temp3=[] for ii in url: for jj in ii: try: html1=urlopen(jj) bs_obj1=bs4.BeautifulSoup(html1,"html.parser") div2=bs_obj1.find("div",{"class":"newsct_article _article_body"}) temp4.append(div2.text.replace("\n","")) except: temp4.append("기사를 찾을 수 없음!") contents2.append(temp4) temp4=[]

목록보기

다음	[24.12.26.] GIS 및 종합문제
이전	[24.12.03] 자료