+ 1

How to filter and modify Beautiful soup results "python" , before or after write it in .txt/.scv

Hi .. i made a script to scrap data from web page with soup tool in python.. and i found that the code work fine but with some <div codes appears !! i want to filter this line or edit it by delete or replace .. i tried a lot of .replace code without success !! any help

python3 data beautiful soup

16th Jul 2018, 7:51 PM

Ahmed ik

4 Answers

+ 3

Beautifulsoup lets you parse HTML document, so you can dig in the tree down to the tags' "clean" values. Just open the source HTML and observe the structure. You shouldn't use .replace unless you operate on the values themselves or the HTML is garbled beyond comprehension. If you have problems, please share the code you wrote, so we can check it out.

16th Jul 2018, 8:16 PM

Kuba Siekierzyński

+ 1

Can you give the code? We can't help you without the code.

17th Jul 2018, 12:04 PM

Christopher

+ 1

ok i faced two problems first this one and second is i want to import links from csv file to my code in for loop if you can help this is the code quote_page = ["my link"] data = [] for pg in quote_page: page = urlopen(pg) soup = BeautifulSoup(page, "html.parser") name_box = soup.find ("div", attrs={'id':'information'}) name=name_box.text.strip() name="".join(re.split("\s+,name,flags=re.UNICODE)) sys.stdout.write(name)

18th Jul 2018, 7:48 PM

Ahmed ik

+ 1

No 1. You don't need to use attrs={'id':'information'}, only write {'id':'information'}. No 2. We also need the HTML structure to be parsed.

20th Jul 2018, 10:52 AM

Christopher