+ 2

How to parse certain characters from html page?

I want to parse this string "863 - The Consummate Gentleman" from " 863 - The Consummate Gentleman ". I have "one_piece" string as a variable. I tired the code below with could parse the required string. ############ a = "one_piece" html = "http://readms.net/r/one_piece/863/4193/1">863 - The Consummate Gentleman " chap_name = re.findall(r'{}/\d*/\d*/(.*?)'.format(a), html) ############

python regular-expressions parsing

6th May 2017, 7:41 PM

buggythegret

2 Answers

+ 8

Your question is unclear ^^ What call you "parsing"? ( I didn't see the difference in << I want to parse this string "863 - The Consummate Gentleman" from " 863 - The Consummate Gentleman " >> appart of the last space character ) What are exactly your right data? ( it seems that your 'html' var is assigned with a quote error ) What's the code context? ( do you get your working data from a real web page code, or did you produce some source code extract... ) Anyway, if you are parsing real web page, you should study the html.parser Python module: https://docs.python.org/3/library/html.parser.html ... as the re module also, at least the findall() method, to know how handle you regex search result object :P

7th May 2017, 12:02 AM

visph