New course! Every coder should learn Generative AI!
Try a free lesson+ 1
I'm not going to give you the code, but I'll give you a few pointers.
you can use the re module for matching text within HTML tags.
to match all text within a p tag you can do
re.search('<p>(.*?)</p>', htmltext)
to match everything inside a table you can do:
re.search('<table>(.*?)</table>', htmltext, re.DOTALL | re.MULTILINE)
to match all table headers:
re.findall('<th>(.*?)</th>', htmltext)