How to extract all the text from a website to one file | Sololearn: Learn to code for FREE!
New course! Every coder should learn Generative AI!
Try a free lesson
+ 1

How to extract all the text from a website to one file

I want to extract all the text from a simple website. But I don't the id or pictures and so on in ... the problem is that there is just too much stuff to take care of that I need to remove ... i mean the irrelevant tags that don't make the text like <div> or <table> ... so how could I get around it.. Oh and my language is php

24th Sep 2017, 9:36 PM
Victor Cislari
Victor Cislari - avatar
2 Answers
+ 2
Use file_get_contents to get all the website contents: $contents = file_get_contents('http://www.blogger.com/'); Form regex pattern to filter tags, $pattern. Then use preg_match to get the filtered text: preg_match($pattern, $contents, $matches, PREG_OFFSET_CAPTURE, 0); print_r($matches);
25th Sep 2017, 6:51 AM
Calviղ
Calviղ - avatar
+ 1
use curl
25th Sep 2017, 1:31 AM
Ali Sawari
Ali Sawari - avatar