+ 1

Need help with code

I want to create a folder and scrape the images from a website into the folder, the code creates the file but doesn’t download any images, A little help would be great, Need to run in IDE because of import modules https://code.sololearn.com/c9Oy9Gmw7WNK/?ref=app

.py

10th Feb 2020, 11:32 AM

Fancy Bear

14 Respuestas

+ 1

Yes I know about the request library but the code doesn’t work when I try it in Pycharm

10th Feb 2020, 1:38 PM

Fancy Bear

requests library isn't install in sololearn. That's the same thing about urllib2, too. So you can't use them.

10th Feb 2020, 11:47 AM

ΛM!N

Oh, sorry for my mistake. I tried to print soup and see page's content and it looks like that cloudflare system asks for captcha to prove that a human is sending the request and website doesn't give you real content including model images you're looking for.

10th Feb 2020, 2:59 PM

ΛM!N

you mean the pexels.com website asks for a captcha ?

10th Feb 2020, 5:39 PM

Fancy Bear

Yes, at least when using requests library.

10th Feb 2020, 5:45 PM

ΛM!N

ok Thanks I’ll try something else

10th Feb 2020, 5:47 PM

Fancy Bear

OK, I’ve tried with other websites and it’s working, my problem now is it only downloads the images from the first page I’m guessing I’ll probably need to set some condition using Selenium Webdriver, Are you familiar yourself with Selenium ?

11th Feb 2020, 11:13 AM

Fancy Bear

Yes, but just a bit Are you sure you need to use selenium? How much of website images do you want to download?(I mean images of what other pages do you want to download)

11th Feb 2020, 11:17 AM

ΛM!N

I’m just doing it as a learning exercise so what I’m looking to do is to automate my browser to navigate to all the pages of a website and scrape/download the images

11th Feb 2020, 1:35 PM

Fancy Bear

As an idea, you can make a list of urls that you have copied. Then collect urls of the home page and if they weren't in the list, download its images and then add url to that list. I suppose the script will end up with all website's images becoming downloaded

11th Feb 2020, 2:21 PM

ΛM!N

Sorry I’m having a little problem understanding what you mean

11th Feb 2020, 3:47 PM

Fancy Bear

I mean get url tags of home page. And then get their content and download their images as you have done for homepage. But some urls may download more than one time. To prevent this, you can create a list and after downloading a url images, add that url to the list and before downloading a url check if it doesn't exist in the list.

11th Feb 2020, 4:08 PM

ΛM!N

ok now I understand, Thanks

11th Feb 2020, 5:52 PM

Fancy Bear

you can use termux for android

12th Feb 2020, 7:55 AM

Sarvesh Yadav