How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

Noah@lemmy.dbzer0.com · edit-2 1 hour ago

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

chicken@lemmy.dbzer0.com · 2 hours ago

The reason to use Selenium is if the website you want to scrape uses javascript in a way that inhibits getting content without a full browser environment. BeautifulSoup is just a parser, it can’t solve that problem.

aMockTie@beehaw.org · 56 minutes ago

In my experience, this scenario typically means that there is some sort of API (very likely undocumented) that is being used on the backend. That requires a bit more investigation and testing with browser developer tools, the JS Console, and often trial and error. But once you overcome that (admittedly very complex and technical) hurdle, you can almost always get away with just using the requests library at that point.

I’ve had to do that kind of thing more times than I’d like to admit, but the juice is almost always worth the squeeze.

Noah@lemmy.dbzer0.com · 1 hour ago

This was the original plan but it doesn’t work as well for this on ‘dynamic’ websites

chicken@lemmy.dbzer0.com · 51 minutes ago

IIRC it should be able to be made to work since it does everything a browser does, found this search result, though it has been a while since I used it myself at all. Another thing you might try that has worked for me is iMacros, that’s a little simpler and more basic than Selenium but should work for what you say you want to do.

Noah@lemmy.dbzer0.com · 38 minutes ago

I test with IDLE for python + use selenium for driver directory (geckodrive)

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

How do I use Open Source scrapers? (Selenium, Scrapy, etc.)

I have been trying for hours to figure this out. From a building tutorial to just trying to find prebuilt ones, I can’t seem to make it click.