this post was submitted on 07 Sep 2022
12 points (100.0% liked)

Piracy

22351 readers
22 users here now

Welcome to /c/piracy

No netflix or streaming services landlubbers allowed, this is pirates territory.

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] commetaltw@lemmygrad.ml 2 points 2 years ago (3 children)

my first question would be what book is it? because some might have already published a pirated copy of it somewhere. might just be difficult to find.

i saw someone mentioning making a wget script to download each page, which is what i would start thinking.

the other thing that would be worth trying is the wkhtmltopdf tool written in python. it can be easily installed through pypi with pip. not sure what it would take to crawl and paginate the whole book without trying to experiment with the website tho. if there's authentication involved and api layer security for their system it may become difficult/cumbersome to script downloading it

crawling and paginating the webpages can be done with python too, using scrapy, or python's own urlib/http.parser libraries. pretty sure scrappy is the framework that uses a browser to scrape the web with, so you can do something like login on the webpage without worrying about something like headers in authentication requests through curl or wget, or without worrying about some crud/rest library for interacting with an api

it's one of those problems where there's no easy general solution to if the system, the website op is using, itself does not provide it as feature and heavily drm's/restricts access