Poll of the Day > What is the worst thing that could happen if somebody web-scraped a site without

Topic List
Page List: 1
EclairReturns
06/07/20 2:28:12 AM
#1:


the owner's permission? Of the first two sites I thought of trying on my own, the first returned a bunch of HTML and the second returned a 403 status code, which I learned roughly means that they received my request to query their site by that means, and that they do not authorize it. I am very new to the programming scene, and I initially got a book to learn about this because I thought it would be neat to have a mechanism to instantly fetch Fire Emblem data for instant look-up without entering stuff into the Mozilla Firefox internet browser in order to do so. I am aware that there I imagine that a lot of sites just deter web-scraping through certain counter-measures, followed by an error message explicitly stating that such measures are unsolicited, such as the second site I had attempted this on. I do imagine that if I were to web-scrape a site legally, I would need to ask the owner's permission first. But I am much too afraid of the repercussions that could be quite troublesome for me if I had chosen to go ahead without his or her expressly written say-so. I know that asking for permission before doing something with someone else's creation is common manners, but at the same time, I am far too afraid of the response I would get should I request permission to--for the lack of a better word--"violate" his or her web-site. What if the bloke feels threatened by me, IP-blocks me or something, and bars access completely? What if he feels the need to bring it into court? These possibilities scare me far too much for me to seriously consider continuing the data-gathering project I want to work on. At the same time, I feel that the consequenes would be less dire, and could possibly be non-existent if I had just screwed up the courage to ask the owner of say, FE Wiki (the good one) or Serenes Forest, if it is alright to scrape their databases. Anyway, if I scraped some bloke's website without his permission, is there a tiny possibility that he would summon legal authorities to investigate the matter, or am I simply being far too paranoid for my own good? I was already afraid of making this topic because I had feared that the mods--they would see this topic and cite it as a valid reason to terminate my account and possibly discontinue my further usage of the site. Anyway, what would be the worst-case scenario if I chose to query some website without permission, received a 403 message, and got the hint to cease my actions at once? What would be the worst-case scenario if I queried some website without that security measure, without their knowing, and accessed their site through that means?

I must have these answers.

Note: Sorry to those who were already typing a response, for deleting and re-making this topic; I was too paranoid of it being deleted by some moderator reading it.

---
Number XII: Larxene.
The Organization's Savage Nymph.
... Copied to Clipboard!
Zeus
06/07/20 2:31:17 AM
#2:


https://i.imgflip.com/mb5hn.jpg

---
(\/)(\/)|-|
There are precious few at ease / With moral ambiguities / So we act as though they don't exist.
... Copied to Clipboard!
OniRonin
06/07/20 2:31:59 AM
#3:


nobody's going to take legal action and you don't need the owners permission
just rate-limit your script so that it doenst make a ton of requests per second and nobody will really care. if they do care the worst thing that will happen is banning your ip. if you want to get around sites disabling scraping, you can probably do it by just setting your user agent

---
god is dumb
#NotAllGamers #YesAllLandlods
... Copied to Clipboard!
OniRonin
06/07/20 2:34:19 AM
#4:


if you want to be respectful and follow a site's guidelines for scraping, you can look at their robots.txt file - for example for the fire emblem site you mentioned
https://serenesforest.net/robots.txt
this is essentially like asking the owner what parts of the site you can scrape

this one means anybody can scrape any part of the website, other than the wp-admin directory. although for some reason it does allow scraping some random php file in that directory.

---
god is dumb
#NotAllGamers #YesAllLandlods
... Copied to Clipboard!
Sarcasthma
06/07/20 2:59:16 AM
#5:


Jesus

---
What's the difference between a pickpocket and a peeping tom?
A pickpocket snatches your watch.
... Copied to Clipboard!
Kungfu Kenobi
06/07/20 4:41:05 AM
#6:


Pretty much what OniRonin said.

There's nothing illegal about what you're asking. We're talking about public sites with no reasonable expectation of those data not being accessed. Just keep your rate of access to some sane level, like don't open hundreds of connections.

---
This album is not available to the public.
Even if it were, you wouldn't wanna listen to it!
... Copied to Clipboard!
captpackrat
06/07/20 3:08:21 PM
#7:


EclairReturns posted...




---
Minutus cantorum, minutus balorum,
Minutus carborata descendum pantorum.
... Copied to Clipboard!
EclairReturns
06/07/20 3:14:10 PM
#8:


OniRonin posted...
nobody's going to take legal action and you don't need the owners permission
just rate-limit your script so that it doenst make a ton of requests per second and nobody will really care. if they do care the worst thing that will happen is banning your ip. if you want to get around sites disabling scraping, you can probably do it by just setting your user agent


Thank you for putting my worries to rest.

OniRonin posted...
https://serenesforest.net/robots.txt
this is essentially like asking the owner what parts of the site you can scrape

this one means anybody can scrape any part of the website, other than the wp-admin directory. although for some reason it does allow scraping some random php file in that directory.


I am grateful, also, for telling me about the robots.txt page, and explaining what this one in particular meant, in regards to scraping the site.

Kungfu Kenobi posted...
Just keep your rate of access to some sane level, like don't open hundreds of connections.


Thank you for the tip.
---
Number XII: Larxene.
The Organization's Savage Nymph.
... Copied to Clipboard!
Topic List
Page List: 1