LogFAQs > #922723051

LurkerFAQs, Active DB, DB1, DB2, DB3, DB4, Database 5 ( 01.01.2019-12.31.2019 ), DB6, DB7, DB8, DB9, DB10, DB11, DB12, Clear

Topic List	Page List: 1
Topic	I need some python help. Is anyone strong in web scraping with Python?
CelesMyUserName 06/03/19 6:34:36 AM #11:	Eh Basically, my gfaqr just initializes that basic requests stuff for me and I just ask it to open whatever url and it returns the response just as the most simplistic requests.get(url) did. in my Gfaqr() class's init function then (well, showing the general, relevant code) class Gfaqr(): ..... def __init__(self): .......... self.s= requests.session() .......... headers = {"User-Agent":'Mozilla/5.0 (compatible; Wigsbot/5.0; +https://penisland.net/bot.html)' } .......... self.s.headers.update(headers) .......... self._rp= robotparser.RobotFileParser() ..... def permission(self,host,stem): .......... url= host+stem .......... self._rp.set_url(host + "/robots.txt") .......... self._rp.read() .......... permission = self._rp.can_fetch("", url) .......... permission &= self._rp.can_fetch("Wigsbot", url) .......... return permission ..... def sget(self,host,stem): .......... if not self.permission(host,stem): ............... return 1 .......... r= self.s.get(host+stem) .......... return r And then all you need is to just import you class and use its own get method instead of requests.get or even s.get. Like I use Gfaqr in that earlier screencap, assigning it to something like fq= Gfaqr() and using it with fq.sget(url) ... Of course, mine has other stuff as well plus defaults the host to gamefaqs since that's all it's for and it has some specific functions for browsing gamefaqs pages built in but yeah --- https://imgtc.com/i/1LkkaGU.jpg somethin somethin hung somethin horse somethin <cite>CelesMyUserName posted [p:922723051] in I need som... [t:77758131]...</cite> <quote>Eh Basically, my gfaqr just initializes that basic requests stuff for me and I just ask it to open whatever url and it returns the response just as the most simplistic requests.get(url) did. in my Gfaqr() class's init function then (well, showing the general, relevant code) class Gfaqr(): ..... def __init__(self): .......... self.s= requests.session() .......... headers = {"User-Agent":'Mozilla/5.0 (compatible; Wigsbot/5.0; +https://penisland.net/bot.html)' } .......... self.s.headers.update(headers) .......... self._rp= robotparser.RobotFileParser() ..... def permission(self,host,stem): .......... url= host+stem .......... self._rp.set_url(host + "/robots.txt") .......... self._rp.read() .......... permission = self._rp.can_fetch("", url) .......... permission &= self._rp.can_fetch("Wigsbot", url) .......... return permission ..... def sget(self,host,stem): .......... if not self.permission(host,stem): ............... return 1 .......... r= self.s.get(host+stem) .......... return r And then all you need is to just import you class and use its own get method instead of requests.get or even s.get. Like I use Gfaqr in that earlier screencap, assigning it to something like fq= Gfaqr() and using it with fq.sget(url) ... Of course, mine has other stuff as well plus defaults the host to gamefaqs since that's all it's for and it has some specific functions for browsing gamefaqs pages built in but yeah --- https://imgtc.com/i/1LkkaGU.jpg somethin somethin hung somethin horse somethin</quote> ... Copied to Clipboard!
Topic List	Page List: 1