LogFAQs > #922731406

LurkerFAQs, Active DB, DB1, DB2, DB3, DB4, Database 5 ( 01.01.2019-12.31.2019 ), DB6, DB7, DB8, DB9, DB10, DB11, DB12, Clear
Topic List
Page List: 1
TopicI need some python help. Is anyone strong in web scraping with Python?
foolm0r0n
06/03/19 12:09:32 PM
#14:


WiggumFan267 posted...
that is helpful thanks. I wound up mostly finishing up the code last night in BeautifulSoup, my main issue coming with when there wasn't a good way to distinguish some bits of the HTML from another, like when the same tags are used. there usually happened to be a workaround but there wasn't always.

There's always some way to distinguish elements. At the least, there's is the element's position in the tree (first child, second child, etc). If that position isn't useful for whatever reason, then you can look at the contents. Maybe you want the element that has 1 text node and 2 span node children.

If the elements are all structured exactly the same and that's not possible, then you can go even further by checking the content. So if you have 2 text elements that are in randomized order, one for name and one for phone number, then you check the element for text that looks like a phone number vs looks like a name.

If that's not enough to distinguish the elements, then they are logically equivalent so it doesn't make sense to distinguish them anyways.
---
_foolmo_
2 + 2 = 4
... Copied to Clipboard!
Topic List
Page List: 1