br tag #2
Labels
No Label
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: thereadme/web-scraper#2
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Dear Mdibaiee!
I have some issues that I can't fix.
Now my tags are looking like this:
tags = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'ul', 'li', 'span', 'a', 'img', 'br']
I have added br tag to it.
When the scraper runs this way, it find's all the br tag which is not inside for example in a p tag.
But when br tag is inside a p tag it won't find the text.
I the case of what is shown on the pic I can't get any of the text inside br.
Are there any chance that You have an easy workaround for this?
Thank You!
Csemid
Dear Mdibaiee!
Do you think that you will have time for this issue nowdays?
Thank You!
Csemid
Hi @Csemid.
You don't need to add the
br
tag to the list asbr
tags themselves don't contain text.The change necessary on the code is around this line:
https://github.com/mdibaiee/web-scraper/blob/master/index.py#L62
el.string
does not contain all of the text inside thep
, only the first piece.To get all of the text, we need something like this:
You might have to import these:
Please try it and let me know if it works. If it did work for you, please open a pull-request