Quote:
Originally Posted by moldy
To counteract this I tried wrapping John in \b anchors in the function
|
It should have worked (in a regex, but not with the python str.replace())
Quote:
I would like to go back to the dict method again (as described in lomkiri’s suggestion above).
|
Try this :
Code:
# insert here the code to load the json file into the dict "equiv"
# (see my post #12 for this code)
import regex
m = match.group()
for key in equiv:
m = regex.sub(rf'\b{key}\b', equiv[key], m)
return m
It works, I have tested it :
Johnson, Johnjo LongJohn and so on John and Ringo, and also john ==>
Johnson, Johnjo LongJohn and so on Mick and Charlie, and also john
Note: rf'\b{key}\b' is the same as r'\b{}\b'.format(key) and will be expanded to '\bJohn\b' if key == 'John'
It works with either
<body[^>]*>\K(.+)</body> (with "dot all" checked) or
>\K([^>]+)(?![^<>{}]*[>}]) (but the 1st form will be quicker, treating one whole html file at each iteration, with the condition, as I said above, that none of your keys will match something inside an html tag). The 2nd form will select the text between tags and avoid the part
inside the tag.