MobileRead Forums - View Single Post

lomkiri · 03-20-2024, 04:33 PM

Quote:

Originally Posted by moldy

To counteract this I tried wrapping John in \b anchors in the function

It should have worked (in a regex, but not with the python str.replace())

Quote:

I would like to go back to the dict method again (as described in lomkiri’s suggestion above).

Try this :

Code:

    # insert here the code to load the json file into the dict "equiv"
    # (see my post #12 for this code)
    import regex
    m = match.group() 
    for key in equiv:
        m = regex.sub(rf'\b{key}\b', equiv[key], m)
    return m

It works, I have tested it :
Johnson, Johnjo LongJohn and so on John and Ringo, and also john ==>
Johnson, Johnjo LongJohn and so on Mick and Charlie, and also john

Note: rf'\b{key}\b' is the same as r'\b{}\b'.format(key) and will be expanded to '\bJohn\b' if key == 'John'

It works with either <body[^>]*>\K(.+)</body> (with "dot all" checked) or >\K([^>]+)(?![^<>{}]*[>}]) (but the 1st form will be quicker, treating one whole html file at each iteration, with the condition, as I said above, that none of your keys will match something inside an html tag). The 2nd form will select the text between tags and avoid the part inside the tag.