Elegant way to replace substring in a regex with optional groups in Python?Capturing optional regex segment...

Why don't electron-positron collisions release infinite energy?

What does it mean to describe someone as a butt steak?

To string or not to string

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

Did Shadowfax go to Valinor?

What would happen to a modern skyscraper if it rains micro blackholes?

Today is the Center

The use of multiple foreign keys on same column in SQL Server

How does strength of boric acid solution increase in presence of salicylic acid?

How do we improve the relationship with a client software team that performs poorly and is becoming less collaborative?

Method of fabrication patents, Is it okay to import from abroad?

What do the dots in this tr command do: tr .............A-Z A-ZA-Z <<< "JVPQBOV" (with 13 dots)

Do I have a twin with permutated remainders?

Why do I get two different answers for this counting problem?

Is it possible to do 50 km distance without any previous training?

Equivalence principle before Einstein

How to put math symbol rotated with 90 degree in table cell?

Test whether all array elements are factors of a number

Can I make popcorn with any corn?

Pattern match does not work in bash script

How to add double frame in tcolorbox?

Why can't I see bouncing of a switch on an oscilloscope?

Either or Neither in sentence with another negative

Mage Armor with Defense fighting style (for Adventurers League bladeslinger)

Elegant way to replace substring in a regex with optional groups in Python?

Capturing optional regex segment with PHPFind and replace String with a substring resultOptimal string literal tokenizing algorithmEval is evil: Dynamic method calls from named regex groups in Python 3Improving CSV filtering with Python using regexJavaScript Regex Test and ReplaceReplace fixed width values over 530px with 100% using RegExpython recursive regex optimizationRecursively replace string placeholders with parameterized phrasesFaster way of replacing strings in large pandas dataframe with regex

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

Given a string taken from the following set:

strings = [

    "The sky is blue and I like it",

    "The tree is green and I love it",

    "A lemon is yellow"

]

I would like to constuct a function which replaces subject, color and optional verb from this string with others values.

All strings match a certain regex pattern as follow:

regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"

The expected output of such function would look like this:

repl("The sea is blue", "moon", "white", "hate")

# => "The moon is white"

Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):

def repl(sentence, subject, color, verb):

    m = re.match(regex, sentence)

    s = sentence

    new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color

    if m.group("verb") is None:

        new_string += s[m.end("color"):]

    else:

        new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]

    return new_string

Do you think there is a more straightforward way to implement this?

edited Mar 29 at 14:10

Reinderien

5,260926

asked Mar 29 at 13:18

Delgan

242111

$begingroup$
Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
$endgroup$
– AJNeufeld
Mar 29 at 14:06

$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
Mar 29 at 14:18

$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
Mar 29 at 14:35

$begingroup$
@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
Mar 29 at 14:40

add a comment |

Given a string taken from the following set:

strings = [

    "The sky is blue and I like it",

    "The tree is green and I love it",

    "A lemon is yellow"

]

I would like to constuct a function which replaces subject, color and optional verb from this string with others values.

All strings match a certain regex pattern as follow:

regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"

The expected output of such function would look like this:

repl("The sea is blue", "moon", "white", "hate")

# => "The moon is white"

Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):

def repl(sentence, subject, color, verb):

    m = re.match(regex, sentence)

    s = sentence

    new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color

    if m.group("verb") is None:

        new_string += s[m.end("color"):]

    else:

        new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]

    return new_string

Do you think there is a more straightforward way to implement this?

edited Mar 29 at 14:10

Reinderien

5,260926

asked Mar 29 at 13:18

Delgan

242111

$begingroup$
Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
$endgroup$
– AJNeufeld
Mar 29 at 14:06

$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
Mar 29 at 14:18

$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
Mar 29 at 14:35

$begingroup$
@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
Mar 29 at 14:40

add a comment |

Given a string taken from the following set:

strings = [

    "The sky is blue and I like it",

    "The tree is green and I love it",

    "A lemon is yellow"

]

I would like to constuct a function which replaces subject, color and optional verb from this string with others values.

All strings match a certain regex pattern as follow:

regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"

The expected output of such function would look like this:

repl("The sea is blue", "moon", "white", "hate")

# => "The moon is white"

Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):

def repl(sentence, subject, color, verb):

    m = re.match(regex, sentence)

    s = sentence

    new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color

    if m.group("verb") is None:

        new_string += s[m.end("color"):]

    else:

        new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]

    return new_string

Do you think there is a more straightforward way to implement this?

edited Mar 29 at 14:10

Reinderien

5,260926

asked Mar 29 at 13:18

Delgan

242111

Given a string taken from the following set:

strings = [

    "The sky is blue and I like it",

    "The tree is green and I love it",

    "A lemon is yellow"

]

I would like to constuct a function which replaces subject, color and optional verb from this string with others values.

All strings match a certain regex pattern as follow:

regex = r"(?:The|A) (?P<subject>w+) is (?P<color>w+)(?: and I (?P<verb>w+) it)?"

The expected output of such function would look like this:

repl("The sea is blue", "moon", "white", "hate")

# => "The moon is white"

Here is the solution I come with (I can't use .replace() because there is edge cases if the string contains the subject twice for example):

def repl(sentence, subject, color, verb):

    m = re.match(regex, sentence)

    s = sentence

    new_string = s[:m.start("subject")] + subject + s[m.end("subject"):m.start("color")] + color

    if m.group("verb") is None:

        new_string += s[m.end("color"):]

    else:

        new_string += s[m.end("color"):m.start("verb")] + verb + s[m.end("verb"):]

    return new_string

Do you think there is a more straightforward way to implement this?

python python-3.x strings regex

edited Mar 29 at 14:10

Reinderien

5,260926

asked Mar 29 at 13:18

Delgan

242111

edited Mar 29 at 14:10

Reinderien

5,260926

asked Mar 29 at 13:18

Delgan

242111

edited Mar 29 at 14:10

Reinderien

5,260926

edited Mar 29 at 14:10

Reinderien

5,260926

edited Mar 29 at 14:10

Reinderien

5,260926

asked Mar 29 at 13:18

Delgan

242111

asked Mar 29 at 13:18

Delgan

242111

asked Mar 29 at 13:18

Delgan

242111

$begingroup$
Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
$endgroup$
– AJNeufeld
Mar 29 at 14:06

$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
Mar 29 at 14:18

$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
Mar 29 at 14:35

$begingroup$
@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
Mar 29 at 14:40

add a comment |

$begingroup$
Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.
$endgroup$
– AJNeufeld
Mar 29 at 14:06

$begingroup$
What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.
$endgroup$
– Reinderien
Mar 29 at 14:18

$begingroup$
@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.
$endgroup$
– Delgan
Mar 29 at 14:35

$begingroup$
@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")
$endgroup$
– Delgan
Mar 29 at 14:40

Do you have to use a regex? If not, split(" ") the string into words, replace words 1, 3, and possibly 6, then " ".join(...) it back into a sentence.

– AJNeufeld
Mar 29 at 14:06

What do you mean by 'string contains subject twice'? That doesn't seem like it would match your regex.

– Reinderien
Mar 29 at 14:18

@AJNeufeld This is not possible, actually the sentences are even more dynamic than the examples here and may contain an indefinite number of spaces.

– Delgan
Mar 29 at 14:35

@Reinderien For example, repl("The meloon is orange", "orange", "great", "like") or simply repl("A letter is A", "letter", "B", "fail")

– Delgan
Mar 29 at 14:40

add a comment |

3 Answers
3

active

oldest

votes

import re



regex = re.compile(

    r'(The|A) '

    r'w+'

    r'( is )'

    r'w+'

    r'(?:'

        r'( and I )'

        r'w+'

        r'( it)'

    r')?'

)





def repl(sentence, subject, colour, verb=None):

    m = regex.match(sentence)

    new = m.expand(rf'1 {subject}2{colour}')

    if m[3]:

        new += m.expand(rf'3{verb}4')

    return new





def test():

    assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') == 

        'The bathroom is smelly and I distrust it'

    assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') == 

        'The pinata is angry and I fear it'

    assert repl('A lemon is yellow', 'population', 'dumbfounded') == 

        'A population is dumbfounded'

Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.

answered Mar 29 at 14:34

Reinderien

5,260926

2

$begingroup$
I did not know expand(), this seems very useful. Thanks!
$endgroup$
– Delgan
Mar 29 at 15:12

add a comment |

You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:

You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:

import nltk

from collections import defaultdict

from nltk.tag import pos_tag, map_tag



def simple_tags(words):

    #see https://stackoverflow.com/a/5793083/6419007

    return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]



def repl(sentence, *new_words):

    new_words_by_tag = defaultdict(list)



    for new_word, tag in simple_tags(new_words):

        new_words_by_tag[tag].append(new_word)



    new_sentence = []



    for word, tag in simple_tags(nltk.word_tokenize(sentence)):

        possible_replacements = new_words_by_tag.get(tag)

        if possible_replacements:

            new_sentence.append(possible_replacements.pop(0))

        else:

            new_sentence.append(word)



    return ' '.join(new_sentence)



repl("The sea is blue", "moon", "white", "hate")

# 'The moon is white'

repl("The sea is blue", "yellow", "elephant")

# 'The elephant is yellow'

This version is brittle though, because some verbs appear to be nouns or vice-versa.

I guess someone with more NLTK experience could find a more robust way to replace the words.

answered Mar 29 at 20:46

Eric Duminil

2,1111613

add a comment |

Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.

Your difficulty come in manually building up the original string parts from the spans of the original string. If you maintained a list of the starting points (which is the start of the string and the end of every group), and a list of the ending points (which is the start of every group, and the end of the string), you could use these to retrieve the parts of the original string you want to keep:

start = [0] + [m.end(i+1) for i in range(m.lastindex)]

end = [m.start(i+1) for i in range(m.lastindex)] + [None]

We can glue these parts together with a placeholder which we will substitute the desired value in:

fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

Using "{}" as the joiner will create a string like The {} is {} and I {} it, which makes a perfect .format() string to substitute in the desired replacements:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    start = [0] + [m.end(i+1) for i in range(m.lastindex)]

    end = [m.start(i+1) for i in range(m.lastindex)] + [None]

    fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

    return fmt.format(subject, color, verb)

If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]

    return "{}".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)

answered Mar 29 at 22:07

AJNeufeld

6,6241621

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216474%2felegant-way-to-replace-substring-in-a-regex-with-optional-groups-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

import re



regex = re.compile(

    r'(The|A) '

    r'w+'

    r'( is )'

    r'w+'

    r'(?:'

        r'( and I )'

        r'w+'

        r'( it)'

    r')?'

)





def repl(sentence, subject, colour, verb=None):

    m = regex.match(sentence)

    new = m.expand(rf'1 {subject}2{colour}')

    if m[3]:

        new += m.expand(rf'3{verb}4')

    return new





def test():

    assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') == 

        'The bathroom is smelly and I distrust it'

    assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') == 

        'The pinata is angry and I fear it'

    assert repl('A lemon is yellow', 'population', 'dumbfounded') == 

        'A population is dumbfounded'

Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.

answered Mar 29 at 14:34

Reinderien

5,260926

2

$begingroup$
I did not know expand(), this seems very useful. Thanks!
$endgroup$
– Delgan
Mar 29 at 15:12

add a comment |

import re



regex = re.compile(

    r'(The|A) '

    r'w+'

    r'( is )'

    r'w+'

    r'(?:'

        r'( and I )'

        r'w+'

        r'( it)'

    r')?'

)





def repl(sentence, subject, colour, verb=None):

    m = regex.match(sentence)

    new = m.expand(rf'1 {subject}2{colour}')

    if m[3]:

        new += m.expand(rf'3{verb}4')

    return new





def test():

    assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') == 

        'The bathroom is smelly and I distrust it'

    assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') == 

        'The pinata is angry and I fear it'

    assert repl('A lemon is yellow', 'population', 'dumbfounded') == 

        'A population is dumbfounded'

Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.

answered Mar 29 at 14:34

Reinderien

5,260926

2

$begingroup$
I did not know expand(), this seems very useful. Thanks!
$endgroup$
– Delgan
Mar 29 at 15:12

add a comment |

import re



regex = re.compile(

    r'(The|A) '

    r'w+'

    r'( is )'

    r'w+'

    r'(?:'

        r'( and I )'

        r'w+'

        r'( it)'

    r')?'

)





def repl(sentence, subject, colour, verb=None):

    m = regex.match(sentence)

    new = m.expand(rf'1 {subject}2{colour}')

    if m[3]:

        new += m.expand(rf'3{verb}4')

    return new





def test():

    assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') == 

        'The bathroom is smelly and I distrust it'

    assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') == 

        'The pinata is angry and I fear it'

    assert repl('A lemon is yellow', 'population', 'dumbfounded') == 

        'A population is dumbfounded'

Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.

answered Mar 29 at 14:34

Reinderien

5,260926

import re



regex = re.compile(

    r'(The|A) '

    r'w+'

    r'( is )'

    r'w+'

    r'(?:'

        r'( and I )'

        r'w+'

        r'( it)'

    r')?'

)





def repl(sentence, subject, colour, verb=None):

    m = regex.match(sentence)

    new = m.expand(rf'1 {subject}2{colour}')

    if m[3]:

        new += m.expand(rf'3{verb}4')

    return new





def test():

    assert repl('The sky is blue and I like it', 'bathroom', 'smelly', 'distrust') == 

        'The bathroom is smelly and I distrust it'

    assert repl('The tree is green and I love it', 'pinata', 'angry', 'fear') == 

        'The pinata is angry and I fear it'

    assert repl('A lemon is yellow', 'population', 'dumbfounded') == 

        'A population is dumbfounded'

Essentially, invert the sections of the regex around which you put groups; they're the things you want to save.

answered Mar 29 at 14:34

Reinderien

5,260926

answered Mar 29 at 14:34

Reinderien

5,260926

answered Mar 29 at 14:34

Reinderien

5,260926

answered Mar 29 at 14:34

Reinderien

5,260926

2

$begingroup$
I did not know expand(), this seems very useful. Thanks!
$endgroup$
– Delgan
Mar 29 at 15:12

add a comment |

2

$begingroup$
I did not know expand(), this seems very useful. Thanks!
$endgroup$
– Delgan
Mar 29 at 15:12

I did not know expand(), this seems very useful. Thanks!

– Delgan
Mar 29 at 15:12

add a comment |

You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:

You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:

import nltk

from collections import defaultdict

from nltk.tag import pos_tag, map_tag



def simple_tags(words):

    #see https://stackoverflow.com/a/5793083/6419007

    return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]



def repl(sentence, *new_words):

    new_words_by_tag = defaultdict(list)



    for new_word, tag in simple_tags(new_words):

        new_words_by_tag[tag].append(new_word)



    new_sentence = []



    for word, tag in simple_tags(nltk.word_tokenize(sentence)):

        possible_replacements = new_words_by_tag.get(tag)

        if possible_replacements:

            new_sentence.append(possible_replacements.pop(0))

        else:

            new_sentence.append(word)



    return ' '.join(new_sentence)



repl("The sea is blue", "moon", "white", "hate")

# 'The moon is white'

repl("The sea is blue", "yellow", "elephant")

# 'The elephant is yellow'

This version is brittle though, because some verbs appear to be nouns or vice-versa.

I guess someone with more NLTK experience could find a more robust way to replace the words.

answered Mar 29 at 20:46

Eric Duminil

2,1111613

add a comment |

You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:

You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:

import nltk

from collections import defaultdict

from nltk.tag import pos_tag, map_tag



def simple_tags(words):

    #see https://stackoverflow.com/a/5793083/6419007

    return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]



def repl(sentence, *new_words):

    new_words_by_tag = defaultdict(list)



    for new_word, tag in simple_tags(new_words):

        new_words_by_tag[tag].append(new_word)



    new_sentence = []



    for word, tag in simple_tags(nltk.word_tokenize(sentence)):

        possible_replacements = new_words_by_tag.get(tag)

        if possible_replacements:

            new_sentence.append(possible_replacements.pop(0))

        else:

            new_sentence.append(word)



    return ' '.join(new_sentence)



repl("The sea is blue", "moon", "white", "hate")

# 'The moon is white'

repl("The sea is blue", "yellow", "elephant")

# 'The elephant is yellow'

This version is brittle though, because some verbs appear to be nouns or vice-versa.

I guess someone with more NLTK experience could find a more robust way to replace the words.

answered Mar 29 at 20:46

Eric Duminil

2,1111613

add a comment |

You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:

You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:

import nltk

from collections import defaultdict

from nltk.tag import pos_tag, map_tag



def simple_tags(words):

    #see https://stackoverflow.com/a/5793083/6419007

    return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]



def repl(sentence, *new_words):

    new_words_by_tag = defaultdict(list)



    for new_word, tag in simple_tags(new_words):

        new_words_by_tag[tag].append(new_word)



    new_sentence = []



    for word, tag in simple_tags(nltk.word_tokenize(sentence)):

        possible_replacements = new_words_by_tag.get(tag)

        if possible_replacements:

            new_sentence.append(possible_replacements.pop(0))

        else:

            new_sentence.append(word)



    return ' '.join(new_sentence)



repl("The sea is blue", "moon", "white", "hate")

# 'The moon is white'

repl("The sea is blue", "yellow", "elephant")

# 'The elephant is yellow'

This version is brittle though, because some verbs appear to be nouns or vice-versa.

I guess someone with more NLTK experience could find a more robust way to replace the words.

answered Mar 29 at 20:46

Eric Duminil

2,1111613

You might want to experiment with NLTK, a leading platform for building Python programs to work with human language data:

You could import it, tags the words (NOUN, ADJ, ...) and replace words in the original sentence according to their tags:

import nltk

from collections import defaultdict

from nltk.tag import pos_tag, map_tag



def simple_tags(words):

    #see https://stackoverflow.com/a/5793083/6419007

    return [(word, map_tag('en-ptb', 'universal', tag)) for (word, tag) in nltk.pos_tag(words)]



def repl(sentence, *new_words):

    new_words_by_tag = defaultdict(list)



    for new_word, tag in simple_tags(new_words):

        new_words_by_tag[tag].append(new_word)



    new_sentence = []



    for word, tag in simple_tags(nltk.word_tokenize(sentence)):

        possible_replacements = new_words_by_tag.get(tag)

        if possible_replacements:

            new_sentence.append(possible_replacements.pop(0))

        else:

            new_sentence.append(word)



    return ' '.join(new_sentence)



repl("The sea is blue", "moon", "white", "hate")

# 'The moon is white'

repl("The sea is blue", "yellow", "elephant")

# 'The elephant is yellow'

This version is brittle though, because some verbs appear to be nouns or vice-versa.

I guess someone with more NLTK experience could find a more robust way to replace the words.

answered Mar 29 at 20:46

Eric Duminil

2,1111613

answered Mar 29 at 20:46

Eric Duminil

2,1111613

answered Mar 29 at 20:46

Eric Duminil

2,1111613

answered Mar 29 at 20:46

Eric Duminil

2,1111613

add a comment |

Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.

start = [0] + [m.end(i+1) for i in range(m.lastindex)]

end = [m.start(i+1) for i in range(m.lastindex)] + [None]

We can glue these parts together with a placeholder which we will substitute the desired value in:

fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

Using "{}" as the joiner will create a string like The {} is {} and I {} it, which makes a perfect .format() string to substitute in the desired replacements:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    start = [0] + [m.end(i+1) for i in range(m.lastindex)]

    end = [m.start(i+1) for i in range(m.lastindex)] + [None]

    fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

    return fmt.format(subject, color, verb)

If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]

    return "{}".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)

answered Mar 29 at 22:07

AJNeufeld

6,6241621

add a comment |

Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.

start = [0] + [m.end(i+1) for i in range(m.lastindex)]

end = [m.start(i+1) for i in range(m.lastindex)] + [None]

We can glue these parts together with a placeholder which we will substitute the desired value in:

fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

Using "{}" as the joiner will create a string like The {} is {} and I {} it, which makes a perfect .format() string to substitute in the desired replacements:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    start = [0] + [m.end(i+1) for i in range(m.lastindex)]

    end = [m.start(i+1) for i in range(m.lastindex)] + [None]

    fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

    return fmt.format(subject, color, verb)

If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]

    return "{}".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)

answered Mar 29 at 22:07

AJNeufeld

6,6241621

add a comment |

Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.

start = [0] + [m.end(i+1) for i in range(m.lastindex)]

end = [m.start(i+1) for i in range(m.lastindex)] + [None]

We can glue these parts together with a placeholder which we will substitute the desired value in:

fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

Using "{}" as the joiner will create a string like The {} is {} and I {} it, which makes a perfect .format() string to substitute in the desired replacements:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    start = [0] + [m.end(i+1) for i in range(m.lastindex)]

    end = [m.start(i+1) for i in range(m.lastindex)] + [None]

    fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

    return fmt.format(subject, color, verb)

If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]

    return "{}".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)

answered Mar 29 at 22:07

AJNeufeld

6,6241621

Here is a solution using the original format string, instead of the inverted format string suggested by Reindeerien.

start = [0] + [m.end(i+1) for i in range(m.lastindex)]

end = [m.start(i+1) for i in range(m.lastindex)] + [None]

We can glue these parts together with a placeholder which we will substitute the desired value in:

fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

Using "{}" as the joiner will create a string like The {} is {} and I {} it, which makes a perfect .format() string to substitute in the desired replacements:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    start = [0] + [m.end(i+1) for i in range(m.lastindex)]

    end = [m.start(i+1) for i in range(m.lastindex)] + [None]

    fmt = "{}".join(sentence[s:e] for s, e in zip(start, end))

    return fmt.format(subject, color, verb)

If you dont mind being a little cryptic, we can even make this into a shorter 3-line function:

def repl(sentence, subject, color, verb=None):

    m = re.match(regex, sentence)

    idx = [0] + [pos for i in range(m.lastindex) for pos in m.span(i+1)] + [None]

    return "{}".join(sentence[s:e] for s, e in zip(*[iter(idx)]*2)).format(subject, color, verb)

answered Mar 29 at 22:07

AJNeufeld

6,6241621

answered Mar 29 at 22:07

AJNeufeld

6,6241621

answered Mar 29 at 22:07

AJNeufeld

6,6241621

answered Mar 29 at 22:07

AJNeufeld

6,6241621

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggthjy