What difference does it make matching a word with/without a trailing whitespace?with sed, how can I replace...

Copycat chess is back

How to make payment on the internet without leaving a money trail?

How do you conduct xenoanthropology after first contact?

If Manufacturer spice model and Datasheet give different values which should I use?

Why did the Germans forbid the possession of pet pigeons in Rostov-on-Don in 1941?

What typically incentivizes a professor to change jobs to a lower ranking university?

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

What is the white spray-pattern residue inside these Falcon Heavy nozzles?

Why was the small council so happy for Tyrion to become the Master of Coin?

The magic money tree problem

N.B. ligature in Latex

Can a German sentence have two subjects?

Japan - Plan around max visa duration

Is it possible to make sharp wind that can cut stuff from afar?

What are these boxed doors outside store fronts in New York?

How old can references or sources in a thesis be?

What defenses are there against being summoned by the Gate spell?

How can the DM most effectively choose 1 out of an odd number of players to be targeted by an attack or effect?

What is the meaning of "of trouble" in the following sentence?

What is the offset in a seaplane's hull?

A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?

Draw simple lines in Inkscape

Schwarzchild Radius of the Universe

Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?

What difference does it make matching a word with/without a trailing whitespace?

with sed, how can I replace word within a matching line?Sed command that would ignore any commented matchHow to search for the word stored in the hold space with sed?How to delete everything (in every line) in a text file after a pattern of characters(including the pattern)?insert new lines into a csv file obtained via curl on an apiHow to extract delimited blocks of text from a file and have munpack decode them?sed - calling a variable from a file with multilineWhy might sed not make any change to a file?Delete text block with matching search wordsed replace matching line which does not start with #

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:

For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.

First of all I tried,

sed 's/the/this/'

but in that sample test case failed. Then I tried

sed 's/the /this /'

and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

asked Mar 31 at 20:33

JHA

625

I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
Apr 1 at 8:40

Well, in thiseory, yes, in practice, no.

– Rolf
Apr 2 at 10:40

add a comment |

I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:

For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.

First of all I tried,

sed 's/the/this/'

but in that sample test case failed. Then I tried

sed 's/the /this /'

and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

asked Mar 31 at 20:33

JHA

625

I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
Apr 1 at 8:40

Well, in thiseory, yes, in practice, no.

– Rolf
Apr 2 at 10:40

add a comment |

I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:

For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.

First of all I tried,

sed 's/the/this/'

but in that sample test case failed. Then I tried

sed 's/the /this /'

and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

asked Mar 31 at 20:33

JHA

625

I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:

For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.

First of all I tried,

sed 's/the/this/'

but in that sample test case failed. Then I tried

sed 's/the /this /'

and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?

sed whitespace

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

asked Mar 31 at 20:33

JHA

625

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

asked Mar 31 at 20:33

JHA

625

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

edited Apr 1 at 11:25

Kusalananda♦

140k17261435

asked Mar 31 at 20:33

JHA

625

asked Mar 31 at 20:33

JHA

625

asked Mar 31 at 20:33

JHA

625

I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
Apr 1 at 8:40

Well, in thiseory, yes, in practice, no.

– Rolf
Apr 2 at 10:40

add a comment |

I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
Apr 1 at 8:40

Well, in thiseory, yes, in practice, no.

– Rolf
Apr 2 at 10:40

I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
Apr 1 at 8:40

Well, in thiseory, yes, in practice, no.

– Rolf
Apr 2 at 10:40

add a comment |

3 Answers
3

active

oldest

votes

The difference is whether there is a space after the in the input text.

For instance:

With a sentence without a space, no replacement:

$ echo 'theman' | sed 's/the /this /'

theman

With a sentence with a space, works as expected:

$ echo 'the man' | sed 's/the /this /'

this man

With a sentence with another whitespace character,
no replacement will occur:

$ echo -e 'thetman' | sed 's/the /this /'

the     man

edited Mar 31 at 21:31

G-Man

13.7k93770

answered Mar 31 at 20:44

BDR

1035

I missed that. I had to take "the" as a string. Not a substring.

– JHA
Mar 31 at 20:53

1

@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

– Peter Cordes
Apr 1 at 16:27

add a comment |

It's a cheap and error-prone way of doing word matching.

Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.

To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.

Instead, use a zero-width word boundary pattern:

sed 's/<the>/this/'

The < and > matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).

With GNU sed, you could also use b in place of < and >:

sed 's/btheb/this/'

edited Apr 1 at 15:49

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

add a comment |

sed works with regular expressions.
Using sed 's/the /this /' you just make the space after the part of the matched pattern.

Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.

In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).

You can see the difference if you try for example to capitalize the in the word the theater:

echo 'the theater' |sed 's/the /THE /g'

THE theater                              

#theater is ignored since the is not followed by space



echo 'the theater' |sed 's/the/THE/g'

THE THEater

#both the are capitalized.

edited Mar 31 at 20:57

JHA

625

answered Mar 31 at 20:54

George Vasiliou

5,82531130

Thank you for the answer. Appreciated :)

– JHA
Mar 31 at 21:02

"you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

– Dubu
Apr 1 at 8:41

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509765%2fwhat-difference-does-it-make-matching-a-word-with-without-a-trailing-whitespace%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

The difference is whether there is a space after the in the input text.

For instance:

With a sentence without a space, no replacement:

$ echo 'theman' | sed 's/the /this /'

theman

With a sentence with a space, works as expected:

$ echo 'the man' | sed 's/the /this /'

this man

With a sentence with another whitespace character,
no replacement will occur:

$ echo -e 'thetman' | sed 's/the /this /'

the     man

edited Mar 31 at 21:31

G-Man

13.7k93770

answered Mar 31 at 20:44

BDR

1035

I missed that. I had to take "the" as a string. Not a substring.

– JHA
Mar 31 at 20:53

1

@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

– Peter Cordes
Apr 1 at 16:27

add a comment |

The difference is whether there is a space after the in the input text.

For instance:

With a sentence without a space, no replacement:

$ echo 'theman' | sed 's/the /this /'

theman

With a sentence with a space, works as expected:

$ echo 'the man' | sed 's/the /this /'

this man

With a sentence with another whitespace character,
no replacement will occur:

$ echo -e 'thetman' | sed 's/the /this /'

the     man

edited Mar 31 at 21:31

G-Man

13.7k93770

answered Mar 31 at 20:44

BDR

1035

I missed that. I had to take "the" as a string. Not a substring.

– JHA
Mar 31 at 20:53

1

@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

– Peter Cordes
Apr 1 at 16:27

add a comment |

The difference is whether there is a space after the in the input text.

For instance:

With a sentence without a space, no replacement:

$ echo 'theman' | sed 's/the /this /'

theman

With a sentence with a space, works as expected:

$ echo 'the man' | sed 's/the /this /'

this man

With a sentence with another whitespace character,
no replacement will occur:

$ echo -e 'thetman' | sed 's/the /this /'

the     man

edited Mar 31 at 21:31

G-Man

13.7k93770

answered Mar 31 at 20:44

BDR

1035

The difference is whether there is a space after the in the input text.

For instance:

With a sentence without a space, no replacement:

$ echo 'theman' | sed 's/the /this /'

theman

With a sentence with a space, works as expected:

$ echo 'the man' | sed 's/the /this /'

this man

With a sentence with another whitespace character,
no replacement will occur:

$ echo -e 'thetman' | sed 's/the /this /'

the     man

edited Mar 31 at 21:31

G-Man

13.7k93770

answered Mar 31 at 20:44

BDR

1035

edited Mar 31 at 21:31

G-Man

13.7k93770

edited Mar 31 at 21:31

G-Man

13.7k93770

edited Mar 31 at 21:31

G-Man

13.7k93770

answered Mar 31 at 20:44

BDR

1035

answered Mar 31 at 20:44

BDR

1035

answered Mar 31 at 20:44

BDR

1035

I missed that. I had to take "the" as a string. Not a substring.

– JHA
Mar 31 at 20:53

1

@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

– Peter Cordes
Apr 1 at 16:27

add a comment |

I missed that. I had to take "the" as a string. Not a substring.

– JHA
Mar 31 at 20:53

1

@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

– Peter Cordes
Apr 1 at 16:27

I missed that. I had to take "the" as a string. Not a substring.

– JHA
Mar 31 at 20:53

@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

– Peter Cordes
Apr 1 at 16:27

add a comment |

It's a cheap and error-prone way of doing word matching.

Instead, use a zero-width word boundary pattern:

sed 's/<the>/this/'

With GNU sed, you could also use b in place of < and >:

sed 's/btheb/this/'

edited Apr 1 at 15:49

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

add a comment |

It's a cheap and error-prone way of doing word matching.

Instead, use a zero-width word boundary pattern:

sed 's/<the>/this/'

With GNU sed, you could also use b in place of < and >:

sed 's/btheb/this/'

edited Apr 1 at 15:49

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

add a comment |

It's a cheap and error-prone way of doing word matching.

Instead, use a zero-width word boundary pattern:

sed 's/<the>/this/'

With GNU sed, you could also use b in place of < and >:

sed 's/btheb/this/'

edited Apr 1 at 15:49

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

It's a cheap and error-prone way of doing word matching.

Instead, use a zero-width word boundary pattern:

sed 's/<the>/this/'

With GNU sed, you could also use b in place of < and >:

sed 's/btheb/this/'

edited Apr 1 at 15:49

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

edited Apr 1 at 15:49

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

answered Mar 31 at 20:53

Kusalananda♦

140k17261435

add a comment |

sed works with regular expressions.
Using sed 's/the /this /' you just make the space after the part of the matched pattern.

Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.

In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).

You can see the difference if you try for example to capitalize the in the word the theater:

echo 'the theater' |sed 's/the /THE /g'

THE theater                              

#theater is ignored since the is not followed by space



echo 'the theater' |sed 's/the/THE/g'

THE THEater

#both the are capitalized.

edited Mar 31 at 20:57

JHA

625

answered Mar 31 at 20:54

George Vasiliou

5,82531130

Thank you for the answer. Appreciated :)

– JHA
Mar 31 at 21:02

"you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

– Dubu
Apr 1 at 8:41

add a comment |

sed works with regular expressions.
Using sed 's/the /this /' you just make the space after the part of the matched pattern.

Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.

In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).

You can see the difference if you try for example to capitalize the in the word the theater:

echo 'the theater' |sed 's/the /THE /g'

THE theater                              

#theater is ignored since the is not followed by space



echo 'the theater' |sed 's/the/THE/g'

THE THEater

#both the are capitalized.

edited Mar 31 at 20:57

JHA

625

answered Mar 31 at 20:54

George Vasiliou

5,82531130

Thank you for the answer. Appreciated :)

– JHA
Mar 31 at 21:02

"you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

– Dubu
Apr 1 at 8:41

add a comment |

sed works with regular expressions.
Using sed 's/the /this /' you just make the space after the part of the matched pattern.

Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.

In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).

You can see the difference if you try for example to capitalize the in the word the theater:

echo 'the theater' |sed 's/the /THE /g'

THE theater                              

#theater is ignored since the is not followed by space



echo 'the theater' |sed 's/the/THE/g'

THE THEater

#both the are capitalized.

edited Mar 31 at 20:57

JHA

625

answered Mar 31 at 20:54

George Vasiliou

5,82531130

sed works with regular expressions.
Using sed 's/the /this /' you just make the space after the part of the matched pattern.

Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.

In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).

You can see the difference if you try for example to capitalize the in the word the theater:

echo 'the theater' |sed 's/the /THE /g'

THE theater                              

#theater is ignored since the is not followed by space



echo 'the theater' |sed 's/the/THE/g'

THE THEater

#both the are capitalized.

edited Mar 31 at 20:57

JHA

625

answered Mar 31 at 20:54

George Vasiliou

5,82531130

edited Mar 31 at 20:57

JHA

625

edited Mar 31 at 20:57

JHA

625

edited Mar 31 at 20:57

JHA

625

answered Mar 31 at 20:54

George Vasiliou

5,82531130

answered Mar 31 at 20:54

George Vasiliou

5,82531130

answered Mar 31 at 20:54

George Vasiliou

5,82531130

Thank you for the answer. Appreciated :)

– JHA
Mar 31 at 21:02

"you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

– Dubu
Apr 1 at 8:41

add a comment |

Thank you for the answer. Appreciated :)

– JHA
Mar 31 at 21:02

"you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

– Dubu
Apr 1 at 8:41

Thank you for the answer. Appreciated :)

– JHA
Mar 31 at 21:02

"you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

– Dubu
Apr 1 at 8:41

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ggthjy