Remove excessive whitespace from string [on hold]Remove useless whitespace from styled stringRemove last word...

Are there any modern advantages of a fire piston?

Why are the books in the Game of Thrones citadel library shelved spine inwards?

Can an insurance company drop you after receiving a bill and refusing to pay?

Why zero tolerance on nudity in space?

Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?

Caruana vs Carlsen game 10 (WCC) why not 18...Nxb6?

How can I install sudo without using su?

Does static make a difference for a const local variable?

Citing paywalled articles accessed via illegal web sharing

Early credit roll before the end of the film

Why do I have multiple (unassociated) temporal history tables?

Is there some relative to Dutch word "kijken" in German?

Lick explanation

Every character has a name

What is the most triangles you can make from a capital "H" and 3 straight lines?

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Word or phrase for showing great skill at something without formal training in it

Why would the Pakistan airspace closure cancel flights not headed to Pakistan itself?

How would an AI self awareness kill switch work?

Program that converts a number to a letter of the alphabet

We are very unlucky in my court

Explain the objections to these measures against human trafficking

Injecting creativity into a cookbook

Strange blocking on readable secondary after reboot

Remove excessive whitespace from string [on hold]

Remove useless whitespace from styled stringRemove last word from stringEffective way to remove white spaces from stringRemove (not extract) timestamp from HTML stringRemove unwanted characters from a stringRemove duplicates from string without additional bufferRemove a query string parameter from a URLRemove duplications from a Java StringRemove duplicate chars from StringC++ program to remove duplicate characters from string

Is that standards-compliant?

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

    auto end{ std::remove_if(std::begin(result), std::end(result),

                             [](std::string::value_type const &ch) {

                                  return std::isspace(static_cast<unsigned>(ch)) &&

                                         (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                             }

              )

    };

    result.erase(end, std::end(result));

    return result;

}



int main()

{

    char const *foo{ "Hello              World!       " };

    std::cout << '"' << remove_excessive_ws(foo) << ""n";

}

I'm unsure whether I should be accessing (&ch)[1]. I think it should be legal, but I am not sure. After all, std::remove_if() could copy the character and pass it to the function, so the pointer (&ch) + 1 might not be valid.

edited 2 days ago

Toby Speight

25.2k741116

asked Feb 26 at 7:39

Swordfish

1669

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – user673679, papagaga, t3chb0t, ferada

If this question can be reworded to fit the rules in the help center, please edit the question.

$begingroup$
It's not standard (but you should have asked this question on stack overflow instead, code review is for working code). std:string is not required to end with a null character and dereferencing the end iterator is undefined behavior; if your input ends with a space, it's precisely what you'll end up doing.
$endgroup$
– papagaga
Feb 26 at 8:25

$begingroup$
I don't think std::remove_if() is allowed to make a copy of each element - if it did, it couldn't (sensibly) work with containers of move-only values.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight Yay, that's reasonable! :)
$endgroup$
– Swordfish
yesterday

1

$begingroup$
Please clarify, does or doesn't the current code work as intended?
$endgroup$
– Mast
yesterday

add a comment |

Is that standards-compliant?

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

    auto end{ std::remove_if(std::begin(result), std::end(result),

                             [](std::string::value_type const &ch) {

                                  return std::isspace(static_cast<unsigned>(ch)) &&

                                         (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                             }

              )

    };

    result.erase(end, std::end(result));

    return result;

}



int main()

{

    char const *foo{ "Hello              World!       " };

    std::cout << '"' << remove_excessive_ws(foo) << ""n";

}

edited 2 days ago

Toby Speight

25.2k741116

asked Feb 26 at 7:39

Swordfish

1669

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – user673679, papagaga, t3chb0t, ferada

If this question can be reworded to fit the rules in the help center, please edit the question.

$begingroup$
It's not standard (but you should have asked this question on stack overflow instead, code review is for working code). std:string is not required to end with a null character and dereferencing the end iterator is undefined behavior; if your input ends with a space, it's precisely what you'll end up doing.
$endgroup$
– papagaga
Feb 26 at 8:25

$begingroup$
I don't think std::remove_if() is allowed to make a copy of each element - if it did, it couldn't (sensibly) work with containers of move-only values.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight Yay, that's reasonable! :)
$endgroup$
– Swordfish
yesterday

1

$begingroup$
Please clarify, does or doesn't the current code work as intended?
$endgroup$
– Mast
yesterday

add a comment |

Is that standards-compliant?

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

    auto end{ std::remove_if(std::begin(result), std::end(result),

                             [](std::string::value_type const &ch) {

                                  return std::isspace(static_cast<unsigned>(ch)) &&

                                         (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                             }

              )

    };

    result.erase(end, std::end(result));

    return result;

}



int main()

{

    char const *foo{ "Hello              World!       " };

    std::cout << '"' << remove_excessive_ws(foo) << ""n";

}

edited 2 days ago

Toby Speight

25.2k741116

asked Feb 26 at 7:39

Swordfish

1669

Is that standards-compliant?

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

    auto end{ std::remove_if(std::begin(result), std::end(result),

                             [](std::string::value_type const &ch) {

                                  return std::isspace(static_cast<unsigned>(ch)) &&

                                         (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                             }

              )

    };

    result.erase(end, std::end(result));

    return result;

}



int main()

{

    char const *foo{ "Hello              World!       " };

    std::cout << '"' << remove_excessive_ws(foo) << ""n";

}

c++ strings

edited 2 days ago

Toby Speight

25.2k741116

asked Feb 26 at 7:39

Swordfish

1669

edited 2 days ago

Toby Speight

25.2k741116

asked Feb 26 at 7:39

Swordfish

1669

edited 2 days ago

Toby Speight

25.2k741116

edited 2 days ago

Toby Speight

25.2k741116

edited 2 days ago

Toby Speight

25.2k741116

asked Feb 26 at 7:39

Swordfish

1669

asked Feb 26 at 7:39

Swordfish

1669

asked Feb 26 at 7:39

Swordfish

1669

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – user673679, papagaga, t3chb0t, ferada

If this question can be reworded to fit the rules in the help center, please edit the question.

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Code not implemented or not working as intended: Code Review is a community where programmers peer-review your working code to address issues such as security, maintainability, performance, and scalability. We require that the code be working correctly, to the best of the author's knowledge, before proceeding with a review." – user673679, papagaga, t3chb0t, ferada

If this question can be reworded to fit the rules in the help center, please edit the question.

$begingroup$
It's not standard (but you should have asked this question on stack overflow instead, code review is for working code). std:string is not required to end with a null character and dereferencing the end iterator is undefined behavior; if your input ends with a space, it's precisely what you'll end up doing.
$endgroup$
– papagaga
Feb 26 at 8:25

$begingroup$
I don't think std::remove_if() is allowed to make a copy of each element - if it did, it couldn't (sensibly) work with containers of move-only values.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight Yay, that's reasonable! :)
$endgroup$
– Swordfish
yesterday

1

$begingroup$
Please clarify, does or doesn't the current code work as intended?
$endgroup$
– Mast
yesterday

add a comment |

$begingroup$
It's not standard (but you should have asked this question on stack overflow instead, code review is for working code). std:string is not required to end with a null character and dereferencing the end iterator is undefined behavior; if your input ends with a space, it's precisely what you'll end up doing.
$endgroup$
– papagaga
Feb 26 at 8:25

$begingroup$
I don't think std::remove_if() is allowed to make a copy of each element - if it did, it couldn't (sensibly) work with containers of move-only values.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight Yay, that's reasonable! :)
$endgroup$
– Swordfish
yesterday

1

$begingroup$
Please clarify, does or doesn't the current code work as intended?
$endgroup$
– Mast
yesterday

It's not standard (but you should have asked this question on stack overflow instead, code review is for working code). std:string is not required to end with a null character and dereferencing the end iterator is undefined behavior; if your input ends with a space, it's precisely what you'll end up doing.

– papagaga
Feb 26 at 8:25

I don't think std::remove_if() is allowed to make a copy of each element - if it did, it couldn't (sensibly) work with containers of move-only values.

– Toby Speight
2 days ago

@TobySpeight Yay, that's reasonable! :)

– Swordfish
yesterday

Please clarify, does or doesn't the current code work as intended?

– Mast
yesterday

add a comment |

3 Answers
3

active

oldest

votes

Define the problem

It's not clear from the description what's considered "excessive" whitespace. From experimenting, it seems that the idea is to collapse multiple whitespace to a single whitespace character, except at the end of the string, where whitespace is to be completely removed. (Whitespace at the beginning of string seems to treated the same as inner whitespace). It's also not clear which whitespace character should be kept - the example code keeps the last one, but is that a requirement, or just an implementation choice?

Missing include

std::begin() and std::end() are declared in <iterator>. However, there's no reason not to use the begin() and end() member functions of std::string here, as we're not operating on generic values.

When an argument is to be copied, pass by value

We don't need to copy str into result:

std::string remove_excessive_ws(std::string str)

Bug

Like all the <cctype> functions, std::isspace() requires that its argument be either EOF or representable as unsigned char. Converting a (possibly signed) char direct to unsigned int can sign-extend to an out-of-range value. We need to convert char to unsigned char before widening to unsigned int:

static_cast<unsigned char>(ch)

Bug

Prior to C++11, accessing the character after the end of the string is undefined behaviour (C++11 requires an extra null to follow the string data). Thankfully, it's easy to avoid this bug by simply remembering whether the last character seen was a space.

Here's a C++11 version:

#include <algorithm>

#include <cctype>

#include <string>

#include <utility>



std::string remove_excessive_ws(std::string str)

{

    bool seen_space = false;

    auto end{ std::remove_if(str.begin(), str.end(),

                             [&seen_space](unsigned char ch) {

                                 bool is_space = std::isspace(ch);

                                 std::swap(seen_space, is_space);

                                 return seen_space && is_space;

                             })};

    // adjust end to remove end whitespace

    if (end != str.begin() && std::isspace(static_cast<unsigned char>(end[-1]))) {

        --end;

    }

    str.erase(end, str.end());

    return str;

}

We might want to move seen_space into the lambda expression in later C++ versions that allow that.

This also is more readable, as we can perform the widening to unsigned int when calling the lambda, rather than having to write a cast.

Style-wise, I'd normally prefer to name the lambda, and keep the erase-remove call on a single line so that the idiom is obvious:

// Assuming C++17 now

std::string remove_excessive_ws(std::string s)

{

    auto is_doubled_space =

        [seen_space=false](unsigned char c) mutable {

            return std::exchange(seen_space, std::isspace(c))

                && seen_space;

        };

    s.erase(std::remove_if(s.begin(), s.end(), is_doubled_space), s.end());

    // remove trailing whitespace

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }

    // convert all whitespace into ordinary space character

    std::replace_if(s.begin(), s.end(),

                    [](unsigned char c) { return std::isspace(c); }, ' ');

    return s;

}

edited 2 days ago

answered Feb 26 at 8:46

Toby Speight

25.2k741116

$begingroup$
Personally I try really hard to avoid mutating lambdas. Unfortunately for this problem this would preclude using std::remove_if. Instead, some kind of reduction or prefix scan would need to be used, and is probably not as efficient.
$endgroup$
– Konrad Rudolph
Feb 26 at 17:07

$begingroup$
Yes, that's a good point @Konrad - mutable lambdas are harder to reason about. It might be that a plain old for (auto it = str.begin(); it != str.end(); ++it) loop (followed by erase) may actually be the clearest code. But it's good to see a few options to choose from!
$endgroup$
– Toby Speight
2 days ago

add a comment |

std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

As Toby mentioned, if you plan on copying and mutating the copy locally, you should pass the parameter str by value. It should also be noted that result will have the same capacity as str and won't shrink to fit automatically (or take advantage of SSO if allocated).

std::isspace(static_cast<unsigned>(ch))

Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t').

You should cast to unsigned char.

(std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '')

Is that standards-compliant?

Using the built-in subscript operator on a pointer is standards compliant. From the C++17 standard (n4659), Postfix Expressions § 8.2.1 Subscripting:

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.

When accessing memory out of bounds via the built-in subscript operator, the behavior is undefined. A well-defined approach would be to track the next index and access the next element using std::string::operator[] (element at size() returns CharT{}). std::string is not a null-terminated sequence and considers the null character (CharT{}) to be a valid character within a sequence.

using namespace std::string_literals;

std::string str = "ab"s;

std::cout << str << 'n'; // prints "ab"

For a standard library solution on removing duplicates, I would simply pass the predicate to std::unique. No pointer arithmetic is necessary. Just pass it a binary predicate that checks if both characters are whitespaces:

#include <algorithm>

#include <cctype>

#include <string>



std::string remove_excessive_ws(std::string s)

{

    static auto const space_space =

        [](unsigned char a, unsigned char b) {

            return std::isspace(a) && std::isspace(b);

        };



    s.erase(std::unique(s.begin(), s.end(), space_space), s.end());



    // trim final space

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }



    return s;

}

Your function leaves a leading whitespace if one or more exists. Is this intended? Should there be a common character (single space) to merge the different characters std::isspace catches? If the ultimate goal was to trim all outer whitespace and join non-whitespace tokens with single spaces, I would use abseil's absl::StrSplit() and absl::StrJoin(). The resulting string would either take advantage of SSO if small enough or use a more appropriate capacity.

// remove_excess_whitespace

//

// Trims leading and trailing space, whitespace, and tab characters

// such that the resulting string is single space separated.

std::string remove_excess_whitespace(absl::string_view sv) {

    return absl::StrJoin(absl::StrSplit(sv, ' ', absl::SkipWhitespace{}), " ");

}

edited 2 days ago

Toby Speight

25.2k741116

answered 2 days ago

Snowhawk

5,31211128

$begingroup$
Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t') – that's intended.
$endgroup$
– Swordfish
2 days ago

$begingroup$
As for std::string, this behavior is well-defined – yes, but what about remove_if(), is it guaranteed that it won't copy?
$endgroup$
– Swordfish
2 days ago

$begingroup$
std::remove_if copies individual characters that do not satisfy the predicate. The well-defined behavior I was talking about is accessing str[str.size()]. So, instead of checking &ch[1] == '', keep an index to the next variable and use str[index] (or better, use std::unique and don't worry about subscript access for adjacent elements).
$endgroup$
– Snowhawk
2 days ago

$begingroup$
I was talking about remove_if() probably copying the element before passing it to the predicate.
$endgroup$
– Swordfish
2 days ago

$begingroup$
Compilers are pretty good at inlining arguments and lambda predicates. I would be concerned with unnecessary heap allocations.
$endgroup$
– Snowhawk
2 days ago

|
show 1 more comment

It's about accessing (&ch)[1]. I think it should be legal, but I am not sure.

Yes, it is legal as long as you have null terminated string, i.e. the size-th character of the string is the null character. The reason is that std::string::end() returns an iterator that is one short of the element that holds the terminating null character when the string object does have a terminating null character. Hence, (&ch)[1] will not access anything beyond the terminating null character. You can verify that by printing the value of &ch as a debugging guide.

Here's a updated version of your posted code that prints additional debugging info.

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

   std::string result{ str };



   std::for_each(std::begin(result), std::end(result),

                 [](std::string::value_type const &ch)

                 {

                    std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                 });



   std::cout << "==========================n";



   auto end{ std::remove_if(std::begin(result), std::end(result),

                            [](std::string::value_type const &ch)

                            {

                               std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                               return std::isspace(static_cast<unsigned>(ch)) &&

                                      (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                            })

   };



   result.erase(end, std::end(result));

   return result;

}



int main()

{

   char const *foo{ "Hello   World!   "};

   std::string res = remove_excessive_ws(foo);

   std::cout << "n"" << res << ""n";

}

Here's its output.

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8

==========================

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8



"Hello World!"

Response to OP's comment

std::string::end() returns std::string::iterator(), which is a LegacyRandomAccessIterator. Please see the Member types of std::string.

LegacyRandomAccessIterator satisfies the requirements of LegacyBidirectionalIterator.

LegacyBidirectionalIterator satisfies the requirments of LegacyForwardIterator.

ForwardIterator satisfies the requirements of LegacyInputIterator.

When a LegacyInputIterator is dereferenced, it evaluates to a reference. Now, you have to go back to std::string to see what that means. As you can expect, it is a reference and not a copy.

You can also use a non-const reference to be doubly sure that you get a reference and not a copy.

auto end{ std::remove_if(std::begin(result), std::end(result),

                         [](std::string::value_type &ch)

                         {

                            std::cout << "pointer value: " << reinterpret_cast<void*>(&ch) << std::endl;

                            return std::isspace(static_cast<unsigned>(ch)) &&

                                   (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                         })

edited yesterday

answered 2 days ago

R Sahu

3,274617

1

$begingroup$
You can verify that by [...] – Standard reference? Also, Where is written that find_if() won't hand me a copy of the char instead a reference to it?
$endgroup$
– Swordfish
2 days ago

1

$begingroup$
Whether it's legal to dereference str.end() depends on the standards version; only since C++11 has str.data() been required to have a null-terminator.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight, agree in principle. However, if the string object is a null terminated string, my answer is valid. I update the answer to make that clearer.
$endgroup$
– R Sahu
yesterday

add a comment |

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Define the problem

Missing include

When an argument is to be copied, pass by value

We don't need to copy str into result:

std::string remove_excessive_ws(std::string str)

Bug

static_cast<unsigned char>(ch)

Bug

Here's a C++11 version:

#include <algorithm>

#include <cctype>

#include <string>

#include <utility>



std::string remove_excessive_ws(std::string str)

{

    bool seen_space = false;

    auto end{ std::remove_if(str.begin(), str.end(),

                             [&seen_space](unsigned char ch) {

                                 bool is_space = std::isspace(ch);

                                 std::swap(seen_space, is_space);

                                 return seen_space && is_space;

                             })};

    // adjust end to remove end whitespace

    if (end != str.begin() && std::isspace(static_cast<unsigned char>(end[-1]))) {

        --end;

    }

    str.erase(end, str.end());

    return str;

}

We might want to move seen_space into the lambda expression in later C++ versions that allow that.

This also is more readable, as we can perform the widening to unsigned int when calling the lambda, rather than having to write a cast.

Style-wise, I'd normally prefer to name the lambda, and keep the erase-remove call on a single line so that the idiom is obvious:

// Assuming C++17 now

std::string remove_excessive_ws(std::string s)

{

    auto is_doubled_space =

        [seen_space=false](unsigned char c) mutable {

            return std::exchange(seen_space, std::isspace(c))

                && seen_space;

        };

    s.erase(std::remove_if(s.begin(), s.end(), is_doubled_space), s.end());

    // remove trailing whitespace

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }

    // convert all whitespace into ordinary space character

    std::replace_if(s.begin(), s.end(),

                    [](unsigned char c) { return std::isspace(c); }, ' ');

    return s;

}

edited 2 days ago

answered Feb 26 at 8:46

Toby Speight

25.2k741116

$begingroup$
Personally I try really hard to avoid mutating lambdas. Unfortunately for this problem this would preclude using std::remove_if. Instead, some kind of reduction or prefix scan would need to be used, and is probably not as efficient.
$endgroup$
– Konrad Rudolph
Feb 26 at 17:07

$begingroup$
Yes, that's a good point @Konrad - mutable lambdas are harder to reason about. It might be that a plain old for (auto it = str.begin(); it != str.end(); ++it) loop (followed by erase) may actually be the clearest code. But it's good to see a few options to choose from!
$endgroup$
– Toby Speight
2 days ago

add a comment |

Define the problem

Missing include

When an argument is to be copied, pass by value

We don't need to copy str into result:

std::string remove_excessive_ws(std::string str)

Bug

static_cast<unsigned char>(ch)

Bug

Here's a C++11 version:

#include <algorithm>

#include <cctype>

#include <string>

#include <utility>



std::string remove_excessive_ws(std::string str)

{

    bool seen_space = false;

    auto end{ std::remove_if(str.begin(), str.end(),

                             [&seen_space](unsigned char ch) {

                                 bool is_space = std::isspace(ch);

                                 std::swap(seen_space, is_space);

                                 return seen_space && is_space;

                             })};

    // adjust end to remove end whitespace

    if (end != str.begin() && std::isspace(static_cast<unsigned char>(end[-1]))) {

        --end;

    }

    str.erase(end, str.end());

    return str;

}

We might want to move seen_space into the lambda expression in later C++ versions that allow that.

This also is more readable, as we can perform the widening to unsigned int when calling the lambda, rather than having to write a cast.

Style-wise, I'd normally prefer to name the lambda, and keep the erase-remove call on a single line so that the idiom is obvious:

// Assuming C++17 now

std::string remove_excessive_ws(std::string s)

{

    auto is_doubled_space =

        [seen_space=false](unsigned char c) mutable {

            return std::exchange(seen_space, std::isspace(c))

                && seen_space;

        };

    s.erase(std::remove_if(s.begin(), s.end(), is_doubled_space), s.end());

    // remove trailing whitespace

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }

    // convert all whitespace into ordinary space character

    std::replace_if(s.begin(), s.end(),

                    [](unsigned char c) { return std::isspace(c); }, ' ');

    return s;

}

edited 2 days ago

answered Feb 26 at 8:46

Toby Speight

25.2k741116

$begingroup$
Personally I try really hard to avoid mutating lambdas. Unfortunately for this problem this would preclude using std::remove_if. Instead, some kind of reduction or prefix scan would need to be used, and is probably not as efficient.
$endgroup$
– Konrad Rudolph
Feb 26 at 17:07

$begingroup$
Yes, that's a good point @Konrad - mutable lambdas are harder to reason about. It might be that a plain old for (auto it = str.begin(); it != str.end(); ++it) loop (followed by erase) may actually be the clearest code. But it's good to see a few options to choose from!
$endgroup$
– Toby Speight
2 days ago

add a comment |

Define the problem

Missing include

When an argument is to be copied, pass by value

We don't need to copy str into result:

std::string remove_excessive_ws(std::string str)

Bug

static_cast<unsigned char>(ch)

Bug

Here's a C++11 version:

#include <algorithm>

#include <cctype>

#include <string>

#include <utility>



std::string remove_excessive_ws(std::string str)

{

    bool seen_space = false;

    auto end{ std::remove_if(str.begin(), str.end(),

                             [&seen_space](unsigned char ch) {

                                 bool is_space = std::isspace(ch);

                                 std::swap(seen_space, is_space);

                                 return seen_space && is_space;

                             })};

    // adjust end to remove end whitespace

    if (end != str.begin() && std::isspace(static_cast<unsigned char>(end[-1]))) {

        --end;

    }

    str.erase(end, str.end());

    return str;

}

We might want to move seen_space into the lambda expression in later C++ versions that allow that.

This also is more readable, as we can perform the widening to unsigned int when calling the lambda, rather than having to write a cast.

Style-wise, I'd normally prefer to name the lambda, and keep the erase-remove call on a single line so that the idiom is obvious:

// Assuming C++17 now

std::string remove_excessive_ws(std::string s)

{

    auto is_doubled_space =

        [seen_space=false](unsigned char c) mutable {

            return std::exchange(seen_space, std::isspace(c))

                && seen_space;

        };

    s.erase(std::remove_if(s.begin(), s.end(), is_doubled_space), s.end());

    // remove trailing whitespace

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }

    // convert all whitespace into ordinary space character

    std::replace_if(s.begin(), s.end(),

                    [](unsigned char c) { return std::isspace(c); }, ' ');

    return s;

}

edited 2 days ago

answered Feb 26 at 8:46

Toby Speight

25.2k741116

Define the problem

Missing include

When an argument is to be copied, pass by value

We don't need to copy str into result:

std::string remove_excessive_ws(std::string str)

Bug

static_cast<unsigned char>(ch)

Bug

Here's a C++11 version:

#include <algorithm>

#include <cctype>

#include <string>

#include <utility>



std::string remove_excessive_ws(std::string str)

{

    bool seen_space = false;

    auto end{ std::remove_if(str.begin(), str.end(),

                             [&seen_space](unsigned char ch) {

                                 bool is_space = std::isspace(ch);

                                 std::swap(seen_space, is_space);

                                 return seen_space && is_space;

                             })};

    // adjust end to remove end whitespace

    if (end != str.begin() && std::isspace(static_cast<unsigned char>(end[-1]))) {

        --end;

    }

    str.erase(end, str.end());

    return str;

}

We might want to move seen_space into the lambda expression in later C++ versions that allow that.

This also is more readable, as we can perform the widening to unsigned int when calling the lambda, rather than having to write a cast.

Style-wise, I'd normally prefer to name the lambda, and keep the erase-remove call on a single line so that the idiom is obvious:

// Assuming C++17 now

std::string remove_excessive_ws(std::string s)

{

    auto is_doubled_space =

        [seen_space=false](unsigned char c) mutable {

            return std::exchange(seen_space, std::isspace(c))

                && seen_space;

        };

    s.erase(std::remove_if(s.begin(), s.end(), is_doubled_space), s.end());

    // remove trailing whitespace

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }

    // convert all whitespace into ordinary space character

    std::replace_if(s.begin(), s.end(),

                    [](unsigned char c) { return std::isspace(c); }, ' ');

    return s;

}

edited 2 days ago

answered Feb 26 at 8:46

Toby Speight

25.2k741116

edited 2 days ago

answered Feb 26 at 8:46

Toby Speight

25.2k741116

answered Feb 26 at 8:46

Toby Speight

25.2k741116

answered Feb 26 at 8:46

Toby Speight

25.2k741116

$begingroup$
Personally I try really hard to avoid mutating lambdas. Unfortunately for this problem this would preclude using std::remove_if. Instead, some kind of reduction or prefix scan would need to be used, and is probably not as efficient.
$endgroup$
– Konrad Rudolph
Feb 26 at 17:07

$begingroup$
Yes, that's a good point @Konrad - mutable lambdas are harder to reason about. It might be that a plain old for (auto it = str.begin(); it != str.end(); ++it) loop (followed by erase) may actually be the clearest code. But it's good to see a few options to choose from!
$endgroup$
– Toby Speight
2 days ago

add a comment |

$begingroup$
Personally I try really hard to avoid mutating lambdas. Unfortunately for this problem this would preclude using std::remove_if. Instead, some kind of reduction or prefix scan would need to be used, and is probably not as efficient.
$endgroup$
– Konrad Rudolph
Feb 26 at 17:07

$begingroup$
Yes, that's a good point @Konrad - mutable lambdas are harder to reason about. It might be that a plain old for (auto it = str.begin(); it != str.end(); ++it) loop (followed by erase) may actually be the clearest code. But it's good to see a few options to choose from!
$endgroup$
– Toby Speight
2 days ago

Personally I try really hard to avoid mutating lambdas. Unfortunately for this problem this would preclude using std::remove_if. Instead, some kind of reduction or prefix scan would need to be used, and is probably not as efficient.

– Konrad Rudolph
Feb 26 at 17:07

Yes, that's a good point @Konrad - mutable lambdas are harder to reason about. It might be that a plain old for (auto it = str.begin(); it != str.end(); ++it) loop (followed by erase) may actually be the clearest code. But it's good to see a few options to choose from!

– Toby Speight
2 days ago

add a comment |

std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

std::isspace(static_cast<unsigned>(ch))

Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t').

You should cast to unsigned char.

(std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '')

Is that standards-compliant?

Using the built-in subscript operator on a pointer is standards compliant. From the C++17 standard (n4659), Postfix Expressions § 8.2.1 Subscripting:

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.

using namespace std::string_literals;

std::string str = "ab"s;

std::cout << str << 'n'; // prints "ab"

#include <algorithm>

#include <cctype>

#include <string>



std::string remove_excessive_ws(std::string s)

{

    static auto const space_space =

        [](unsigned char a, unsigned char b) {

            return std::isspace(a) && std::isspace(b);

        };



    s.erase(std::unique(s.begin(), s.end(), space_space), s.end());



    // trim final space

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }



    return s;

}

// remove_excess_whitespace

//

// Trims leading and trailing space, whitespace, and tab characters

// such that the resulting string is single space separated.

std::string remove_excess_whitespace(absl::string_view sv) {

    return absl::StrJoin(absl::StrSplit(sv, ' ', absl::SkipWhitespace{}), " ");

}

edited 2 days ago

Toby Speight

25.2k741116

answered 2 days ago

Snowhawk

5,31211128

$begingroup$
Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t') – that's intended.
$endgroup$
– Swordfish
2 days ago

$begingroup$
As for std::string, this behavior is well-defined – yes, but what about remove_if(), is it guaranteed that it won't copy?
$endgroup$
– Swordfish
2 days ago

$begingroup$
std::remove_if copies individual characters that do not satisfy the predicate. The well-defined behavior I was talking about is accessing str[str.size()]. So, instead of checking &ch[1] == '', keep an index to the next variable and use str[index] (or better, use std::unique and don't worry about subscript access for adjacent elements).
$endgroup$
– Snowhawk
2 days ago

$begingroup$
I was talking about remove_if() probably copying the element before passing it to the predicate.
$endgroup$
– Swordfish
2 days ago

$begingroup$
Compilers are pretty good at inlining arguments and lambda predicates. I would be concerned with unnecessary heap allocations.
$endgroup$
– Snowhawk
2 days ago

|
show 1 more comment

std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

std::isspace(static_cast<unsigned>(ch))

Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t').

You should cast to unsigned char.

(std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '')

Is that standards-compliant?

Using the built-in subscript operator on a pointer is standards compliant. From the C++17 standard (n4659), Postfix Expressions § 8.2.1 Subscripting:

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.

using namespace std::string_literals;

std::string str = "ab"s;

std::cout << str << 'n'; // prints "ab"

#include <algorithm>

#include <cctype>

#include <string>



std::string remove_excessive_ws(std::string s)

{

    static auto const space_space =

        [](unsigned char a, unsigned char b) {

            return std::isspace(a) && std::isspace(b);

        };



    s.erase(std::unique(s.begin(), s.end(), space_space), s.end());



    // trim final space

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }



    return s;

}

// remove_excess_whitespace

//

// Trims leading and trailing space, whitespace, and tab characters

// such that the resulting string is single space separated.

std::string remove_excess_whitespace(absl::string_view sv) {

    return absl::StrJoin(absl::StrSplit(sv, ' ', absl::SkipWhitespace{}), " ");

}

edited 2 days ago

Toby Speight

25.2k741116

answered 2 days ago

Snowhawk

5,31211128

$begingroup$
Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t') – that's intended.
$endgroup$
– Swordfish
2 days ago

$begingroup$
As for std::string, this behavior is well-defined – yes, but what about remove_if(), is it guaranteed that it won't copy?
$endgroup$
– Swordfish
2 days ago

$begingroup$
std::remove_if copies individual characters that do not satisfy the predicate. The well-defined behavior I was talking about is accessing str[str.size()]. So, instead of checking &ch[1] == '', keep an index to the next variable and use str[index] (or better, use std::unique and don't worry about subscript access for adjacent elements).
$endgroup$
– Snowhawk
2 days ago

$begingroup$
I was talking about remove_if() probably copying the element before passing it to the predicate.
$endgroup$
– Swordfish
2 days ago

$begingroup$
Compilers are pretty good at inlining arguments and lambda predicates. I would be concerned with unnecessary heap allocations.
$endgroup$
– Snowhawk
2 days ago

|
show 1 more comment

std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

std::isspace(static_cast<unsigned>(ch))

Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t').

You should cast to unsigned char.

(std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '')

Is that standards-compliant?

Using the built-in subscript operator on a pointer is standards compliant. From the C++17 standard (n4659), Postfix Expressions § 8.2.1 Subscripting:

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.

using namespace std::string_literals;

std::string str = "ab"s;

std::cout << str << 'n'; // prints "ab"

#include <algorithm>

#include <cctype>

#include <string>



std::string remove_excessive_ws(std::string s)

{

    static auto const space_space =

        [](unsigned char a, unsigned char b) {

            return std::isspace(a) && std::isspace(b);

        };



    s.erase(std::unique(s.begin(), s.end(), space_space), s.end());



    // trim final space

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }



    return s;

}

// remove_excess_whitespace

//

// Trims leading and trailing space, whitespace, and tab characters

// such that the resulting string is single space separated.

std::string remove_excess_whitespace(absl::string_view sv) {

    return absl::StrJoin(absl::StrSplit(sv, ' ', absl::SkipWhitespace{}), " ");

}

edited 2 days ago

Toby Speight

25.2k741116

answered 2 days ago

Snowhawk

5,31211128

std::string remove_excessive_ws(std::string const &str)

{

    std::string result{ str };

std::isspace(static_cast<unsigned>(ch))

Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t').

You should cast to unsigned char.

(std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '')

Is that standards-compliant?

Using the built-in subscript operator on a pointer is standards compliant. From the C++17 standard (n4659), Postfix Expressions § 8.2.1 Subscripting:

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.

using namespace std::string_literals;

std::string str = "ab"s;

std::cout << str << 'n'; // prints "ab"

#include <algorithm>

#include <cctype>

#include <string>



std::string remove_excessive_ws(std::string s)

{

    static auto const space_space =

        [](unsigned char a, unsigned char b) {

            return std::isspace(a) && std::isspace(b);

        };



    s.erase(std::unique(s.begin(), s.end(), space_space), s.end());



    // trim final space

    if (!s.empty() && std::isspace(static_cast<unsigned char>(s.back()))) {

        s.pop_back();

    }



    return s;

}

// remove_excess_whitespace

//

// Trims leading and trailing space, whitespace, and tab characters

// such that the resulting string is single space separated.

std::string remove_excess_whitespace(absl::string_view sv) {

    return absl::StrJoin(absl::StrSplit(sv, ' ', absl::SkipWhitespace{}), " ");

}

edited 2 days ago

Toby Speight

25.2k741116

answered 2 days ago

Snowhawk

5,31211128

edited 2 days ago

Toby Speight

25.2k741116

edited 2 days ago

Toby Speight

25.2k741116

edited 2 days ago

Toby Speight

25.2k741116

answered 2 days ago

Snowhawk

5,31211128

answered 2 days ago

Snowhawk

5,31211128

answered 2 days ago

Snowhawk

5,31211128

$begingroup$
Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t') – that's intended.
$endgroup$
– Swordfish
2 days ago

$begingroup$
As for std::string, this behavior is well-defined – yes, but what about remove_if(), is it guaranteed that it won't copy?
$endgroup$
– Swordfish
2 days ago

$begingroup$
std::remove_if copies individual characters that do not satisfy the predicate. The well-defined behavior I was talking about is accessing str[str.size()]. So, instead of checking &ch[1] == '', keep an index to the next variable and use str[index] (or better, use std::unique and don't worry about subscript access for adjacent elements).
$endgroup$
– Snowhawk
2 days ago

$begingroup$
I was talking about remove_if() probably copying the element before passing it to the predicate.
$endgroup$
– Swordfish
2 days ago

$begingroup$
Compilers are pretty good at inlining arguments and lambda predicates. I would be concerned with unnecessary heap allocations.
$endgroup$
– Snowhawk
2 days ago

|
show 1 more comment

$begingroup$
Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t') – that's intended.
$endgroup$
– Swordfish
2 days ago

$begingroup$
As for std::string, this behavior is well-defined – yes, but what about remove_if(), is it guaranteed that it won't copy?
$endgroup$
– Swordfish
2 days ago

$begingroup$
std::remove_if copies individual characters that do not satisfy the predicate. The well-defined behavior I was talking about is accessing str[str.size()]. So, instead of checking &ch[1] == '', keep an index to the next variable and use str[index] (or better, use std::unique and don't worry about subscript access for adjacent elements).
$endgroup$
– Snowhawk
2 days ago

$begingroup$
I was talking about remove_if() probably copying the element before passing it to the predicate.
$endgroup$
– Swordfish
2 days ago

$begingroup$
Compilers are pretty good at inlining arguments and lambda predicates. I would be concerned with unnecessary heap allocations.
$endgroup$
– Snowhawk
2 days ago

Be aware that std::isspace removes spaces (' '), whitespaces ('n', 'v', 'f', 'r'), and tabs ('t') – that's intended.

– Swordfish
2 days ago

As for std::string, this behavior is well-defined – yes, but what about remove_if(), is it guaranteed that it won't copy?

– Swordfish
2 days ago

std::remove_if copies individual characters that do not satisfy the predicate. The well-defined behavior I was talking about is accessing str[str.size()]. So, instead of checking &ch[1] == '', keep an index to the next variable and use str[index] (or better, use std::unique and don't worry about subscript access for adjacent elements).

– Snowhawk
2 days ago

I was talking about remove_if() probably copying the element before passing it to the predicate.

– Swordfish
2 days ago

Compilers are pretty good at inlining arguments and lambda predicates. I would be concerned with unnecessary heap allocations.

– Snowhawk
2 days ago

|
show 1 more comment

It's about accessing (&ch)[1]. I think it should be legal, but I am not sure.

Here's a updated version of your posted code that prints additional debugging info.

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

   std::string result{ str };



   std::for_each(std::begin(result), std::end(result),

                 [](std::string::value_type const &ch)

                 {

                    std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                 });



   std::cout << "==========================n";



   auto end{ std::remove_if(std::begin(result), std::end(result),

                            [](std::string::value_type const &ch)

                            {

                               std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                               return std::isspace(static_cast<unsigned>(ch)) &&

                                      (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                            })

   };



   result.erase(end, std::end(result));

   return result;

}



int main()

{

   char const *foo{ "Hello   World!   "};

   std::string res = remove_excessive_ws(foo);

   std::cout << "n"" << res << ""n";

}

Here's its output.

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8

==========================

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8



"Hello World!"

Response to OP's comment

std::string::end() returns std::string::iterator(), which is a LegacyRandomAccessIterator. Please see the Member types of std::string.

LegacyRandomAccessIterator satisfies the requirements of LegacyBidirectionalIterator.

LegacyBidirectionalIterator satisfies the requirments of LegacyForwardIterator.

ForwardIterator satisfies the requirements of LegacyInputIterator.

When a LegacyInputIterator is dereferenced, it evaluates to a reference. Now, you have to go back to std::string to see what that means. As you can expect, it is a reference and not a copy.

You can also use a non-const reference to be doubly sure that you get a reference and not a copy.

auto end{ std::remove_if(std::begin(result), std::end(result),

                         [](std::string::value_type &ch)

                         {

                            std::cout << "pointer value: " << reinterpret_cast<void*>(&ch) << std::endl;

                            return std::isspace(static_cast<unsigned>(ch)) &&

                                   (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                         })

edited yesterday

answered 2 days ago

R Sahu

3,274617

1

$begingroup$
You can verify that by [...] – Standard reference? Also, Where is written that find_if() won't hand me a copy of the char instead a reference to it?
$endgroup$
– Swordfish
2 days ago

1

$begingroup$
Whether it's legal to dereference str.end() depends on the standards version; only since C++11 has str.data() been required to have a null-terminator.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight, agree in principle. However, if the string object is a null terminated string, my answer is valid. I update the answer to make that clearer.
$endgroup$
– R Sahu
yesterday

add a comment |

It's about accessing (&ch)[1]. I think it should be legal, but I am not sure.

Here's a updated version of your posted code that prints additional debugging info.

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

   std::string result{ str };



   std::for_each(std::begin(result), std::end(result),

                 [](std::string::value_type const &ch)

                 {

                    std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                 });



   std::cout << "==========================n";



   auto end{ std::remove_if(std::begin(result), std::end(result),

                            [](std::string::value_type const &ch)

                            {

                               std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                               return std::isspace(static_cast<unsigned>(ch)) &&

                                      (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                            })

   };



   result.erase(end, std::end(result));

   return result;

}



int main()

{

   char const *foo{ "Hello   World!   "};

   std::string res = remove_excessive_ws(foo);

   std::cout << "n"" << res << ""n";

}

Here's its output.

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8

==========================

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8



"Hello World!"

Response to OP's comment

std::string::end() returns std::string::iterator(), which is a LegacyRandomAccessIterator. Please see the Member types of std::string.

LegacyRandomAccessIterator satisfies the requirements of LegacyBidirectionalIterator.

LegacyBidirectionalIterator satisfies the requirments of LegacyForwardIterator.

ForwardIterator satisfies the requirements of LegacyInputIterator.

When a LegacyInputIterator is dereferenced, it evaluates to a reference. Now, you have to go back to std::string to see what that means. As you can expect, it is a reference and not a copy.

You can also use a non-const reference to be doubly sure that you get a reference and not a copy.

auto end{ std::remove_if(std::begin(result), std::end(result),

                         [](std::string::value_type &ch)

                         {

                            std::cout << "pointer value: " << reinterpret_cast<void*>(&ch) << std::endl;

                            return std::isspace(static_cast<unsigned>(ch)) &&

                                   (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                         })

edited yesterday

answered 2 days ago

R Sahu

3,274617

1

$begingroup$
You can verify that by [...] – Standard reference? Also, Where is written that find_if() won't hand me a copy of the char instead a reference to it?
$endgroup$
– Swordfish
2 days ago

1

$begingroup$
Whether it's legal to dereference str.end() depends on the standards version; only since C++11 has str.data() been required to have a null-terminator.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight, agree in principle. However, if the string object is a null terminated string, my answer is valid. I update the answer to make that clearer.
$endgroup$
– R Sahu
yesterday

add a comment |

It's about accessing (&ch)[1]. I think it should be legal, but I am not sure.

Here's a updated version of your posted code that prints additional debugging info.

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

   std::string result{ str };



   std::for_each(std::begin(result), std::end(result),

                 [](std::string::value_type const &ch)

                 {

                    std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                 });



   std::cout << "==========================n";



   auto end{ std::remove_if(std::begin(result), std::end(result),

                            [](std::string::value_type const &ch)

                            {

                               std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                               return std::isspace(static_cast<unsigned>(ch)) &&

                                      (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                            })

   };



   result.erase(end, std::end(result));

   return result;

}



int main()

{

   char const *foo{ "Hello   World!   "};

   std::string res = remove_excessive_ws(foo);

   std::cout << "n"" << res << ""n";

}

Here's its output.

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8

==========================

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8



"Hello World!"

Response to OP's comment

std::string::end() returns std::string::iterator(), which is a LegacyRandomAccessIterator. Please see the Member types of std::string.

LegacyRandomAccessIterator satisfies the requirements of LegacyBidirectionalIterator.

LegacyBidirectionalIterator satisfies the requirments of LegacyForwardIterator.

ForwardIterator satisfies the requirements of LegacyInputIterator.

When a LegacyInputIterator is dereferenced, it evaluates to a reference. Now, you have to go back to std::string to see what that means. As you can expect, it is a reference and not a copy.

You can also use a non-const reference to be doubly sure that you get a reference and not a copy.

auto end{ std::remove_if(std::begin(result), std::end(result),

                         [](std::string::value_type &ch)

                         {

                            std::cout << "pointer value: " << reinterpret_cast<void*>(&ch) << std::endl;

                            return std::isspace(static_cast<unsigned>(ch)) &&

                                   (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                         })

edited yesterday

answered 2 days ago

R Sahu

3,274617

It's about accessing (&ch)[1]. I think it should be legal, but I am not sure.

Here's a updated version of your posted code that prints additional debugging info.

#include <string>

#include <algorithm>

#include <iostream>

#include <cctype>



std::string remove_excessive_ws(std::string const &str)

{

   std::string result{ str };



   std::for_each(std::begin(result), std::end(result),

                 [](std::string::value_type const &ch)

                 {

                    std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                 });



   std::cout << "==========================n";



   auto end{ std::remove_if(std::begin(result), std::end(result),

                            [](std::string::value_type const &ch)

                            {

                               std::cout << "pointer value: " << reinterpret_cast<void const*>(&ch) << std::endl;

                               return std::isspace(static_cast<unsigned>(ch)) &&

                                      (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                            })

   };



   result.erase(end, std::end(result));

   return result;

}



int main()

{

   char const *foo{ "Hello   World!   "};

   std::string res = remove_excessive_ws(foo);

   std::cout << "n"" << res << ""n";

}

Here's its output.

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8

==========================

pointer value: 0x600012be8

pointer value: 0x600012be9

pointer value: 0x600012bea

pointer value: 0x600012beb

pointer value: 0x600012bec

pointer value: 0x600012bed

pointer value: 0x600012bee

pointer value: 0x600012bef

pointer value: 0x600012bf0

pointer value: 0x600012bf1

pointer value: 0x600012bf2

pointer value: 0x600012bf3

pointer value: 0x600012bf4

pointer value: 0x600012bf5

pointer value: 0x600012bf6

pointer value: 0x600012bf7

pointer value: 0x600012bf8



"Hello World!"

Response to OP's comment

std::string::end() returns std::string::iterator(), which is a LegacyRandomAccessIterator. Please see the Member types of std::string.

LegacyRandomAccessIterator satisfies the requirements of LegacyBidirectionalIterator.

LegacyBidirectionalIterator satisfies the requirments of LegacyForwardIterator.

ForwardIterator satisfies the requirements of LegacyInputIterator.

When a LegacyInputIterator is dereferenced, it evaluates to a reference. Now, you have to go back to std::string to see what that means. As you can expect, it is a reference and not a copy.

You can also use a non-const reference to be doubly sure that you get a reference and not a copy.

auto end{ std::remove_if(std::begin(result), std::end(result),

                         [](std::string::value_type &ch)

                         {

                            std::cout << "pointer value: " << reinterpret_cast<void*>(&ch) << std::endl;

                            return std::isspace(static_cast<unsigned>(ch)) &&

                                   (std::isspace(static_cast<unsigned>((&ch)[1])) || (&ch)[1] == '');

                         })

edited yesterday

answered 2 days ago

R Sahu

3,274617

edited yesterday

answered 2 days ago

R Sahu

3,274617

answered 2 days ago

R Sahu

3,274617

answered 2 days ago

R Sahu

3,274617

1

$begingroup$
You can verify that by [...] – Standard reference? Also, Where is written that find_if() won't hand me a copy of the char instead a reference to it?
$endgroup$
– Swordfish
2 days ago

1

$begingroup$
Whether it's legal to dereference str.end() depends on the standards version; only since C++11 has str.data() been required to have a null-terminator.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight, agree in principle. However, if the string object is a null terminated string, my answer is valid. I update the answer to make that clearer.
$endgroup$
– R Sahu
yesterday

add a comment |

1

$begingroup$
You can verify that by [...] – Standard reference? Also, Where is written that find_if() won't hand me a copy of the char instead a reference to it?
$endgroup$
– Swordfish
2 days ago

1

$begingroup$
Whether it's legal to dereference str.end() depends on the standards version; only since C++11 has str.data() been required to have a null-terminator.
$endgroup$
– Toby Speight
2 days ago

$begingroup$
@TobySpeight, agree in principle. However, if the string object is a null terminated string, my answer is valid. I update the answer to make that clearer.
$endgroup$
– R Sahu
yesterday

You can verify that by [...] – Standard reference? Also, Where is written that find_if() won't hand me a copy of the char instead a reference to it?

– Swordfish
2 days ago

Whether it's legal to dereference str.end() depends on the standards version; only since C++11 has str.data() been required to have a null-terminator.

– Toby Speight
2 days ago

@TobySpeight, agree in principle. However, if the string object is a null terminated string, my answer is valid. I update the answer to make that clearer.

– R Sahu
yesterday

add a comment |

This page is only for reference, If you need detailed information, please check here

Remove excessive whitespace from string [on hold]Remove useless whitespace from styled stringRemove last word...

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

put on hold as off-topic by user673679, papagaga, t3chb0t, ferada, Zeta yesterday

3 Answers 3

Define the problem

Missing include

When an argument is to be copied, pass by value

Bug

Bug

Response to OP's comment

3 Answers 3

3 Answers 3

Define the problem

Missing include

When an argument is to be copied, pass by value

Bug

Bug

Define the problem

Missing include

When an argument is to be copied, pass by value

Bug

Bug

Define the problem

Missing include

When an argument is to be copied, pass by value

Bug

Bug

Define the problem

Missing include

When an argument is to be copied, pass by value

Bug

Bug

Response to OP's comment

Response to OP's comment

Response to OP's comment

Response to OP's comment

Popular posts from this blog

How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

St. Wolfgang (Mickhausen) Inhaltsverzeichnis Geschichte | Beschreibung | Ausstattung | Literatur |...

PTIJ: Mordechai mourningParashat PekudeiPurim and Shushan PurimIs wearing masks on Purim a Biblical...

3 Answers
3

3 Answers
3

3 Answers
3