Rule of five and move semantics for a string classCanonical Implementation of Move SemanticsYet another 'any'...

Is it true that real estate prices mainly go up?

How did Alan Turing break the enigma code using the hint given by the lady in the bar?

How can The Temple of Elementary Evil reliably protect itself against kinetic bombardment?

Are babies of evil humanoid species inherently evil?

Why would one plane in this picture not have gear down yet?

Good for you! in Russian

Is "history" a male-biased word ("his+story")?

Built-In Shelves/Bookcases - IKEA vs Built

Are there historical instances of the capital of a colonising country being temporarily or permanently shifted to one of its colonies?

Is Gradient Descent central to every optimizer?

Reverse string, can I make it faster?

Force user to remove USB token

Finding algorithms of QGIS commands?

Why was Goose renamed from Chewie for the Captain Marvel film?

What Happens when Passenger Refuses to Fly Boeing 737 Max?

infinitive telling the purpose

Can one live in the U.S. and not use a credit card?

They call me Inspector Morse

Signed and unsigned numbers

Should I take out a loan for a friend to invest on my behalf?

How much stiffer are 23c tires over 28c?

How do you like my writing?

Can someone explain what is being said here in color publishing in the American Mathematical Monthly?

PTIJ: where are Tzafra and Urta located?



Rule of five and move semantics for a string class


Canonical Implementation of Move SemanticsYet another 'any' class implementation, named 'some'Rule Of Three for a Coordinate classStack implementation with move semanticsBenchmarking copy semantics vs. move semanticsTemplated double ended queue (deque) move semantics edge cases C++C++ String classWriting order and move assignment when following the Rule of FiveString class in C++C++ Updated Stack Code













3












$begingroup$


I have been learning about the rule of five and I tried to implement an example myself. It all works exactly as I'd expect, but I was wondering if I've missed any obvious issues or easier/faster ways of implementing the constructors and operators.



class string {
public:
string() : data(nullptr) {
std::cout << "Default ctor" << std::endl;
}
string(const char * p, size_t size) : m_size(size) { //default constructor
std::cout << "Special ctor" << std::endl;
data = new char[m_size];
memcpy(data, p, m_size);
}
~string() noexcept { //default destructor
std::cout << "Default dtorrn";
delete[] data;
}
string(const string & str) { //copy constructor
std::cout << "Copy constructor" << std::endl;
m_size = str.m_size;
data = new char[m_size];
memcpy(data,str.data,m_size);
}
string(string && str) { //move constructor
std::cout << "Move constructor" << std::endl;
m_size = str.m_size;
data = str.data; //swap the values
str.data = nullptr;
}
string& operator=(const string & str) { //copy operator
std::cout << "Copy assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //can safely delete a nullptr
m_size = str.m_size;
char * new_data = new char[m_size]; //we have to make a copy as this is not the move operator
memcpy(new_data,str.data,m_size);
data = new_data;
return * this;
}
string& operator=(string && str) { //move assignment operator
std::cout << "Move assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //delete the old data, will be ok because data is always initialised to nullptr
m_size = str.m_size;
data = str.data; //swap over the pointers
str.data = nullptr;
return * this;
}

string& operator+=(const string & str) {
size_t new_size = m_size + str.m_size; //we need to create a new block of memory with the new appropriate size
char * new_data = new char[new_size];
memmove(new_data,data,m_size); //move the old memory
strcat(new_data,str.data); //concat the new memory
delete[] data; //delete the old data
m_size = new_size;
data = new_data;
return *this;
}

string operator+(const string & str) {
return static_cast<string&&>(string(*this) += str); //could be replaced with std::move() but I use static cast for education purposes
}

private:
char * data;
size_t m_size;
friend std::ostream& operator<<(std::ostream& os, const string& s) {
os << s.data << "rn";
return os;
}
};

int main() {
const char t1[] = "text";
const char t2[] = "more text";
const char * p1 = t1;
const char * p2 = t2;
string a(p1,sizeof(t1));
string b(p2,sizeof(t2));
string c(a);
string d(a + b);
string e;
a += b;
e = d;
c = b + a;

std::cout << "a:t" << a;
std::cout << "b:t" << b;
std::cout << "c:t" << c;
std::cout << "d:t" << d;
std::cout << "e:t" << e;


return 0;
}


I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?



==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==24703== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)









share|improve this question











$endgroup$












  • $begingroup$
    In your default constructor, you forgot to initialize m_size. Don't assume it will be automatically initialized to 0.
    $endgroup$
    – ncalmbeblpaicr0011
    Dec 15 '16 at 19:24










  • $begingroup$
    One thing that surprised me is that the ostream output operator is in a private: section, but is still accessible from main() - presumably being declared friend makes it visible? I'm still learning here...
    $endgroup$
    – Toby Speight
    Dec 15 '16 at 22:40
















3












$begingroup$


I have been learning about the rule of five and I tried to implement an example myself. It all works exactly as I'd expect, but I was wondering if I've missed any obvious issues or easier/faster ways of implementing the constructors and operators.



class string {
public:
string() : data(nullptr) {
std::cout << "Default ctor" << std::endl;
}
string(const char * p, size_t size) : m_size(size) { //default constructor
std::cout << "Special ctor" << std::endl;
data = new char[m_size];
memcpy(data, p, m_size);
}
~string() noexcept { //default destructor
std::cout << "Default dtorrn";
delete[] data;
}
string(const string & str) { //copy constructor
std::cout << "Copy constructor" << std::endl;
m_size = str.m_size;
data = new char[m_size];
memcpy(data,str.data,m_size);
}
string(string && str) { //move constructor
std::cout << "Move constructor" << std::endl;
m_size = str.m_size;
data = str.data; //swap the values
str.data = nullptr;
}
string& operator=(const string & str) { //copy operator
std::cout << "Copy assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //can safely delete a nullptr
m_size = str.m_size;
char * new_data = new char[m_size]; //we have to make a copy as this is not the move operator
memcpy(new_data,str.data,m_size);
data = new_data;
return * this;
}
string& operator=(string && str) { //move assignment operator
std::cout << "Move assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //delete the old data, will be ok because data is always initialised to nullptr
m_size = str.m_size;
data = str.data; //swap over the pointers
str.data = nullptr;
return * this;
}

string& operator+=(const string & str) {
size_t new_size = m_size + str.m_size; //we need to create a new block of memory with the new appropriate size
char * new_data = new char[new_size];
memmove(new_data,data,m_size); //move the old memory
strcat(new_data,str.data); //concat the new memory
delete[] data; //delete the old data
m_size = new_size;
data = new_data;
return *this;
}

string operator+(const string & str) {
return static_cast<string&&>(string(*this) += str); //could be replaced with std::move() but I use static cast for education purposes
}

private:
char * data;
size_t m_size;
friend std::ostream& operator<<(std::ostream& os, const string& s) {
os << s.data << "rn";
return os;
}
};

int main() {
const char t1[] = "text";
const char t2[] = "more text";
const char * p1 = t1;
const char * p2 = t2;
string a(p1,sizeof(t1));
string b(p2,sizeof(t2));
string c(a);
string d(a + b);
string e;
a += b;
e = d;
c = b + a;

std::cout << "a:t" << a;
std::cout << "b:t" << b;
std::cout << "c:t" << c;
std::cout << "d:t" << d;
std::cout << "e:t" << e;


return 0;
}


I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?



==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==24703== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)









share|improve this question











$endgroup$












  • $begingroup$
    In your default constructor, you forgot to initialize m_size. Don't assume it will be automatically initialized to 0.
    $endgroup$
    – ncalmbeblpaicr0011
    Dec 15 '16 at 19:24










  • $begingroup$
    One thing that surprised me is that the ostream output operator is in a private: section, but is still accessible from main() - presumably being declared friend makes it visible? I'm still learning here...
    $endgroup$
    – Toby Speight
    Dec 15 '16 at 22:40














3












3








3


1



$begingroup$


I have been learning about the rule of five and I tried to implement an example myself. It all works exactly as I'd expect, but I was wondering if I've missed any obvious issues or easier/faster ways of implementing the constructors and operators.



class string {
public:
string() : data(nullptr) {
std::cout << "Default ctor" << std::endl;
}
string(const char * p, size_t size) : m_size(size) { //default constructor
std::cout << "Special ctor" << std::endl;
data = new char[m_size];
memcpy(data, p, m_size);
}
~string() noexcept { //default destructor
std::cout << "Default dtorrn";
delete[] data;
}
string(const string & str) { //copy constructor
std::cout << "Copy constructor" << std::endl;
m_size = str.m_size;
data = new char[m_size];
memcpy(data,str.data,m_size);
}
string(string && str) { //move constructor
std::cout << "Move constructor" << std::endl;
m_size = str.m_size;
data = str.data; //swap the values
str.data = nullptr;
}
string& operator=(const string & str) { //copy operator
std::cout << "Copy assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //can safely delete a nullptr
m_size = str.m_size;
char * new_data = new char[m_size]; //we have to make a copy as this is not the move operator
memcpy(new_data,str.data,m_size);
data = new_data;
return * this;
}
string& operator=(string && str) { //move assignment operator
std::cout << "Move assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //delete the old data, will be ok because data is always initialised to nullptr
m_size = str.m_size;
data = str.data; //swap over the pointers
str.data = nullptr;
return * this;
}

string& operator+=(const string & str) {
size_t new_size = m_size + str.m_size; //we need to create a new block of memory with the new appropriate size
char * new_data = new char[new_size];
memmove(new_data,data,m_size); //move the old memory
strcat(new_data,str.data); //concat the new memory
delete[] data; //delete the old data
m_size = new_size;
data = new_data;
return *this;
}

string operator+(const string & str) {
return static_cast<string&&>(string(*this) += str); //could be replaced with std::move() but I use static cast for education purposes
}

private:
char * data;
size_t m_size;
friend std::ostream& operator<<(std::ostream& os, const string& s) {
os << s.data << "rn";
return os;
}
};

int main() {
const char t1[] = "text";
const char t2[] = "more text";
const char * p1 = t1;
const char * p2 = t2;
string a(p1,sizeof(t1));
string b(p2,sizeof(t2));
string c(a);
string d(a + b);
string e;
a += b;
e = d;
c = b + a;

std::cout << "a:t" << a;
std::cout << "b:t" << b;
std::cout << "c:t" << c;
std::cout << "d:t" << d;
std::cout << "e:t" << e;


return 0;
}


I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?



==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==24703== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)









share|improve this question











$endgroup$




I have been learning about the rule of five and I tried to implement an example myself. It all works exactly as I'd expect, but I was wondering if I've missed any obvious issues or easier/faster ways of implementing the constructors and operators.



class string {
public:
string() : data(nullptr) {
std::cout << "Default ctor" << std::endl;
}
string(const char * p, size_t size) : m_size(size) { //default constructor
std::cout << "Special ctor" << std::endl;
data = new char[m_size];
memcpy(data, p, m_size);
}
~string() noexcept { //default destructor
std::cout << "Default dtorrn";
delete[] data;
}
string(const string & str) { //copy constructor
std::cout << "Copy constructor" << std::endl;
m_size = str.m_size;
data = new char[m_size];
memcpy(data,str.data,m_size);
}
string(string && str) { //move constructor
std::cout << "Move constructor" << std::endl;
m_size = str.m_size;
data = str.data; //swap the values
str.data = nullptr;
}
string& operator=(const string & str) { //copy operator
std::cout << "Copy assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //can safely delete a nullptr
m_size = str.m_size;
char * new_data = new char[m_size]; //we have to make a copy as this is not the move operator
memcpy(new_data,str.data,m_size);
data = new_data;
return * this;
}
string& operator=(string && str) { //move assignment operator
std::cout << "Move assignment" << std::endl;
if (this == &str) return * this;
delete[] data; //delete the old data, will be ok because data is always initialised to nullptr
m_size = str.m_size;
data = str.data; //swap over the pointers
str.data = nullptr;
return * this;
}

string& operator+=(const string & str) {
size_t new_size = m_size + str.m_size; //we need to create a new block of memory with the new appropriate size
char * new_data = new char[new_size];
memmove(new_data,data,m_size); //move the old memory
strcat(new_data,str.data); //concat the new memory
delete[] data; //delete the old data
m_size = new_size;
data = new_data;
return *this;
}

string operator+(const string & str) {
return static_cast<string&&>(string(*this) += str); //could be replaced with std::move() but I use static cast for education purposes
}

private:
char * data;
size_t m_size;
friend std::ostream& operator<<(std::ostream& os, const string& s) {
os << s.data << "rn";
return os;
}
};

int main() {
const char t1[] = "text";
const char t2[] = "more text";
const char * p1 = t1;
const char * p2 = t2;
string a(p1,sizeof(t1));
string b(p2,sizeof(t2));
string c(a);
string d(a + b);
string e;
a += b;
e = d;
c = b + a;

std::cout << "a:t" << a;
std::cout << "b:t" << b;
std::cout << "c:t" << c;
std::cout << "d:t" << d;
std::cout << "e:t" << e;


return 0;
}


I ran it with valgrind and recieved the following output: I am concerned that there are 11 uses of alloc and only 10 frees? I am vaguely familiar with the difference but not confident enough to tell if it's a mistake I've made or expected output?



==24703== HEAP SUMMARY:
==24703== in use at exit: 72,704 bytes in 1 blocks
==24703== total heap usage: 11 allocs, 10 frees, 73,833 bytes allocated
==24703==
==24703== LEAK SUMMARY:
==24703== definitely lost: 0 bytes in 0 blocks
==24703== indirectly lost: 0 bytes in 0 blocks
==24703== possibly lost: 0 bytes in 0 blocks
==24703== still reachable: 72,704 bytes in 1 blocks
==24703== suppressed: 0 bytes in 0 blocks
==24703== Rerun with --leak-check=full to see details of leaked memory
==24703==
==24703== For counts of detected and suppressed errors, rerun with: -v
==24703== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)






c++ strings reinventing-the-wheel memory-management






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 15 '16 at 21:40









200_success

130k17153419




130k17153419










asked Dec 14 '16 at 20:34









TomJTomJ

1516




1516












  • $begingroup$
    In your default constructor, you forgot to initialize m_size. Don't assume it will be automatically initialized to 0.
    $endgroup$
    – ncalmbeblpaicr0011
    Dec 15 '16 at 19:24










  • $begingroup$
    One thing that surprised me is that the ostream output operator is in a private: section, but is still accessible from main() - presumably being declared friend makes it visible? I'm still learning here...
    $endgroup$
    – Toby Speight
    Dec 15 '16 at 22:40


















  • $begingroup$
    In your default constructor, you forgot to initialize m_size. Don't assume it will be automatically initialized to 0.
    $endgroup$
    – ncalmbeblpaicr0011
    Dec 15 '16 at 19:24










  • $begingroup$
    One thing that surprised me is that the ostream output operator is in a private: section, but is still accessible from main() - presumably being declared friend makes it visible? I'm still learning here...
    $endgroup$
    – Toby Speight
    Dec 15 '16 at 22:40
















$begingroup$
In your default constructor, you forgot to initialize m_size. Don't assume it will be automatically initialized to 0.
$endgroup$
– ncalmbeblpaicr0011
Dec 15 '16 at 19:24




$begingroup$
In your default constructor, you forgot to initialize m_size. Don't assume it will be automatically initialized to 0.
$endgroup$
– ncalmbeblpaicr0011
Dec 15 '16 at 19:24












$begingroup$
One thing that surprised me is that the ostream output operator is in a private: section, but is still accessible from main() - presumably being declared friend makes it visible? I'm still learning here...
$endgroup$
– Toby Speight
Dec 15 '16 at 22:40




$begingroup$
One thing that surprised me is that the ostream output operator is in a private: section, but is still accessible from main() - presumably being declared friend makes it visible? I'm still learning here...
$endgroup$
– Toby Speight
Dec 15 '16 at 22:40










2 Answers
2






active

oldest

votes


















4












$begingroup$

Comments



Let's consider one of your constructors, and its accompanying comment:



string(const char * p, size_t size) : m_size(size) { //default constructor


At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.



The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.



Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.



The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).



Non-templated string type



The days of strings supporting only char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).



Array new



The array form of new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).



operator+



I'm not particularly fond of your current implementation of operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.



I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:



string operator+(const string & str) const {
string r(*this);
r += str;
return r;
}


I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of std::move) can inhibit that.



Alternative Rule



In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.



Copy assignment



Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.



One easy way to get defined behavior is the copy and swap idiom, which could look something like this:



string& operator=(string str) {
std::cout << "Copy assignmentn";
using std::swap;
swap(data, str.data);
swap(m_size, str.m_size);
return *this;
}


Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.



In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.



This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:



Phase 1: carry out things that might throw, but don't affect any existing data if they do.
Phase 2: carry out operation that affect existing data, but can't throw.



This way, if anything throws, Phase 2 never happens at all, so existing data can never be affected. If phase 2 starts, it always executes completely. Therefore, we always get either the state from before we started this operation at all, or else the state after it finished completely. The outside world will never see an intermediate state.



Of course, move assignment can be handled much the same way, except taking an rvalue reference as the parameter. We can still swap the "guts" of the current string with those of the parameter, and then let the dtor take care of destroying the parameter when it goes out of scope:



string& operator=(string && str) {
std::cout << "Move assignmentn";
using std::swap;
swap(data, str.data);
swap(m_size, str.m_size);
return *this;
}


Use of strcat



As it stands right now, I'd call your use of strcat (e.g., in operator+=) basically a bug. To use strcat, you'd need to guarantee that every string contains exactly one NUL character at its very end.



As it stands now, a string can be created that contains NUL characters in its middle. A string can also be created that doesn't contain any NUL character at its end.



To work correctly, you could use memcpy for both operations:



string& operator+=(const string & str) {
size_t new_size = m_size + str.m_size;
char * new_data = new char[new_size];
mempy(new_data, data, m_size);
memcpy(new_data+m_size, str.data, str.m_size);
delete[] data;
m_size = new_size;
data = new_data;
return *this;
}


Note that your use of memmove would work correctly, but was overkill. memmove guarantees correct behavior even if the old and new locations might overlap. To do that, it has to check for overlap, and copy from beginning to end for overlap in one direction, and end to beginning for overlap in the other direction. In this case, we know there will never be any overlap, so we can use memcpy which only supports non-overlapping buffers (so it avoids checking for overlap).






share|improve this answer











$endgroup$





















    1












    $begingroup$

    When I run your code, Valgrind reports 11 allocations and a matching 11 frees, with no leaks. So I don't see why your run indicated a non-freed block.



    I did have to add includes for <cstring> and <iostream> to get it to compile - please don't omit your includes, even if the code is long! Other than that, this code was complete; it's clear and easy to read.



    Some observations:



    Initializers



    g++ -Weffc++ turns up some missing initialisers that are worth addressing. For example:



    string() : data(nullptr) {
    std::cout << "Default ctor" << std::endl;
    }


    Would be better if it also initialized m_size:



    string()
    : data(nullptr),
    m_size(0)
    {
    std::cout << "Default ctor" << std::endl;
    }


    And this one (the comment lies):



    string(const char * p, size_t size) : m_size(size) { //default constructor
    std::cout << "Special ctor" << std::endl;
    data = new char[m_size];
    memcpy(data, p, m_size);
    }


    could be



    string(const char *p, size_t size)
    : data(new char[size]),
    m_size(size)
    {
    std::cout << "Converting ctor" << std::endl;
    std::memcpy(data, p, m_size);
    }


    I prefer one initialiser per line, as I do for assignments - but you're welcome to disagree.



    Constructors - general



    I'd add a constructor that takes a null-terminated string:



    string(const char *p) : string(p, strlen(p)) {}


    Assignment operator



    You can simplify your assignment operator using the copy and swap idiom:



    friend void swap(string &a, string& b)
    {
    using std::swap;
    swap(a.data, b.data);
    swap(a.m_size, b.m_size);
    }

    string& operator=(string str) {
    std::cout << "Copy assignment" << std::endl;
    swap(*this, str);
    return *this;
    }


    Writing it like this means that you don't need to overload it on const string& and string&& - passing by value will construct str using the appropriate constructor. See the linked answer for details.



    Copy-and-swap does have some performance cost to gain that simplicity and safety. See Scott Meyers's article Drawbacks of Implementing Move Assignment in Terms of Swap for a full explanation. If that's a concern, then you should keep the overload for move-assignment.



    Append operator



    You use strcat() to append the second string to the first, but that assumes that both are null-terminated. You need to use memmove() (preferably memcpy) for both:



    string& operator+=(const string & str) {
    size_t new_size = m_size + str.m_size;
    char * new_data = new char[new_size];
    memcpy(new_data, data, m_size);
    memcpy(new_data+m_size, str.data, str.m_size);
    delete[] data;
    m_size = new_size;
    data = new_data;
    return *this;
    }


    The ostream operator also depends on a trailing NUL, so here's a safe replacement:



    friend std::ostream& operator<<(std::ostream& os, const string& s) {
    for (size_t i = 0; i < s.m_size; ++i)
    os << s.data[i];
    return os << "n";
    }


    Addition operator



    Addition is conventionally a const method; I'm not very fond of casts:



    string operator+(const string & str) const {
    string s{*this};
    return s += str;
    }


    You may prefer to add a constructor taking two (or more - you might employ std::initializer_list or C++17 fold expressions) string objects, to save allocating temporary space which is immediately freed and reallocated:



    public:
    string operator+(const string & str) const {
    return {*this, str};
    }

    private:
    string(const string& a, const string& b)
    : data{new char[a.m_size + b.m_size]},
    m_size{a.m_size + b.m_size}
    {
    if (a.data)
    memcpy(data, a.data, a.m_size);
    if (b.data)
    memcpy(data+a.m_size, b.data, b.m_size);
    }


    That concatenation constructor could be more concise if you swap the order of m_size and data in the class.



    Tests



    It's great that you include the tests in the review. It helps show how you expect your class to be used, and it helps identify tests you've missed. In my opinion, all review requests should include tests.



    Here I changed the initializer of a to not include its NUL and that of b to use my additional constructor:



    string a(p1,sizeof t1-1);
    string b(p2);


    I took out the assignment e=d; so that I could test operator+() with the empty string:



    std::cout << "a+e:t" << a+e;
    std::cout << "e+e:t" << e+e;
    std::cout << "e+a:t" << e+a;


    It would be worth implementing the == operator soon, and then you could make the tests self-checking (affecting the exit status, rather than requiring manual inspection).






    share|improve this answer











    $endgroup$













    • $begingroup$
      An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
      $endgroup$
      – Toby Speight
      Jan 19 '17 at 7:18











    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149917%2frule-of-five-and-move-semantics-for-a-string-class%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4












    $begingroup$

    Comments



    Let's consider one of your constructors, and its accompanying comment:



    string(const char * p, size_t size) : m_size(size) { //default constructor


    At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.



    The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.



    Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.



    The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).



    Non-templated string type



    The days of strings supporting only char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).



    Array new



    The array form of new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).



    operator+



    I'm not particularly fond of your current implementation of operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.



    I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:



    string operator+(const string & str) const {
    string r(*this);
    r += str;
    return r;
    }


    I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of std::move) can inhibit that.



    Alternative Rule



    In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.



    Copy assignment



    Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.



    One easy way to get defined behavior is the copy and swap idiom, which could look something like this:



    string& operator=(string str) {
    std::cout << "Copy assignmentn";
    using std::swap;
    swap(data, str.data);
    swap(m_size, str.m_size);
    return *this;
    }


    Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.



    In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.



    This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:



    Phase 1: carry out things that might throw, but don't affect any existing data if they do.
    Phase 2: carry out operation that affect existing data, but can't throw.



    This way, if anything throws, Phase 2 never happens at all, so existing data can never be affected. If phase 2 starts, it always executes completely. Therefore, we always get either the state from before we started this operation at all, or else the state after it finished completely. The outside world will never see an intermediate state.



    Of course, move assignment can be handled much the same way, except taking an rvalue reference as the parameter. We can still swap the "guts" of the current string with those of the parameter, and then let the dtor take care of destroying the parameter when it goes out of scope:



    string& operator=(string && str) {
    std::cout << "Move assignmentn";
    using std::swap;
    swap(data, str.data);
    swap(m_size, str.m_size);
    return *this;
    }


    Use of strcat



    As it stands right now, I'd call your use of strcat (e.g., in operator+=) basically a bug. To use strcat, you'd need to guarantee that every string contains exactly one NUL character at its very end.



    As it stands now, a string can be created that contains NUL characters in its middle. A string can also be created that doesn't contain any NUL character at its end.



    To work correctly, you could use memcpy for both operations:



    string& operator+=(const string & str) {
    size_t new_size = m_size + str.m_size;
    char * new_data = new char[new_size];
    mempy(new_data, data, m_size);
    memcpy(new_data+m_size, str.data, str.m_size);
    delete[] data;
    m_size = new_size;
    data = new_data;
    return *this;
    }


    Note that your use of memmove would work correctly, but was overkill. memmove guarantees correct behavior even if the old and new locations might overlap. To do that, it has to check for overlap, and copy from beginning to end for overlap in one direction, and end to beginning for overlap in the other direction. In this case, we know there will never be any overlap, so we can use memcpy which only supports non-overlapping buffers (so it avoids checking for overlap).






    share|improve this answer











    $endgroup$


















      4












      $begingroup$

      Comments



      Let's consider one of your constructors, and its accompanying comment:



      string(const char * p, size_t size) : m_size(size) { //default constructor


      At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.



      The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.



      Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.



      The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).



      Non-templated string type



      The days of strings supporting only char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).



      Array new



      The array form of new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).



      operator+



      I'm not particularly fond of your current implementation of operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.



      I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:



      string operator+(const string & str) const {
      string r(*this);
      r += str;
      return r;
      }


      I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of std::move) can inhibit that.



      Alternative Rule



      In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.



      Copy assignment



      Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.



      One easy way to get defined behavior is the copy and swap idiom, which could look something like this:



      string& operator=(string str) {
      std::cout << "Copy assignmentn";
      using std::swap;
      swap(data, str.data);
      swap(m_size, str.m_size);
      return *this;
      }


      Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.



      In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.



      This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:



      Phase 1: carry out things that might throw, but don't affect any existing data if they do.
      Phase 2: carry out operation that affect existing data, but can't throw.



      This way, if anything throws, Phase 2 never happens at all, so existing data can never be affected. If phase 2 starts, it always executes completely. Therefore, we always get either the state from before we started this operation at all, or else the state after it finished completely. The outside world will never see an intermediate state.



      Of course, move assignment can be handled much the same way, except taking an rvalue reference as the parameter. We can still swap the "guts" of the current string with those of the parameter, and then let the dtor take care of destroying the parameter when it goes out of scope:



      string& operator=(string && str) {
      std::cout << "Move assignmentn";
      using std::swap;
      swap(data, str.data);
      swap(m_size, str.m_size);
      return *this;
      }


      Use of strcat



      As it stands right now, I'd call your use of strcat (e.g., in operator+=) basically a bug. To use strcat, you'd need to guarantee that every string contains exactly one NUL character at its very end.



      As it stands now, a string can be created that contains NUL characters in its middle. A string can also be created that doesn't contain any NUL character at its end.



      To work correctly, you could use memcpy for both operations:



      string& operator+=(const string & str) {
      size_t new_size = m_size + str.m_size;
      char * new_data = new char[new_size];
      mempy(new_data, data, m_size);
      memcpy(new_data+m_size, str.data, str.m_size);
      delete[] data;
      m_size = new_size;
      data = new_data;
      return *this;
      }


      Note that your use of memmove would work correctly, but was overkill. memmove guarantees correct behavior even if the old and new locations might overlap. To do that, it has to check for overlap, and copy from beginning to end for overlap in one direction, and end to beginning for overlap in the other direction. In this case, we know there will never be any overlap, so we can use memcpy which only supports non-overlapping buffers (so it avoids checking for overlap).






      share|improve this answer











      $endgroup$
















        4












        4








        4





        $begingroup$

        Comments



        Let's consider one of your constructors, and its accompanying comment:



        string(const char * p, size_t size) : m_size(size) { //default constructor


        At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.



        The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.



        Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.



        The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).



        Non-templated string type



        The days of strings supporting only char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).



        Array new



        The array form of new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).



        operator+



        I'm not particularly fond of your current implementation of operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.



        I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:



        string operator+(const string & str) const {
        string r(*this);
        r += str;
        return r;
        }


        I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of std::move) can inhibit that.



        Alternative Rule



        In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.



        Copy assignment



        Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.



        One easy way to get defined behavior is the copy and swap idiom, which could look something like this:



        string& operator=(string str) {
        std::cout << "Copy assignmentn";
        using std::swap;
        swap(data, str.data);
        swap(m_size, str.m_size);
        return *this;
        }


        Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.



        In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.



        This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:



        Phase 1: carry out things that might throw, but don't affect any existing data if they do.
        Phase 2: carry out operation that affect existing data, but can't throw.



        This way, if anything throws, Phase 2 never happens at all, so existing data can never be affected. If phase 2 starts, it always executes completely. Therefore, we always get either the state from before we started this operation at all, or else the state after it finished completely. The outside world will never see an intermediate state.



        Of course, move assignment can be handled much the same way, except taking an rvalue reference as the parameter. We can still swap the "guts" of the current string with those of the parameter, and then let the dtor take care of destroying the parameter when it goes out of scope:



        string& operator=(string && str) {
        std::cout << "Move assignmentn";
        using std::swap;
        swap(data, str.data);
        swap(m_size, str.m_size);
        return *this;
        }


        Use of strcat



        As it stands right now, I'd call your use of strcat (e.g., in operator+=) basically a bug. To use strcat, you'd need to guarantee that every string contains exactly one NUL character at its very end.



        As it stands now, a string can be created that contains NUL characters in its middle. A string can also be created that doesn't contain any NUL character at its end.



        To work correctly, you could use memcpy for both operations:



        string& operator+=(const string & str) {
        size_t new_size = m_size + str.m_size;
        char * new_data = new char[new_size];
        mempy(new_data, data, m_size);
        memcpy(new_data+m_size, str.data, str.m_size);
        delete[] data;
        m_size = new_size;
        data = new_data;
        return *this;
        }


        Note that your use of memmove would work correctly, but was overkill. memmove guarantees correct behavior even if the old and new locations might overlap. To do that, it has to check for overlap, and copy from beginning to end for overlap in one direction, and end to beginning for overlap in the other direction. In this case, we know there will never be any overlap, so we can use memcpy which only supports non-overlapping buffers (so it avoids checking for overlap).






        share|improve this answer











        $endgroup$



        Comments



        Let's consider one of your constructors, and its accompanying comment:



        string(const char * p, size_t size) : m_size(size) { //default constructor


        At least in my opinion, this has at least two obvious problems. This first problem is that the comment is just wrong. A default constructor is one that can be invoked without supplying any arguments. In this case, two arguments must be supplied, so it's not a default ctor.



        The second problem is almost simpler. As it stands right now, the comment is pointless. If the comment were corrected (or the code changed so it really was a default constructor), it would still wouldn't give any useful information. It's just telling me what's already obvious from the function's parameter list--it's a default ctor if and only if it can be invoked without supplying any arguments.



        Comments should convey information that's not immediately obvious from the code itself. Most often, this is things like why you wrote the code the way it is.



        The same applies to most of the other comments as well--everything they try to convey is either wrong or obvious (or, as above, both).



        Non-templated string type



        The days of strings supporting only char as the data type are (IMO) long past. Whether you like Unicode or not, it's better to support it (and for a string, it's much cleaner to manipulate UTF-32 than UTF-8).



        Array new



        The array form of new should generally be avoided. In the specific case of char (or unsigned char) it's marginally less evil than usual, but I'd advise avoiding it in general. For containers (and IMO, string is a container) you want to use operator new to allocate "raw" memory, then use placement new to construct objects in that raw memory. It's kind of overkill in the specific case of char, but at worst it's harmless, and at best it's a drastic improvement (it leaves the unused part of the memory as raw memory, where new [] always creates objects in all the memory you allocate).



        operator+



        I'm not particularly fond of your current implementation of operator+. It works, but (IMO) is fairly easy for somebody to misinterpret in subtle ways that could lead to problems down the road.



        I'd do pretty much the same things, but split it into smaller pieces that made the intent a little more obvious:



        string operator+(const string & str) const {
        string r(*this);
        r += str;
        return r;
        }


        I'd have to think things through in detail to be sure, there's at least a reasonable chance that this is actually more efficient. In particular, it allows copy elision of the return value, where casting to an rvalue (whether done explicitly or in the form of std::move) can inhibit that.



        Alternative Rule



        In most case, you want to follow the rule of zero rather than the rule of five. In simple form, this says you should delegate essentially all the resource management to a class (most often unique_ptr or shared_ptr) devoted solely to that purpose, so your class only needs to deal with whatever it's really supposed to do. In the case of a (non-COW) string, each string representation has only a single owner, so this should be std::unique_ptr.



        Copy assignment



        Your current copy assignment operator isn't safe in the face of exceptions. Right now you have a sequence of deleting the existing data, then allocating new space, and then copying the new data into the new space. If (for example) that new throws an exception, your string won't contain either the old data or the new data. Since it's a constructed string, its destructor will execute when that scope is exited. That will attempt to delete the buffer pointed to by its data--which you just deleted, so you get a double delete, giving undefined behavior.



        One easy way to get defined behavior is the copy and swap idiom, which could look something like this:



        string& operator=(string str) {
        std::cout << "Copy assignmentn";
        using std::swap;
        swap(data, str.data);
        swap(m_size, str.m_size);
        return *this;
        }


        Since we pass the parameter by value, the copy constructor is invoked to create a string for the parameter. We then essentially do a move from there to the target. To keep things as simple as possible, we swap the guts of the two strings, so when the function returns, the parameter goes out of scope, and gets destroyed automatically.



        In this case, the allocation happens in the ctor, so if it throws, this function is never invoked at all. The function itself contains only operations that we know can't throw, so we don't have to worry about what would happen if an exception were thrown in the middle of it.



        This isn't the only possible way, of course, but it's one way that's pretty easy to get right. More important, perhaps, is establishing the basic pattern:



        Phase 1: carry out things that might throw, but don't affect any existing data if they do.
        Phase 2: carry out operation that affect existing data, but can't throw.



        This way, if anything throws, Phase 2 never happens at all, so existing data can never be affected. If phase 2 starts, it always executes completely. Therefore, we always get either the state from before we started this operation at all, or else the state after it finished completely. The outside world will never see an intermediate state.



        Of course, move assignment can be handled much the same way, except taking an rvalue reference as the parameter. We can still swap the "guts" of the current string with those of the parameter, and then let the dtor take care of destroying the parameter when it goes out of scope:



        string& operator=(string && str) {
        std::cout << "Move assignmentn";
        using std::swap;
        swap(data, str.data);
        swap(m_size, str.m_size);
        return *this;
        }


        Use of strcat



        As it stands right now, I'd call your use of strcat (e.g., in operator+=) basically a bug. To use strcat, you'd need to guarantee that every string contains exactly one NUL character at its very end.



        As it stands now, a string can be created that contains NUL characters in its middle. A string can also be created that doesn't contain any NUL character at its end.



        To work correctly, you could use memcpy for both operations:



        string& operator+=(const string & str) {
        size_t new_size = m_size + str.m_size;
        char * new_data = new char[new_size];
        mempy(new_data, data, m_size);
        memcpy(new_data+m_size, str.data, str.m_size);
        delete[] data;
        m_size = new_size;
        data = new_data;
        return *this;
        }


        Note that your use of memmove would work correctly, but was overkill. memmove guarantees correct behavior even if the old and new locations might overlap. To do that, it has to check for overlap, and copy from beginning to end for overlap in one direction, and end to beginning for overlap in the other direction. In this case, we know there will never be any overlap, so we can use memcpy which only supports non-overlapping buffers (so it avoids checking for overlap).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 4 mins ago

























        answered Dec 15 '16 at 0:17









        Jerry CoffinJerry Coffin

        28.4k462128




        28.4k462128

























            1












            $begingroup$

            When I run your code, Valgrind reports 11 allocations and a matching 11 frees, with no leaks. So I don't see why your run indicated a non-freed block.



            I did have to add includes for <cstring> and <iostream> to get it to compile - please don't omit your includes, even if the code is long! Other than that, this code was complete; it's clear and easy to read.



            Some observations:



            Initializers



            g++ -Weffc++ turns up some missing initialisers that are worth addressing. For example:



            string() : data(nullptr) {
            std::cout << "Default ctor" << std::endl;
            }


            Would be better if it also initialized m_size:



            string()
            : data(nullptr),
            m_size(0)
            {
            std::cout << "Default ctor" << std::endl;
            }


            And this one (the comment lies):



            string(const char * p, size_t size) : m_size(size) { //default constructor
            std::cout << "Special ctor" << std::endl;
            data = new char[m_size];
            memcpy(data, p, m_size);
            }


            could be



            string(const char *p, size_t size)
            : data(new char[size]),
            m_size(size)
            {
            std::cout << "Converting ctor" << std::endl;
            std::memcpy(data, p, m_size);
            }


            I prefer one initialiser per line, as I do for assignments - but you're welcome to disagree.



            Constructors - general



            I'd add a constructor that takes a null-terminated string:



            string(const char *p) : string(p, strlen(p)) {}


            Assignment operator



            You can simplify your assignment operator using the copy and swap idiom:



            friend void swap(string &a, string& b)
            {
            using std::swap;
            swap(a.data, b.data);
            swap(a.m_size, b.m_size);
            }

            string& operator=(string str) {
            std::cout << "Copy assignment" << std::endl;
            swap(*this, str);
            return *this;
            }


            Writing it like this means that you don't need to overload it on const string& and string&& - passing by value will construct str using the appropriate constructor. See the linked answer for details.



            Copy-and-swap does have some performance cost to gain that simplicity and safety. See Scott Meyers's article Drawbacks of Implementing Move Assignment in Terms of Swap for a full explanation. If that's a concern, then you should keep the overload for move-assignment.



            Append operator



            You use strcat() to append the second string to the first, but that assumes that both are null-terminated. You need to use memmove() (preferably memcpy) for both:



            string& operator+=(const string & str) {
            size_t new_size = m_size + str.m_size;
            char * new_data = new char[new_size];
            memcpy(new_data, data, m_size);
            memcpy(new_data+m_size, str.data, str.m_size);
            delete[] data;
            m_size = new_size;
            data = new_data;
            return *this;
            }


            The ostream operator also depends on a trailing NUL, so here's a safe replacement:



            friend std::ostream& operator<<(std::ostream& os, const string& s) {
            for (size_t i = 0; i < s.m_size; ++i)
            os << s.data[i];
            return os << "n";
            }


            Addition operator



            Addition is conventionally a const method; I'm not very fond of casts:



            string operator+(const string & str) const {
            string s{*this};
            return s += str;
            }


            You may prefer to add a constructor taking two (or more - you might employ std::initializer_list or C++17 fold expressions) string objects, to save allocating temporary space which is immediately freed and reallocated:



            public:
            string operator+(const string & str) const {
            return {*this, str};
            }

            private:
            string(const string& a, const string& b)
            : data{new char[a.m_size + b.m_size]},
            m_size{a.m_size + b.m_size}
            {
            if (a.data)
            memcpy(data, a.data, a.m_size);
            if (b.data)
            memcpy(data+a.m_size, b.data, b.m_size);
            }


            That concatenation constructor could be more concise if you swap the order of m_size and data in the class.



            Tests



            It's great that you include the tests in the review. It helps show how you expect your class to be used, and it helps identify tests you've missed. In my opinion, all review requests should include tests.



            Here I changed the initializer of a to not include its NUL and that of b to use my additional constructor:



            string a(p1,sizeof t1-1);
            string b(p2);


            I took out the assignment e=d; so that I could test operator+() with the empty string:



            std::cout << "a+e:t" << a+e;
            std::cout << "e+e:t" << e+e;
            std::cout << "e+a:t" << e+a;


            It would be worth implementing the == operator soon, and then you could make the tests self-checking (affecting the exit status, rather than requiring manual inspection).






            share|improve this answer











            $endgroup$













            • $begingroup$
              An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
              $endgroup$
              – Toby Speight
              Jan 19 '17 at 7:18
















            1












            $begingroup$

            When I run your code, Valgrind reports 11 allocations and a matching 11 frees, with no leaks. So I don't see why your run indicated a non-freed block.



            I did have to add includes for <cstring> and <iostream> to get it to compile - please don't omit your includes, even if the code is long! Other than that, this code was complete; it's clear and easy to read.



            Some observations:



            Initializers



            g++ -Weffc++ turns up some missing initialisers that are worth addressing. For example:



            string() : data(nullptr) {
            std::cout << "Default ctor" << std::endl;
            }


            Would be better if it also initialized m_size:



            string()
            : data(nullptr),
            m_size(0)
            {
            std::cout << "Default ctor" << std::endl;
            }


            And this one (the comment lies):



            string(const char * p, size_t size) : m_size(size) { //default constructor
            std::cout << "Special ctor" << std::endl;
            data = new char[m_size];
            memcpy(data, p, m_size);
            }


            could be



            string(const char *p, size_t size)
            : data(new char[size]),
            m_size(size)
            {
            std::cout << "Converting ctor" << std::endl;
            std::memcpy(data, p, m_size);
            }


            I prefer one initialiser per line, as I do for assignments - but you're welcome to disagree.



            Constructors - general



            I'd add a constructor that takes a null-terminated string:



            string(const char *p) : string(p, strlen(p)) {}


            Assignment operator



            You can simplify your assignment operator using the copy and swap idiom:



            friend void swap(string &a, string& b)
            {
            using std::swap;
            swap(a.data, b.data);
            swap(a.m_size, b.m_size);
            }

            string& operator=(string str) {
            std::cout << "Copy assignment" << std::endl;
            swap(*this, str);
            return *this;
            }


            Writing it like this means that you don't need to overload it on const string& and string&& - passing by value will construct str using the appropriate constructor. See the linked answer for details.



            Copy-and-swap does have some performance cost to gain that simplicity and safety. See Scott Meyers's article Drawbacks of Implementing Move Assignment in Terms of Swap for a full explanation. If that's a concern, then you should keep the overload for move-assignment.



            Append operator



            You use strcat() to append the second string to the first, but that assumes that both are null-terminated. You need to use memmove() (preferably memcpy) for both:



            string& operator+=(const string & str) {
            size_t new_size = m_size + str.m_size;
            char * new_data = new char[new_size];
            memcpy(new_data, data, m_size);
            memcpy(new_data+m_size, str.data, str.m_size);
            delete[] data;
            m_size = new_size;
            data = new_data;
            return *this;
            }


            The ostream operator also depends on a trailing NUL, so here's a safe replacement:



            friend std::ostream& operator<<(std::ostream& os, const string& s) {
            for (size_t i = 0; i < s.m_size; ++i)
            os << s.data[i];
            return os << "n";
            }


            Addition operator



            Addition is conventionally a const method; I'm not very fond of casts:



            string operator+(const string & str) const {
            string s{*this};
            return s += str;
            }


            You may prefer to add a constructor taking two (or more - you might employ std::initializer_list or C++17 fold expressions) string objects, to save allocating temporary space which is immediately freed and reallocated:



            public:
            string operator+(const string & str) const {
            return {*this, str};
            }

            private:
            string(const string& a, const string& b)
            : data{new char[a.m_size + b.m_size]},
            m_size{a.m_size + b.m_size}
            {
            if (a.data)
            memcpy(data, a.data, a.m_size);
            if (b.data)
            memcpy(data+a.m_size, b.data, b.m_size);
            }


            That concatenation constructor could be more concise if you swap the order of m_size and data in the class.



            Tests



            It's great that you include the tests in the review. It helps show how you expect your class to be used, and it helps identify tests you've missed. In my opinion, all review requests should include tests.



            Here I changed the initializer of a to not include its NUL and that of b to use my additional constructor:



            string a(p1,sizeof t1-1);
            string b(p2);


            I took out the assignment e=d; so that I could test operator+() with the empty string:



            std::cout << "a+e:t" << a+e;
            std::cout << "e+e:t" << e+e;
            std::cout << "e+a:t" << e+a;


            It would be worth implementing the == operator soon, and then you could make the tests self-checking (affecting the exit status, rather than requiring manual inspection).






            share|improve this answer











            $endgroup$













            • $begingroup$
              An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
              $endgroup$
              – Toby Speight
              Jan 19 '17 at 7:18














            1












            1








            1





            $begingroup$

            When I run your code, Valgrind reports 11 allocations and a matching 11 frees, with no leaks. So I don't see why your run indicated a non-freed block.



            I did have to add includes for <cstring> and <iostream> to get it to compile - please don't omit your includes, even if the code is long! Other than that, this code was complete; it's clear and easy to read.



            Some observations:



            Initializers



            g++ -Weffc++ turns up some missing initialisers that are worth addressing. For example:



            string() : data(nullptr) {
            std::cout << "Default ctor" << std::endl;
            }


            Would be better if it also initialized m_size:



            string()
            : data(nullptr),
            m_size(0)
            {
            std::cout << "Default ctor" << std::endl;
            }


            And this one (the comment lies):



            string(const char * p, size_t size) : m_size(size) { //default constructor
            std::cout << "Special ctor" << std::endl;
            data = new char[m_size];
            memcpy(data, p, m_size);
            }


            could be



            string(const char *p, size_t size)
            : data(new char[size]),
            m_size(size)
            {
            std::cout << "Converting ctor" << std::endl;
            std::memcpy(data, p, m_size);
            }


            I prefer one initialiser per line, as I do for assignments - but you're welcome to disagree.



            Constructors - general



            I'd add a constructor that takes a null-terminated string:



            string(const char *p) : string(p, strlen(p)) {}


            Assignment operator



            You can simplify your assignment operator using the copy and swap idiom:



            friend void swap(string &a, string& b)
            {
            using std::swap;
            swap(a.data, b.data);
            swap(a.m_size, b.m_size);
            }

            string& operator=(string str) {
            std::cout << "Copy assignment" << std::endl;
            swap(*this, str);
            return *this;
            }


            Writing it like this means that you don't need to overload it on const string& and string&& - passing by value will construct str using the appropriate constructor. See the linked answer for details.



            Copy-and-swap does have some performance cost to gain that simplicity and safety. See Scott Meyers's article Drawbacks of Implementing Move Assignment in Terms of Swap for a full explanation. If that's a concern, then you should keep the overload for move-assignment.



            Append operator



            You use strcat() to append the second string to the first, but that assumes that both are null-terminated. You need to use memmove() (preferably memcpy) for both:



            string& operator+=(const string & str) {
            size_t new_size = m_size + str.m_size;
            char * new_data = new char[new_size];
            memcpy(new_data, data, m_size);
            memcpy(new_data+m_size, str.data, str.m_size);
            delete[] data;
            m_size = new_size;
            data = new_data;
            return *this;
            }


            The ostream operator also depends on a trailing NUL, so here's a safe replacement:



            friend std::ostream& operator<<(std::ostream& os, const string& s) {
            for (size_t i = 0; i < s.m_size; ++i)
            os << s.data[i];
            return os << "n";
            }


            Addition operator



            Addition is conventionally a const method; I'm not very fond of casts:



            string operator+(const string & str) const {
            string s{*this};
            return s += str;
            }


            You may prefer to add a constructor taking two (or more - you might employ std::initializer_list or C++17 fold expressions) string objects, to save allocating temporary space which is immediately freed and reallocated:



            public:
            string operator+(const string & str) const {
            return {*this, str};
            }

            private:
            string(const string& a, const string& b)
            : data{new char[a.m_size + b.m_size]},
            m_size{a.m_size + b.m_size}
            {
            if (a.data)
            memcpy(data, a.data, a.m_size);
            if (b.data)
            memcpy(data+a.m_size, b.data, b.m_size);
            }


            That concatenation constructor could be more concise if you swap the order of m_size and data in the class.



            Tests



            It's great that you include the tests in the review. It helps show how you expect your class to be used, and it helps identify tests you've missed. In my opinion, all review requests should include tests.



            Here I changed the initializer of a to not include its NUL and that of b to use my additional constructor:



            string a(p1,sizeof t1-1);
            string b(p2);


            I took out the assignment e=d; so that I could test operator+() with the empty string:



            std::cout << "a+e:t" << a+e;
            std::cout << "e+e:t" << e+e;
            std::cout << "e+a:t" << e+a;


            It would be worth implementing the == operator soon, and then you could make the tests self-checking (affecting the exit status, rather than requiring manual inspection).






            share|improve this answer











            $endgroup$



            When I run your code, Valgrind reports 11 allocations and a matching 11 frees, with no leaks. So I don't see why your run indicated a non-freed block.



            I did have to add includes for <cstring> and <iostream> to get it to compile - please don't omit your includes, even if the code is long! Other than that, this code was complete; it's clear and easy to read.



            Some observations:



            Initializers



            g++ -Weffc++ turns up some missing initialisers that are worth addressing. For example:



            string() : data(nullptr) {
            std::cout << "Default ctor" << std::endl;
            }


            Would be better if it also initialized m_size:



            string()
            : data(nullptr),
            m_size(0)
            {
            std::cout << "Default ctor" << std::endl;
            }


            And this one (the comment lies):



            string(const char * p, size_t size) : m_size(size) { //default constructor
            std::cout << "Special ctor" << std::endl;
            data = new char[m_size];
            memcpy(data, p, m_size);
            }


            could be



            string(const char *p, size_t size)
            : data(new char[size]),
            m_size(size)
            {
            std::cout << "Converting ctor" << std::endl;
            std::memcpy(data, p, m_size);
            }


            I prefer one initialiser per line, as I do for assignments - but you're welcome to disagree.



            Constructors - general



            I'd add a constructor that takes a null-terminated string:



            string(const char *p) : string(p, strlen(p)) {}


            Assignment operator



            You can simplify your assignment operator using the copy and swap idiom:



            friend void swap(string &a, string& b)
            {
            using std::swap;
            swap(a.data, b.data);
            swap(a.m_size, b.m_size);
            }

            string& operator=(string str) {
            std::cout << "Copy assignment" << std::endl;
            swap(*this, str);
            return *this;
            }


            Writing it like this means that you don't need to overload it on const string& and string&& - passing by value will construct str using the appropriate constructor. See the linked answer for details.



            Copy-and-swap does have some performance cost to gain that simplicity and safety. See Scott Meyers's article Drawbacks of Implementing Move Assignment in Terms of Swap for a full explanation. If that's a concern, then you should keep the overload for move-assignment.



            Append operator



            You use strcat() to append the second string to the first, but that assumes that both are null-terminated. You need to use memmove() (preferably memcpy) for both:



            string& operator+=(const string & str) {
            size_t new_size = m_size + str.m_size;
            char * new_data = new char[new_size];
            memcpy(new_data, data, m_size);
            memcpy(new_data+m_size, str.data, str.m_size);
            delete[] data;
            m_size = new_size;
            data = new_data;
            return *this;
            }


            The ostream operator also depends on a trailing NUL, so here's a safe replacement:



            friend std::ostream& operator<<(std::ostream& os, const string& s) {
            for (size_t i = 0; i < s.m_size; ++i)
            os << s.data[i];
            return os << "n";
            }


            Addition operator



            Addition is conventionally a const method; I'm not very fond of casts:



            string operator+(const string & str) const {
            string s{*this};
            return s += str;
            }


            You may prefer to add a constructor taking two (or more - you might employ std::initializer_list or C++17 fold expressions) string objects, to save allocating temporary space which is immediately freed and reallocated:



            public:
            string operator+(const string & str) const {
            return {*this, str};
            }

            private:
            string(const string& a, const string& b)
            : data{new char[a.m_size + b.m_size]},
            m_size{a.m_size + b.m_size}
            {
            if (a.data)
            memcpy(data, a.data, a.m_size);
            if (b.data)
            memcpy(data+a.m_size, b.data, b.m_size);
            }


            That concatenation constructor could be more concise if you swap the order of m_size and data in the class.



            Tests



            It's great that you include the tests in the review. It helps show how you expect your class to be used, and it helps identify tests you've missed. In my opinion, all review requests should include tests.



            Here I changed the initializer of a to not include its NUL and that of b to use my additional constructor:



            string a(p1,sizeof t1-1);
            string b(p2);


            I took out the assignment e=d; so that I could test operator+() with the empty string:



            std::cout << "a+e:t" << a+e;
            std::cout << "e+e:t" << e+e;
            std::cout << "e+a:t" << e+a;


            It would be worth implementing the == operator soon, and then you could make the tests self-checking (affecting the exit status, rather than requiring manual inspection).







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Dec 16 '16 at 9:51

























            answered Dec 15 '16 at 18:51









            Toby SpeightToby Speight

            26.1k742118




            26.1k742118












            • $begingroup$
              An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
              $endgroup$
              – Toby Speight
              Jan 19 '17 at 7:18


















            • $begingroup$
              An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
              $endgroup$
              – Toby Speight
              Jan 19 '17 at 7:18
















            $begingroup$
            An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
            $endgroup$
            – Toby Speight
            Jan 19 '17 at 7:18




            $begingroup$
            An additional observation: operator+=() can avoid some allocations. Check if *this is empty (return *this = str;) or if str is empty - return *this;.
            $endgroup$
            – Toby Speight
            Jan 19 '17 at 7:18


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149917%2frule-of-five-and-move-semantics-for-a-string-class%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Webac Holding Inhaltsverzeichnis Geschichte | Organisationsstruktur | Tochterfirmen |...

            What's the meaning of a knight fighting a snail in medieval book illustrations?What is the meaning of a glove...

            Salamanca Inhaltsverzeichnis Lage und Klima | Bevölkerungsentwicklung | Geschichte | Kultur und...