Basic substring algorithms + auxiliary string-generating functionsRecursive and iterative Euclidean...

Can a dragon be stuck looking like a human?

Placing an adverb between a verb and an object?

What is the purpose of easy combat scenarios that don't need resource expenditure?

Dilemma of explaining to interviewer that he is the reason for declining second interview

Chess tournament winning streaks

Notes in a lick that don't fit in the scale associated with the chord

Does fast page mode apply to ROM?

Parsing a string of key-value pairs as a dictionary

Do authors have to be politically correct in article-writing?

Disable the ">" operator in Rstudio linux terminal

Would a National Army of mercenaries be a feasible idea?

What's the most convenient time of year to end the world?

Check if the digits in the number are in increasing sequence in python

It took me a lot of time to make this, pls like. (YouTube Comments #1)

Cryptic with missing capitals

Isn't using the Extrusion Multiplier like cheating?

Word or phrase for showing great skill at something without formal training in it

Is there some relative to Dutch word "kijken" in German?

why a subspace is closed?

Program that converts a number to a letter of the alphabet

What does Cypher mean when he says Neo is "gonna pop"?

How would an AI self awareness kill switch work?

Why did this image turn out darker?

Does Improved Divine Smite trigger when a paladin makes an unarmed strike?



Basic substring algorithms + auxiliary string-generating functions


Recursive and iterative Euclidean algorithmsBinary tree traversal algorithmsString algorithms and localeSorting algorithms with utility functions and iteratorsPrefix- and z-functions in C++ (string algorithms)Minimal length of substring that forms cyclic stringVisualization of sorting algorithmsExtending functions to tuplesGenerating maze for complicated Hunt the Wumpus gameSTL-Style algorithms on tuples













5












$begingroup$


Summary: a bunch of algorithms to find if particular substring exists in the string. Meanwhile there's a lot to learn and I am looking forward to implementing more complex ones, I suppose there's some crucial insight regarding style, syntax, or design, which I might be missing. I have also managed to implement the auxiliary functions which potentially would assist me in the future.



Note: the question I have is how to avoid the boilerplate part in the main function. There should have been a some kind of "framework", which would accept the vector of functions and perform the tests. However, for me it's not quite clear how the architecture should be designed.



#include <algorithm>
#include <cassert>
#include <functional>
#include <iostream>
#include <random>
#include <string_view>
#include <utility>
#include <vector>

template <typename T>
std::ostream &operator<<(std::ostream &o, std::vector<T> const &vector)
{ for (auto const &element: vector) o << element << " "; return o; }

namespace string_generator_utility {

std::string generate_string(std::string::size_type length,
std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
std::string result;
static std::random_device rd;
static std::mt19937_64 mersenne_twister_generator {rd ()};
const auto supremum = alphabet.size() - 1;
std::uniform_int_distribution<std::size_t> range {0, supremum};
for (std::size_t index = 0; index < length; ++index) {
const auto letter = range(mersenne_twister_generator);
result += alphabet[letter];
}
return result;
}

std::vector<std::string> generate_n_strings(std::size_t vector_size, std::size_t string_length,
std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
std::vector<std::string> generated;
generated.reserve(vector_size);
std::generate_n(std::back_inserter(generated), vector_size,
[&]() { return generate_string(string_length, std::move(alphabet)); });
return generated;
}

} // namespace string_generator_utility

namespace algo {

std::vector<std::int64_t> naive_substring(std::string_view haystack, std::string_view needle) {

const auto haystack_size = haystack.size();
const auto needle_size = needle.size();
assert(haystack_size >= needle_size);

std::vector<std::int64_t> result;
for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
std::size_t offset = 0;
for (; offset < needle_size; ++offset) {
if (haystack[index + offset] != needle[offset])
break;
}
if (offset == needle_size)
result.push_back(index);
}
return result;
}

std::vector<std::int64_t> rabin_karp_hash(std::string_view haystack, std::string_view needle) {

const auto haystack_size = haystack.size();
const auto needle_size = needle.size();
assert(haystack_size >= needle_size);

std::vector<std::int64_t> matches;
static const auto hash_function = [](std::string_view::iterator begin, std::string_view::iterator end) {
std::int64_t hash = 5381;
for (; begin != end; ++begin)
hash = ((hash << 5) + hash) + *begin;
return hash;
};

const auto needle_hashed = hash_function(std::begin(needle), std::end(needle));
for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
const auto substring_hash = hash_function
(
std::begin(haystack) + index, std::begin(haystack) + index + needle_size
);
if (substring_hash == needle_hashed)
matches.push_back(index);
}
return matches;
}
} // namespace algo

int main() {
auto vector = string_generator_utility::generate_n_strings(25, 50);
std::cout << "naive substring:n";
for (std::size_t index = 0; index < vector.size(); ++index)
{
std::cout << vector[index] << ": ";
auto shift = algo::naive_substring(vector[index], "ab");
std::cout << shift << "n";
}
std::cout << "rabin-karp-substring:n";
for (std::size_t index = 0; index < vector.size(); ++index)
{
std::cout << vector[index] << ": ";
auto shift = algo::rabin_karp_hash(vector[index], "ab");
std::cout << shift << "n";
}
return 0;
}









share|improve this question











$endgroup$





This question has an open bounty worth +50
reputation from Inter Veridium ending in 7 days.


The current answers do not contain enough detail.





















    5












    $begingroup$


    Summary: a bunch of algorithms to find if particular substring exists in the string. Meanwhile there's a lot to learn and I am looking forward to implementing more complex ones, I suppose there's some crucial insight regarding style, syntax, or design, which I might be missing. I have also managed to implement the auxiliary functions which potentially would assist me in the future.



    Note: the question I have is how to avoid the boilerplate part in the main function. There should have been a some kind of "framework", which would accept the vector of functions and perform the tests. However, for me it's not quite clear how the architecture should be designed.



    #include <algorithm>
    #include <cassert>
    #include <functional>
    #include <iostream>
    #include <random>
    #include <string_view>
    #include <utility>
    #include <vector>

    template <typename T>
    std::ostream &operator<<(std::ostream &o, std::vector<T> const &vector)
    { for (auto const &element: vector) o << element << " "; return o; }

    namespace string_generator_utility {

    std::string generate_string(std::string::size_type length,
    std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
    std::string result;
    static std::random_device rd;
    static std::mt19937_64 mersenne_twister_generator {rd ()};
    const auto supremum = alphabet.size() - 1;
    std::uniform_int_distribution<std::size_t> range {0, supremum};
    for (std::size_t index = 0; index < length; ++index) {
    const auto letter = range(mersenne_twister_generator);
    result += alphabet[letter];
    }
    return result;
    }

    std::vector<std::string> generate_n_strings(std::size_t vector_size, std::size_t string_length,
    std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
    std::vector<std::string> generated;
    generated.reserve(vector_size);
    std::generate_n(std::back_inserter(generated), vector_size,
    [&]() { return generate_string(string_length, std::move(alphabet)); });
    return generated;
    }

    } // namespace string_generator_utility

    namespace algo {

    std::vector<std::int64_t> naive_substring(std::string_view haystack, std::string_view needle) {

    const auto haystack_size = haystack.size();
    const auto needle_size = needle.size();
    assert(haystack_size >= needle_size);

    std::vector<std::int64_t> result;
    for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
    std::size_t offset = 0;
    for (; offset < needle_size; ++offset) {
    if (haystack[index + offset] != needle[offset])
    break;
    }
    if (offset == needle_size)
    result.push_back(index);
    }
    return result;
    }

    std::vector<std::int64_t> rabin_karp_hash(std::string_view haystack, std::string_view needle) {

    const auto haystack_size = haystack.size();
    const auto needle_size = needle.size();
    assert(haystack_size >= needle_size);

    std::vector<std::int64_t> matches;
    static const auto hash_function = [](std::string_view::iterator begin, std::string_view::iterator end) {
    std::int64_t hash = 5381;
    for (; begin != end; ++begin)
    hash = ((hash << 5) + hash) + *begin;
    return hash;
    };

    const auto needle_hashed = hash_function(std::begin(needle), std::end(needle));
    for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
    const auto substring_hash = hash_function
    (
    std::begin(haystack) + index, std::begin(haystack) + index + needle_size
    );
    if (substring_hash == needle_hashed)
    matches.push_back(index);
    }
    return matches;
    }
    } // namespace algo

    int main() {
    auto vector = string_generator_utility::generate_n_strings(25, 50);
    std::cout << "naive substring:n";
    for (std::size_t index = 0; index < vector.size(); ++index)
    {
    std::cout << vector[index] << ": ";
    auto shift = algo::naive_substring(vector[index], "ab");
    std::cout << shift << "n";
    }
    std::cout << "rabin-karp-substring:n";
    for (std::size_t index = 0; index < vector.size(); ++index)
    {
    std::cout << vector[index] << ": ";
    auto shift = algo::rabin_karp_hash(vector[index], "ab");
    std::cout << shift << "n";
    }
    return 0;
    }









    share|improve this question











    $endgroup$





    This question has an open bounty worth +50
    reputation from Inter Veridium ending in 7 days.


    The current answers do not contain enough detail.



















      5












      5








      5





      $begingroup$


      Summary: a bunch of algorithms to find if particular substring exists in the string. Meanwhile there's a lot to learn and I am looking forward to implementing more complex ones, I suppose there's some crucial insight regarding style, syntax, or design, which I might be missing. I have also managed to implement the auxiliary functions which potentially would assist me in the future.



      Note: the question I have is how to avoid the boilerplate part in the main function. There should have been a some kind of "framework", which would accept the vector of functions and perform the tests. However, for me it's not quite clear how the architecture should be designed.



      #include <algorithm>
      #include <cassert>
      #include <functional>
      #include <iostream>
      #include <random>
      #include <string_view>
      #include <utility>
      #include <vector>

      template <typename T>
      std::ostream &operator<<(std::ostream &o, std::vector<T> const &vector)
      { for (auto const &element: vector) o << element << " "; return o; }

      namespace string_generator_utility {

      std::string generate_string(std::string::size_type length,
      std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
      std::string result;
      static std::random_device rd;
      static std::mt19937_64 mersenne_twister_generator {rd ()};
      const auto supremum = alphabet.size() - 1;
      std::uniform_int_distribution<std::size_t> range {0, supremum};
      for (std::size_t index = 0; index < length; ++index) {
      const auto letter = range(mersenne_twister_generator);
      result += alphabet[letter];
      }
      return result;
      }

      std::vector<std::string> generate_n_strings(std::size_t vector_size, std::size_t string_length,
      std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
      std::vector<std::string> generated;
      generated.reserve(vector_size);
      std::generate_n(std::back_inserter(generated), vector_size,
      [&]() { return generate_string(string_length, std::move(alphabet)); });
      return generated;
      }

      } // namespace string_generator_utility

      namespace algo {

      std::vector<std::int64_t> naive_substring(std::string_view haystack, std::string_view needle) {

      const auto haystack_size = haystack.size();
      const auto needle_size = needle.size();
      assert(haystack_size >= needle_size);

      std::vector<std::int64_t> result;
      for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
      std::size_t offset = 0;
      for (; offset < needle_size; ++offset) {
      if (haystack[index + offset] != needle[offset])
      break;
      }
      if (offset == needle_size)
      result.push_back(index);
      }
      return result;
      }

      std::vector<std::int64_t> rabin_karp_hash(std::string_view haystack, std::string_view needle) {

      const auto haystack_size = haystack.size();
      const auto needle_size = needle.size();
      assert(haystack_size >= needle_size);

      std::vector<std::int64_t> matches;
      static const auto hash_function = [](std::string_view::iterator begin, std::string_view::iterator end) {
      std::int64_t hash = 5381;
      for (; begin != end; ++begin)
      hash = ((hash << 5) + hash) + *begin;
      return hash;
      };

      const auto needle_hashed = hash_function(std::begin(needle), std::end(needle));
      for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
      const auto substring_hash = hash_function
      (
      std::begin(haystack) + index, std::begin(haystack) + index + needle_size
      );
      if (substring_hash == needle_hashed)
      matches.push_back(index);
      }
      return matches;
      }
      } // namespace algo

      int main() {
      auto vector = string_generator_utility::generate_n_strings(25, 50);
      std::cout << "naive substring:n";
      for (std::size_t index = 0; index < vector.size(); ++index)
      {
      std::cout << vector[index] << ": ";
      auto shift = algo::naive_substring(vector[index], "ab");
      std::cout << shift << "n";
      }
      std::cout << "rabin-karp-substring:n";
      for (std::size_t index = 0; index < vector.size(); ++index)
      {
      std::cout << vector[index] << ": ";
      auto shift = algo::rabin_karp_hash(vector[index], "ab");
      std::cout << shift << "n";
      }
      return 0;
      }









      share|improve this question











      $endgroup$




      Summary: a bunch of algorithms to find if particular substring exists in the string. Meanwhile there's a lot to learn and I am looking forward to implementing more complex ones, I suppose there's some crucial insight regarding style, syntax, or design, which I might be missing. I have also managed to implement the auxiliary functions which potentially would assist me in the future.



      Note: the question I have is how to avoid the boilerplate part in the main function. There should have been a some kind of "framework", which would accept the vector of functions and perform the tests. However, for me it's not quite clear how the architecture should be designed.



      #include <algorithm>
      #include <cassert>
      #include <functional>
      #include <iostream>
      #include <random>
      #include <string_view>
      #include <utility>
      #include <vector>

      template <typename T>
      std::ostream &operator<<(std::ostream &o, std::vector<T> const &vector)
      { for (auto const &element: vector) o << element << " "; return o; }

      namespace string_generator_utility {

      std::string generate_string(std::string::size_type length,
      std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
      std::string result;
      static std::random_device rd;
      static std::mt19937_64 mersenne_twister_generator {rd ()};
      const auto supremum = alphabet.size() - 1;
      std::uniform_int_distribution<std::size_t> range {0, supremum};
      for (std::size_t index = 0; index < length; ++index) {
      const auto letter = range(mersenne_twister_generator);
      result += alphabet[letter];
      }
      return result;
      }

      std::vector<std::string> generate_n_strings(std::size_t vector_size, std::size_t string_length,
      std::vector<char>&& alphabet = {'a', 'b', 'c', 'd'}) {
      std::vector<std::string> generated;
      generated.reserve(vector_size);
      std::generate_n(std::back_inserter(generated), vector_size,
      [&]() { return generate_string(string_length, std::move(alphabet)); });
      return generated;
      }

      } // namespace string_generator_utility

      namespace algo {

      std::vector<std::int64_t> naive_substring(std::string_view haystack, std::string_view needle) {

      const auto haystack_size = haystack.size();
      const auto needle_size = needle.size();
      assert(haystack_size >= needle_size);

      std::vector<std::int64_t> result;
      for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
      std::size_t offset = 0;
      for (; offset < needle_size; ++offset) {
      if (haystack[index + offset] != needle[offset])
      break;
      }
      if (offset == needle_size)
      result.push_back(index);
      }
      return result;
      }

      std::vector<std::int64_t> rabin_karp_hash(std::string_view haystack, std::string_view needle) {

      const auto haystack_size = haystack.size();
      const auto needle_size = needle.size();
      assert(haystack_size >= needle_size);

      std::vector<std::int64_t> matches;
      static const auto hash_function = [](std::string_view::iterator begin, std::string_view::iterator end) {
      std::int64_t hash = 5381;
      for (; begin != end; ++begin)
      hash = ((hash << 5) + hash) + *begin;
      return hash;
      };

      const auto needle_hashed = hash_function(std::begin(needle), std::end(needle));
      for (std::size_t index = 0; index < haystack_size - needle_size + 1; ++index) {
      const auto substring_hash = hash_function
      (
      std::begin(haystack) + index, std::begin(haystack) + index + needle_size
      );
      if (substring_hash == needle_hashed)
      matches.push_back(index);
      }
      return matches;
      }
      } // namespace algo

      int main() {
      auto vector = string_generator_utility::generate_n_strings(25, 50);
      std::cout << "naive substring:n";
      for (std::size_t index = 0; index < vector.size(); ++index)
      {
      std::cout << vector[index] << ": ";
      auto shift = algo::naive_substring(vector[index], "ab");
      std::cout << shift << "n";
      }
      std::cout << "rabin-karp-substring:n";
      for (std::size_t index = 0; index < vector.size(); ++index)
      {
      std::cout << vector[index] << ": ";
      auto shift = algo::rabin_karp_hash(vector[index], "ab");
      std::cout << shift << "n";
      }
      return 0;
      }






      c++ algorithm c++17






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 10 hours ago







      Inter Veridium

















      asked Feb 7 at 10:16









      Inter VeridiumInter Veridium

      355




      355






      This question has an open bounty worth +50
      reputation from Inter Veridium ending in 7 days.


      The current answers do not contain enough detail.








      This question has an open bounty worth +50
      reputation from Inter Veridium ending in 7 days.


      The current answers do not contain enough detail.
























          2 Answers
          2






          active

          oldest

          votes


















          2












          $begingroup$

          From a readability viewpoint, your operator<< for std::vector declared at the top of your file should be spread out onto multiple lines so that it is easier to read and understand.



          Also, the calculation of ((hash << 5) + hash) should be rewritten as the simpler hash * 33. The compiler will know the best way to multiply a number by 33. This could be a multiply, a shift-and-add like you've coded, or some sequence involving the address calculation instructions.



          Rather than using an assert to verify that the needle is not longer than the haystack (which will only check the condition if the NDEBUG macro is not defined), just check the condition and return an empty collection.



          In rabin_karp_hash you assume that two strings match if their hash values are the same. This is not necessarily the case. It is possible, however unlikely, that two different strings will have the same hash value. This is a hash collision. To ensure that your potential match are identical strings, you still need to compare both strings when the hashes match.



          To simplify the code in main and eliminate the duplication, you can create a class with a virtual compare member. Then derive two classes from it, one for the naive comparison, the other for the Rabin-Karp one. Put your loop into another function, and pass instances of the appropriate derived class to use the specific comparison you want to test.






          share|improve this answer









          $endgroup$





















            1












            $begingroup$


            1. I think the assertion in the rabin_karp_hash is redundant. There is nothing wrong with trying to find a substring with a size exceeding that of a text (although the result is self-evident), besides you can just return an empty vector there.



            2. I'd replace



              for (; offset < needle_size; ++offset) {
              if (haystack[index + offset] != needle[offset])
              break;
              }


              with something like



              while (offset < needle_size && haystack[index + offset] == needle[offset])
              ++offset;







            share|improve this answer











            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "196"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f213023%2fbasic-substring-algorithms-auxiliary-string-generating-functions%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2












              $begingroup$

              From a readability viewpoint, your operator<< for std::vector declared at the top of your file should be spread out onto multiple lines so that it is easier to read and understand.



              Also, the calculation of ((hash << 5) + hash) should be rewritten as the simpler hash * 33. The compiler will know the best way to multiply a number by 33. This could be a multiply, a shift-and-add like you've coded, or some sequence involving the address calculation instructions.



              Rather than using an assert to verify that the needle is not longer than the haystack (which will only check the condition if the NDEBUG macro is not defined), just check the condition and return an empty collection.



              In rabin_karp_hash you assume that two strings match if their hash values are the same. This is not necessarily the case. It is possible, however unlikely, that two different strings will have the same hash value. This is a hash collision. To ensure that your potential match are identical strings, you still need to compare both strings when the hashes match.



              To simplify the code in main and eliminate the duplication, you can create a class with a virtual compare member. Then derive two classes from it, one for the naive comparison, the other for the Rabin-Karp one. Put your loop into another function, and pass instances of the appropriate derived class to use the specific comparison you want to test.






              share|improve this answer









              $endgroup$


















                2












                $begingroup$

                From a readability viewpoint, your operator<< for std::vector declared at the top of your file should be spread out onto multiple lines so that it is easier to read and understand.



                Also, the calculation of ((hash << 5) + hash) should be rewritten as the simpler hash * 33. The compiler will know the best way to multiply a number by 33. This could be a multiply, a shift-and-add like you've coded, or some sequence involving the address calculation instructions.



                Rather than using an assert to verify that the needle is not longer than the haystack (which will only check the condition if the NDEBUG macro is not defined), just check the condition and return an empty collection.



                In rabin_karp_hash you assume that two strings match if their hash values are the same. This is not necessarily the case. It is possible, however unlikely, that two different strings will have the same hash value. This is a hash collision. To ensure that your potential match are identical strings, you still need to compare both strings when the hashes match.



                To simplify the code in main and eliminate the duplication, you can create a class with a virtual compare member. Then derive two classes from it, one for the naive comparison, the other for the Rabin-Karp one. Put your loop into another function, and pass instances of the appropriate derived class to use the specific comparison you want to test.






                share|improve this answer









                $endgroup$
















                  2












                  2








                  2





                  $begingroup$

                  From a readability viewpoint, your operator<< for std::vector declared at the top of your file should be spread out onto multiple lines so that it is easier to read and understand.



                  Also, the calculation of ((hash << 5) + hash) should be rewritten as the simpler hash * 33. The compiler will know the best way to multiply a number by 33. This could be a multiply, a shift-and-add like you've coded, or some sequence involving the address calculation instructions.



                  Rather than using an assert to verify that the needle is not longer than the haystack (which will only check the condition if the NDEBUG macro is not defined), just check the condition and return an empty collection.



                  In rabin_karp_hash you assume that two strings match if their hash values are the same. This is not necessarily the case. It is possible, however unlikely, that two different strings will have the same hash value. This is a hash collision. To ensure that your potential match are identical strings, you still need to compare both strings when the hashes match.



                  To simplify the code in main and eliminate the duplication, you can create a class with a virtual compare member. Then derive two classes from it, one for the naive comparison, the other for the Rabin-Karp one. Put your loop into another function, and pass instances of the appropriate derived class to use the specific comparison you want to test.






                  share|improve this answer









                  $endgroup$



                  From a readability viewpoint, your operator<< for std::vector declared at the top of your file should be spread out onto multiple lines so that it is easier to read and understand.



                  Also, the calculation of ((hash << 5) + hash) should be rewritten as the simpler hash * 33. The compiler will know the best way to multiply a number by 33. This could be a multiply, a shift-and-add like you've coded, or some sequence involving the address calculation instructions.



                  Rather than using an assert to verify that the needle is not longer than the haystack (which will only check the condition if the NDEBUG macro is not defined), just check the condition and return an empty collection.



                  In rabin_karp_hash you assume that two strings match if their hash values are the same. This is not necessarily the case. It is possible, however unlikely, that two different strings will have the same hash value. This is a hash collision. To ensure that your potential match are identical strings, you still need to compare both strings when the hashes match.



                  To simplify the code in main and eliminate the duplication, you can create a class with a virtual compare member. Then derive two classes from it, one for the naive comparison, the other for the Rabin-Karp one. Put your loop into another function, and pass instances of the appropriate derived class to use the specific comparison you want to test.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 10 hours ago









                  1201ProgramAlarm1201ProgramAlarm

                  3,2532923




                  3,2532923

























                      1












                      $begingroup$


                      1. I think the assertion in the rabin_karp_hash is redundant. There is nothing wrong with trying to find a substring with a size exceeding that of a text (although the result is self-evident), besides you can just return an empty vector there.



                      2. I'd replace



                        for (; offset < needle_size; ++offset) {
                        if (haystack[index + offset] != needle[offset])
                        break;
                        }


                        with something like



                        while (offset < needle_size && haystack[index + offset] == needle[offset])
                        ++offset;







                      share|improve this answer











                      $endgroup$


















                        1












                        $begingroup$


                        1. I think the assertion in the rabin_karp_hash is redundant. There is nothing wrong with trying to find a substring with a size exceeding that of a text (although the result is self-evident), besides you can just return an empty vector there.



                        2. I'd replace



                          for (; offset < needle_size; ++offset) {
                          if (haystack[index + offset] != needle[offset])
                          break;
                          }


                          with something like



                          while (offset < needle_size && haystack[index + offset] == needle[offset])
                          ++offset;







                        share|improve this answer











                        $endgroup$
















                          1












                          1








                          1





                          $begingroup$


                          1. I think the assertion in the rabin_karp_hash is redundant. There is nothing wrong with trying to find a substring with a size exceeding that of a text (although the result is self-evident), besides you can just return an empty vector there.



                          2. I'd replace



                            for (; offset < needle_size; ++offset) {
                            if (haystack[index + offset] != needle[offset])
                            break;
                            }


                            with something like



                            while (offset < needle_size && haystack[index + offset] == needle[offset])
                            ++offset;







                          share|improve this answer











                          $endgroup$




                          1. I think the assertion in the rabin_karp_hash is redundant. There is nothing wrong with trying to find a substring with a size exceeding that of a text (although the result is self-evident), besides you can just return an empty vector there.



                          2. I'd replace



                            for (; offset < needle_size; ++offset) {
                            if (haystack[index + offset] != needle[offset])
                            break;
                            }


                            with something like



                            while (offset < needle_size && haystack[index + offset] == needle[offset])
                            ++offset;








                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Feb 8 at 4:24









                          Jamal

                          30.3k11119227




                          30.3k11119227










                          answered Feb 7 at 16:13









                          MikhailMikhail

                          111




                          111






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Code Review Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f213023%2fbasic-substring-algorithms-auxiliary-string-generating-functions%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

                              How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

                              Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...