Verify two numbers from a list will add up to a number in the same list using PythonPrime factorization of a...

What is GPS' 19 year rollover and does it present a cybersecurity issue?

What does "enim et" mean?

declaring a variable twice in IIFE

Infinite past with a beginning?

"which" command doesn't work / path of Safari?

Should I join an office cleaning event for free?

Is it legal to have the "// (c) 2019 John Smith" header in all files when there are hundreds of contributors?

Why Is Death Allowed In the Matrix?

Extreme, but not acceptable situation and I can't start the work tomorrow morning

XeLaTeX and pdfLaTeX ignore hyphenation

Simulate Bitwise Cyclic Tag

Can town administrative "code" overule state laws like those forbidding trespassing?

Prevent a directory in /tmp from being deleted

What would happen to a modern skyscraper if it rains micro blackholes?

How can bays and straits be determined in a procedurally generated map?

Chess with symmetric move-square

Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).

What is the command to reset a PC without deleting any files

Patience, young "Padovan"

How do I create uniquely male characters?

Is there a familial term for apples and pears?

Are tax years 2016 & 2017 back taxes deductible for tax year 2018?

How do you conduct xenoanthropology after first contact?

Copenhagen passport control - US citizen



Verify two numbers from a list will add up to a number in the same list using Python


Prime factorization of a numberGenerator to Tuple to List?Calculating parsimonies using Python; performance is lackingDynamic Programming for printing additive numbers up to digits nProject Euler: #36 Binary and Base 10 PalindromesFinding all divisors of an integerGenerate set of random numbers and remove lowestFind two distinct subsets of a list that add up to the same valueFirstDuplicate FinderSearch father (x) of the number D(x) and measure time and memory consumed






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







2












$begingroup$


I am trying to solve this question:




  • Given a list of integers, verifies if any two numbers from this list add up to another number in the same list.


I came up with:



def func(l):
sum_set = set() # using a set to store sums of two numbers.
for index in range(len(l)-1):
counter = 1
while counter + index < len(l):
sum = l[index] + l[index+counter]
sum_set.add(sum)
counter += 1
for number in l:
if number in sum_set:
return True
return False

print(func([1,2,3,4]))
print(func([1,2,1,4]))
print(func([1,1,1]))
print(func([1]))
print(func([1,2,5,9]))
print(func([2,1,1]))
print(func([1,-1,0]))


From the tests I ran above, my function is working. But there is at least one way I can think of my approach is lacking of:




  • I should filter the original list to get rid of duplicate numbers. If a number appears more than 2 times, its rest of appearance should be disregarded. E.g. l = [1,1,1,1,2] should be optimised to l = [1,1,2]


Is there any other aspect that I can improve my code snippet on to make it more Pythonic / more efficient?










share|improve this question









$endgroup$



















    2












    $begingroup$


    I am trying to solve this question:




    • Given a list of integers, verifies if any two numbers from this list add up to another number in the same list.


    I came up with:



    def func(l):
    sum_set = set() # using a set to store sums of two numbers.
    for index in range(len(l)-1):
    counter = 1
    while counter + index < len(l):
    sum = l[index] + l[index+counter]
    sum_set.add(sum)
    counter += 1
    for number in l:
    if number in sum_set:
    return True
    return False

    print(func([1,2,3,4]))
    print(func([1,2,1,4]))
    print(func([1,1,1]))
    print(func([1]))
    print(func([1,2,5,9]))
    print(func([2,1,1]))
    print(func([1,-1,0]))


    From the tests I ran above, my function is working. But there is at least one way I can think of my approach is lacking of:




    • I should filter the original list to get rid of duplicate numbers. If a number appears more than 2 times, its rest of appearance should be disregarded. E.g. l = [1,1,1,1,2] should be optimised to l = [1,1,2]


    Is there any other aspect that I can improve my code snippet on to make it more Pythonic / more efficient?










    share|improve this question









    $endgroup$















      2












      2








      2





      $begingroup$


      I am trying to solve this question:




      • Given a list of integers, verifies if any two numbers from this list add up to another number in the same list.


      I came up with:



      def func(l):
      sum_set = set() # using a set to store sums of two numbers.
      for index in range(len(l)-1):
      counter = 1
      while counter + index < len(l):
      sum = l[index] + l[index+counter]
      sum_set.add(sum)
      counter += 1
      for number in l:
      if number in sum_set:
      return True
      return False

      print(func([1,2,3,4]))
      print(func([1,2,1,4]))
      print(func([1,1,1]))
      print(func([1]))
      print(func([1,2,5,9]))
      print(func([2,1,1]))
      print(func([1,-1,0]))


      From the tests I ran above, my function is working. But there is at least one way I can think of my approach is lacking of:




      • I should filter the original list to get rid of duplicate numbers. If a number appears more than 2 times, its rest of appearance should be disregarded. E.g. l = [1,1,1,1,2] should be optimised to l = [1,1,2]


      Is there any other aspect that I can improve my code snippet on to make it more Pythonic / more efficient?










      share|improve this question









      $endgroup$




      I am trying to solve this question:




      • Given a list of integers, verifies if any two numbers from this list add up to another number in the same list.


      I came up with:



      def func(l):
      sum_set = set() # using a set to store sums of two numbers.
      for index in range(len(l)-1):
      counter = 1
      while counter + index < len(l):
      sum = l[index] + l[index+counter]
      sum_set.add(sum)
      counter += 1
      for number in l:
      if number in sum_set:
      return True
      return False

      print(func([1,2,3,4]))
      print(func([1,2,1,4]))
      print(func([1,1,1]))
      print(func([1]))
      print(func([1,2,5,9]))
      print(func([2,1,1]))
      print(func([1,-1,0]))


      From the tests I ran above, my function is working. But there is at least one way I can think of my approach is lacking of:




      • I should filter the original list to get rid of duplicate numbers. If a number appears more than 2 times, its rest of appearance should be disregarded. E.g. l = [1,1,1,1,2] should be optimised to l = [1,1,2]


      Is there any other aspect that I can improve my code snippet on to make it more Pythonic / more efficient?







      python






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 2 at 5:45









      Yu ZhangYu Zhang

      314312




      314312






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$

          The thing to be concerned with when writing functions like this is the size of the data it's going to be processing. If you're only ever going to be dealing with small sets of data (less than say 100 items) then there really isn't an issue with the way this is written. At the scale of maybe tens of thousands of items this starts to be a problem.



          I'm not a data scientist or anything, I'm just a web developer and I very rarely delve into trying to optimise code, so any advice I'm giving is based on my personal experience + uni, and doesn't utilize math libraries in Python like numpy which you might want to look into depending on what this code is for. Another answerer could likely help more with this specific circumstance, my answer is going to be a bit more general; assuming you're going to be dealing with large amounts of data (or otherwise prematurely optimising is sometimes considered the root of all evil):





          EDIT: I just noticed that one of your examples included negative numbers, so the example code below will need to be adapted with that in mind since it affects the movement of i and j; sorry about that, but the code provided can still be run as an example with positive numbers.



          Ignoring data types used for the most part & how pythonic the code is, I see probably two areas for improvement. The first I'll touch on at the end in a code example & may depend a bit on your particular data-set as to how helpful it will be, but presently, the code has no chance of finishing early (assuming random inputs that'd give you an substantial speedup especially if your data is likely to find a match in the early items of the list).



          However, the primary areas for improvement can be determined through analysing your code via Big-O / Big(O) notation. I'd suggest a read of the concept if you're unfamiliar, but according to Wikipedia:




          big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows




          Given the following section of your code:



          for index in range(len(l)-1):  # (1) 
          ...
          while counter + index < len(l): # (2)
          ...
          sum_set.add(sum) # (3)
          ...
          for number in l: # (4)


          If we look at, for instance, a 100,000 element array containing the numbers between 1 and 100,000, if we run your code as it's written, we will end up running the sum_set.add(sum) statement (3) about 4.99 billion times. At just 10,000 elements, on my machine, the code as written takes multiple seconds to finish.



          Looking at (1), we see this statement runs through all elements in the list, therefore O(N); the time taken for the outer loop depends on a linear relationship to the input, e.g. O(N) means an input array of 200 elements, ignoring any constant time overhead, should take ~roughly~ 200x longer than an array with 1 element. Line (2) passes through N-1 elements on the first loop, then N-2, ... finally 1 item in last loop; averaging out at half as many loops as the outer index, but is still O(N) as it's linearly related to the amount of items included in the list. As you have an O(N) loop within an O(N), this gives it the overall O(N^2) performance.



          (4) is tricky to estimate, as it depends on the passed in data. Iterating through the list l is O(N) again, and if we assume worst case, each element in sum_set could be unique, i.e. if the passed in array was something like [1, 11, 111, ...], which would mean there are ~N^2 terms in sum_set, actually causing this loop to degrade to O(N^3) performance. Best case, the sum_set is very small, but even assuming only 1 item, that would still be O(N) as we need to touch each element in l. Additionally, sum_set could potentially become very large, causing the loop not only to be expensive in time but also possibly memory (although as you are using a set there aren't going to be any duplicates, so it will totally depend on the input data. E.g. 100,000 elements, but the values range between 0 & 100, so sum_set ranges between 0 & 200).



          I'd say your suggestion of pre-filtering to remove duplicates is a good idea, ideally something O(N) like the following (though there are likely more optimal approaches):



          >>> input_list = [1,1,1,1,2,2,2,3,3,3]
          >>> filtering_dict = collections.defaultdict(int)
          >>> for item in input_list:
          ... filtering_dict[item] += 1
          ...
          >>> newList = []
          >>> for key, value in filtering_dict.items():
          ... newList.append(key)
          ... if value > 1:
          ... newList.append(key)
          ...
          >>> newList
          ... [1, 1, 2, 2, 3, 3]


          I'd then try and take advantage of sorting the array using an O(Nlog(N)) sort like Mergesort / Quicksort, or depending on your data an O(N) sort like Counting Sort. If you know your data is going to be ordered, you can skip this step. With sorted data, we don't have to use the sum_set set; we can instead pick an index in the array & determine whether it is the total of two other elements. We know that any index we suspect to be our sum will have to be made up of elements that are lower indexes than it in the list, i.e. [1, 2, 3, 4, 5] -> If we start looking at 3, we know we don't need to consider elements 4 & 5, as they will be larger than 3, so couldn't possibly sum to it. Finally, the halfway point for a number is also relevant, I.e. [1, 3, 5, 7, 9, 11, 99, 117] if we're looking at 99, we first look to add the next lowest index & the first index; however, since 11 < 99/2 we know we won't be able to find a match that adds to 99; on average this should be another speedup assuming the data isn't perfectly uniform.



          Finally, since we aren't pushing results into sum_set & only checking once for each total, this will cause some repetition in our search. However, since we can return immediately upon finding a match, our best/average case just got a lot better.



          def func2(l):
          # l = filter_inputs(l)
          # l.sort()
          for index in range(2, len(l)):
          i = 0
          j = index - 1
          half_val = l[index] / 2;
          while ( i < j and l[i] <= half_val and l[j] >= half_val ):
          if l[index] > l[i] + l[j]:
          i = i + 1
          elif l[index] < l[i] + l[j]:
          j = j - 1
          else:
          print(str(l[i]) + " + " + str(l[j]) + " = " + str(l[index]))
          return True
          return False


          Using timeit, and comparing func & func2, using code like the following:



          from timeit import timeit
          timeit('func2(<TEST ARRAY>)', setup="from __main__ import func2; import random", number=20)
          # Use the function that's been predefined in the python interpreter,
          # pass the array, repeat the test 20 times, and output how many seconds taken

          # All times listed are in seconds for 20 repeats

          # list of all positive odd numbers up to 9,999 == [x for x in range(1,100000) if x % 2 is 1]
          # (absolute worst-case condition for func2, behaves nearly identically to your original function)
          # (this is because i, j, and half_val achieve 0% improvement & every value is checked)
          # func2 # func
          >>> 73.89 >>> 73.86


          # all integers between 1 & 9,999 == [x for x in range(1,10000)]
          # func2 # func
          >>> 0.02 >>> 297.54

          # 9,999 random integers between 1 & 10,000 == [random.randint(0,10000) for x in range (1,10000)]
          # Re-ran this one about 5 times for func2 since it was so quick,
          # with 20 loops its lowest was 0.25 & highest 0.32 seconds taken
          # You'll also need to sort 'l' for this one to work with func2
          # func2 # func
          >>> ~0.3 >>> 312.83



          Again, with a low number of entries in l, the cost of removing duplicates & sorting the array would probably cause my function to run slower than yours. None of these speedups change the fact that the overall operation is worst-case O(N^2); however, they should drastically improve the best/average-case scenarios. Additionally, getting a large average speedup with an O(N^2) operation is huge when it comes to a big dataset, as it will be the limiting factor:



          I.e. 100,000 items:

          Filtering = ~2x O(N) = ~200,000 ops
          Mergesort = O(NLog(N)) = ~1.6 million ops
          Main Loop = 1/2 O(N^2) = ~5 billion ops


          If you can come up with a better way to take advantage of the data such that you can get O(N^2) down to O(Nlog(N)) or similar, I think that'd be key here for optimizing the worst-case scenario.






          share|improve this answer










          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













          • $begingroup$
            Thank you so much for your huge effort. Much appreciated.
            $endgroup$
            – Yu Zhang
            Apr 3 at 10:49












          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216696%2fverify-two-numbers-from-a-list-will-add-up-to-a-number-in-the-same-list-using-py%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$

          The thing to be concerned with when writing functions like this is the size of the data it's going to be processing. If you're only ever going to be dealing with small sets of data (less than say 100 items) then there really isn't an issue with the way this is written. At the scale of maybe tens of thousands of items this starts to be a problem.



          I'm not a data scientist or anything, I'm just a web developer and I very rarely delve into trying to optimise code, so any advice I'm giving is based on my personal experience + uni, and doesn't utilize math libraries in Python like numpy which you might want to look into depending on what this code is for. Another answerer could likely help more with this specific circumstance, my answer is going to be a bit more general; assuming you're going to be dealing with large amounts of data (or otherwise prematurely optimising is sometimes considered the root of all evil):





          EDIT: I just noticed that one of your examples included negative numbers, so the example code below will need to be adapted with that in mind since it affects the movement of i and j; sorry about that, but the code provided can still be run as an example with positive numbers.



          Ignoring data types used for the most part & how pythonic the code is, I see probably two areas for improvement. The first I'll touch on at the end in a code example & may depend a bit on your particular data-set as to how helpful it will be, but presently, the code has no chance of finishing early (assuming random inputs that'd give you an substantial speedup especially if your data is likely to find a match in the early items of the list).



          However, the primary areas for improvement can be determined through analysing your code via Big-O / Big(O) notation. I'd suggest a read of the concept if you're unfamiliar, but according to Wikipedia:




          big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows




          Given the following section of your code:



          for index in range(len(l)-1):  # (1) 
          ...
          while counter + index < len(l): # (2)
          ...
          sum_set.add(sum) # (3)
          ...
          for number in l: # (4)


          If we look at, for instance, a 100,000 element array containing the numbers between 1 and 100,000, if we run your code as it's written, we will end up running the sum_set.add(sum) statement (3) about 4.99 billion times. At just 10,000 elements, on my machine, the code as written takes multiple seconds to finish.



          Looking at (1), we see this statement runs through all elements in the list, therefore O(N); the time taken for the outer loop depends on a linear relationship to the input, e.g. O(N) means an input array of 200 elements, ignoring any constant time overhead, should take ~roughly~ 200x longer than an array with 1 element. Line (2) passes through N-1 elements on the first loop, then N-2, ... finally 1 item in last loop; averaging out at half as many loops as the outer index, but is still O(N) as it's linearly related to the amount of items included in the list. As you have an O(N) loop within an O(N), this gives it the overall O(N^2) performance.



          (4) is tricky to estimate, as it depends on the passed in data. Iterating through the list l is O(N) again, and if we assume worst case, each element in sum_set could be unique, i.e. if the passed in array was something like [1, 11, 111, ...], which would mean there are ~N^2 terms in sum_set, actually causing this loop to degrade to O(N^3) performance. Best case, the sum_set is very small, but even assuming only 1 item, that would still be O(N) as we need to touch each element in l. Additionally, sum_set could potentially become very large, causing the loop not only to be expensive in time but also possibly memory (although as you are using a set there aren't going to be any duplicates, so it will totally depend on the input data. E.g. 100,000 elements, but the values range between 0 & 100, so sum_set ranges between 0 & 200).



          I'd say your suggestion of pre-filtering to remove duplicates is a good idea, ideally something O(N) like the following (though there are likely more optimal approaches):



          >>> input_list = [1,1,1,1,2,2,2,3,3,3]
          >>> filtering_dict = collections.defaultdict(int)
          >>> for item in input_list:
          ... filtering_dict[item] += 1
          ...
          >>> newList = []
          >>> for key, value in filtering_dict.items():
          ... newList.append(key)
          ... if value > 1:
          ... newList.append(key)
          ...
          >>> newList
          ... [1, 1, 2, 2, 3, 3]


          I'd then try and take advantage of sorting the array using an O(Nlog(N)) sort like Mergesort / Quicksort, or depending on your data an O(N) sort like Counting Sort. If you know your data is going to be ordered, you can skip this step. With sorted data, we don't have to use the sum_set set; we can instead pick an index in the array & determine whether it is the total of two other elements. We know that any index we suspect to be our sum will have to be made up of elements that are lower indexes than it in the list, i.e. [1, 2, 3, 4, 5] -> If we start looking at 3, we know we don't need to consider elements 4 & 5, as they will be larger than 3, so couldn't possibly sum to it. Finally, the halfway point for a number is also relevant, I.e. [1, 3, 5, 7, 9, 11, 99, 117] if we're looking at 99, we first look to add the next lowest index & the first index; however, since 11 < 99/2 we know we won't be able to find a match that adds to 99; on average this should be another speedup assuming the data isn't perfectly uniform.



          Finally, since we aren't pushing results into sum_set & only checking once for each total, this will cause some repetition in our search. However, since we can return immediately upon finding a match, our best/average case just got a lot better.



          def func2(l):
          # l = filter_inputs(l)
          # l.sort()
          for index in range(2, len(l)):
          i = 0
          j = index - 1
          half_val = l[index] / 2;
          while ( i < j and l[i] <= half_val and l[j] >= half_val ):
          if l[index] > l[i] + l[j]:
          i = i + 1
          elif l[index] < l[i] + l[j]:
          j = j - 1
          else:
          print(str(l[i]) + " + " + str(l[j]) + " = " + str(l[index]))
          return True
          return False


          Using timeit, and comparing func & func2, using code like the following:



          from timeit import timeit
          timeit('func2(<TEST ARRAY>)', setup="from __main__ import func2; import random", number=20)
          # Use the function that's been predefined in the python interpreter,
          # pass the array, repeat the test 20 times, and output how many seconds taken

          # All times listed are in seconds for 20 repeats

          # list of all positive odd numbers up to 9,999 == [x for x in range(1,100000) if x % 2 is 1]
          # (absolute worst-case condition for func2, behaves nearly identically to your original function)
          # (this is because i, j, and half_val achieve 0% improvement & every value is checked)
          # func2 # func
          >>> 73.89 >>> 73.86


          # all integers between 1 & 9,999 == [x for x in range(1,10000)]
          # func2 # func
          >>> 0.02 >>> 297.54

          # 9,999 random integers between 1 & 10,000 == [random.randint(0,10000) for x in range (1,10000)]
          # Re-ran this one about 5 times for func2 since it was so quick,
          # with 20 loops its lowest was 0.25 & highest 0.32 seconds taken
          # You'll also need to sort 'l' for this one to work with func2
          # func2 # func
          >>> ~0.3 >>> 312.83



          Again, with a low number of entries in l, the cost of removing duplicates & sorting the array would probably cause my function to run slower than yours. None of these speedups change the fact that the overall operation is worst-case O(N^2); however, they should drastically improve the best/average-case scenarios. Additionally, getting a large average speedup with an O(N^2) operation is huge when it comes to a big dataset, as it will be the limiting factor:



          I.e. 100,000 items:

          Filtering = ~2x O(N) = ~200,000 ops
          Mergesort = O(NLog(N)) = ~1.6 million ops
          Main Loop = 1/2 O(N^2) = ~5 billion ops


          If you can come up with a better way to take advantage of the data such that you can get O(N^2) down to O(Nlog(N)) or similar, I think that'd be key here for optimizing the worst-case scenario.






          share|improve this answer










          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













          • $begingroup$
            Thank you so much for your huge effort. Much appreciated.
            $endgroup$
            – Yu Zhang
            Apr 3 at 10:49
















          2












          $begingroup$

          The thing to be concerned with when writing functions like this is the size of the data it's going to be processing. If you're only ever going to be dealing with small sets of data (less than say 100 items) then there really isn't an issue with the way this is written. At the scale of maybe tens of thousands of items this starts to be a problem.



          I'm not a data scientist or anything, I'm just a web developer and I very rarely delve into trying to optimise code, so any advice I'm giving is based on my personal experience + uni, and doesn't utilize math libraries in Python like numpy which you might want to look into depending on what this code is for. Another answerer could likely help more with this specific circumstance, my answer is going to be a bit more general; assuming you're going to be dealing with large amounts of data (or otherwise prematurely optimising is sometimes considered the root of all evil):





          EDIT: I just noticed that one of your examples included negative numbers, so the example code below will need to be adapted with that in mind since it affects the movement of i and j; sorry about that, but the code provided can still be run as an example with positive numbers.



          Ignoring data types used for the most part & how pythonic the code is, I see probably two areas for improvement. The first I'll touch on at the end in a code example & may depend a bit on your particular data-set as to how helpful it will be, but presently, the code has no chance of finishing early (assuming random inputs that'd give you an substantial speedup especially if your data is likely to find a match in the early items of the list).



          However, the primary areas for improvement can be determined through analysing your code via Big-O / Big(O) notation. I'd suggest a read of the concept if you're unfamiliar, but according to Wikipedia:




          big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows




          Given the following section of your code:



          for index in range(len(l)-1):  # (1) 
          ...
          while counter + index < len(l): # (2)
          ...
          sum_set.add(sum) # (3)
          ...
          for number in l: # (4)


          If we look at, for instance, a 100,000 element array containing the numbers between 1 and 100,000, if we run your code as it's written, we will end up running the sum_set.add(sum) statement (3) about 4.99 billion times. At just 10,000 elements, on my machine, the code as written takes multiple seconds to finish.



          Looking at (1), we see this statement runs through all elements in the list, therefore O(N); the time taken for the outer loop depends on a linear relationship to the input, e.g. O(N) means an input array of 200 elements, ignoring any constant time overhead, should take ~roughly~ 200x longer than an array with 1 element. Line (2) passes through N-1 elements on the first loop, then N-2, ... finally 1 item in last loop; averaging out at half as many loops as the outer index, but is still O(N) as it's linearly related to the amount of items included in the list. As you have an O(N) loop within an O(N), this gives it the overall O(N^2) performance.



          (4) is tricky to estimate, as it depends on the passed in data. Iterating through the list l is O(N) again, and if we assume worst case, each element in sum_set could be unique, i.e. if the passed in array was something like [1, 11, 111, ...], which would mean there are ~N^2 terms in sum_set, actually causing this loop to degrade to O(N^3) performance. Best case, the sum_set is very small, but even assuming only 1 item, that would still be O(N) as we need to touch each element in l. Additionally, sum_set could potentially become very large, causing the loop not only to be expensive in time but also possibly memory (although as you are using a set there aren't going to be any duplicates, so it will totally depend on the input data. E.g. 100,000 elements, but the values range between 0 & 100, so sum_set ranges between 0 & 200).



          I'd say your suggestion of pre-filtering to remove duplicates is a good idea, ideally something O(N) like the following (though there are likely more optimal approaches):



          >>> input_list = [1,1,1,1,2,2,2,3,3,3]
          >>> filtering_dict = collections.defaultdict(int)
          >>> for item in input_list:
          ... filtering_dict[item] += 1
          ...
          >>> newList = []
          >>> for key, value in filtering_dict.items():
          ... newList.append(key)
          ... if value > 1:
          ... newList.append(key)
          ...
          >>> newList
          ... [1, 1, 2, 2, 3, 3]


          I'd then try and take advantage of sorting the array using an O(Nlog(N)) sort like Mergesort / Quicksort, or depending on your data an O(N) sort like Counting Sort. If you know your data is going to be ordered, you can skip this step. With sorted data, we don't have to use the sum_set set; we can instead pick an index in the array & determine whether it is the total of two other elements. We know that any index we suspect to be our sum will have to be made up of elements that are lower indexes than it in the list, i.e. [1, 2, 3, 4, 5] -> If we start looking at 3, we know we don't need to consider elements 4 & 5, as they will be larger than 3, so couldn't possibly sum to it. Finally, the halfway point for a number is also relevant, I.e. [1, 3, 5, 7, 9, 11, 99, 117] if we're looking at 99, we first look to add the next lowest index & the first index; however, since 11 < 99/2 we know we won't be able to find a match that adds to 99; on average this should be another speedup assuming the data isn't perfectly uniform.



          Finally, since we aren't pushing results into sum_set & only checking once for each total, this will cause some repetition in our search. However, since we can return immediately upon finding a match, our best/average case just got a lot better.



          def func2(l):
          # l = filter_inputs(l)
          # l.sort()
          for index in range(2, len(l)):
          i = 0
          j = index - 1
          half_val = l[index] / 2;
          while ( i < j and l[i] <= half_val and l[j] >= half_val ):
          if l[index] > l[i] + l[j]:
          i = i + 1
          elif l[index] < l[i] + l[j]:
          j = j - 1
          else:
          print(str(l[i]) + " + " + str(l[j]) + " = " + str(l[index]))
          return True
          return False


          Using timeit, and comparing func & func2, using code like the following:



          from timeit import timeit
          timeit('func2(<TEST ARRAY>)', setup="from __main__ import func2; import random", number=20)
          # Use the function that's been predefined in the python interpreter,
          # pass the array, repeat the test 20 times, and output how many seconds taken

          # All times listed are in seconds for 20 repeats

          # list of all positive odd numbers up to 9,999 == [x for x in range(1,100000) if x % 2 is 1]
          # (absolute worst-case condition for func2, behaves nearly identically to your original function)
          # (this is because i, j, and half_val achieve 0% improvement & every value is checked)
          # func2 # func
          >>> 73.89 >>> 73.86


          # all integers between 1 & 9,999 == [x for x in range(1,10000)]
          # func2 # func
          >>> 0.02 >>> 297.54

          # 9,999 random integers between 1 & 10,000 == [random.randint(0,10000) for x in range (1,10000)]
          # Re-ran this one about 5 times for func2 since it was so quick,
          # with 20 loops its lowest was 0.25 & highest 0.32 seconds taken
          # You'll also need to sort 'l' for this one to work with func2
          # func2 # func
          >>> ~0.3 >>> 312.83



          Again, with a low number of entries in l, the cost of removing duplicates & sorting the array would probably cause my function to run slower than yours. None of these speedups change the fact that the overall operation is worst-case O(N^2); however, they should drastically improve the best/average-case scenarios. Additionally, getting a large average speedup with an O(N^2) operation is huge when it comes to a big dataset, as it will be the limiting factor:



          I.e. 100,000 items:

          Filtering = ~2x O(N) = ~200,000 ops
          Mergesort = O(NLog(N)) = ~1.6 million ops
          Main Loop = 1/2 O(N^2) = ~5 billion ops


          If you can come up with a better way to take advantage of the data such that you can get O(N^2) down to O(Nlog(N)) or similar, I think that'd be key here for optimizing the worst-case scenario.






          share|improve this answer










          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













          • $begingroup$
            Thank you so much for your huge effort. Much appreciated.
            $endgroup$
            – Yu Zhang
            Apr 3 at 10:49














          2












          2








          2





          $begingroup$

          The thing to be concerned with when writing functions like this is the size of the data it's going to be processing. If you're only ever going to be dealing with small sets of data (less than say 100 items) then there really isn't an issue with the way this is written. At the scale of maybe tens of thousands of items this starts to be a problem.



          I'm not a data scientist or anything, I'm just a web developer and I very rarely delve into trying to optimise code, so any advice I'm giving is based on my personal experience + uni, and doesn't utilize math libraries in Python like numpy which you might want to look into depending on what this code is for. Another answerer could likely help more with this specific circumstance, my answer is going to be a bit more general; assuming you're going to be dealing with large amounts of data (or otherwise prematurely optimising is sometimes considered the root of all evil):





          EDIT: I just noticed that one of your examples included negative numbers, so the example code below will need to be adapted with that in mind since it affects the movement of i and j; sorry about that, but the code provided can still be run as an example with positive numbers.



          Ignoring data types used for the most part & how pythonic the code is, I see probably two areas for improvement. The first I'll touch on at the end in a code example & may depend a bit on your particular data-set as to how helpful it will be, but presently, the code has no chance of finishing early (assuming random inputs that'd give you an substantial speedup especially if your data is likely to find a match in the early items of the list).



          However, the primary areas for improvement can be determined through analysing your code via Big-O / Big(O) notation. I'd suggest a read of the concept if you're unfamiliar, but according to Wikipedia:




          big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows




          Given the following section of your code:



          for index in range(len(l)-1):  # (1) 
          ...
          while counter + index < len(l): # (2)
          ...
          sum_set.add(sum) # (3)
          ...
          for number in l: # (4)


          If we look at, for instance, a 100,000 element array containing the numbers between 1 and 100,000, if we run your code as it's written, we will end up running the sum_set.add(sum) statement (3) about 4.99 billion times. At just 10,000 elements, on my machine, the code as written takes multiple seconds to finish.



          Looking at (1), we see this statement runs through all elements in the list, therefore O(N); the time taken for the outer loop depends on a linear relationship to the input, e.g. O(N) means an input array of 200 elements, ignoring any constant time overhead, should take ~roughly~ 200x longer than an array with 1 element. Line (2) passes through N-1 elements on the first loop, then N-2, ... finally 1 item in last loop; averaging out at half as many loops as the outer index, but is still O(N) as it's linearly related to the amount of items included in the list. As you have an O(N) loop within an O(N), this gives it the overall O(N^2) performance.



          (4) is tricky to estimate, as it depends on the passed in data. Iterating through the list l is O(N) again, and if we assume worst case, each element in sum_set could be unique, i.e. if the passed in array was something like [1, 11, 111, ...], which would mean there are ~N^2 terms in sum_set, actually causing this loop to degrade to O(N^3) performance. Best case, the sum_set is very small, but even assuming only 1 item, that would still be O(N) as we need to touch each element in l. Additionally, sum_set could potentially become very large, causing the loop not only to be expensive in time but also possibly memory (although as you are using a set there aren't going to be any duplicates, so it will totally depend on the input data. E.g. 100,000 elements, but the values range between 0 & 100, so sum_set ranges between 0 & 200).



          I'd say your suggestion of pre-filtering to remove duplicates is a good idea, ideally something O(N) like the following (though there are likely more optimal approaches):



          >>> input_list = [1,1,1,1,2,2,2,3,3,3]
          >>> filtering_dict = collections.defaultdict(int)
          >>> for item in input_list:
          ... filtering_dict[item] += 1
          ...
          >>> newList = []
          >>> for key, value in filtering_dict.items():
          ... newList.append(key)
          ... if value > 1:
          ... newList.append(key)
          ...
          >>> newList
          ... [1, 1, 2, 2, 3, 3]


          I'd then try and take advantage of sorting the array using an O(Nlog(N)) sort like Mergesort / Quicksort, or depending on your data an O(N) sort like Counting Sort. If you know your data is going to be ordered, you can skip this step. With sorted data, we don't have to use the sum_set set; we can instead pick an index in the array & determine whether it is the total of two other elements. We know that any index we suspect to be our sum will have to be made up of elements that are lower indexes than it in the list, i.e. [1, 2, 3, 4, 5] -> If we start looking at 3, we know we don't need to consider elements 4 & 5, as they will be larger than 3, so couldn't possibly sum to it. Finally, the halfway point for a number is also relevant, I.e. [1, 3, 5, 7, 9, 11, 99, 117] if we're looking at 99, we first look to add the next lowest index & the first index; however, since 11 < 99/2 we know we won't be able to find a match that adds to 99; on average this should be another speedup assuming the data isn't perfectly uniform.



          Finally, since we aren't pushing results into sum_set & only checking once for each total, this will cause some repetition in our search. However, since we can return immediately upon finding a match, our best/average case just got a lot better.



          def func2(l):
          # l = filter_inputs(l)
          # l.sort()
          for index in range(2, len(l)):
          i = 0
          j = index - 1
          half_val = l[index] / 2;
          while ( i < j and l[i] <= half_val and l[j] >= half_val ):
          if l[index] > l[i] + l[j]:
          i = i + 1
          elif l[index] < l[i] + l[j]:
          j = j - 1
          else:
          print(str(l[i]) + " + " + str(l[j]) + " = " + str(l[index]))
          return True
          return False


          Using timeit, and comparing func & func2, using code like the following:



          from timeit import timeit
          timeit('func2(<TEST ARRAY>)', setup="from __main__ import func2; import random", number=20)
          # Use the function that's been predefined in the python interpreter,
          # pass the array, repeat the test 20 times, and output how many seconds taken

          # All times listed are in seconds for 20 repeats

          # list of all positive odd numbers up to 9,999 == [x for x in range(1,100000) if x % 2 is 1]
          # (absolute worst-case condition for func2, behaves nearly identically to your original function)
          # (this is because i, j, and half_val achieve 0% improvement & every value is checked)
          # func2 # func
          >>> 73.89 >>> 73.86


          # all integers between 1 & 9,999 == [x for x in range(1,10000)]
          # func2 # func
          >>> 0.02 >>> 297.54

          # 9,999 random integers between 1 & 10,000 == [random.randint(0,10000) for x in range (1,10000)]
          # Re-ran this one about 5 times for func2 since it was so quick,
          # with 20 loops its lowest was 0.25 & highest 0.32 seconds taken
          # You'll also need to sort 'l' for this one to work with func2
          # func2 # func
          >>> ~0.3 >>> 312.83



          Again, with a low number of entries in l, the cost of removing duplicates & sorting the array would probably cause my function to run slower than yours. None of these speedups change the fact that the overall operation is worst-case O(N^2); however, they should drastically improve the best/average-case scenarios. Additionally, getting a large average speedup with an O(N^2) operation is huge when it comes to a big dataset, as it will be the limiting factor:



          I.e. 100,000 items:

          Filtering = ~2x O(N) = ~200,000 ops
          Mergesort = O(NLog(N)) = ~1.6 million ops
          Main Loop = 1/2 O(N^2) = ~5 billion ops


          If you can come up with a better way to take advantage of the data such that you can get O(N^2) down to O(Nlog(N)) or similar, I think that'd be key here for optimizing the worst-case scenario.






          share|improve this answer










          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          The thing to be concerned with when writing functions like this is the size of the data it's going to be processing. If you're only ever going to be dealing with small sets of data (less than say 100 items) then there really isn't an issue with the way this is written. At the scale of maybe tens of thousands of items this starts to be a problem.



          I'm not a data scientist or anything, I'm just a web developer and I very rarely delve into trying to optimise code, so any advice I'm giving is based on my personal experience + uni, and doesn't utilize math libraries in Python like numpy which you might want to look into depending on what this code is for. Another answerer could likely help more with this specific circumstance, my answer is going to be a bit more general; assuming you're going to be dealing with large amounts of data (or otherwise prematurely optimising is sometimes considered the root of all evil):





          EDIT: I just noticed that one of your examples included negative numbers, so the example code below will need to be adapted with that in mind since it affects the movement of i and j; sorry about that, but the code provided can still be run as an example with positive numbers.



          Ignoring data types used for the most part & how pythonic the code is, I see probably two areas for improvement. The first I'll touch on at the end in a code example & may depend a bit on your particular data-set as to how helpful it will be, but presently, the code has no chance of finishing early (assuming random inputs that'd give you an substantial speedup especially if your data is likely to find a match in the early items of the list).



          However, the primary areas for improvement can be determined through analysing your code via Big-O / Big(O) notation. I'd suggest a read of the concept if you're unfamiliar, but according to Wikipedia:




          big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows




          Given the following section of your code:



          for index in range(len(l)-1):  # (1) 
          ...
          while counter + index < len(l): # (2)
          ...
          sum_set.add(sum) # (3)
          ...
          for number in l: # (4)


          If we look at, for instance, a 100,000 element array containing the numbers between 1 and 100,000, if we run your code as it's written, we will end up running the sum_set.add(sum) statement (3) about 4.99 billion times. At just 10,000 elements, on my machine, the code as written takes multiple seconds to finish.



          Looking at (1), we see this statement runs through all elements in the list, therefore O(N); the time taken for the outer loop depends on a linear relationship to the input, e.g. O(N) means an input array of 200 elements, ignoring any constant time overhead, should take ~roughly~ 200x longer than an array with 1 element. Line (2) passes through N-1 elements on the first loop, then N-2, ... finally 1 item in last loop; averaging out at half as many loops as the outer index, but is still O(N) as it's linearly related to the amount of items included in the list. As you have an O(N) loop within an O(N), this gives it the overall O(N^2) performance.



          (4) is tricky to estimate, as it depends on the passed in data. Iterating through the list l is O(N) again, and if we assume worst case, each element in sum_set could be unique, i.e. if the passed in array was something like [1, 11, 111, ...], which would mean there are ~N^2 terms in sum_set, actually causing this loop to degrade to O(N^3) performance. Best case, the sum_set is very small, but even assuming only 1 item, that would still be O(N) as we need to touch each element in l. Additionally, sum_set could potentially become very large, causing the loop not only to be expensive in time but also possibly memory (although as you are using a set there aren't going to be any duplicates, so it will totally depend on the input data. E.g. 100,000 elements, but the values range between 0 & 100, so sum_set ranges between 0 & 200).



          I'd say your suggestion of pre-filtering to remove duplicates is a good idea, ideally something O(N) like the following (though there are likely more optimal approaches):



          >>> input_list = [1,1,1,1,2,2,2,3,3,3]
          >>> filtering_dict = collections.defaultdict(int)
          >>> for item in input_list:
          ... filtering_dict[item] += 1
          ...
          >>> newList = []
          >>> for key, value in filtering_dict.items():
          ... newList.append(key)
          ... if value > 1:
          ... newList.append(key)
          ...
          >>> newList
          ... [1, 1, 2, 2, 3, 3]


          I'd then try and take advantage of sorting the array using an O(Nlog(N)) sort like Mergesort / Quicksort, or depending on your data an O(N) sort like Counting Sort. If you know your data is going to be ordered, you can skip this step. With sorted data, we don't have to use the sum_set set; we can instead pick an index in the array & determine whether it is the total of two other elements. We know that any index we suspect to be our sum will have to be made up of elements that are lower indexes than it in the list, i.e. [1, 2, 3, 4, 5] -> If we start looking at 3, we know we don't need to consider elements 4 & 5, as they will be larger than 3, so couldn't possibly sum to it. Finally, the halfway point for a number is also relevant, I.e. [1, 3, 5, 7, 9, 11, 99, 117] if we're looking at 99, we first look to add the next lowest index & the first index; however, since 11 < 99/2 we know we won't be able to find a match that adds to 99; on average this should be another speedup assuming the data isn't perfectly uniform.



          Finally, since we aren't pushing results into sum_set & only checking once for each total, this will cause some repetition in our search. However, since we can return immediately upon finding a match, our best/average case just got a lot better.



          def func2(l):
          # l = filter_inputs(l)
          # l.sort()
          for index in range(2, len(l)):
          i = 0
          j = index - 1
          half_val = l[index] / 2;
          while ( i < j and l[i] <= half_val and l[j] >= half_val ):
          if l[index] > l[i] + l[j]:
          i = i + 1
          elif l[index] < l[i] + l[j]:
          j = j - 1
          else:
          print(str(l[i]) + " + " + str(l[j]) + " = " + str(l[index]))
          return True
          return False


          Using timeit, and comparing func & func2, using code like the following:



          from timeit import timeit
          timeit('func2(<TEST ARRAY>)', setup="from __main__ import func2; import random", number=20)
          # Use the function that's been predefined in the python interpreter,
          # pass the array, repeat the test 20 times, and output how many seconds taken

          # All times listed are in seconds for 20 repeats

          # list of all positive odd numbers up to 9,999 == [x for x in range(1,100000) if x % 2 is 1]
          # (absolute worst-case condition for func2, behaves nearly identically to your original function)
          # (this is because i, j, and half_val achieve 0% improvement & every value is checked)
          # func2 # func
          >>> 73.89 >>> 73.86


          # all integers between 1 & 9,999 == [x for x in range(1,10000)]
          # func2 # func
          >>> 0.02 >>> 297.54

          # 9,999 random integers between 1 & 10,000 == [random.randint(0,10000) for x in range (1,10000)]
          # Re-ran this one about 5 times for func2 since it was so quick,
          # with 20 loops its lowest was 0.25 & highest 0.32 seconds taken
          # You'll also need to sort 'l' for this one to work with func2
          # func2 # func
          >>> ~0.3 >>> 312.83



          Again, with a low number of entries in l, the cost of removing duplicates & sorting the array would probably cause my function to run slower than yours. None of these speedups change the fact that the overall operation is worst-case O(N^2); however, they should drastically improve the best/average-case scenarios. Additionally, getting a large average speedup with an O(N^2) operation is huge when it comes to a big dataset, as it will be the limiting factor:



          I.e. 100,000 items:

          Filtering = ~2x O(N) = ~200,000 ops
          Mergesort = O(NLog(N)) = ~1.6 million ops
          Main Loop = 1/2 O(N^2) = ~5 billion ops


          If you can come up with a better way to take advantage of the data such that you can get O(N^2) down to O(Nlog(N)) or similar, I think that'd be key here for optimizing the worst-case scenario.







          share|improve this answer










          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer








          edited Apr 2 at 16:13





















          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered Apr 2 at 15:31









          LovethenakedgunLovethenakedgun

          363




          363




          New contributor




          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Lovethenakedgun is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.












          • $begingroup$
            Thank you so much for your huge effort. Much appreciated.
            $endgroup$
            – Yu Zhang
            Apr 3 at 10:49


















          • $begingroup$
            Thank you so much for your huge effort. Much appreciated.
            $endgroup$
            – Yu Zhang
            Apr 3 at 10:49
















          $begingroup$
          Thank you so much for your huge effort. Much appreciated.
          $endgroup$
          – Yu Zhang
          Apr 3 at 10:49




          $begingroup$
          Thank you so much for your huge effort. Much appreciated.
          $endgroup$
          – Yu Zhang
          Apr 3 at 10:49


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216696%2fverify-two-numbers-from-a-list-will-add-up-to-a-number-in-the-same-list-using-py%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

          How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

          Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...