Function figuring out which close-by value comes from which input valueUpdating list of Dictionaries from...

How do you conduct xenoanthropology after first contact?

XeLaTeX and pdfLaTeX ignore hyphenation

Chess with symmetric move-square

Should I join office cleaning event for free?

If Manufacturer spice model and Datasheet give different values which should I use?

Can a German sentence have two subjects?

Calculus Optimization - Point on graph closest to given point

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?

Concept of linear mappings are confusing me

Why is an old chain unsafe?

The magic money tree problem

Is it possible to do 50 km distance without any previous training?

"which" command doesn't work / path of Safari?

Is there a minimum number of transactions in a block?

Are there any consumables that function as addictive (psychedelic) drugs?

Can I interfere when another PC is about to be attacked?

Example of a relative pronoun

Circuitry of TV splitters

Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).

Why don't electron-positron collisions release infinite energy?

Banach space and Hilbert space topology

What typically incentivizes a professor to change jobs to a lower ranking university?

Why CLRS example on residual networks does not follows its formula?

Function figuring out which close-by value comes from which input value

Updating list of Dictionaries from other such listCompute the box covering on a graph using CPythonReplacing nodal values in a mesh with >1e6 inputs selectively using a polygonImplement LSD radix sort in pythonRed-black tree appears to be slower than std::multimapFind the longest rectilinear path in a matrix with descending entriesUsing dictionary.get() in if conditionFilling values for nested dictionariesCombine data from a list and map based on criteriaOptimization of BFS for finding minimum distances between elements of a grid

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I would like to optimize the below traceback function. This function is part of an elementary step of the program and is called a lot...

import numpy as np



def traceback(tuple_node, tuple_node_alt):

    """

    Compute which value from tuple_node_alt comes from which value from tuple_node.

    Return a dictionnary where the key are the values from tuple_node and the values are the idx at which

    the value may be located in tuple_node_alt.

    """

    # Compute the tolerances based on the node

    tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]



    # Traceback

    distance = dict()

    alt_identification = dict()

    for k, x in enumerate(tuple_node):

        distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]

        alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])



    # Controls the identification and corrects it

    len_values = {key: len(val) for key, val in alt_identification.items()}



    if all([x <= 1 for x in len_values.values()]):

        return alt_identification

    else:

        for key, value in alt_identification.items():

            if len(value) <= 1:

                continue

            else:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if value in other_values:

                   continue

                else:

                    for val in other_values:

                        set1 = set(value)

                        intersec = set1.intersection(set(val))

                        if len(intersec) == 0:

                            continue

                        else:

                            alt_identification[key] = [v for v in value if v not in intersec]



    return alt_identification

The input is composed of 2 tuples which do not need to have the same size. e.g.

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (87, 48, 59, 39)

The goal is to figure out which value from tuple_node_alt may come from which value from tuple_node. If the value from tuple_node_alt is within a 10% margin from a value from tuple_node, it is considered that it comes from this value.

e.g. 39 is within a 10% margin of 40. It comes from 40.
This aprt is perform in the "Traceback" section, where a distance dictionnary is computed and where the idx are computed. With the example above, the output is:

Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}

However, because of a potential overlapping of the tolerance band, 3 scenarios exists:

Scenario 1: each value has been identified to one alternative value. That's the case above.

Scenario 2:

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (42, 55, 54)

55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:

Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}

Scenario 3:

tuple_node = (40, 50, 60)

tuple_node_alt = (42, 55, 59)

This is when the control part comes in play. With this input, alt_identification becomes: Out[66]: {40: [0], 50: [1], 60: [1, 2], 80: []}. However, the 55 can not come from 60 since 50 only has one possibility: 55. Thus, this number being already taken, the correct output which is provided through the control & correct section is:

Out[66]: {40: [0], 50: [1], 60: [2], 80: []}

I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:

# With an input which does not enter the control & correct part.

node = (40, 50, 60, 80)

node_alt = (39, 48, 59, 87)

%timeit traceback(node, node_alt)

22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)



# With an input which need correction

node = (40, 50, 60, 100)

node_alt = (42, 55, 59, 89)

%timeit traceback(node, node_alt)

28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Apr 1 at 16:17

200_success

131k17157422

asked Apr 1 at 11:23

Mathieu

1507

add a comment |

I would like to optimize the below traceback function. This function is part of an elementary step of the program and is called a lot...

import numpy as np



def traceback(tuple_node, tuple_node_alt):

    """

    Compute which value from tuple_node_alt comes from which value from tuple_node.

    Return a dictionnary where the key are the values from tuple_node and the values are the idx at which

    the value may be located in tuple_node_alt.

    """

    # Compute the tolerances based on the node

    tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]



    # Traceback

    distance = dict()

    alt_identification = dict()

    for k, x in enumerate(tuple_node):

        distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]

        alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])



    # Controls the identification and corrects it

    len_values = {key: len(val) for key, val in alt_identification.items()}



    if all([x <= 1 for x in len_values.values()]):

        return alt_identification

    else:

        for key, value in alt_identification.items():

            if len(value) <= 1:

                continue

            else:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if value in other_values:

                   continue

                else:

                    for val in other_values:

                        set1 = set(value)

                        intersec = set1.intersection(set(val))

                        if len(intersec) == 0:

                            continue

                        else:

                            alt_identification[key] = [v for v in value if v not in intersec]



    return alt_identification

The input is composed of 2 tuples which do not need to have the same size. e.g.

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (87, 48, 59, 39)

Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}

However, because of a potential overlapping of the tolerance band, 3 scenarios exists:

Scenario 1: each value has been identified to one alternative value. That's the case above.

Scenario 2:

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (42, 55, 54)

55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:

Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}

Scenario 3:

tuple_node = (40, 50, 60)

tuple_node_alt = (42, 55, 59)

Out[66]: {40: [0], 50: [1], 60: [2], 80: []}

I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:

# With an input which does not enter the control & correct part.

node = (40, 50, 60, 80)

node_alt = (39, 48, 59, 87)

%timeit traceback(node, node_alt)

22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)



# With an input which need correction

node = (40, 50, 60, 100)

node_alt = (42, 55, 59, 89)

%timeit traceback(node, node_alt)

28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Apr 1 at 16:17

200_success

131k17157422

asked Apr 1 at 11:23

Mathieu

1507

add a comment |

I would like to optimize the below traceback function. This function is part of an elementary step of the program and is called a lot...

import numpy as np



def traceback(tuple_node, tuple_node_alt):

    """

    Compute which value from tuple_node_alt comes from which value from tuple_node.

    Return a dictionnary where the key are the values from tuple_node and the values are the idx at which

    the value may be located in tuple_node_alt.

    """

    # Compute the tolerances based on the node

    tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]



    # Traceback

    distance = dict()

    alt_identification = dict()

    for k, x in enumerate(tuple_node):

        distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]

        alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])



    # Controls the identification and corrects it

    len_values = {key: len(val) for key, val in alt_identification.items()}



    if all([x <= 1 for x in len_values.values()]):

        return alt_identification

    else:

        for key, value in alt_identification.items():

            if len(value) <= 1:

                continue

            else:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if value in other_values:

                   continue

                else:

                    for val in other_values:

                        set1 = set(value)

                        intersec = set1.intersection(set(val))

                        if len(intersec) == 0:

                            continue

                        else:

                            alt_identification[key] = [v for v in value if v not in intersec]



    return alt_identification

The input is composed of 2 tuples which do not need to have the same size. e.g.

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (87, 48, 59, 39)

Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}

However, because of a potential overlapping of the tolerance band, 3 scenarios exists:

Scenario 1: each value has been identified to one alternative value. That's the case above.

Scenario 2:

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (42, 55, 54)

55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:

Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}

Scenario 3:

tuple_node = (40, 50, 60)

tuple_node_alt = (42, 55, 59)

Out[66]: {40: [0], 50: [1], 60: [2], 80: []}

I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:

# With an input which does not enter the control & correct part.

node = (40, 50, 60, 80)

node_alt = (39, 48, 59, 87)

%timeit traceback(node, node_alt)

22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)



# With an input which need correction

node = (40, 50, 60, 100)

node_alt = (42, 55, 59, 89)

%timeit traceback(node, node_alt)

28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

edited Apr 1 at 16:17

200_success

131k17157422

asked Apr 1 at 11:23

Mathieu

1507

I would like to optimize the below traceback function. This function is part of an elementary step of the program and is called a lot...

import numpy as np



def traceback(tuple_node, tuple_node_alt):

    """

    Compute which value from tuple_node_alt comes from which value from tuple_node.

    Return a dictionnary where the key are the values from tuple_node and the values are the idx at which

    the value may be located in tuple_node_alt.

    """

    # Compute the tolerances based on the node

    tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]



    # Traceback

    distance = dict()

    alt_identification = dict()

    for k, x in enumerate(tuple_node):

        distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]

        alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])



    # Controls the identification and corrects it

    len_values = {key: len(val) for key, val in alt_identification.items()}



    if all([x <= 1 for x in len_values.values()]):

        return alt_identification

    else:

        for key, value in alt_identification.items():

            if len(value) <= 1:

                continue

            else:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if value in other_values:

                   continue

                else:

                    for val in other_values:

                        set1 = set(value)

                        intersec = set1.intersection(set(val))

                        if len(intersec) == 0:

                            continue

                        else:

                            alt_identification[key] = [v for v in value if v not in intersec]



    return alt_identification

The input is composed of 2 tuples which do not need to have the same size. e.g.

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (87, 48, 59, 39)

Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}

However, because of a potential overlapping of the tolerance band, 3 scenarios exists:

Scenario 1: each value has been identified to one alternative value. That's the case above.

Scenario 2:

tuple_node = (40, 50, 60, 80)

tuple_node_alt = (42, 55, 54)

55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:

Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}

Scenario 3:

tuple_node = (40, 50, 60)

tuple_node_alt = (42, 55, 59)

Out[66]: {40: [0], 50: [1], 60: [2], 80: []}

I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:

# With an input which does not enter the control & correct part.

node = (40, 50, 60, 80)

node_alt = (39, 48, 59, 87)

%timeit traceback(node, node_alt)

22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)



# With an input which need correction

node = (40, 50, 60, 100)

node_alt = (42, 55, 59, 89)

%timeit traceback(node, node_alt)

28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

python performance numpy

edited Apr 1 at 16:17

200_success

131k17157422

asked Apr 1 at 11:23

Mathieu

1507

edited Apr 1 at 16:17

200_success

131k17157422

asked Apr 1 at 11:23

Mathieu

1507

edited Apr 1 at 16:17

200_success

131k17157422

edited Apr 1 at 16:17

200_success

131k17157422

edited Apr 1 at 16:17

200_success

131k17157422

asked Apr 1 at 11:23

Mathieu

1507

asked Apr 1 at 11:23

Mathieu

1507

asked Apr 1 at 11:23

Mathieu

1507

add a comment |

1 Answer
1

active

oldest

votes

A couple of low-hanging fruit inefficiencies:

distance = dict(). The distance[k] value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.

all([ ...list comprehension... ]): You are using list comprehension to build up a list, which you immediately pass to all(...). There is no need to actually create the list. Just use all(...list comprehension...).

set1 = set(value). This is inside a for val in other_values: loop, where value and set1 are not changed. Move the statement out of the for loop, to avoid recreating the same set each iteration.

len_values is only used in the afore mentioned all(...), and only the the values of len_values dictionary are used. As such, the len_value dictionary construction is also unnecessary, and the if statement can be written:
```
if all(len(val) <= 1 for val in alt_identification.values()):
```

Since you are returning alt_identification from the if statement, and after the if...else statement, you can invert the test, and remove one return statement:

if any(len(val) > 1 for val in alt_identification.values()):

    for key, value in alt_identification.items():

        # ... omitted for brevity ...



return alt_identification

Similarly, the two if condition: continue else: could be re-written if not condition:.

Other possible improvements:

tolerances[k] is only used in next for k loop. The list can be removed and the calculations move into the loop.

numpy is only used for a list(np.where([...])[0]) operation, which is fairly obfuscated. A simple list comprehension can be used instead.

The values of alt_identification are of type list, and converted (repeatedly) into a set() in the "control & correct" code. They could be stored as set() to avoid repeated conversions.

Here is my rework of the code, with the changes based on above comments:

def traceback(tuple_node, tuple_node_alt):



    def close_alternates(x):

        tolerance = (0.1 if x <= 100 else 0.2) + 0.00001

        return set( k for k, alt_x in enumerate(tuple_node_alt)

                    if abs(alt_x/x - 1) <= tolerance )



    alt_identification = { x: close_alternates(x) for x in tuple_node }   



    if any(len(val) > 1 for val in alt_identification.values()):

        for key, values in alt_identification.items():

            if len(values) > 1:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if values not in other_values:

                    for other in other_values:

                        alt_identification[key] -= other



    return alt_identification

I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.

edited Apr 2 at 22:26

answered Apr 1 at 17:37

AJNeufeld

6,6441722

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216646%2ffunction-figuring-out-which-close-by-value-comes-from-which-input-value%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

A couple of low-hanging fruit inefficiencies:

distance = dict(). The distance[k] value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.

all([ ...list comprehension... ]): You are using list comprehension to build up a list, which you immediately pass to all(...). There is no need to actually create the list. Just use all(...list comprehension...).

set1 = set(value). This is inside a for val in other_values: loop, where value and set1 are not changed. Move the statement out of the for loop, to avoid recreating the same set each iteration.

len_values is only used in the afore mentioned all(...), and only the the values of len_values dictionary are used. As such, the len_value dictionary construction is also unnecessary, and the if statement can be written:
```
if all(len(val) <= 1 for val in alt_identification.values()):
```

Since you are returning alt_identification from the if statement, and after the if...else statement, you can invert the test, and remove one return statement:

if any(len(val) > 1 for val in alt_identification.values()):

    for key, value in alt_identification.items():

        # ... omitted for brevity ...



return alt_identification

Similarly, the two if condition: continue else: could be re-written if not condition:.

Other possible improvements:

tolerances[k] is only used in next for k loop. The list can be removed and the calculations move into the loop.

numpy is only used for a list(np.where([...])[0]) operation, which is fairly obfuscated. A simple list comprehension can be used instead.

The values of alt_identification are of type list, and converted (repeatedly) into a set() in the "control & correct" code. They could be stored as set() to avoid repeated conversions.

Here is my rework of the code, with the changes based on above comments:

def traceback(tuple_node, tuple_node_alt):



    def close_alternates(x):

        tolerance = (0.1 if x <= 100 else 0.2) + 0.00001

        return set( k for k, alt_x in enumerate(tuple_node_alt)

                    if abs(alt_x/x - 1) <= tolerance )



    alt_identification = { x: close_alternates(x) for x in tuple_node }   



    if any(len(val) > 1 for val in alt_identification.values()):

        for key, values in alt_identification.items():

            if len(values) > 1:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if values not in other_values:

                    for other in other_values:

                        alt_identification[key] -= other



    return alt_identification

I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.

edited Apr 2 at 22:26

answered Apr 1 at 17:37

AJNeufeld

6,6441722

add a comment |

A couple of low-hanging fruit inefficiencies:

distance = dict(). The distance[k] value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.

all([ ...list comprehension... ]): You are using list comprehension to build up a list, which you immediately pass to all(...). There is no need to actually create the list. Just use all(...list comprehension...).

set1 = set(value). This is inside a for val in other_values: loop, where value and set1 are not changed. Move the statement out of the for loop, to avoid recreating the same set each iteration.

len_values is only used in the afore mentioned all(...), and only the the values of len_values dictionary are used. As such, the len_value dictionary construction is also unnecessary, and the if statement can be written:
```
if all(len(val) <= 1 for val in alt_identification.values()):
```

Since you are returning alt_identification from the if statement, and after the if...else statement, you can invert the test, and remove one return statement:

if any(len(val) > 1 for val in alt_identification.values()):

    for key, value in alt_identification.items():

        # ... omitted for brevity ...



return alt_identification

Similarly, the two if condition: continue else: could be re-written if not condition:.

Other possible improvements:

tolerances[k] is only used in next for k loop. The list can be removed and the calculations move into the loop.

numpy is only used for a list(np.where([...])[0]) operation, which is fairly obfuscated. A simple list comprehension can be used instead.

The values of alt_identification are of type list, and converted (repeatedly) into a set() in the "control & correct" code. They could be stored as set() to avoid repeated conversions.

Here is my rework of the code, with the changes based on above comments:

def traceback(tuple_node, tuple_node_alt):



    def close_alternates(x):

        tolerance = (0.1 if x <= 100 else 0.2) + 0.00001

        return set( k for k, alt_x in enumerate(tuple_node_alt)

                    if abs(alt_x/x - 1) <= tolerance )



    alt_identification = { x: close_alternates(x) for x in tuple_node }   



    if any(len(val) > 1 for val in alt_identification.values()):

        for key, values in alt_identification.items():

            if len(values) > 1:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if values not in other_values:

                    for other in other_values:

                        alt_identification[key] -= other



    return alt_identification

I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.

edited Apr 2 at 22:26

answered Apr 1 at 17:37

AJNeufeld

6,6441722

add a comment |

A couple of low-hanging fruit inefficiencies:

distance = dict(). The distance[k] value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.

all([ ...list comprehension... ]): You are using list comprehension to build up a list, which you immediately pass to all(...). There is no need to actually create the list. Just use all(...list comprehension...).

set1 = set(value). This is inside a for val in other_values: loop, where value and set1 are not changed. Move the statement out of the for loop, to avoid recreating the same set each iteration.

len_values is only used in the afore mentioned all(...), and only the the values of len_values dictionary are used. As such, the len_value dictionary construction is also unnecessary, and the if statement can be written:
```
if all(len(val) <= 1 for val in alt_identification.values()):
```

Since you are returning alt_identification from the if statement, and after the if...else statement, you can invert the test, and remove one return statement:

if any(len(val) > 1 for val in alt_identification.values()):

    for key, value in alt_identification.items():

        # ... omitted for brevity ...



return alt_identification

Similarly, the two if condition: continue else: could be re-written if not condition:.

Other possible improvements:

tolerances[k] is only used in next for k loop. The list can be removed and the calculations move into the loop.

numpy is only used for a list(np.where([...])[0]) operation, which is fairly obfuscated. A simple list comprehension can be used instead.

The values of alt_identification are of type list, and converted (repeatedly) into a set() in the "control & correct" code. They could be stored as set() to avoid repeated conversions.

Here is my rework of the code, with the changes based on above comments:

def traceback(tuple_node, tuple_node_alt):



    def close_alternates(x):

        tolerance = (0.1 if x <= 100 else 0.2) + 0.00001

        return set( k for k, alt_x in enumerate(tuple_node_alt)

                    if abs(alt_x/x - 1) <= tolerance )



    alt_identification = { x: close_alternates(x) for x in tuple_node }   



    if any(len(val) > 1 for val in alt_identification.values()):

        for key, values in alt_identification.items():

            if len(values) > 1:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if values not in other_values:

                    for other in other_values:

                        alt_identification[key] -= other



    return alt_identification

I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.

edited Apr 2 at 22:26

answered Apr 1 at 17:37

AJNeufeld

6,6441722

A couple of low-hanging fruit inefficiencies:

distance = dict(). The distance[k] value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.

all([ ...list comprehension... ]): You are using list comprehension to build up a list, which you immediately pass to all(...). There is no need to actually create the list. Just use all(...list comprehension...).

set1 = set(value). This is inside a for val in other_values: loop, where value and set1 are not changed. Move the statement out of the for loop, to avoid recreating the same set each iteration.

len_values is only used in the afore mentioned all(...), and only the the values of len_values dictionary are used. As such, the len_value dictionary construction is also unnecessary, and the if statement can be written:
```
if all(len(val) <= 1 for val in alt_identification.values()):
```

Since you are returning alt_identification from the if statement, and after the if...else statement, you can invert the test, and remove one return statement:

if any(len(val) > 1 for val in alt_identification.values()):

    for key, value in alt_identification.items():

        # ... omitted for brevity ...



return alt_identification

Similarly, the two if condition: continue else: could be re-written if not condition:.

Other possible improvements:

tolerances[k] is only used in next for k loop. The list can be removed and the calculations move into the loop.

numpy is only used for a list(np.where([...])[0]) operation, which is fairly obfuscated. A simple list comprehension can be used instead.

The values of alt_identification are of type list, and converted (repeatedly) into a set() in the "control & correct" code. They could be stored as set() to avoid repeated conversions.

Here is my rework of the code, with the changes based on above comments:

def traceback(tuple_node, tuple_node_alt):



    def close_alternates(x):

        tolerance = (0.1 if x <= 100 else 0.2) + 0.00001

        return set( k for k, alt_x in enumerate(tuple_node_alt)

                    if abs(alt_x/x - 1) <= tolerance )



    alt_identification = { x: close_alternates(x) for x in tuple_node }   



    if any(len(val) > 1 for val in alt_identification.values()):

        for key, values in alt_identification.items():

            if len(values) > 1:

                other_values = [val for k, val in alt_identification.items() if k != key]

                if values not in other_values:

                    for other in other_values:

                        alt_identification[key] -= other



    return alt_identification

I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.

edited Apr 2 at 22:26

answered Apr 1 at 17:37

AJNeufeld

6,6441722

edited Apr 2 at 22:26

answered Apr 1 at 17:37

AJNeufeld

6,6441722

answered Apr 1 at 17:37

AJNeufeld

6,6441722

answered Apr 1 at 17:37

AJNeufeld

6,6441722

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

nEiIRK4nn08HlohKlx7DdzhN,4LOeLl,pL0R9gj7qe,fNuKNEJa,kVI QT,GijW1r mMkmAZXrNR

搜尋此網誌

Ggthjy