Function figuring out which close-by value comes from which input valueUpdating list of Dictionaries from...
How do you conduct xenoanthropology after first contact?
XeLaTeX and pdfLaTeX ignore hyphenation
Chess with symmetric move-square
Should I join office cleaning event for free?
If Manufacturer spice model and Datasheet give different values which should I use?
Can a German sentence have two subjects?
Calculus Optimization - Point on graph closest to given point
How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)
Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?
Concept of linear mappings are confusing me
Why is an old chain unsafe?
The magic money tree problem
Is it possible to do 50 km distance without any previous training?
"which" command doesn't work / path of Safari?
Is there a minimum number of transactions in a block?
Are there any consumables that function as addictive (psychedelic) drugs?
Can I interfere when another PC is about to be attacked?
Example of a relative pronoun
Circuitry of TV splitters
Why has Russell's definition of numbers using equivalence classes been finally abandoned? ( If it has actually been abandoned).
Why don't electron-positron collisions release infinite energy?
Banach space and Hilbert space topology
What typically incentivizes a professor to change jobs to a lower ranking university?
Why CLRS example on residual networks does not follows its formula?
Function figuring out which close-by value comes from which input value
Updating list of Dictionaries from other such listCompute the box covering on a graph using CPythonReplacing nodal values in a mesh with >1e6 inputs selectively using a polygonImplement LSD radix sort in pythonRed-black tree appears to be slower than std::multimapFind the longest rectilinear path in a matrix with descending entriesUsing dictionary.get() in if conditionFilling values for nested dictionariesCombine data from a list and map based on criteriaOptimization of BFS for finding minimum distances between elements of a grid
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I would like to optimize the below traceback
function. This function is part of an elementary step of the program and is called a lot...
import numpy as np
def traceback(tuple_node, tuple_node_alt):
"""
Compute which value from tuple_node_alt comes from which value from tuple_node.
Return a dictionnary where the key are the values from tuple_node and the values are the idx at which
the value may be located in tuple_node_alt.
"""
# Compute the tolerances based on the node
tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]
# Traceback
distance = dict()
alt_identification = dict()
for k, x in enumerate(tuple_node):
distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]
alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])
# Controls the identification and corrects it
len_values = {key: len(val) for key, val in alt_identification.items()}
if all([x <= 1 for x in len_values.values()]):
return alt_identification
else:
for key, value in alt_identification.items():
if len(value) <= 1:
continue
else:
other_values = [val for k, val in alt_identification.items() if k != key]
if value in other_values:
continue
else:
for val in other_values:
set1 = set(value)
intersec = set1.intersection(set(val))
if len(intersec) == 0:
continue
else:
alt_identification[key] = [v for v in value if v not in intersec]
return alt_identification
The input is composed of 2 tuples which do not need to have the same size. e.g.
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (87, 48, 59, 39)
The goal is to figure out which value from tuple_node_alt
may come from which value from tuple_node
. If the value from tuple_node_alt
is within a 10% margin from a value from tuple_node
, it is considered that it comes from this value.
e.g. 39 is within a 10% margin of 40. It comes from 40.
This aprt is perform in the "Traceback" section, where a distance dictionnary is computed and where the idx are computed. With the example above, the output is:
Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}
However, because of a potential overlapping of the tolerance band, 3 scenarios exists:
Scenario 1: each value has been identified to one alternative value. That's the case above.
Scenario 2:
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (42, 55, 54)
55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:
Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}
Scenario 3:
tuple_node = (40, 50, 60)
tuple_node_alt = (42, 55, 59)
This is when the control part comes in play. With this input, alt_identification
becomes: Out[66]: {40: [0], 50: [1], 60: [1, 2], 80: []}
. However, the 55 can not come from 60 since 50 only has one possibility: 55. Thus, this number being already taken, the correct output which is provided through the control & correct section is:
Out[66]: {40: [0], 50: [1], 60: [2], 80: []}
I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:
# With an input which does not enter the control & correct part.
node = (40, 50, 60, 80)
node_alt = (39, 48, 59, 87)
%timeit traceback(node, node_alt)
22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# With an input which need correction
node = (40, 50, 60, 100)
node_alt = (42, 55, 59, 89)
%timeit traceback(node, node_alt)
28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
python performance numpy
$endgroup$
add a comment |
$begingroup$
I would like to optimize the below traceback
function. This function is part of an elementary step of the program and is called a lot...
import numpy as np
def traceback(tuple_node, tuple_node_alt):
"""
Compute which value from tuple_node_alt comes from which value from tuple_node.
Return a dictionnary where the key are the values from tuple_node and the values are the idx at which
the value may be located in tuple_node_alt.
"""
# Compute the tolerances based on the node
tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]
# Traceback
distance = dict()
alt_identification = dict()
for k, x in enumerate(tuple_node):
distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]
alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])
# Controls the identification and corrects it
len_values = {key: len(val) for key, val in alt_identification.items()}
if all([x <= 1 for x in len_values.values()]):
return alt_identification
else:
for key, value in alt_identification.items():
if len(value) <= 1:
continue
else:
other_values = [val for k, val in alt_identification.items() if k != key]
if value in other_values:
continue
else:
for val in other_values:
set1 = set(value)
intersec = set1.intersection(set(val))
if len(intersec) == 0:
continue
else:
alt_identification[key] = [v for v in value if v not in intersec]
return alt_identification
The input is composed of 2 tuples which do not need to have the same size. e.g.
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (87, 48, 59, 39)
The goal is to figure out which value from tuple_node_alt
may come from which value from tuple_node
. If the value from tuple_node_alt
is within a 10% margin from a value from tuple_node
, it is considered that it comes from this value.
e.g. 39 is within a 10% margin of 40. It comes from 40.
This aprt is perform in the "Traceback" section, where a distance dictionnary is computed and where the idx are computed. With the example above, the output is:
Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}
However, because of a potential overlapping of the tolerance band, 3 scenarios exists:
Scenario 1: each value has been identified to one alternative value. That's the case above.
Scenario 2:
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (42, 55, 54)
55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:
Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}
Scenario 3:
tuple_node = (40, 50, 60)
tuple_node_alt = (42, 55, 59)
This is when the control part comes in play. With this input, alt_identification
becomes: Out[66]: {40: [0], 50: [1], 60: [1, 2], 80: []}
. However, the 55 can not come from 60 since 50 only has one possibility: 55. Thus, this number being already taken, the correct output which is provided through the control & correct section is:
Out[66]: {40: [0], 50: [1], 60: [2], 80: []}
I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:
# With an input which does not enter the control & correct part.
node = (40, 50, 60, 80)
node_alt = (39, 48, 59, 87)
%timeit traceback(node, node_alt)
22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# With an input which need correction
node = (40, 50, 60, 100)
node_alt = (42, 55, 59, 89)
%timeit traceback(node, node_alt)
28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
python performance numpy
$endgroup$
add a comment |
$begingroup$
I would like to optimize the below traceback
function. This function is part of an elementary step of the program and is called a lot...
import numpy as np
def traceback(tuple_node, tuple_node_alt):
"""
Compute which value from tuple_node_alt comes from which value from tuple_node.
Return a dictionnary where the key are the values from tuple_node and the values are the idx at which
the value may be located in tuple_node_alt.
"""
# Compute the tolerances based on the node
tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]
# Traceback
distance = dict()
alt_identification = dict()
for k, x in enumerate(tuple_node):
distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]
alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])
# Controls the identification and corrects it
len_values = {key: len(val) for key, val in alt_identification.items()}
if all([x <= 1 for x in len_values.values()]):
return alt_identification
else:
for key, value in alt_identification.items():
if len(value) <= 1:
continue
else:
other_values = [val for k, val in alt_identification.items() if k != key]
if value in other_values:
continue
else:
for val in other_values:
set1 = set(value)
intersec = set1.intersection(set(val))
if len(intersec) == 0:
continue
else:
alt_identification[key] = [v for v in value if v not in intersec]
return alt_identification
The input is composed of 2 tuples which do not need to have the same size. e.g.
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (87, 48, 59, 39)
The goal is to figure out which value from tuple_node_alt
may come from which value from tuple_node
. If the value from tuple_node_alt
is within a 10% margin from a value from tuple_node
, it is considered that it comes from this value.
e.g. 39 is within a 10% margin of 40. It comes from 40.
This aprt is perform in the "Traceback" section, where a distance dictionnary is computed and where the idx are computed. With the example above, the output is:
Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}
However, because of a potential overlapping of the tolerance band, 3 scenarios exists:
Scenario 1: each value has been identified to one alternative value. That's the case above.
Scenario 2:
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (42, 55, 54)
55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:
Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}
Scenario 3:
tuple_node = (40, 50, 60)
tuple_node_alt = (42, 55, 59)
This is when the control part comes in play. With this input, alt_identification
becomes: Out[66]: {40: [0], 50: [1], 60: [1, 2], 80: []}
. However, the 55 can not come from 60 since 50 only has one possibility: 55. Thus, this number being already taken, the correct output which is provided through the control & correct section is:
Out[66]: {40: [0], 50: [1], 60: [2], 80: []}
I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:
# With an input which does not enter the control & correct part.
node = (40, 50, 60, 80)
node_alt = (39, 48, 59, 87)
%timeit traceback(node, node_alt)
22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# With an input which need correction
node = (40, 50, 60, 100)
node_alt = (42, 55, 59, 89)
%timeit traceback(node, node_alt)
28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
python performance numpy
$endgroup$
I would like to optimize the below traceback
function. This function is part of an elementary step of the program and is called a lot...
import numpy as np
def traceback(tuple_node, tuple_node_alt):
"""
Compute which value from tuple_node_alt comes from which value from tuple_node.
Return a dictionnary where the key are the values from tuple_node and the values are the idx at which
the value may be located in tuple_node_alt.
"""
# Compute the tolerances based on the node
tolerances = [0.1 if x <= 100 else 0.2 for x in tuple_node]
# Traceback
distance = dict()
alt_identification = dict()
for k, x in enumerate(tuple_node):
distance[k] = [abs(elt-1) for elt in [alt_x/x for alt_x in tuple_node_alt]]
alt_identification[x] = list(np.where([elt <= tolerances[k]+0.00001 for elt in distance[k]])[0])
# Controls the identification and corrects it
len_values = {key: len(val) for key, val in alt_identification.items()}
if all([x <= 1 for x in len_values.values()]):
return alt_identification
else:
for key, value in alt_identification.items():
if len(value) <= 1:
continue
else:
other_values = [val for k, val in alt_identification.items() if k != key]
if value in other_values:
continue
else:
for val in other_values:
set1 = set(value)
intersec = set1.intersection(set(val))
if len(intersec) == 0:
continue
else:
alt_identification[key] = [v for v in value if v not in intersec]
return alt_identification
The input is composed of 2 tuples which do not need to have the same size. e.g.
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (87, 48, 59, 39)
The goal is to figure out which value from tuple_node_alt
may come from which value from tuple_node
. If the value from tuple_node_alt
is within a 10% margin from a value from tuple_node
, it is considered that it comes from this value.
e.g. 39 is within a 10% margin of 40. It comes from 40.
This aprt is perform in the "Traceback" section, where a distance dictionnary is computed and where the idx are computed. With the example above, the output is:
Out[67]: {40: [3], 50: [1], 60: [2], 80: [0]}
However, because of a potential overlapping of the tolerance band, 3 scenarios exists:
Scenario 1: each value has been identified to one alternative value. That's the case above.
Scenario 2:
tuple_node = (40, 50, 60, 80)
tuple_node_alt = (42, 55, 54)
55 and 54 are both in the tolerance band of both 50 and 60. Thus, the output is:
Out[66]: {40: [0], 50: [1, 2], 60: [1, 2], 80: []}
Scenario 3:
tuple_node = (40, 50, 60)
tuple_node_alt = (42, 55, 59)
This is when the control part comes in play. With this input, alt_identification
becomes: Out[66]: {40: [0], 50: [1], 60: [1, 2], 80: []}
. However, the 55 can not come from 60 since 50 only has one possibility: 55. Thus, this number being already taken, the correct output which is provided through the control & correct section is:
Out[66]: {40: [0], 50: [1], 60: [2], 80: []}
I would really like to optimize this part and to make it a lot more quicker. At the moment, it takes:
# With an input which does not enter the control & correct part.
node = (40, 50, 60, 80)
node_alt = (39, 48, 59, 87)
%timeit traceback(node, node_alt)
22.6 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# With an input which need correction
node = (40, 50, 60, 100)
node_alt = (42, 55, 59, 89)
%timeit traceback(node, node_alt)
28.1 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
python performance numpy
python performance numpy
edited Apr 1 at 16:17
200_success
131k17157422
131k17157422
asked Apr 1 at 11:23
MathieuMathieu
1507
1507
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
A couple of low-hanging fruit inefficiencies:
distance = dict()
. Thedistance[k]
value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.
all([ ...list comprehension... ])
: You are using list comprehension to build up a list, which you immediately pass toall(...)
. There is no need to actually create the list. Just useall(...list comprehension...)
.
set1 = set(value)
. This is inside afor val in other_values:
loop, wherevalue
andset1
are not changed. Move the statement out of thefor
loop, to avoid recreating the same set each iteration.
len_values
is only used in the afore mentionedall(...)
, and only the the values oflen_values
dictionary are used. As such, thelen_value
dictionary construction is also unnecessary, and theif
statement can be written:
if all(len(val) <= 1 for val in alt_identification.values()):
Since you are returning alt_identification
from the if
statement, and after the if...else
statement, you can invert the test, and remove one return statement:
if any(len(val) > 1 for val in alt_identification.values()):
for key, value in alt_identification.items():
# ... omitted for brevity ...
return alt_identification
Similarly, the two if condition: continue else:
could be re-written if not condition:
.
Other possible improvements:
tolerances[k]
is only used in nextfor k
loop. The list can be removed and the calculations move into the loop.
numpy
is only used for alist(np.where([...])[0])
operation, which is fairly obfuscated. A simple list comprehension can be used instead.- The values of
alt_identification
are of typelist
, and converted (repeatedly) into aset()
in the "control & correct" code. They could be stored asset()
to avoid repeated conversions.
Here is my rework of the code, with the changes based on above comments:
def traceback(tuple_node, tuple_node_alt):
def close_alternates(x):
tolerance = (0.1 if x <= 100 else 0.2) + 0.00001
return set( k for k, alt_x in enumerate(tuple_node_alt)
if abs(alt_x/x - 1) <= tolerance )
alt_identification = { x: close_alternates(x) for x in tuple_node }
if any(len(val) > 1 for val in alt_identification.values()):
for key, values in alt_identification.items():
if len(values) > 1:
other_values = [val for k, val in alt_identification.items() if k != key]
if values not in other_values:
for other in other_values:
alt_identification[key] -= other
return alt_identification
I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216646%2ffunction-figuring-out-which-close-by-value-comes-from-which-input-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
A couple of low-hanging fruit inefficiencies:
distance = dict()
. Thedistance[k]
value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.
all([ ...list comprehension... ])
: You are using list comprehension to build up a list, which you immediately pass toall(...)
. There is no need to actually create the list. Just useall(...list comprehension...)
.
set1 = set(value)
. This is inside afor val in other_values:
loop, wherevalue
andset1
are not changed. Move the statement out of thefor
loop, to avoid recreating the same set each iteration.
len_values
is only used in the afore mentionedall(...)
, and only the the values oflen_values
dictionary are used. As such, thelen_value
dictionary construction is also unnecessary, and theif
statement can be written:
if all(len(val) <= 1 for val in alt_identification.values()):
Since you are returning alt_identification
from the if
statement, and after the if...else
statement, you can invert the test, and remove one return statement:
if any(len(val) > 1 for val in alt_identification.values()):
for key, value in alt_identification.items():
# ... omitted for brevity ...
return alt_identification
Similarly, the two if condition: continue else:
could be re-written if not condition:
.
Other possible improvements:
tolerances[k]
is only used in nextfor k
loop. The list can be removed and the calculations move into the loop.
numpy
is only used for alist(np.where([...])[0])
operation, which is fairly obfuscated. A simple list comprehension can be used instead.- The values of
alt_identification
are of typelist
, and converted (repeatedly) into aset()
in the "control & correct" code. They could be stored asset()
to avoid repeated conversions.
Here is my rework of the code, with the changes based on above comments:
def traceback(tuple_node, tuple_node_alt):
def close_alternates(x):
tolerance = (0.1 if x <= 100 else 0.2) + 0.00001
return set( k for k, alt_x in enumerate(tuple_node_alt)
if abs(alt_x/x - 1) <= tolerance )
alt_identification = { x: close_alternates(x) for x in tuple_node }
if any(len(val) > 1 for val in alt_identification.values()):
for key, values in alt_identification.items():
if len(values) > 1:
other_values = [val for k, val in alt_identification.items() if k != key]
if values not in other_values:
for other in other_values:
alt_identification[key] -= other
return alt_identification
I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.
$endgroup$
add a comment |
$begingroup$
A couple of low-hanging fruit inefficiencies:
distance = dict()
. Thedistance[k]
value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.
all([ ...list comprehension... ])
: You are using list comprehension to build up a list, which you immediately pass toall(...)
. There is no need to actually create the list. Just useall(...list comprehension...)
.
set1 = set(value)
. This is inside afor val in other_values:
loop, wherevalue
andset1
are not changed. Move the statement out of thefor
loop, to avoid recreating the same set each iteration.
len_values
is only used in the afore mentionedall(...)
, and only the the values oflen_values
dictionary are used. As such, thelen_value
dictionary construction is also unnecessary, and theif
statement can be written:
if all(len(val) <= 1 for val in alt_identification.values()):
Since you are returning alt_identification
from the if
statement, and after the if...else
statement, you can invert the test, and remove one return statement:
if any(len(val) > 1 for val in alt_identification.values()):
for key, value in alt_identification.items():
# ... omitted for brevity ...
return alt_identification
Similarly, the two if condition: continue else:
could be re-written if not condition:
.
Other possible improvements:
tolerances[k]
is only used in nextfor k
loop. The list can be removed and the calculations move into the loop.
numpy
is only used for alist(np.where([...])[0])
operation, which is fairly obfuscated. A simple list comprehension can be used instead.- The values of
alt_identification
are of typelist
, and converted (repeatedly) into aset()
in the "control & correct" code. They could be stored asset()
to avoid repeated conversions.
Here is my rework of the code, with the changes based on above comments:
def traceback(tuple_node, tuple_node_alt):
def close_alternates(x):
tolerance = (0.1 if x <= 100 else 0.2) + 0.00001
return set( k for k, alt_x in enumerate(tuple_node_alt)
if abs(alt_x/x - 1) <= tolerance )
alt_identification = { x: close_alternates(x) for x in tuple_node }
if any(len(val) > 1 for val in alt_identification.values()):
for key, values in alt_identification.items():
if len(values) > 1:
other_values = [val for k, val in alt_identification.items() if k != key]
if values not in other_values:
for other in other_values:
alt_identification[key] -= other
return alt_identification
I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.
$endgroup$
add a comment |
$begingroup$
A couple of low-hanging fruit inefficiencies:
distance = dict()
. Thedistance[k]
value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.
all([ ...list comprehension... ])
: You are using list comprehension to build up a list, which you immediately pass toall(...)
. There is no need to actually create the list. Just useall(...list comprehension...)
.
set1 = set(value)
. This is inside afor val in other_values:
loop, wherevalue
andset1
are not changed. Move the statement out of thefor
loop, to avoid recreating the same set each iteration.
len_values
is only used in the afore mentionedall(...)
, and only the the values oflen_values
dictionary are used. As such, thelen_value
dictionary construction is also unnecessary, and theif
statement can be written:
if all(len(val) <= 1 for val in alt_identification.values()):
Since you are returning alt_identification
from the if
statement, and after the if...else
statement, you can invert the test, and remove one return statement:
if any(len(val) > 1 for val in alt_identification.values()):
for key, value in alt_identification.items():
# ... omitted for brevity ...
return alt_identification
Similarly, the two if condition: continue else:
could be re-written if not condition:
.
Other possible improvements:
tolerances[k]
is only used in nextfor k
loop. The list can be removed and the calculations move into the loop.
numpy
is only used for alist(np.where([...])[0])
operation, which is fairly obfuscated. A simple list comprehension can be used instead.- The values of
alt_identification
are of typelist
, and converted (repeatedly) into aset()
in the "control & correct" code. They could be stored asset()
to avoid repeated conversions.
Here is my rework of the code, with the changes based on above comments:
def traceback(tuple_node, tuple_node_alt):
def close_alternates(x):
tolerance = (0.1 if x <= 100 else 0.2) + 0.00001
return set( k for k, alt_x in enumerate(tuple_node_alt)
if abs(alt_x/x - 1) <= tolerance )
alt_identification = { x: close_alternates(x) for x in tuple_node }
if any(len(val) > 1 for val in alt_identification.values()):
for key, values in alt_identification.items():
if len(values) > 1:
other_values = [val for k, val in alt_identification.items() if k != key]
if values not in other_values:
for other in other_values:
alt_identification[key] -= other
return alt_identification
I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.
$endgroup$
A couple of low-hanging fruit inefficiencies:
distance = dict()
. Thedistance[k]
value is computed in a loop, and only every used in the next statement of the loop. It does not need to be stored in a dictionary.
all([ ...list comprehension... ])
: You are using list comprehension to build up a list, which you immediately pass toall(...)
. There is no need to actually create the list. Just useall(...list comprehension...)
.
set1 = set(value)
. This is inside afor val in other_values:
loop, wherevalue
andset1
are not changed. Move the statement out of thefor
loop, to avoid recreating the same set each iteration.
len_values
is only used in the afore mentionedall(...)
, and only the the values oflen_values
dictionary are used. As such, thelen_value
dictionary construction is also unnecessary, and theif
statement can be written:
if all(len(val) <= 1 for val in alt_identification.values()):
Since you are returning alt_identification
from the if
statement, and after the if...else
statement, you can invert the test, and remove one return statement:
if any(len(val) > 1 for val in alt_identification.values()):
for key, value in alt_identification.items():
# ... omitted for brevity ...
return alt_identification
Similarly, the two if condition: continue else:
could be re-written if not condition:
.
Other possible improvements:
tolerances[k]
is only used in nextfor k
loop. The list can be removed and the calculations move into the loop.
numpy
is only used for alist(np.where([...])[0])
operation, which is fairly obfuscated. A simple list comprehension can be used instead.- The values of
alt_identification
are of typelist
, and converted (repeatedly) into aset()
in the "control & correct" code. They could be stored asset()
to avoid repeated conversions.
Here is my rework of the code, with the changes based on above comments:
def traceback(tuple_node, tuple_node_alt):
def close_alternates(x):
tolerance = (0.1 if x <= 100 else 0.2) + 0.00001
return set( k for k, alt_x in enumerate(tuple_node_alt)
if abs(alt_x/x - 1) <= tolerance )
alt_identification = { x: close_alternates(x) for x in tuple_node }
if any(len(val) > 1 for val in alt_identification.values()):
for key, values in alt_identification.items():
if len(values) > 1:
other_values = [val for k, val in alt_identification.items() if k != key]
if values not in other_values:
for other in other_values:
alt_identification[key] -= other
return alt_identification
I'm getting up to a 2.8x speedup with the above code, on your test data set that require correction.
edited Apr 2 at 22:26
answered Apr 1 at 17:37
AJNeufeldAJNeufeld
6,6441722
6,6441722
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216646%2ffunction-figuring-out-which-close-by-value-comes-from-which-input-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown