GUI that searches files using user inputted search strings Announcing the arrival of Valued...
Is it fair for a professor to grade us on the possession of past papers?
How do I make this wiring inside cabinet safer? (Pic)
Amount of permutations on an NxNxN Rubik's Cube
What is homebrew?
How do pianists reach extremely loud dynamics?
Is there such thing as an Availability Group failover trigger?
Crossing US/Canada Border for less than 24 hours
Is grep documentation wrong?
Fundamental Solution of the Pell Equation
Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?
Is safe to use va_start macro with this as parameter?
How would a mousetrap for use in space work?
Wu formula for manifolds with boundary
How to react to hostile behavior from a senior developer?
When a candle burns, why does the top of wick glow if bottom of flame is hottest?
Should I use a zero-interest credit card for a large one-time purchase?
What do you call a floor made of glass so you can see through the floor?
Generate an RGB colour grid
Do square wave exist?
Does classifying an integer as a discrete log require it be part of a multiplicative group?
old style "caution" boxes
Do I really need to have a message in a novel to appeal to readers?
What is the longest distance a player character can jump in one leap?
What font is "z" in "z-score"?
GUI that searches files using user inputted search strings
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Using Levenstein distance to compare stringsLocating matching files with input folder and file prefixFaster way to parse file to array, compare to array in second file, write final fileRead Space Delimited Text File to Standardized Data TypeCreating a simple Interpreter for the Quartz languageFinding Matches and writing results to file C#Binary search for random access iterators in C++Did you twist my words?Searching repositories for files with forbidden stringsTrying to speed up doing a regex_search function with X strings on Y files
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.
Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).
If they are regularly searchable strings, then I use the normal str.find
function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search
function.
The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...
searchAllFilesForAllStrings
-> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)
Global variables:
vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;
This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):
ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);
regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}
searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}
int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;
// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }
// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}
// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;
found
and nonwcfound
are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename
and foldername
are variables that will also be saved to be outputted to the file.
size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }
if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;
// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);
// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);
// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;
stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}
entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");
c++ performance beginner
New contributor
$endgroup$
add a comment |
$begingroup$
Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.
Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).
If they are regularly searchable strings, then I use the normal str.find
function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search
function.
The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...
searchAllFilesForAllStrings
-> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)
Global variables:
vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;
This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):
ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);
regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}
searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}
int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;
// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }
// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}
// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;
found
and nonwcfound
are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename
and foldername
are variables that will also be saved to be outputted to the file.
size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }
if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;
// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);
// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);
// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;
stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}
entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");
c++ performance beginner
New contributor
$endgroup$
$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g.excelPath
).
$endgroup$
– user673679
8 hours ago
1
$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago
1
$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago
add a comment |
$begingroup$
Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.
Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).
If they are regularly searchable strings, then I use the normal str.find
function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search
function.
The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...
searchAllFilesForAllStrings
-> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)
Global variables:
vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;
This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):
ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);
regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}
searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}
int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;
// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }
// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}
// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;
found
and nonwcfound
are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename
and foldername
are variables that will also be saved to be outputted to the file.
size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }
if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;
// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);
// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);
// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;
stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}
entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");
c++ performance beginner
New contributor
$endgroup$
Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.
Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).
If they are regularly searchable strings, then I use the normal str.find
function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search
function.
The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...
searchAllFilesForAllStrings
-> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)
Global variables:
vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;
This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):
ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);
regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}
searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}
int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;
// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }
// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}
// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;
found
and nonwcfound
are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename
and foldername
are variables that will also be saved to be outputted to the file.
size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }
if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;
// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);
// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);
// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;
stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}
entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");
c++ performance beginner
c++ performance beginner
New contributor
New contributor
edited 6 mins ago
Jamal♦
30.6k11121227
30.6k11121227
New contributor
asked 11 hours ago
ace1ericace1eric
211
211
New contributor
New contributor
$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g.excelPath
).
$endgroup$
– user673679
8 hours ago
1
$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago
1
$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago
add a comment |
$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g.excelPath
).
$endgroup$
– user673679
8 hours ago
1
$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago
1
$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago
$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g.
excelPath
).$endgroup$
– user673679
8 hours ago
$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g.
excelPath
).$endgroup$
– user673679
8 hours ago
1
1
$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago
$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago
1
1
$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago
$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
ace1eric is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217628%2fgui-that-searches-files-using-user-inputted-search-strings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
ace1eric is a new contributor. Be nice, and check out our Code of Conduct.
ace1eric is a new contributor. Be nice, and check out our Code of Conduct.
ace1eric is a new contributor. Be nice, and check out our Code of Conduct.
ace1eric is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217628%2fgui-that-searches-files-using-user-inputted-search-strings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g.
excelPath
).$endgroup$
– user673679
8 hours ago
1
$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago
1
$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago