GUI that searches files using user inputted search strings Announcing the arrival of Valued...

Is it fair for a professor to grade us on the possession of past papers?

How do I make this wiring inside cabinet safer? (Pic)

Amount of permutations on an NxNxN Rubik's Cube

What is homebrew?

How do pianists reach extremely loud dynamics?

Is there such thing as an Availability Group failover trigger?

Crossing US/Canada Border for less than 24 hours

Is grep documentation wrong?

Fundamental Solution of the Pell Equation

Is it ethical to give a final exam after the professor has quit before teaching the remaining chapters of the course?

Is safe to use va_start macro with this as parameter?

How would a mousetrap for use in space work?

Wu formula for manifolds with boundary

How to react to hostile behavior from a senior developer?

When a candle burns, why does the top of wick glow if bottom of flame is hottest?

Should I use a zero-interest credit card for a large one-time purchase?

What do you call a floor made of glass so you can see through the floor?

Generate an RGB colour grid

Do square wave exist?

Does classifying an integer as a discrete log require it be part of a multiplicative group?

old style "caution" boxes

Do I really need to have a message in a novel to appeal to readers?

What is the longest distance a player character can jump in one leap?

What font is "z" in "z-score"?



GUI that searches files using user inputted search strings



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Using Levenstein distance to compare stringsLocating matching files with input folder and file prefixFaster way to parse file to array, compare to array in second file, write final fileRead Space Delimited Text File to Standardized Data TypeCreating a simple Interpreter for the Quartz languageFinding Matches and writing results to file C#Binary search for random access iterators in C++Did you twist my words?Searching repositories for files with forbidden stringsTrying to speed up doing a regex_search function with X strings on Y files





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







2












$begingroup$


Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.



Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).



If they are regularly searchable strings, then I use the normal str.find function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search function.



The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...



searchAllFilesForAllStrings -> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)



Global variables:



vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;


This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):



ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);

regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}

searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}

int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;

// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }

// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}

// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;


found and nonwcfound are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename and foldername are variables that will also be saved to be outputted to the file.



        size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }

if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;

// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);

// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);

// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;

stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}

entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");









share|improve this question









New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g. excelPath).
    $endgroup$
    – user673679
    8 hours ago






  • 1




    $begingroup$
    Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
    $endgroup$
    – Martin York
    3 hours ago






  • 1




    $begingroup$
    Please submit each function seprately for its own review.
    $endgroup$
    – Martin York
    3 hours ago


















2












$begingroup$


Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.



Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).



If they are regularly searchable strings, then I use the normal str.find function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search function.



The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...



searchAllFilesForAllStrings -> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)



Global variables:



vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;


This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):



ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);

regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}

searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}

int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;

// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }

// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}

// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;


found and nonwcfound are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename and foldername are variables that will also be saved to be outputted to the file.



        size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }

if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;

// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);

// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);

// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;

stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}

entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");









share|improve this question









New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g. excelPath).
    $endgroup$
    – user673679
    8 hours ago






  • 1




    $begingroup$
    Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
    $endgroup$
    – Martin York
    3 hours ago






  • 1




    $begingroup$
    Please submit each function seprately for its own review.
    $endgroup$
    – Martin York
    3 hours ago














2












2








2





$begingroup$


Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.



Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).



If they are regularly searchable strings, then I use the normal str.find function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search function.



The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...



searchAllFilesForAllStrings -> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)



Global variables:



vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;


This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):



ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);

regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}

searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}

int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;

// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }

// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}

// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;


found and nonwcfound are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename and foldername are variables that will also be saved to be outputted to the file.



        size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }

if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;

// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);

// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);

// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;

stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}

entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");









share|improve this question









New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




Basically, I've got a GUI where the user can select text files, enter in strings (that may or may not contain wildcards) and search those files for those strings.



Currently, I take the user inputted string(s) and divide them into two groups: regularly searchable strings, and strings that have wildcards (either * for 1 wildcard character, or .* for any amount).



If they are regularly searchable strings, then I use the normal str.find function (as I tested this vs regex_search and this is faster), otherwise I use the regex_search function.



The main problem is performance. As a benchmark comparison, it takes my program roughly 47 minutes and it is searching through 5,028,712 lines. From trying to figure this problem out on google it seems this entire search should take me well under a minute...



searchAllFilesForAllStrings -> bool checkbox on the GUI where if it is set to true, the program will just search every file for every string, and false, will only search each "batch" of files (if the filepath includes a wildcard, such as "Read*.txt -> all files starting with Read and that are .txt files will be chosen)



Global variables:



vector<string> regex_patterns;
vector<string> excelFiles;
vector<string> nonWildCardSearchStrings;
vector<string> searchStrings;
vector<string> searchFiles;
vector<bool> regexIndex;


This function gets called from a GUI button using an editable text box that contains the path to the file. It essentially grabs strings and files from an excel file formatted with 2 columns, one column search strings, and one column search files. In search strings, XYY denotes 1 wildcard (because the user should be able to search for a * if they desire; XYZ denotes any # of wildcards):



ifstream excelFile;
string line;
string delimiter = ",";
int bIdxStrings = 0;
int bIdxFiles = 0;
for (int i = 0; i < excelPath.size(); i++)
{
excelFile.open(excelPath.at(i));
if (excelFile.is_open()) {}
else { return false; }
int index = 0;
while (getline(excelFile, line))
{
searchStrings.push_back(line.substr(0, line.find(delimiter)));
searchFiles.push_back(line.substr(line.find(delimiter) + 1));
index = index + 1;
}
searchStrings.erase(searchStrings.begin() + bIdxStrings);
searchFiles.erase(searchFiles.begin() + bIdxFiles);
bIdxStrings = searchStrings.size() + 1;
bIdxFiles = searchFiles.size() + 1;
for (int i = 0; i < searchStrings.size(); i++)
{
searchStrings.at(i) = addEscapes(searchStrings.at(i));
}
}
excelFile.close();
string key = "\";
size_t foundLast = 0;
string wcPath = "";
tuple<vector<string>, string> addWildCardFiles;
vector<string>test;
string holdTemp = "";
string regSearch = "";
string regtemp = "";
string fullPath = "";
vector<string>tempFiles;
vector<string>tempStrings;
// Search for wildcard paths. if any exist, find files based on their main directory. will not recursively search.
for (int i = 0; i < searchFiles.size();i++)
{
size_t found = searchFiles.at(i).find("*");
if (found != string::npos)
{
// temporarily hold the search string corresponding to this entry.
holdTemp = searchStrings.at(i);
foundLast = searchFiles.at(i).rfind(key);
wcPath = searchFiles.at(i).substr(0, foundLast);
regSearch = searchFiles.at(i).substr(foundLast + 1, string::npos);

regtemp = regSearch.substr(0, regSearch.find("*"));
regtemp.append(".*");
regtemp.append(regSearch.substr(regSearch.find("*") + 1, string::npos));
regSearch = regtemp;
smatch matchez;
// Should make regex search case insensitive.
regex e(regSearch, regex_constants::icase);
//searchFiles.erase(searchFiles.begin() + i);
//searchStrings.erase(searchStrings.begin() + i);
// All files in the directory:
addWildCardFiles = read_directory(wcPath, test);
for (int m = 0; m < get<0>(addWildCardFiles).size(); m++)
{
size_t wcBool = regex_search(get<0>(addWildCardFiles)[m], matchez, e);
if (wcBool == 1)
{
fullPath.append(wcPath); fullPath.append("\");
fullPath.append(get<0>(addWildCardFiles)[m]);
tempFiles.push_back(fullPath);
tempStrings.push_back(holdTemp);
}
fullPath = "";
}
}
}

searchStrings.insert(searchStrings.end(), tempStrings.begin(), tempStrings.end());
searchFiles.insert(searchFiles.end(), tempFiles.begin(), tempFiles.end());
sort(searchStrings.begin(), searchStrings.end());
sort(searchFiles.begin(), searchFiles.end());
searchFiles.erase(unique(searchFiles.begin(), searchFiles.end()), searchFiles.end());
if (searchAllFilesForAllStrings == true)
{
searchStrings.erase(unique(searchStrings.begin(), searchStrings.end()), searchStrings.end());
}

int setNext = -1;
vector<int> filesRepeat;
vector<int> stringsRepeat;
size_t stringsCount = 0;
size_t filesCount = 0;

// Loops to get rid of duplicate search strings + duplicate files.
// Dont get rid of duplicates if only searching each file for each subsequent string because of how the code is structured;
for (int i = 0; i < searchStrings.size(); i++) { if (searchStrings.at(i).compare("Search Strings") == 0) { searchStrings.erase(searchStrings.begin() + i); } }
for (int i = 0; i < searchFiles.size(); i++) { if (searchFiles.at(i).compare("Search Files") == 0) { searchFiles.erase(searchFiles.begin() + i); } }

// Loops to get rid of wildstar patterns that are included (these can't be searched)
int idx = 0;
int startCount = searchFiles.size();
while (idx < startCount)
{
if (contains(searchFiles.at(idx), "*", 1) == 1)
{
searchFiles.erase(searchFiles.begin() + idx);
startCount = startCount - 1;
idx = 0;
}
idx = idx + 1;
}

// Loop to deal with each search string and format it for regex searching later.
// only pull strings that are non wildcard containing. everything else can be normally searched which will save time.
for (unsigned int jj = 0; jj < searchStrings.size(); jj++)
{
if (contains(searchStrings.at(jj), "XYY", 0) == 1 || contains(searchStrings.at(jj), "XYZ", 0) == 1)
{
regex_patterns.push_back(replaceWildCards(searchStrings.at(jj)));
regexIndex.push_back(true);
}
else
{
nonWildCardSearchStrings.push_back(searchStrings.at(jj));
regexIndex.push_back(false);
}
}
return true;


found and nonwcfound are the variables used to check if a match was found and subsequently to save the line text and line number in vectors. filename and foldername are variables that will also be saved to be outputted to the file.



        size_t found;
bool nonwcfound = false;
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++) { expressions.emplace_back(regex_patterns.at(i)); }

if (searchAllFilesForAllStrings == true)
{
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
myOutPut << "Line Text, Line Number, File Name, Folder Path," << "n";
myOutPut.close();
for (size_t j = 0; j < searchFiles.size();j++)
{
// Initialize variables for line number + text
vector<int> lineNumber;
vector<string> lineText;
vector<string>lineStrings;
string entireFile;

// Get file and folder name for storage.
string fileName;
string folderName;
fileName = searchFiles.at(j);
int fileNameSlashIdx = fileName.rfind("\");
folderName = fileName.substr(0, fileNameSlashIdx);
fileName = fileName.substr(fileNameSlashIdx + 1, string::npos);

// File ifstream definition/opening
ifstream file;
file.open(searchFiles.at(j), ios::in | ios::ate);

// Fill and close file
if (file)
{
ifstream::streampos filesize = file.tellg();
entireFile.reserve(filesize);
file.seekg(0);
while (!file.eof())
{
entireFile += file.get();
}
}
file.close();
int linecount = 0;

stringstream stream(entireFile);
while (1)
{
string line;
getline(stream, line);
if (!stream.good())
break;
for (size_t r = 0; r < expressions.size(); r++)
{
found = regex_search(line, matches, expressions.at(r));
if (found == 1)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
for (size_t rr = 0; rr < nonWildCardSearchStrings.size(); rr++)
{
nonwcfound = contains(line, nonWildCardSearchStrings.at(rr), 0);
if (nonwcfound == true)
{
lineNumber.push_back(linecount);
lineText.push_back(line);
}
}
linecount = linecount + 1;
}

entireFile.clear();
ofstream myOutPut;
myOutPut.open(outputFilePath, std::ofstream::out | std::ofstream::app);
{
tuple<vector<string>, vector<int>, string, string>result = make_tuple(lineText, lineNumber, fileName, folderName);
writeResultsToFile(result, outputFilePath);
}
myOutPut.close();
}
}
if (searchAllFilesForAllStrings == false)
{
// Do the same thing as above, except that it will search each file/batch of files only with the
// subsequent search string in the same row of the excel file that is read in using the above function.
}
MessageBox::Show("Finished execution. Your file is now available for viewing!", "Output Excel File Written");






c++ performance beginner






share|improve this question









New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 6 mins ago









Jamal

30.6k11121227




30.6k11121227






New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 11 hours ago









ace1ericace1eric

211




211




New contributor




ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






ace1eric is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g. excelPath).
    $endgroup$
    – user673679
    8 hours ago






  • 1




    $begingroup$
    Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
    $endgroup$
    – Martin York
    3 hours ago






  • 1




    $begingroup$
    Please submit each function seprately for its own review.
    $endgroup$
    – Martin York
    3 hours ago


















  • $begingroup$
    1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g. excelPath).
    $endgroup$
    – user673679
    8 hours ago






  • 1




    $begingroup$
    Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
    $endgroup$
    – Martin York
    3 hours ago






  • 1




    $begingroup$
    Please submit each function seprately for its own review.
    $endgroup$
    – Martin York
    3 hours ago
















$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g. excelPath).
$endgroup$
– user673679
8 hours ago




$begingroup$
1. Did you use a profiler to profile the program? We can guess about performance, but since this isn't a complete, compilable program it's just guessing. You have the power to actually measure it and see. 2. Please add the function declarations. At the moment we have to guess the types of the function parameters (e.g. excelPath).
$endgroup$
– user673679
8 hours ago




1




1




$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago




$begingroup$
Have to vote to close as it stands. There are no functions here just raw code so impossible to review as it stands (no context). But there is a lot to talk about (efficiency and improvements and design) with each function. Please resubmit each function with a test harness to show that that function works as expected and can be tested by us.
$endgroup$
– Martin York
3 hours ago




1




1




$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago




$begingroup$
Please submit each function seprately for its own review.
$endgroup$
– Martin York
3 hours ago










0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






ace1eric is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217628%2fgui-that-searches-files-using-user-inputted-search-strings%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes








ace1eric is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















ace1eric is a new contributor. Be nice, and check out our Code of Conduct.













ace1eric is a new contributor. Be nice, and check out our Code of Conduct.












ace1eric is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217628%2fgui-that-searches-files-using-user-inputted-search-strings%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...