Find all strings matching a giving regex pattern in files in a directory (including all...

How to acknowledge an embarrassing job interview, now that I work directly with the interviewer?

The effects of magnetism in radio transmissions

What is better: yes / no radio, or simple checkbox?

Why zero tolerance on nudity in space?

Isn't using the Extrusion Multiplier like cheating?

Why did this image turn out darker?

Why did other German political parties disband so fast when Hitler was appointed chancellor?

Why do members of Congress in committee hearings ask witnesses the same question multiple times?

Disable the ">" operator in Rstudio linux terminal

Dilemma of explaining to interviewer that he is the reason for declining second interview

Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?

Does fast page mode apply to ROM?

Can a hotel cancel a confirmed reservation?

How to tag distinct options/entities without giving any an implicit priority or suggested order?

What is the purpose of easy combat scenarios that don't need resource expenditure?

How to deal with an incendiary email that was recalled

Lick explanation

How to prevent users from executing commands through browser URL

A starship is travelling at 0.9c and collides with a small rock. Will it leave a clean hole through, or will more happen?

Citing paywalled articles accessed via illegal web sharing

Slow moving projectiles from a hand-held weapon - how do they reach the target?

What creature do these Alchemical Humonculus actions target?

If I delete my router's history can my ISP still provide it to my parents?

What is the wife of a henpecked husband called?



Find all strings matching a giving regex pattern in files in a directory (including all subdirectories)


Process files in all subdirectories and save output to new files based on their current pathFind a specific file, or find all executable files within the system pathMoving MP3 files from one directory to another using regexWrite MD5 hashes to file for all files in a directory treeFind files with content matching regexPattern matching (like regex)Python library for tio.run interactionFind all files in directory and subdirectories while ignoring noncritical exceptionsSpeed-cubing timer console applicationFind files by pattern and copy to target location













1












$begingroup$


#! python3
# `regexSearch`: Finds all lines matching a given regex in each file in a given folder.
# Usage:
# The directory to search and regex to be searched for are provided as a command line arguments.
# The 1st and 2nd command line arguments are the directory and regex pattern respectively.
# Script prompts the user to enter the regex.
# After completion, the user is prompted to continue

import re, sys
from os import path, listdir

def regex_search(regex, directory):
res, lst = {}, listdir(directory)
for itm in lst:
pth = path.join(path.abspath(directory), itm)
if path.isdir(pth): res.update(regex_search(regex, pth)) #Recursively traverse all sub directories.
else:
print(pth)
with open(pth) as file:
tmp = []
for idx, line in enumerate(file.readlines()):
results = regex.findall(line)
if results: tmp.extend([f"Line {idx+1}: {results}"])
res[pth] = tmp
return res

if __name__ == "__main__":
directory, pattern = sys.argv[1:3]
while not path.isdir(directory):
print("Error: Please input a valid path for an existing directory:", end = "t")
directory = input()
while True:
try:
regex = re.compile(pattern)
break
except TypeError:
print("Error: Please input a valid regex:", end = "t")
pattern = input()
except re.error:
print("Error: Please input a valid regex:", end = "t")
pattern = input()
matches = regex_search(regex, directory)
for key in matches: print(key, "n".join(matches[key]), sep="n", end="nn")









share|improve this question











$endgroup$

















    1












    $begingroup$


    #! python3
    # `regexSearch`: Finds all lines matching a given regex in each file in a given folder.
    # Usage:
    # The directory to search and regex to be searched for are provided as a command line arguments.
    # The 1st and 2nd command line arguments are the directory and regex pattern respectively.
    # Script prompts the user to enter the regex.
    # After completion, the user is prompted to continue

    import re, sys
    from os import path, listdir

    def regex_search(regex, directory):
    res, lst = {}, listdir(directory)
    for itm in lst:
    pth = path.join(path.abspath(directory), itm)
    if path.isdir(pth): res.update(regex_search(regex, pth)) #Recursively traverse all sub directories.
    else:
    print(pth)
    with open(pth) as file:
    tmp = []
    for idx, line in enumerate(file.readlines()):
    results = regex.findall(line)
    if results: tmp.extend([f"Line {idx+1}: {results}"])
    res[pth] = tmp
    return res

    if __name__ == "__main__":
    directory, pattern = sys.argv[1:3]
    while not path.isdir(directory):
    print("Error: Please input a valid path for an existing directory:", end = "t")
    directory = input()
    while True:
    try:
    regex = re.compile(pattern)
    break
    except TypeError:
    print("Error: Please input a valid regex:", end = "t")
    pattern = input()
    except re.error:
    print("Error: Please input a valid regex:", end = "t")
    pattern = input()
    matches = regex_search(regex, directory)
    for key in matches: print(key, "n".join(matches[key]), sep="n", end="nn")









    share|improve this question











    $endgroup$















      1












      1








      1





      $begingroup$


      #! python3
      # `regexSearch`: Finds all lines matching a given regex in each file in a given folder.
      # Usage:
      # The directory to search and regex to be searched for are provided as a command line arguments.
      # The 1st and 2nd command line arguments are the directory and regex pattern respectively.
      # Script prompts the user to enter the regex.
      # After completion, the user is prompted to continue

      import re, sys
      from os import path, listdir

      def regex_search(regex, directory):
      res, lst = {}, listdir(directory)
      for itm in lst:
      pth = path.join(path.abspath(directory), itm)
      if path.isdir(pth): res.update(regex_search(regex, pth)) #Recursively traverse all sub directories.
      else:
      print(pth)
      with open(pth) as file:
      tmp = []
      for idx, line in enumerate(file.readlines()):
      results = regex.findall(line)
      if results: tmp.extend([f"Line {idx+1}: {results}"])
      res[pth] = tmp
      return res

      if __name__ == "__main__":
      directory, pattern = sys.argv[1:3]
      while not path.isdir(directory):
      print("Error: Please input a valid path for an existing directory:", end = "t")
      directory = input()
      while True:
      try:
      regex = re.compile(pattern)
      break
      except TypeError:
      print("Error: Please input a valid regex:", end = "t")
      pattern = input()
      except re.error:
      print("Error: Please input a valid regex:", end = "t")
      pattern = input()
      matches = regex_search(regex, directory)
      for key in matches: print(key, "n".join(matches[key]), sep="n", end="nn")









      share|improve this question











      $endgroup$




      #! python3
      # `regexSearch`: Finds all lines matching a given regex in each file in a given folder.
      # Usage:
      # The directory to search and regex to be searched for are provided as a command line arguments.
      # The 1st and 2nd command line arguments are the directory and regex pattern respectively.
      # Script prompts the user to enter the regex.
      # After completion, the user is prompted to continue

      import re, sys
      from os import path, listdir

      def regex_search(regex, directory):
      res, lst = {}, listdir(directory)
      for itm in lst:
      pth = path.join(path.abspath(directory), itm)
      if path.isdir(pth): res.update(regex_search(regex, pth)) #Recursively traverse all sub directories.
      else:
      print(pth)
      with open(pth) as file:
      tmp = []
      for idx, line in enumerate(file.readlines()):
      results = regex.findall(line)
      if results: tmp.extend([f"Line {idx+1}: {results}"])
      res[pth] = tmp
      return res

      if __name__ == "__main__":
      directory, pattern = sys.argv[1:3]
      while not path.isdir(directory):
      print("Error: Please input a valid path for an existing directory:", end = "t")
      directory = input()
      while True:
      try:
      regex = re.compile(pattern)
      break
      except TypeError:
      print("Error: Please input a valid regex:", end = "t")
      pattern = input()
      except re.error:
      print("Error: Please input a valid regex:", end = "t")
      pattern = input()
      matches = regex_search(regex, directory)
      for key in matches: print(key, "n".join(matches[key]), sep="n", end="nn")






      python python-3.x regex file-system






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday









      Ludisposed

      8,32722161




      8,32722161










      asked yesterday









      Tobi AlafinTobi Alafin

      40319




      40319






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Some improvements





          • Style



            Please indent your file properly, since indentation is important in Python, those lines like




            if path.isdir(pth):   res.update(regex_search(regex, pth))



            Are frowned upon, instead do



            if path.isdir(pth):
            res.update(regex_search(regex, pth))



          • Use glob for listing files in a directory



            With Python3.5+ glob is the easiest way to list all files in a directory and subdirectory, before you should use os.walk()




          • Use generators when appropriate



            This will save some memory space, as it doesn't have to append to the temporary list all the time




          • Use argparse over sys.argv[]



            Argparse is the module for CLI input, easy to use and has a ton of features I definitely recommend it!




          Code



          import argparse
          import glob
          import re
          import os
          import pathlib

          def regex_search(regex, directory):
          for f in glob.glob(f"{directory}**/*.*", recursive=True):
          with open(f) as _file:
          for i, line in enumerate(_file.readlines()):
          if regex.search(line):
          yield f"In file {f} matched: {line.rstrip()} at position: {i}"

          def parse_args():
          parser = argparse.ArgumentParser(
          usage='%(prog)s [options] <regex> <directory>',
          formatter_class=argparse.RawDescriptionHelpFormatter
          )
          parser.add_argument('regex', type=str)
          parser.add_argument('directory', type=str)
          args = parser.parse_args()

          try:
          rgx = re.compile(args.regex)
          except Exception as e:
          parser.error('Regex does not compile')
          directory = pathlib.Path(args.directory)
          if not os.path.isdir(directory):
          parser.error('Directory is not valid')
          return rgx, directory

          if __name__ == '__main__':
          regex, directory = parse_args()
          for match in regex_search(regex, directory):
          print(match)


          Bonus Round!



          grep is a Unix tool that can basically do this by default



          grep -Hrn 'search term' path/to/dir


          Where:





          • -H prints the matching line


          • -r Does a recursive search


          • -n prints the line number






          share|improve this answer











          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214462%2ffind-all-strings-matching-a-giving-regex-pattern-in-files-in-a-directory-includ%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            Some improvements





            • Style



              Please indent your file properly, since indentation is important in Python, those lines like




              if path.isdir(pth):   res.update(regex_search(regex, pth))



              Are frowned upon, instead do



              if path.isdir(pth):
              res.update(regex_search(regex, pth))



            • Use glob for listing files in a directory



              With Python3.5+ glob is the easiest way to list all files in a directory and subdirectory, before you should use os.walk()




            • Use generators when appropriate



              This will save some memory space, as it doesn't have to append to the temporary list all the time




            • Use argparse over sys.argv[]



              Argparse is the module for CLI input, easy to use and has a ton of features I definitely recommend it!




            Code



            import argparse
            import glob
            import re
            import os
            import pathlib

            def regex_search(regex, directory):
            for f in glob.glob(f"{directory}**/*.*", recursive=True):
            with open(f) as _file:
            for i, line in enumerate(_file.readlines()):
            if regex.search(line):
            yield f"In file {f} matched: {line.rstrip()} at position: {i}"

            def parse_args():
            parser = argparse.ArgumentParser(
            usage='%(prog)s [options] <regex> <directory>',
            formatter_class=argparse.RawDescriptionHelpFormatter
            )
            parser.add_argument('regex', type=str)
            parser.add_argument('directory', type=str)
            args = parser.parse_args()

            try:
            rgx = re.compile(args.regex)
            except Exception as e:
            parser.error('Regex does not compile')
            directory = pathlib.Path(args.directory)
            if not os.path.isdir(directory):
            parser.error('Directory is not valid')
            return rgx, directory

            if __name__ == '__main__':
            regex, directory = parse_args()
            for match in regex_search(regex, directory):
            print(match)


            Bonus Round!



            grep is a Unix tool that can basically do this by default



            grep -Hrn 'search term' path/to/dir


            Where:





            • -H prints the matching line


            • -r Does a recursive search


            • -n prints the line number






            share|improve this answer











            $endgroup$


















              1












              $begingroup$

              Some improvements





              • Style



                Please indent your file properly, since indentation is important in Python, those lines like




                if path.isdir(pth):   res.update(regex_search(regex, pth))



                Are frowned upon, instead do



                if path.isdir(pth):
                res.update(regex_search(regex, pth))



              • Use glob for listing files in a directory



                With Python3.5+ glob is the easiest way to list all files in a directory and subdirectory, before you should use os.walk()




              • Use generators when appropriate



                This will save some memory space, as it doesn't have to append to the temporary list all the time




              • Use argparse over sys.argv[]



                Argparse is the module for CLI input, easy to use and has a ton of features I definitely recommend it!




              Code



              import argparse
              import glob
              import re
              import os
              import pathlib

              def regex_search(regex, directory):
              for f in glob.glob(f"{directory}**/*.*", recursive=True):
              with open(f) as _file:
              for i, line in enumerate(_file.readlines()):
              if regex.search(line):
              yield f"In file {f} matched: {line.rstrip()} at position: {i}"

              def parse_args():
              parser = argparse.ArgumentParser(
              usage='%(prog)s [options] <regex> <directory>',
              formatter_class=argparse.RawDescriptionHelpFormatter
              )
              parser.add_argument('regex', type=str)
              parser.add_argument('directory', type=str)
              args = parser.parse_args()

              try:
              rgx = re.compile(args.regex)
              except Exception as e:
              parser.error('Regex does not compile')
              directory = pathlib.Path(args.directory)
              if not os.path.isdir(directory):
              parser.error('Directory is not valid')
              return rgx, directory

              if __name__ == '__main__':
              regex, directory = parse_args()
              for match in regex_search(regex, directory):
              print(match)


              Bonus Round!



              grep is a Unix tool that can basically do this by default



              grep -Hrn 'search term' path/to/dir


              Where:





              • -H prints the matching line


              • -r Does a recursive search


              • -n prints the line number






              share|improve this answer











              $endgroup$
















                1












                1








                1





                $begingroup$

                Some improvements





                • Style



                  Please indent your file properly, since indentation is important in Python, those lines like




                  if path.isdir(pth):   res.update(regex_search(regex, pth))



                  Are frowned upon, instead do



                  if path.isdir(pth):
                  res.update(regex_search(regex, pth))



                • Use glob for listing files in a directory



                  With Python3.5+ glob is the easiest way to list all files in a directory and subdirectory, before you should use os.walk()




                • Use generators when appropriate



                  This will save some memory space, as it doesn't have to append to the temporary list all the time




                • Use argparse over sys.argv[]



                  Argparse is the module for CLI input, easy to use and has a ton of features I definitely recommend it!




                Code



                import argparse
                import glob
                import re
                import os
                import pathlib

                def regex_search(regex, directory):
                for f in glob.glob(f"{directory}**/*.*", recursive=True):
                with open(f) as _file:
                for i, line in enumerate(_file.readlines()):
                if regex.search(line):
                yield f"In file {f} matched: {line.rstrip()} at position: {i}"

                def parse_args():
                parser = argparse.ArgumentParser(
                usage='%(prog)s [options] <regex> <directory>',
                formatter_class=argparse.RawDescriptionHelpFormatter
                )
                parser.add_argument('regex', type=str)
                parser.add_argument('directory', type=str)
                args = parser.parse_args()

                try:
                rgx = re.compile(args.regex)
                except Exception as e:
                parser.error('Regex does not compile')
                directory = pathlib.Path(args.directory)
                if not os.path.isdir(directory):
                parser.error('Directory is not valid')
                return rgx, directory

                if __name__ == '__main__':
                regex, directory = parse_args()
                for match in regex_search(regex, directory):
                print(match)


                Bonus Round!



                grep is a Unix tool that can basically do this by default



                grep -Hrn 'search term' path/to/dir


                Where:





                • -H prints the matching line


                • -r Does a recursive search


                • -n prints the line number






                share|improve this answer











                $endgroup$



                Some improvements





                • Style



                  Please indent your file properly, since indentation is important in Python, those lines like




                  if path.isdir(pth):   res.update(regex_search(regex, pth))



                  Are frowned upon, instead do



                  if path.isdir(pth):
                  res.update(regex_search(regex, pth))



                • Use glob for listing files in a directory



                  With Python3.5+ glob is the easiest way to list all files in a directory and subdirectory, before you should use os.walk()




                • Use generators when appropriate



                  This will save some memory space, as it doesn't have to append to the temporary list all the time




                • Use argparse over sys.argv[]



                  Argparse is the module for CLI input, easy to use and has a ton of features I definitely recommend it!




                Code



                import argparse
                import glob
                import re
                import os
                import pathlib

                def regex_search(regex, directory):
                for f in glob.glob(f"{directory}**/*.*", recursive=True):
                with open(f) as _file:
                for i, line in enumerate(_file.readlines()):
                if regex.search(line):
                yield f"In file {f} matched: {line.rstrip()} at position: {i}"

                def parse_args():
                parser = argparse.ArgumentParser(
                usage='%(prog)s [options] <regex> <directory>',
                formatter_class=argparse.RawDescriptionHelpFormatter
                )
                parser.add_argument('regex', type=str)
                parser.add_argument('directory', type=str)
                args = parser.parse_args()

                try:
                rgx = re.compile(args.regex)
                except Exception as e:
                parser.error('Regex does not compile')
                directory = pathlib.Path(args.directory)
                if not os.path.isdir(directory):
                parser.error('Directory is not valid')
                return rgx, directory

                if __name__ == '__main__':
                regex, directory = parse_args()
                for match in regex_search(regex, directory):
                print(match)


                Bonus Round!



                grep is a Unix tool that can basically do this by default



                grep -Hrn 'search term' path/to/dir


                Where:





                • -H prints the matching line


                • -r Does a recursive search


                • -n prints the line number







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited yesterday

























                answered yesterday









                LudisposedLudisposed

                8,32722161




                8,32722161






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Code Review Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214462%2ffind-all-strings-matching-a-giving-regex-pattern-in-files-in-a-directory-includ%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

                    How do i solve the “ No module named 'mlxtend' ” issue on Jupyter?

                    Pilgersdorf Inhaltsverzeichnis Geografie | Geschichte | Bevölkerungsentwicklung | Politik | Kultur...