Using Python glob
A Basic Guide
glob is described in the docs as:
“Unix style pathname pattern expansion”…So what is that?
Basically, anytime your program needs to look for a list of files on the filesystem, and the
names of these files match a pattern, then
glob will help you get the task done.
The API for using
glob is brief, with three main methods:
glob.glob(pathname, *, recursive=False)
The main guy: Returns a list of pathnames depending on the search criteria passed in with “pathname”
glob.glob, except that it returns an iterator, meaning that not all values get stored in memory - so
can be much more efficient.
Escapes special characters in the passed in “pathname”.
That’s it! This is a succinct yet powerful module.
The pattern rules for glob are not regular expressions. You’ve got the asterisk (*) wildcard for searching all files within a given directory without dropping into subdirectories.
If you have a directory structure like this
. +-- glob_post_dir | filea.txt | fileb.txt | filec.txt | readme.txt | glob_post.py |-------subdirectory | filed.txt | filee.txt
glob_post.py looks like this:
import glob import os CWD = os.getcwd() for name in glob.glob(CWD+'/*'): print(name)
The output will be the full file paths of the five files directly under the
glob_post_dir directory, but not the files
in the subdirectory. To get the paths for the subdirectory files, we will need to adjust the code as follows:
# ... for name in glob.glob(CWD+'/subdirectory/*'): print(name) # or for name in glob.glob(CWD+'/*/*'): print(name)
Note that neither of these return the file paths of those files in the parent directory.
If we wanted to get just
filec.txt then the single character wildcard, the question mark (
is the way to go:
# ... for name in glob.glob(CWD+'/file?.txt'): print(name)
More Advanced Usage
Use character ranges to match a specific character. A dash represents an unbroken range.
If we have this directory structure:
. +-- glob_post_dir | file1.txt | file2.txt | file5.txt | glob_post.py |-------subdirectory | filed.txt | filee.txt
import glob for name in glob.glob(CWD+'/*[1-2].*'): print name # or for name in glob.glob(CWD+'/*.*'): print name
The above code will return only the paths for
file2.txt due to the character range specified.
Starting with Python version 3.5, the glob module supports the “**” directive (which is parsed only if you pass recursive flag).
In our directory structure above, we could code the following in
import glob import os CWD = os.getcwd() for name in glob.glob(CWD+'/**/*', recursive=True): print(name)
This code will return the paths for all files in both the parent and its subdirectory.
fnmatch.fnmatch under the hood.
For recursive file selection in older versions of Python, see this discussion.