Stop wasting your time with boring file manipulation

Rafaela Pinter
4 min readOct 23, 2021

Python is a high-level general-purpused programming language that allows you to create very useful programs. Through the last decade, it has been well known for its application in Data Science, but it goes way beyond that.

Photo by Maksym Kaharlytskyi on Unsplash

At the day that this article has been written, there was 334,904 modules in PyPI, so you can imagine that there is a Python package for almost everything that you can imagine.

One of these modules is called “os”. This module is part of the Python Standard Library and it provides a portable way of using operating system dependent functionality. Another useful module is “shutil”. Its functions offer high-level operations on files and collections of files, file copying and removal.

Indeed, knowing the automate some OS functionalities may save us lots of time to really focus on what is important for us. So, my goal here is to give you an example of using Python to automate file manipulation. Thus, this article will cover:

  1. Unzipping files
  2. Moving files
  3. Renaming files
  4. Removing files

Without further due, let’s get out hand dirty with some examples!

Let's say you are studying a new subject and a friend of yours sends you all the books they have about it. All of a sudden you have 100 different zip files, each of them containing some books inside:

  • 1.zip
  • 2.zip
  • 3.zip
  • 4.zip
  • …and so on…
  • 100.zip

Yup, they love this subject. Let's say that you are just like me: a little lazy but super into automating the boring stuff. The first step is:

1.Unzipping files

The good news is that it is simple. All you have to do is save all your zip files in one single folder (here I called 'your/path') and run the script below:

import zipfile
import os
## Path to your 100 zip files
path = 'your/path'
## Folder we extract the files into
extraction_path = 'unziped_files'
## Loop through all your folder's files
for zip_file in os.listdir(path):
## Checking if the file has the .zip extension
if zip_file.endswith(".zip"):
## Performing the extraction
with zipfile.ZipFile(path + zip_file, 'r') as zip_ref:
zip_ref.extractall(extraction_path)

Now you just unzipped all the 100 files at once. Pretty nice. But wait. Now if you see something like this:

1/
---abc.txt
---def.txt
---A/
------ghi.txt
2/
---jkl (1).txt
---mno.txt
---B
------abc.txt
3
---qrs.txt
---tuv.txt
---B
...

you have now 100 folders containing subfolders containing your books. It is more practical to have all text files in the same main folder. In order to accomplish that, our second step is:

2. Moving files

There are different strategies here. If you have extracted all the 100 zip files and each one of them also have subfolders containing the books, you need to make your directory flat your primary folder so all books are together. To do this, we use os.walk and os.shutil:

import zipfile
import os
## function to move files
def moving(folder, new_path):
for folderName, subfolders, filenames in os.walk(folder):

for filename in filenames:
shutil.move(folderName+'/'+filename, new_path+filename)
## New location
my_new_path = 'new/path/'
## Iterating through our extracted folders
for i in range(1,101):
moving(i, my_new_path)

Ok, finally we have all our books in one single folder. But you might notice here that we have three situations:

  • Some files appear only once;
  • Some files are duplicated and have end with "(1)";
  • Some files are not duplicated and end with "(1)".
abc.txt
abc (1).txt
def.txt
ghi.txt
jkl (1).txt
mno.txt
qrs.txt
tuv.txt

For our final step:

3. Renaming and removing files

Alright. This is our pseudo-algorithm:

  • Get a list of all the files in the directory
  • Check if the file contains a single space (" ")
  • If so, check if it has a duplicate
  • If it has a duplicate, remove the file
  • If it does not have a duplicate and has a " ", rename the file (we do not want the " (1)" part)

The code for this last step is, then:

import os
import shutil
## Location of the files
path = "new/path/"
## Creating a list for our files
list_txt = []
## Iterating through the folder
for txt_file in os.listdir(path):
## Adding only the text files to our list
if txt_file.endswith(".txt"):
list_txt.append(txt_file)
## Now we do the magic
for txt_file in list_txt:
## Splitting the file name so we know which one has ' (1)'
splitted = txt_file.split(' ')
## If it does have a ' (1)', the length is bigger than one
if len(splitted) > 1:
if splitted[0] + '.txt' in list_txt:
os.remove(path + txt_file)
else:
os.rename(path + txt_file, path + splitted[0] + '.txt')

Coooool!

We have accomplished unzipping, moving, removing duplicates and renaming our files. Our main folder should look like this and we are all set to start studying!

abc.txt
def.txt
ghi.txt
jkl.txt
mno.txt
qrs.txt
tuv.txt

Thank you for reading this article! Stay tuned for more content ❤

--

--

Rafaela Pinter

Bioprocess Engineer | Data Science enthusiastic | Bassist