Benutzer-Werkzeuge

Webseiten-Werkzeuge


python:python_copy_image_files

Python 3.4 Code Beispiel - Skripte um Bild Dateien zu importieren und Duplikate zu löschen

Python 3.4

Bilder importieren

Mit der Library exifread können sehr einfach die Exif Daten aus einer Bild Datei ausgewertet werden.

Ziel ist es aus einem Quellverzeichnis alle Bilder aufzulisten und je nach Datum wieder in einen Ordner pro jeweiligen Tag auf dem Zielsystem zu importieren.

Liegen die Exif Daten nicht vor, wird der Zeitstempel der letzten Modifikation an der Datei verwendet.

Der Aufruf erfolgt mit der Angabe des Source Verzeichnisses, des Destination Verzeichnisses und der Unterordnertiefe im Source Verzeichnis.

Beispiel:

D:\Python34\python.exe .\importImg.py -h
 
usage: importImg.py  -s <src> -d <dest> -r <recursive Level>
 
#wie
 
PS D:\entwicklung\work\python\ImageImp> D:\Python34\python.exe .\importImg.py -s "F:\DCIM" -d D:\data\bilder -r 1
--========================================
-- Info  :: Read all files from F:\DCIM\*\*.m*
-- Info  :: Copy files to       D:\data\bilder
...
-- Info  :: Copy Image P1070277.JPG   to directory D:\data\bilder\20150415
...
 
-- Info  :: Directory still exits :: D:\data\bilder\20150524
-- Info  :: File P1080259.JPG exits with the same content
--========================================
-- Finish with           :: 1549 files in 0 new directories
-- Found duplicate files :: 1541
-- The run needs         :: 618.5560 seconds
-- Copy size             :: 142.031 MB
--========================================

Funktion

Die wichtigen Schritte in dem Skript:

  • Parameter erkennen ( opts, args = getopt.getopt(argv, „hs:d:r:“, [„src=“, „dest=“, „rec=“]))
  • Alle Dateien in der Quelle in eine Liste einlesen ( fileList = glob.glob(path_name) )
  • Liste auf unerwünschte Element filtern ( file.endswith(thumbsDBFile): fileList.remove(file))
  • Über die Liste der Dateien iterieren (for file in fileList:)
  • Nur den Namen der Datei auswerten ( imgFilename = ntpath.basename(file))
  • Datei readonly öffnen ( imgFile = open(file, 'rb'))
  • Exif Tag „EXIF DateTimeDigitized“ auslesen (tags = exifread.process_file(imgFile, stop_tag='EXIF DateTimeDigitized'))
  • Exif String in ein Datum wandlen ( createDate = datetime.datetime.strptime(str(tags['EXIF DateTimeDigitized']), '%Y:%m:%d %H:%M:%S'))
  • Prüfen ob in akuellen Run schon das Verzeichnis angelegt wurde, ansonst neu anlegen ( os.makedirs(dirPath))
  • Prüfen ob die Datei im Ziel schon existiert ( compare = filecmp.cmp(imgFile.name, newFileName) )
  • Fall nicht Datei kopieren ( shutil.copy2(imgFile.name, dirPath) )
  • Falls ja, neuen Namen erstellen (rekursive Funktion createNewName) und unter dem neuen Namen erstellen, falls die Datei nicht bereits unter einen ähnlichen Namen im Verzeichnis existiert

Der Code

importImg.py
__author__ = 'gpipperr'
 
import datetime, time
import glob, filecmp, ntpath, shutil
import os, errno, sys, getopt
 
# Library to read the exif informatoin
# load with .\python -m pip install exifread  --upgrade
# from https://pypi.python.org/pypi/ExifRead
import exifread
 
 
# Get the change date of a file
def modification_date(filename):
    t = os.path.getmtime(filename)
    print("-- Info  :: EXIF data not available - use  mTime {0} of the file".format(datetime.datetime.fromtimestamp(t)))
    return datetime.datetime.fromtimestamp(t)
 
 
# check for a unique filename in the import directory
# if the new unique name still exits check if this file is the same as the original file
# if yes do not copy and search the next unique file name until a new name is found
def createNewName(filename, importDir, fileNo, origFile):
    firstPart = filename.split(".")[0]
    extension = filename.split(".")[1]
    newFName = firstPart + "-" + str(fileNo) + "." + extension
    # If exits
    if os.path.isfile(importDir + os.path.sep + newFName):
        # Check if this new Name is still the original file
        # if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.
        fcompare = filecmp.cmp(origFile, importDir + os.path.sep + newFName,shallow=False)
        if fcompare:
            print("-- Info  :: File {0} still exits with same content as original file {1}".format(newFName, origFile))
            return newFName
        else:
            fileNo += 1
            # call again to find the next possible name
            return createNewName(newFName, importDir, fileNo, origFile)
    else:
        # Copy the new file and return the name of the new file
        shutil.copy2(origFile, importDir + os.path.sep + newFName)
        setStatisticTotalSize(os.path.getsize(importDir + os.path.sep + newFName))
        print("-- Info  :: File {0} exits but with other content, create new File {1}".format(origFile,
                                                                                              importDir + os.path.sep + newFName))
        return newFName
 
 
# Remember the global Size of all copied files
def setStatisticTotalSize(size):
    global totalFileSize
    totalFileSize += size
 
 
# global for the total filesize
totalFileSize = 0
 
 
# Main Script part
def main(argv):
 
    # Parameter 1   - Import Directory
    # Parameter 2   - Image Main Folder
    # Parameter 3   - Subfolder Level
 
    path_name = '-'
    dest_name = '-'
    recursiveLevel = 0
 
    try:
        opts, args = getopt.getopt(argv, "hs:d:r:", ["src=", "dest=", "rec="])
    except getopt.GetoptError:
        print("usage: importImg.py  -s <src> -d <dest> -r <recursive Level>")
        sys.exit(2)
 
    for opt, arg in opts:
        if opt == '-h':
            print("usage: importImg.py  -s <src> -d <dest> -r <recursive Level>")
            sys.exit()
        elif opt in ("-s", "--src"):
            path_name = arg
        elif opt in ("-d", "--dest"):
            dest_name = arg
        elif opt in ("-r", "--rec"):
            recursiveLevel = int(arg)
 
    # check if Directory exists and if the * is necessary
    # Source
    if os.path.isdir(path_name):
        if path_name.endswith(os.path.sep):
            path_name += ("*" + os.path.sep) * recursiveLevel
            path_name += "*.*"
        else:
            path_name += os.path.sep
            path_name += ("*" + os.path.sep) * recursiveLevel
            path_name += "*.*"
    else:
        print("-- Error :: 05 Source Directory (-s) {0} not found".format(path_name))
        print("usage: importImg.py  -s <src> -d <dest> ")
        sys.exit(2)
 
    # Destination
    # check and strip last / if necessary
    if not os.path.isdir(dest_name):
        print("-- Error :: 04 Destination Directory (-d) {0} not found".format(dest_name))
        print("usage: importImg.py  -s <src> -d <dest> ")
        sys.exit(2)
    else:
        if dest_name.endswith(os.path.sep):
            dest_name = dest_name[:-1]
 
    # Remember the start time of the program
    start_time = time.clock()
 
    print("--" + 40 * "=")
    print("-- Info  :: Read all files from {0}".format(path_name))
    print("-- Info  :: Copy files to       {0}".format(dest_name))
    print("--" + 40 * "=")
 
    fileCount = 0
    fileExistsCount = 0
    dirCount = 0
    dirPathList = []
 
    # Get the list of all Files
    fileList = glob.glob(path_name)
 
    # remove Thumbs.db if exist from the list
    # Internal Windows file no need to copy it
    thumbsDBFile = "Thumbs.db"
    for file in fileList:
        if file.endswith(thumbsDBFile):
            fileList.remove(file)
 
    # Loop one read files in Import Directory
    for file in fileList:
        fileCount += 1
        createDate = datetime.datetime.now()
        imgFilename = '-'
        newFileName = '-'
        try:
            # get only the filename without the path
            imgFilename = ntpath.basename(file)
 
            # Open image file for reading (binary mode)
            imgFile = open(file, 'rb')
 
            # Read the image tags, if not possible read last change date
            try:
                tags = exifread.process_file(imgFile, stop_tag='EXIF DateTimeDigitized')
                # Transform to a real date
                # https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
                # 2010:08:22 14:13:42  %Y:%m:%d %H:%M:%S
                createDate = datetime.datetime.strptime(str(tags['EXIF DateTimeDigitized']), '%Y:%m:%d %H:%M:%S')
            except:
                # if no exif tag use the last modification date
                # print("file with not exif information ::{0}".format(imgfile.name))
                createDate = modification_date(file)
 
            # Create Import Directory if not exits
            # Remember the directory after the first create
            # to avoid exception with still existing directories
            dirPath = dest_name + os.path.sep + "{0:%Y%m%d}".format(createDate)
            try:
                if dirPath not in dirPathList:
                    dirPathList.append(dirPath)
                    os.makedirs(dirPath)
                    dirCount += 1
                    print("-- Info  :: Create Directory :: {0}".format(dirPath))
            except OSError as exception:
                if exception.errno != errno.EEXIST:
                    print(
                        "-- Error :: 03 Directory {0} creation error :: see error {1}".format(dirPath,
                                                                                              sys.exc_info()[0]))
                else:
                    print("-- Info  :: Directory still exits :: {0}".format(dirPath))
                    pass
 
            # Copy the file to the new directory
            newFileName = dirPath + os.path.sep + imgFilename
            try:
                # Check if the same filename still exists
                if os.path.isfile(newFileName):
                     # if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.
                    compare = filecmp.cmp(imgFile.name, newFileName,shallow=False)
                    if compare:
                        print("-- Info  :: File {0} exits with the same content".format(imgFilename))
                        fileExistsCount += 1
                    else:
                        newUniqueFileName = createNewName(imgFilename, dirPath, 0, imgFile.name)
                else:
                    # copy2 preserves the original modification and access info (mtime and atime) in the file metadata.
                    shutil.copy2(imgFile.name, dirPath)
                    setStatisticTotalSize(os.path.getsize(newFileName))
                    print("-- Info  :: Copy Image {0:50} to directory {1}".format(imgFilename, dirPath))
            except OSError as exception:
                print("-- Error :: 02 File {0} in directory {1} :: error {2}".format(imgFile.name, dirPath,
                                                                                     sys.exc_info()[0]))
 
            if not imgFile.closed:
                imgFile.close()
        except:
            print("-- Error :: 01 Error with {0} in {1} :: error {2}".format(imgFile.name, path_name, sys.exc_info()))
            pass
 
    # print statistics
    print("--" + 40 * "=")
    print("-- Finish with           :: {0} files in {1} new directories".format(fileCount, dirCount))
    print("-- Found duplicate files :: {0}".format(fileExistsCount))
    print("-- The run needs         :: {0:5.4f} seconds".format(time.clock() - start_time))
    print("-- Copy size             :: {0:5.3f} MB".format(totalFileSize / 1024 / 1024))
    print("--" + 40 * "=")
 
if __name__ == "__main__":
    main(sys.argv[1:]);

Dubletten suchen

Im ersten Schritt wird in einem einzelnen Verzeichnis über das Vergleichen der Dateien nach Doppelten gesucht.

D:\Python34\python.exe .\removeDuplicateFiles.py -h
 
usage: removeDuplicateFiles.py  -s <src> -t <temp path>
 
 
D:\Python34\python.exe .\removeDuplicateFiles.py -s D:\data\bilder\20110915 -t D:\temp\saveimg
 
 
D:\Python34\python.exe .\removeDuplicateFiles.py -s D:\temp\20140818\ -t D:\temp\saveimg
 
--========================================
-- Info  :: Read all files in  :: D:\temp\20140818\*.*
-- Info  :: Copy duplicates to :: D:\temp\saveimg
--========================================
-- Info  :: Check File "D:\temp\20140818\DSCN0025.TIF"
-- Info  :: File "D:\temp\20140818\DSCN9301 - Copy.TIF    " exits with the same content as file D:\temp\20140818\DSCN9301.TIF
 
...
 
--========================================
 
...
 
-- Info  :: Move Duplicate File "D:\temp\20140818\DSCN9301 - Copy.TIF    " to "D:\temp\saveimg\DSCN9301 - Copy.TIF"
--========================================
--========================================
-- Finish with           :: 17 files in directorie D:\temp\20140818\*.*
-- Found duplicate files :: 2
-- The run needs         :: 0.2044 seconds
--========================================

In der nächsten Lösung wird ein kompletter Dateibaum eingelesen, die Hashes aller Dateien gelesen und dann die doppelten aussortiert siehe auch Dateien in Python hashen .

Funktion:

  • Parameter erkennen ( opts, args = getopt.getopt(argv, „hs:t:“, [„src=“, „tmp=“]))
  • Alle Dateien in der Quelle in eine Liste einlesen ( masterFileList = glob.glob(path_name))
  • Über die Liste der Dateien iterieren (for masterfile in fileList:)
  • Mit jeder Datei über die Dateien in dem Verzeichnis iterieren (for cfile in slaveFileList:)
  • Datei vergleichen (compare = filecmp.cmp(masterfile, cfile, shallow=False))
  • Falls eine doppelte Datei gefunden, diese in das Temp Verzeichnis kopieren, dabei prüfen ob die Datei im Ziel schon existiert mit moveDuplicateFile

Code

removeDuplicateFiles.py
__author__ = 'gpipperr'
 
import datetime, time
import glob, filecmp, ntpath, shutil
import os, errno, sys, getopt
 
 
# check for a unique filename in the temp directory and move the file to the tmp directory
# if the new unique name still exits check if this file is the same as the original file
# if yes do not copy and search the next unique file name until a new name is found
def moveDuplicateFile(filename, tempDir, fileNo, origFile):
    firstPart = filename.split(".")[0]
    extension = filename.split(".")[1]
    newFName = firstPart + ("-" + str(fileNo) if fileNo > 0 else "") + "." + extension
    # If exits
    if os.path.isfile(tempDir + os.path.sep + newFName):
        # Check if this new Name is still the original file
        fcompare = filecmp.cmp(origFile, tempDir + os.path.sep + newFName, shallow=False)
        if fcompare:
            # as the original file is still save - delete the original one
            os.remove(origFile)
            print("-- Info  :: Delete File \"{0:40}\" - still exits with the same content in \"{1}\"".format(origFile,
                                                                                                          newFName))
            return newFName
        else:
            fileNo += 1
            # call again to find the next possible name
            return moveDuplicateFile(newFName, tempDir, fileNo, origFile)
    else:
        # Copy the new file and return the name of the new file
        shutil.move(origFile, tempDir + os.path.sep + newFName)
        print("-- Info  :: Move Duplicate File \"{0:40}\" to \"{1}\"".format(origFile, tempDir + os.path.sep + newFName))
        return newFName
 
 
# Main Script part
def main(argv):
    # Parameter 1   - Image Main Folder
    path_name = '-'
    temp_path = 'd:\\temp'
    recursiveLevel = 0
    usageString = "usage: removeDuplicateFiles.py  -s <src> -t <temp path>"
 
    try:
        opts, args = getopt.getopt(argv, "hs:t:", ["src=", "tmp="])
    except getopt.GetoptError:
        print(usageString)
        sys.exit(2)
 
    for opt, arg in opts:
        if opt == '-h':
            print(usageString)
            sys.exit()
        elif opt in ("-s", "--src"):
            path_name = arg
        elif opt in ("-t", "--tmp"):
            temp_path = arg
 
    # check if Directory exists and if the * is necessary
    # Source
    if os.path.isdir(path_name):
        if path_name.endswith(os.path.sep):
            path_name += ("*" + os.path.sep) * recursiveLevel
            path_name += "*.*"
        else:
            path_name += os.path.sep
            path_name += ("*" + os.path.sep) * recursiveLevel
            path_name += "*.*"
    else:
        print("-- Error :: 03 Source Directory (-s) {0} not found".format(path_name))
        print(usageString)
        sys.exit(2)
 
    # Temp Destination
    # check and strip last / if necessary
    if not os.path.isdir(temp_path):
        print("-- Error :: 02 temp Directory (-t) {0} not found".format(temp_path))
        print(usageString)
        sys.exit(2)
    else:
        if temp_path.endswith(os.path.sep):
            dest_name = temp_path[:-1]
 
    # Remember the start time of the program
    start_time = time.clock()
 
    print("--" + 40 * "=")
    print("-- Info  :: Read all files in  :: {0}".format(path_name))
    print("-- Info  :: Copy duplicates to :: {0}".format(temp_path))
    print("--" + 40 * "=")
 
    fileCount = 0
    fileExistsCount = 0
 
    # Get the list of all Files
    masterFileList = glob.glob(path_name)
    slaveFileList = glob.glob(path_name)
    candiateFile = []
 
    # Loop one read files in Import Directory
    for masterfile in masterFileList:
        fileCount += 1
        createDate = datetime.datetime.now()
 
        # Loop again over all files
        # compare the files, if a match found remove from list
        print("-- Info  :: Check File \"{0}\"".format(masterfile))
        for cfile in slaveFileList:
            # only if not the same file
            if masterfile != cfile:
                # if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.
                compare = filecmp.cmp(masterfile, cfile, shallow=False)
                if compare:
                    print("-- Info  :: File \"{0:40}\" exits with the same content as file {1}".format(masterfile, cfile))
 
                    # remove only if still exits
                    # if more then one file is identical
                    # you need more then one run
                    if masterfile in slaveFileList:
                        slaveFileList.remove(masterfile)
 
                    # Add the file with the longest name to the duplicate file list
                    longestFileName = masterfile if len(masterfile) > len(cfile) else cfile
                    # Avoid duplicate entries
                    if longestFileName not in candiateFile:
                        candiateFile.append(longestFileName)
                        fileExistsCount += 1
 
    # Do something with the duplicates
    print("--" + 40 * "=")
 
    for file in candiateFile:
        # move the files to temp
        try:
            imgFilename = ntpath.basename(file)
            moveDuplicateFile(filename=imgFilename, tempDir=temp_path, fileNo=0, origFile=file)
        except:
            print("-- Error :: 01 - Move File {0} :: error {1}:".format(file, sys.exc_info()))
            pass
 
    if fileExistsCount < 1:
        print("-- Found no duplicate files in directory {0}".format(path_name))
 
    print("--" + 40 * "=")
 
    # print statistics
 
    print("--" + 40 * "=")
    print("-- Finish with           :: {0} files in directorie {1}".format(fileCount, path_name))
    print("-- Found duplicate files :: {0}".format(fileExistsCount))
    print("-- The run needs         :: {0:5.4f} seconds".format(time.clock() - start_time))
    print("--" + 40 * "=")
 
 
if __name__ == "__main__":
    main(sys.argv[1:]);

In eine Exe Datei wandeln

für Python 2.x siehe http://www.py2exe.org/, für höhere Python Versionen siehe http://cx-freeze.sourceforge.net/

Installation von cx_Freeze

 .\python -m pip install cx_Freeze --upgrade

Erzeugen eines Exe Datei:

D:\Python34\python.exe D:\Python34\Scripts\cxfreeze  .\importImg.py --target-dir dist

Unter dem Unterverzeichnis „dist“ liegt nur alles was notwenig ist um auch ohne installiert Python Umgebung das Script als EXE zu starten.

Android Mobil Telefon einbinden

Als nächstes sollen auch von einen Android Mobil Telefon die Bild Daten importiert werden.

Das Problem ist nun aber, das unter Windows keine Laufwerksbuchstabe für den Handy Speicher vergibt.

Lösungsbeispiel für die Powershell

Pyhton

Wie kann das nun aber auch in Python gelöst werden? In Enddefekt muss es ja unter Windows auch eine Art Device Pointer geben, der direkt angesprochen werden kann, im Explorer ist das Laufwerk ja auch sichtbar.

Ideen:

Cookies helfen bei der Bereitstellung von Inhalten. Durch die Nutzung dieser Seiten erklären Sie sich damit einverstanden, dass Cookies auf Ihrem Rechner gespeichert werden. Weitere Information
"Autor: Gunther Pipperr"
python/python_copy_image_files.txt · Zuletzt geändert: 2015/10/25 20:00 von Gunther Pippèrr