=====Python 3.4 Code Beispiel - Skripte um Bild Dateien zu importieren und Duplikate zu löschen ===== **Python 3.4** ==== Bilder importieren ==== Mit der Library [[https://pypi.python.org/pypi/ExifRead|exifread]] können sehr einfach die [[https://de.wikipedia.org/wiki/Exchangeable_Image_File_Format|Exif Daten]] aus einer Bild Datei ausgewertet werden. Ziel ist es aus einem Quellverzeichnis alle Bilder aufzulisten und je nach Datum wieder in einen Ordner pro jeweiligen Tag auf dem Zielsystem zu importieren. Liegen die Exif Daten nicht vor, wird der Zeitstempel der letzten Modifikation an der Datei verwendet. Der Aufruf erfolgt mit der Angabe des Source Verzeichnisses, des Destination Verzeichnisses und der Unterordnertiefe im Source Verzeichnis. Beispiel: D:\Python34\python.exe .\importImg.py -h usage: importImg.py -s -d -r #wie PS D:\entwicklung\work\python\ImageImp> D:\Python34\python.exe .\importImg.py -s "F:\DCIM" -d D:\data\bilder -r 1 --======================================== -- Info :: Read all files from F:\DCIM\*\*.m* -- Info :: Copy files to D:\data\bilder ... -- Info :: Copy Image P1070277.JPG to directory D:\data\bilder\20150415 ... -- Info :: Directory still exits :: D:\data\bilder\20150524 -- Info :: File P1080259.JPG exits with the same content --======================================== -- Finish with :: 1549 files in 0 new directories -- Found duplicate files :: 1541 -- The run needs :: 618.5560 seconds -- Copy size :: 142.031 MB --======================================== ===Funktion === Die wichtigen Schritte in dem Skript: * Parameter erkennen ( opts, args = getopt.getopt(argv, "hs:d:r:", ["src=", "dest=", "rec="])) * Alle Dateien in der Quelle in eine Liste einlesen ( fileList = glob.glob(path_name) ) * Liste auf unerwünschte Element filtern ( file.endswith(thumbsDBFile): fileList.remove(file)) * Über die Liste der Dateien iterieren (for file in fileList:) * Nur den Namen der Datei auswerten ( imgFilename = ntpath.basename(file)) * Datei readonly öffnen ( imgFile = open(file, 'rb')) * Exif Tag "EXIF DateTimeDigitized" auslesen (tags = exifread.process_file(imgFile, stop_tag='EXIF DateTimeDigitized')) * Exif String in ein Datum wandlen ( createDate = datetime.datetime.strptime(str(tags['EXIF DateTimeDigitized']), '%Y:%m:%d %H:%M:%S')) * Prüfen ob in akuellen Run schon das Verzeichnis angelegt wurde, ansonst neu anlegen ( os.makedirs(dirPath)) * Prüfen ob die Datei im Ziel schon existiert ( compare = filecmp.cmp(imgFile.name, newFileName) ) * Fall nicht Datei kopieren ( shutil.copy2(imgFile.name, dirPath) ) * Falls ja, neuen Namen erstellen (rekursive Funktion createNewName) und unter dem neuen Namen erstellen, falls die Datei nicht bereits unter einen ähnlichen Namen im Verzeichnis existiert === Der Code === __author__ = 'gpipperr' import datetime, time import glob, filecmp, ntpath, shutil import os, errno, sys, getopt # Library to read the exif informatoin # load with .\python -m pip install exifread --upgrade # from https://pypi.python.org/pypi/ExifRead import exifread # Get the change date of a file def modification_date(filename): t = os.path.getmtime(filename) print("-- Info :: EXIF data not available - use mTime {0} of the file".format(datetime.datetime.fromtimestamp(t))) return datetime.datetime.fromtimestamp(t) # check for a unique filename in the import directory # if the new unique name still exits check if this file is the same as the original file # if yes do not copy and search the next unique file name until a new name is found def createNewName(filename, importDir, fileNo, origFile): firstPart = filename.split(".")[0] extension = filename.split(".")[1] newFName = firstPart + "-" + str(fileNo) + "." + extension # If exits if os.path.isfile(importDir + os.path.sep + newFName): # Check if this new Name is still the original file # if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared. fcompare = filecmp.cmp(origFile, importDir + os.path.sep + newFName,shallow=False) if fcompare: print("-- Info :: File {0} still exits with same content as original file {1}".format(newFName, origFile)) return newFName else: fileNo += 1 # call again to find the next possible name return createNewName(newFName, importDir, fileNo, origFile) else: # Copy the new file and return the name of the new file shutil.copy2(origFile, importDir + os.path.sep + newFName) setStatisticTotalSize(os.path.getsize(importDir + os.path.sep + newFName)) print("-- Info :: File {0} exits but with other content, create new File {1}".format(origFile, importDir + os.path.sep + newFName)) return newFName # Remember the global Size of all copied files def setStatisticTotalSize(size): global totalFileSize totalFileSize += size # global for the total filesize totalFileSize = 0 # Main Script part def main(argv): # Parameter 1 - Import Directory # Parameter 2 - Image Main Folder # Parameter 3 - Subfolder Level path_name = '-' dest_name = '-' recursiveLevel = 0 try: opts, args = getopt.getopt(argv, "hs:d:r:", ["src=", "dest=", "rec="]) except getopt.GetoptError: print("usage: importImg.py -s -d -r ") sys.exit(2) for opt, arg in opts: if opt == '-h': print("usage: importImg.py -s -d -r ") sys.exit() elif opt in ("-s", "--src"): path_name = arg elif opt in ("-d", "--dest"): dest_name = arg elif opt in ("-r", "--rec"): recursiveLevel = int(arg) # check if Directory exists and if the * is necessary # Source if os.path.isdir(path_name): if path_name.endswith(os.path.sep): path_name += ("*" + os.path.sep) * recursiveLevel path_name += "*.*" else: path_name += os.path.sep path_name += ("*" + os.path.sep) * recursiveLevel path_name += "*.*" else: print("-- Error :: 05 Source Directory (-s) {0} not found".format(path_name)) print("usage: importImg.py -s -d ") sys.exit(2) # Destination # check and strip last / if necessary if not os.path.isdir(dest_name): print("-- Error :: 04 Destination Directory (-d) {0} not found".format(dest_name)) print("usage: importImg.py -s -d ") sys.exit(2) else: if dest_name.endswith(os.path.sep): dest_name = dest_name[:-1] # Remember the start time of the program start_time = time.clock() print("--" + 40 * "=") print("-- Info :: Read all files from {0}".format(path_name)) print("-- Info :: Copy files to {0}".format(dest_name)) print("--" + 40 * "=") fileCount = 0 fileExistsCount = 0 dirCount = 0 dirPathList = [] # Get the list of all Files fileList = glob.glob(path_name) # remove Thumbs.db if exist from the list # Internal Windows file no need to copy it thumbsDBFile = "Thumbs.db" for file in fileList: if file.endswith(thumbsDBFile): fileList.remove(file) # Loop one read files in Import Directory for file in fileList: fileCount += 1 createDate = datetime.datetime.now() imgFilename = '-' newFileName = '-' try: # get only the filename without the path imgFilename = ntpath.basename(file) # Open image file for reading (binary mode) imgFile = open(file, 'rb') # Read the image tags, if not possible read last change date try: tags = exifread.process_file(imgFile, stop_tag='EXIF DateTimeDigitized') # Transform to a real date # https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior # 2010:08:22 14:13:42 %Y:%m:%d %H:%M:%S createDate = datetime.datetime.strptime(str(tags['EXIF DateTimeDigitized']), '%Y:%m:%d %H:%M:%S') except: # if no exif tag use the last modification date # print("file with not exif information ::{0}".format(imgfile.name)) createDate = modification_date(file) # Create Import Directory if not exits # Remember the directory after the first create # to avoid exception with still existing directories dirPath = dest_name + os.path.sep + "{0:%Y%m%d}".format(createDate) try: if dirPath not in dirPathList: dirPathList.append(dirPath) os.makedirs(dirPath) dirCount += 1 print("-- Info :: Create Directory :: {0}".format(dirPath)) except OSError as exception: if exception.errno != errno.EEXIST: print( "-- Error :: 03 Directory {0} creation error :: see error {1}".format(dirPath, sys.exc_info()[0])) else: print("-- Info :: Directory still exits :: {0}".format(dirPath)) pass # Copy the file to the new directory newFileName = dirPath + os.path.sep + imgFilename try: # Check if the same filename still exists if os.path.isfile(newFileName): # if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared. compare = filecmp.cmp(imgFile.name, newFileName,shallow=False) if compare: print("-- Info :: File {0} exits with the same content".format(imgFilename)) fileExistsCount += 1 else: newUniqueFileName = createNewName(imgFilename, dirPath, 0, imgFile.name) else: # copy2 preserves the original modification and access info (mtime and atime) in the file metadata. shutil.copy2(imgFile.name, dirPath) setStatisticTotalSize(os.path.getsize(newFileName)) print("-- Info :: Copy Image {0:50} to directory {1}".format(imgFilename, dirPath)) except OSError as exception: print("-- Error :: 02 File {0} in directory {1} :: error {2}".format(imgFile.name, dirPath, sys.exc_info()[0])) if not imgFile.closed: imgFile.close() except: print("-- Error :: 01 Error with {0} in {1} :: error {2}".format(imgFile.name, path_name, sys.exc_info())) pass # print statistics print("--" + 40 * "=") print("-- Finish with :: {0} files in {1} new directories".format(fileCount, dirCount)) print("-- Found duplicate files :: {0}".format(fileExistsCount)) print("-- The run needs :: {0:5.4f} seconds".format(time.clock() - start_time)) print("-- Copy size :: {0:5.3f} MB".format(totalFileSize / 1024 / 1024)) print("--" + 40 * "=") if __name__ == "__main__": main(sys.argv[1:]); ==== Dubletten suchen==== Im ersten Schritt wird in einem einzelnen Verzeichnis über das Vergleichen der Dateien nach Doppelten gesucht. D:\Python34\python.exe .\removeDuplicateFiles.py -h usage: removeDuplicateFiles.py -s -t D:\Python34\python.exe .\removeDuplicateFiles.py -s D:\data\bilder\20110915 -t D:\temp\saveimg D:\Python34\python.exe .\removeDuplicateFiles.py -s D:\temp\20140818\ -t D:\temp\saveimg --======================================== -- Info :: Read all files in :: D:\temp\20140818\*.* -- Info :: Copy duplicates to :: D:\temp\saveimg --======================================== -- Info :: Check File "D:\temp\20140818\DSCN0025.TIF" -- Info :: File "D:\temp\20140818\DSCN9301 - Copy.TIF " exits with the same content as file D:\temp\20140818\DSCN9301.TIF ... --======================================== ... -- Info :: Move Duplicate File "D:\temp\20140818\DSCN9301 - Copy.TIF " to "D:\temp\saveimg\DSCN9301 - Copy.TIF" --======================================== --======================================== -- Finish with :: 17 files in directorie D:\temp\20140818\*.* -- Found duplicate files :: 2 -- The run needs :: 0.2044 seconds --======================================== In der nächsten Lösung wird ein kompletter Dateibaum eingelesen, die Hashes aller Dateien gelesen und dann die doppelten aussortiert siehe auch [[python:python_hash_image_files|Dateien in Python hashen ]]. Funktion: * Parameter erkennen ( opts, args = getopt.getopt(argv, "hs:t:", ["src=", "tmp="])) * Alle Dateien in der Quelle in eine Liste einlesen ( masterFileList = glob.glob(path_name)) * Über die Liste der Dateien iterieren (for masterfile in fileList:) * Mit jeder Datei über die Dateien in dem Verzeichnis iterieren (for cfile in slaveFileList:) * Datei vergleichen (compare = filecmp.cmp(masterfile, cfile, shallow=False)) *Falls eine doppelte Datei gefunden, diese in das Temp Verzeichnis kopieren, dabei prüfen ob die Datei im Ziel schon existiert mit moveDuplicateFile ===Code== __author__ = 'gpipperr' import datetime, time import glob, filecmp, ntpath, shutil import os, errno, sys, getopt # check for a unique filename in the temp directory and move the file to the tmp directory # if the new unique name still exits check if this file is the same as the original file # if yes do not copy and search the next unique file name until a new name is found def moveDuplicateFile(filename, tempDir, fileNo, origFile): firstPart = filename.split(".")[0] extension = filename.split(".")[1] newFName = firstPart + ("-" + str(fileNo) if fileNo > 0 else "") + "." + extension # If exits if os.path.isfile(tempDir + os.path.sep + newFName): # Check if this new Name is still the original file fcompare = filecmp.cmp(origFile, tempDir + os.path.sep + newFName, shallow=False) if fcompare: # as the original file is still save - delete the original one os.remove(origFile) print("-- Info :: Delete File \"{0:40}\" - still exits with the same content in \"{1}\"".format(origFile, newFName)) return newFName else: fileNo += 1 # call again to find the next possible name return moveDuplicateFile(newFName, tempDir, fileNo, origFile) else: # Copy the new file and return the name of the new file shutil.move(origFile, tempDir + os.path.sep + newFName) print("-- Info :: Move Duplicate File \"{0:40}\" to \"{1}\"".format(origFile, tempDir + os.path.sep + newFName)) return newFName # Main Script part def main(argv): # Parameter 1 - Image Main Folder path_name = '-' temp_path = 'd:\\temp' recursiveLevel = 0 usageString = "usage: removeDuplicateFiles.py -s -t " try: opts, args = getopt.getopt(argv, "hs:t:", ["src=", "tmp="]) except getopt.GetoptError: print(usageString) sys.exit(2) for opt, arg in opts: if opt == '-h': print(usageString) sys.exit() elif opt in ("-s", "--src"): path_name = arg elif opt in ("-t", "--tmp"): temp_path = arg # check if Directory exists and if the * is necessary # Source if os.path.isdir(path_name): if path_name.endswith(os.path.sep): path_name += ("*" + os.path.sep) * recursiveLevel path_name += "*.*" else: path_name += os.path.sep path_name += ("*" + os.path.sep) * recursiveLevel path_name += "*.*" else: print("-- Error :: 03 Source Directory (-s) {0} not found".format(path_name)) print(usageString) sys.exit(2) # Temp Destination # check and strip last / if necessary if not os.path.isdir(temp_path): print("-- Error :: 02 temp Directory (-t) {0} not found".format(temp_path)) print(usageString) sys.exit(2) else: if temp_path.endswith(os.path.sep): dest_name = temp_path[:-1] # Remember the start time of the program start_time = time.clock() print("--" + 40 * "=") print("-- Info :: Read all files in :: {0}".format(path_name)) print("-- Info :: Copy duplicates to :: {0}".format(temp_path)) print("--" + 40 * "=") fileCount = 0 fileExistsCount = 0 # Get the list of all Files masterFileList = glob.glob(path_name) slaveFileList = glob.glob(path_name) candiateFile = [] # Loop one read files in Import Directory for masterfile in masterFileList: fileCount += 1 createDate = datetime.datetime.now() # Loop again over all files # compare the files, if a match found remove from list print("-- Info :: Check File \"{0}\"".format(masterfile)) for cfile in slaveFileList: # only if not the same file if masterfile != cfile: # if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared. compare = filecmp.cmp(masterfile, cfile, shallow=False) if compare: print("-- Info :: File \"{0:40}\" exits with the same content as file {1}".format(masterfile, cfile)) # remove only if still exits # if more then one file is identical # you need more then one run if masterfile in slaveFileList: slaveFileList.remove(masterfile) # Add the file with the longest name to the duplicate file list longestFileName = masterfile if len(masterfile) > len(cfile) else cfile # Avoid duplicate entries if longestFileName not in candiateFile: candiateFile.append(longestFileName) fileExistsCount += 1 # Do something with the duplicates print("--" + 40 * "=") for file in candiateFile: # move the files to temp try: imgFilename = ntpath.basename(file) moveDuplicateFile(filename=imgFilename, tempDir=temp_path, fileNo=0, origFile=file) except: print("-- Error :: 01 - Move File {0} :: error {1}:".format(file, sys.exc_info())) pass if fileExistsCount < 1: print("-- Found no duplicate files in directory {0}".format(path_name)) print("--" + 40 * "=") # print statistics print("--" + 40 * "=") print("-- Finish with :: {0} files in directorie {1}".format(fileCount, path_name)) print("-- Found duplicate files :: {0}".format(fileExistsCount)) print("-- The run needs :: {0:5.4f} seconds".format(time.clock() - start_time)) print("--" + 40 * "=") if __name__ == "__main__": main(sys.argv[1:]); ==== In eine Exe Datei wandeln ==== für Python 2.x siehe http://www.py2exe.org/, für höhere Python Versionen siehe http://cx-freeze.sourceforge.net/ Installation von cx_Freeze .\python -m pip install cx_Freeze --upgrade Erzeugen eines Exe Datei: D:\Python34\python.exe D:\Python34\Scripts\cxfreeze .\importImg.py --target-dir dist Unter dem Unterverzeichnis "dist" liegt nur alles was notwenig ist um auch ohne installiert Python Umgebung das Script als EXE zu starten. ====Android Mobil Telefon einbinden==== Als nächstes sollen auch von einen Android Mobil Telefon die Bild Daten importiert werden. Das Problem ist nun aber, das unter Windows keine Laufwerksbuchstabe für den Handy Speicher vergibt. === Lösungsbeispiel für die Powershell=== siehe [[https://gist.github.com/cveld/8fa339306f8504095815|Crawl your Android device attached via usb with PowerShell]] === Pyhton=== Wie kann das nun aber auch in Python gelöst werden? In Enddefekt muss es ja unter Windows auch eine Art Device Pointer geben, der direkt angesprochen werden kann, im Explorer ist das Laufwerk ja auch sichtbar. Ideen: * http://stackoverflow.com/questions/827371/is-there-a-way-to-list-all-the-available-drive-letters-in-python * http://timgolden.me.uk/python/wmi/cookbook.html *