如别处所见 ,docx,xlsx和pttx是ZIP。 当他们上传到我的Web应用程序, file (通过libmagic和python-magic )检测到他们是ZIP。
我将文件的内容作为blob存储在数据库中,但是我自然不想用这种types的文件types来信任用户。 所以我想信任file ,并在下载过程中自动生成一个文件名。
我知道可以修改/etc/magic但格式( magic(5) )对于我来说太复杂了。 我在Debian的bug中发现了一个关于这个问题的bug报告,但是从2008年开始,这个bug似乎并没有在短期内得到修复。
我想我唯一的另一种select是确实信任用户(但仍然将内容存储为blob),并只检查基于文件名的文件扩展名。 这样我可以禁止一些扩展,并允许其他人。 当用户重新下载他的文件时,他可以以任何方式上传文件。 但是,如果文件与其他人共享,则此解决scheme是不安全的,因为您可以简单地重命名文件以允许上传文件。
有任何想法吗?
最后,我find了docx等的幻数列表 ,但是我无法将它们转换成magic(5)格式。
您可以使用
0 string PK\x03\x04\x14\x00\x06\x00 Microsoft Office Open XML Format
在/ etc / magic中,根据您提供的信息来识别一般的文件types。
(但是,这可能不是通用的:在LibreOffice生成的XLSX文件的开头已经观察到了PK\x03\x04\x00\x14\x08\x08 。)
后来的Ubuntu版本正确地识别.docx,.pptx和.xlsx文件。 在文件实用程序的sorce代码中进行挖掘,我find了识别的~/file-5.09/magic/Magdir/msooxml文件。 您可以获取该文件的副本并将其添加到您的/etc/magic文件中。
包括已更新至v 1.5的文件的副本
# $File: msooxml,v 1.5 2014/08/05 07:38:45 christos Exp $ # msooxml: file(1) magic for Microsoft Office XML # From: Ralf Brown <[email protected]> # .docx, .pptx, and .xlsx are XML plus other files inside a ZIP # archive. The first member file is normally "[Content_Types].xml". # but some libreoffice generated files put this later. Perhaps skip # the "[Content_Types].xml" test? # Since MSOOXML doesn't have anything like the uncompressed "mimetype" # file of ePub or OpenDocument, we'll have to scan for a filename # which can distinguish between the three types # start by checking for ZIP local file header signature 0 string PK\003\004 !:strength +10 # make sure the first file is correct >0x1E regex \\[Content_Types\\]\\.xml|_rels/\\.rels # skip to the second local file header # since some documents include a 520-byte extra field following the file # header, we need to scan for the next header >>(18.l+49) search/2000 PK\003\004 # now skip to the *third* local file header; again, we need to scan due to a # 520-byte extra field following the file header >>>&26 search/1000 PK\003\004 # and check the subdirectory name to determine which type of OOXML # file we have. Correct the mimetype with the registered ones: # http://technet.microsoft.com/en-us/library/cc179224.aspx >>>>&26 string word/ Microsoft Word 2007+ !:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document >>>>&26 string ppt/ Microsoft PowerPoint 2007+ !:mime application/vnd.openxmlformats-officedocument.presentationml.presentation >>>>&26 string xl/ Microsoft Excel 2007+ !:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet >>>>&26 default x Microsoft OOXML ---
但是在这里留下了V1.2的后代。
在上面链接中包含副本可能会随着文件包的更新而过时。
#------------------------------------------------------------------------------ # $File: msooxml,v 1.2 2013/01/25 23:04:37 christos Exp $ # msooxml: file(1) magic for Microsoft Office XML # From: Ralf Brown <[email protected]> # .docx, .pptx, and .xlsx are XML plus other files inside a ZIP # archive. The first member file is normally "[Content_Types].xml". # Since MSOOXML doesn't have anything like the uncompressed "mimetype" # file of ePub or OpenDocument, we'll have to scan for a filename # which can distinguish between the three types # start by checking for ZIP local file header signature 0 string PK\003\004 # make sure the first file is correct >0x1E string [Content_Types].xml # skip to the second local file header # since some documents include a 520-byte extra field following the file # header, we need to scan for the next header >>(18.l+49) search/2000 PK\003\004 # now skip to the *third* local file header; again, we need to scan due to a # 520-byte extra field following the file header >>>&26 search/1000 PK\003\004 # and check the subdirectory name to determine which type of OOXML # file we have # Correct the mimetype with the registered ones: # http://technet.microsoft.com/en-us/library/cc179224.aspx >>>>&26 string word/ Microsoft Word 2007+ !:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document >>>>&26 string ppt/ Microsoft PowerPoint 2007+ !:mime application/vnd.openxmlformats-officedocument.presentationml.presentation >>>>&26 string xl/ Microsoft Excel 2007+ !:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet >>>>&26 default x Microsoft OOXML !:strength +10
文件,5.13之前的版本,将截断MIMEtypes为64个字符。 因此,使用msooxml的内容,file -bi命令中的MIMEtypes变为“mime application / vnd.openxmlformats-officedocument.wordprocessingml.d; charset = binary”
如果使用libreoffice的docx,则可以将内容(下面)添加到/ etc / magic:
# start by checking for ZIP local file header signature 0 string PK\003\004 !:strength +10 >1104 search/300 PK\003\004 # and check the subdirectory name to determine which type of OOXML # file we have. Correct the mimetype with the registered ones: # http://technet.microsoft.com/en-us/library/cc179224.aspx >>&26 string word/ Microsoft Word 2007+ !:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document >>&26 string ppt/ Microsoft PowerPoint 2007+ !:mime application/vnd.openxmlformats-officedocument.presentationml.presentation >>&26 string xl/ Microsoft Excel 2007+ !:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet >>&26 default x Microsoft OOXML