From a security perspective, and as a programmer, I've never liked ZIP files precisely because there are two mechanisms to identify the contents, the per-file header and the central directory. When you're defining a format, protocol, or w'ever, ideally there should be a single source of truth, a single valid & useable parse, etc; basically, the structure of the data or process should be intrinsically constraining. There shouldn't be a pathway for multiple implementations to produce different functional results, and ZIP archives are in my mind the archetype for getting this wrong. tar files aren't ideal, but in the abstract (ignoring issues with long file names) they don't have this problem. (tar files don't support random access, either, but better to rely on something suboptimal than something that's fundamentally broken.)
A similar security problem, though not as fundamentally baked into the format, is MIME parsing. The header section is supposed to be delimited from the body by an empty line (likewise for nested entities). But what if it's not? For better or worse, Sendmail was tolerant of the absence of an empty line and treated as headers everything up to the first line that didn't parse as a header or header continuation.[1] Later systems, like Postfix, originally copied this behavior. But Microsoft Exchange and Outlook are even more tolerant, yet in a much more horrendous way, by parsing as a header anything that looks like a Content-Type or related header immediately after the first empty line. They have similar hacks for other, similar violations. So today, depending on the receiving software, you can send messages that appear differently, including having different attachments. It's a security nightmare!
I not a Postel's Law hater, but ZIP archives and Microsoft's MIME parsing behaviors are just egregiously wrong and indefensible. And even if you think the Robustness Principle is inherently bad policy, you still have to design your formats, protocols, and systems to be as intrinsically constraining as possible. You can't rely on vendors adhering to a MUST rule in an RFC, unless it's unquestioningly crystal clear what the repercussions will be--everybody else will (because it's the natural and convenient thing to do) reject your output as trash and drop it on the floor immediately so violations never have a chance to get a foothold.
[1] MTAs don't necessarily need to care about MIME parsing, but Sendmail eventually gained features where parsing message contents mattered, setting the de facto norm (for those paying attention) until Microsoft came along.
The central directory allows zip archives to be split across multiple files on separate media without needing to read them all in for selective extraction. Not particularly useful today but invaluable in the sneakernet era with floppies.
That's a very bad way of solving that issue. If transmission is a problem, either use a proper retry-friendly protocol (such as bittorrent) or split the file. Using hacks on the data format just leads to additional pain
Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.
The OP was saying that zip files can specify their own special type of splitting, done within the format itself, rather than operating on the raw bytes of a saved file.
> Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.
I'm inclined to agree with you.
You can see good examples of this with the various multi-part upload APIs used by cloud object storage platforms like S3. There's nothing particularly fancy about it. Each part is individually retry-able, with checksumming of parts and the whole, so you get nice and reliable approaches.
On the *nix side, you can just run split over a file, to the desired size, and you can just cat all the parts together, super simple. It would be simple to have a CLI or full UI tool that would handle the pause between `cat`s as you swapped in and out various media, if we hark back to the zip archive across floppy disks days.
Without knowing the specifics of what's being talked about, I guess it makes sense that zip did that because the OS doesn't make it easy for the average user to concatenate files, and it would be hard to concatenate 10+ files in the right order. If you have to use a cli then it's not really a solution for most people, nor is it something I want to have to do anyways.
The OS level solution might be a naming convention like "{filename}.{ext}.{n}" like "videos.zip.1" where you right-click it and choose "concatenate {n} files" and turns them into "{filename}.{ext}".
> the OS doesn't make it easy for the average user to concatenate files
Bwah! You are probably thinking too much GUI.
X301 c:\Users\justsomehnguy>copy /?
Copies one or more files to another location.
COPY [/D] [/V] [/N] [/Y | /-Y] [/Z] [/L] [/A | /B ] source [/A | /B]
[+ source [/A | /B] [+ ...]] [destination [/A | /B]]
[skipped]
To append files, specify a single file for destination, but multiple files
for source (using wildcards or file1+file2+file3 format).
Why would you use manual tools to achieve what ZIP archive can give you out of the box? E.g. if you do this manually you’d need to worry about file checksum to ensure you put it together correctly.
We need to separate and design modules as unitary as possible:
- zip should ARCHIVE/COMPRESS, i.e. reduce the file size and create a single file from the file system point of view. The complexity should go in the compression algorithm.
- Sharding/sending multiple coherent pieces of the same file (zip or not) is a different module and should be handled by specialized and agnostic protocols that do this like the ones you mentioned.
People are always doing tools that handle 2 or more use cases instead of following the UNIX principle to create generic and good single respectability tools that can be combined together (thus allowing a 'whitelist' of combinations which is safe). Quite frankly it's annoying and very often leads to issues such as this that weren't even thought in the original design because of the exponential problem of combining tools together.
Well, 1) is zip with compression into single file, 2) is zip without compression into multiple files. You can also combine the two. And in all cases, you need a container format.
The tasks are related enough that I don't really see the problem here.
This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.
And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.
No need for 'sharding' to exist as a feature in the zip utility.
And... if you want only the shard from the get go without the original 1 file archive, you can do something like:
`zip dir/ | shard -O out_shards/`
Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.
The problem is that on DOS (and Windows), it didn't have the unix philosophy of a tool that did one thing well and you couldn't depend on the necessary small tools being available. Thus, each compression tool also included its own file spanning system.
The key thing that you get by integrating the two tools is the ability to more easily extract a single file from a multipart archive— Instead of having to reconstruct the entire file, you can look in the part/diskette with the index to find out which other part/diskette you need to use to get at the file you want.
The problem seems to be that each individual split part is valid in itself. This means that the entire file, with the central directory at the end, can diverge from each entry. This is the original issue.
Even worse, in the general case, you should really decompress the whole tarball up to the end because the traditional mechanism for efficiently overwriting a file in a tarball is to append another copy of it to the end. (This is similar to why you should only trust the central directory for zip files.)
If the point is being able to access some files even if the whole archive isn’t uploaded, why not create 100 separate archives each with a partial set of files?
Or use a protocol that supports resume of partial transmits.
This carries the information that all those files are a pack in an inseparable and immutable way, contrary to encoding that in the archive's name or via some parallel channel.
>there are two mechanisms to identify the contents, the per-file header and the central directory
There is only one right, standard mandated, way to identify the contents (central directory). For one or another reason many implementations ignore it, but I don't think it's fair to say that the zip format in ambiguous.
That doesn't change anything wrt what the parent commenter said.
Imagine—
Officer: The reason why I pulled you over is that you were doing 45, but this is a 25 mph school zone right now, and even aside from that the posted speed when this is not a school zone is only 35. So you shouldn't be going faster than that, like you were just now.
Motorist: But sometimes you want to go faster than that.
I don't think you understand the reason for the ZIP archive file design.
Back in the late 1980s, backup media for consumers was limited to mostly floppy disks, some users had tape/another hard disk.
Say you had a variable number of files to compress and write out to a ZIP archive.
IF you write out the central directory first, followed by all the individually possibly compressed and/or encrypted files, you'd have to calculate all the files to be archived, process them (compress and/or encrypt), write them out, then go back and update the info for the actual compressed values and offsets for the ZIP local entries.
Now if you wanted to add files to the ZIP archive, the central directory will grow and push the following individual compressed/encrypted files further out and you'll have to update ALL the central directory entries since each entry includes an offset from the beginning of the disk - if the archive does not span multiple disks, this offset is from the start of the ZIP archive file.
So that's one reason for why the ZIP central directory is placed at the end of the ZIP archive file. If you're streaming the output from a ZIP program, then placing the ZIP central dir at the start of the file is a non-starter since you can't rewind a stream to update the ZIP central directory entries.
Why do some programs ignore the ZIP central directory as the ONE source of truth?
Before SSDs and their minimal seek latency, coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory and proceed accordingly.
If the central directory in the ZIP archive is corrupted or missing, the user could still recover the data for the individual files (if all the ZIP local entries are intact). In this case, ignoring the ZIP central dir and scanning sequentially for ZIP local entries is REQUIRED.
The fault here is the security scanners. There's never been any guarantee that the ONLY data in the ZIP archive was only valid ZIP local file entries followed by the ZIP central directory. Between ZIP local file entries, one can place any data. Unzip programs don't care.
The more general principle is that single source of truth is not ideal for data storage where you're worried about corruption. There's a backup MBR on your hard disk at the end, your ext4 filesystem has many backups of your superblock.
When it comes to user data the natural programmer instinct for "is exactly what I expect or fail" which is typically good design, falls to pragmatism where try your hardest to not lose data, partial results are better then nothing, is desired.
Having a backup copy isn't quite the same thing though. It is just a copy of the single source of truth. Not a different implementation or used for a different use case. Also trivial to verify.
> coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory
Would this have worked? Reserve a few bytes at the beginning of the archive at a fixed location offset from the start, and say "this is where we will write the offset to where the central directory will start." Then build the whole archive, writing the central directory at the end. Then seek back to that known offset at the start of the file and write the offset to the central directory. When creating the archive, we can write the central directory to a temp file, and then append that in to the end of the file we're building at the end, and fix up the offset.
Seems like this strategy would enable us to both have a number of files in the archive that are known at the beginning, and also allow us to do a high-speed seek to the central directory when reading the archive.
I imagine people thought about this idea and didn't do it for one reason or another. I can imagine why we didn't do that for Unix TAR-- most tape devices are a one-way write stream and don't have random access. But ZIP was designed for disks; I'm curious why this idea wouldn't have solved both problems.
You forgot about the streaming case. ZIP creators can stream the archive out and never seek back earlier in the stream (send the data to another tool or a write-only medium).
The central directory at the end of the archive fits that constraint. Any design where a placeholder needs to be updated later won't.
In a similar vein, HTTP header smuggling attacks exploit differences in header parsing. For instance, a reverse proxy and a web server might handle repetition of headers or the presence of whitespace differently.
I've evaded all sorts of scanning tools by base64 encoding data (i.e. binary data) to and copy pasting the text from insecure to highly secured environments.
At the end of the day, these malware databases rely on hashing and detecting for known bad hashes and there are lots of command line tools to help get over that sort of thing like zip/tar etc.
I used to have a workflow for updating code inside a very highly secure environment that relied on exactly this:
Run build of prior version, run build of current version, run diff against them, compress with xz -9, base64 encode, generate output, base64 encode, e-mail it to myself, copy text of email, type "openssl base64 -d | unxz | bash", right click.
E-mailing this was completely fine according to the stringent security protocols but e-mailing a zip of the code, etc. was absolutely 100% not. That would have to go on the vendor's formal portal.
(Eventually I just opened my own "portal" to upload binaries to, put the vendor I worked for's logo on it, and issued a statement saying it was an official place to download binaries from the vendor. But sometimes their WAF would still mangle downloads or flag them as a risk, so I made sure builds had options of coming in an obfuscated base64 format.)
This is sometimes used non-maliciously to concatenate zipped eBooks to a JPEG of the cover art. 4Chan's /lit/ board used to do this, but I can't find any reference to it anymore.
I think the reason it’s not used anymore is because it was used maliciously to share CSAM on other boards and 4chan banned uploading anything that looks like a concatenated zip
FYI, the blog post describes a zip file embedded in the ICC profile data of a JPEG in order to survive web image transformations, whereas the linked Tucker script is just appending the zip to the image.
> only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives)
This contradicts the specification, which explicitly supports stream-processing zip files, which necessarily can't happen if your source of truth is the central directory record. Unless if you can wrap your stream processing in some kind of transaction that you can drop once you discover foul play.
Hmm, it appears that you are right. I vaguely remembered that zip was streamable, but it appears that it only means that it's stream writable, as in you can write zip file contents from a datastream of unknown size, and append the checksum and file size later in the zip file stream.
However such a zip file is definitely not stream readable, as the local file header no longer contains the size of the following file data, so you can't know where to end a given file. So for reading you definitely have to locate the central directory record.
In my defense the spec says[1].
> 4.3.2 Each file placed into a ZIP file MUST be preceded by a "local file header" record for that file. Each "local file header" MUST be accompanied by a corresponding "central directory header" record within the central directory section of the ZIP file.
Then in 4.3.6 it describes the file format, which seems to be fundamentally incompatible to altering zip files by appending data, as the resulting file would not conform to this format.
So basically some implementations (maybe opportunistically, relying on compressed sizes being available in the local file headers) stream read from the zip file, assuming that it's a valid zip file, but not validating. Some other implementations only use the central directory record at the end, but don't validate the file format either.
A validating zip file parser should be possible, by locating the central directory record and checking that the referred files with their metadata fully cover the file contents of the zip files, without gaps and overlaps. But this probably won't win any benchmarks.
Indeed: ZIP files are stream-writable, and some ZIP files are stream-readable, but not both: ZIP files that were stream-written are not steam-readable.
Also, steaming unzipping always requires that by the time you arrive at the central directory, you delete so-far-unzipped files that don't have entries in it, as those were "deleted".
> Then in 4.3.6 it describes the file format, which seems to be fundamentally incompatible to altering zip files by appending data, as the resulting file would not conform to this format.
My interpretation of the spec and specifically of 4.3.6 is that it is informational for how ZIP files usually look, and that you may store arbitrary data in between files; such data then doesn't count as "files". This reading then does allow appending and concatenation.
Unfortunately 4.3.6 does not have MUST/MAY wording so we don't really know if this reading was intended by the authors (maybe they clarify it in the future), but allowing this reading seems rather useful because it permits append-only modification of existing ZIP files. (The wording /'A ZIP file MUST have only one "end of central directory record"'/ suggests somehow that the authors didn't intend this, but again one could argue that this is not necessarily to state, that there is only one EOCDR by definition, and that any previous ones are just garbage data that is to be ignored.)
> Unfortunately 4.3.6 does not have MUST/MAY wording so we don't really know if this reading was intended by the authors
It's not explicit from the APPNOTE, but that's not the same as saying "we don't really know". We do know—islands of opaque data are allowed and that's the entire reason the format is designed the way it is. Katz designed ZIP for use on machines with floppy drives, and append-only modification of archives spanning multiple media was therefore baked into the scheme from the very roots. He also an produced an implementation that we happen to be able to check, and we know it works that way.
The only way to arrive at another interpretation is to look at APPNOTE in isolation and devoid of context.
> one could argue that this is not necessarily to state, that there is only one EOCDR by definition, and that any previous ones are just garbage data that is to be ignored
That is the correct interpretation; the wording doesn't suggest otherwise.
The article says that WinRAR "displays both ZIP structures", so, no, it doesn't do it right. Of the three, only Windows Explorer is close to the correct behavior (showing the contents of the ZIP that was appended closest to the end, and ignoring everything else). The exception to its correctness is that they report it may fail to process the ZIP and identify it as being corrupt, which shouldn't happen so long as the endmost ZIP is well-formed.
Thanks, that's a much better article (and published on a site that, while slightly annoying on its own, is infested with fewer rudely intrusive and resource-consuming ads).
Reading both, it's clear that (a) you are correct, and (b) the submitted link, besides being materially inaccurate, is shameless reblog spam and should be changed.
The big problem with the ZIP format is that although the "spec" says what a ZIP file looks like, it does not tell you in concrete terms how to parse it, leading to all sorts of ambiguities and divergent implementations. Someone needs to write a "strict ZIP" spec that has explicit and well-defined parsing rules, and then we need to get every existing ZIP implementation to agree to follow said spec.
Remember, secure encryption, good compression, and truely random data are indistinguishable.
It's best to paste that encrypted payload into a JPG with some bullshit magic headers and upload that to a trusted Exfil pivot instead.
Or, to get SuperMarioKart.rom to work with your chromeApp-XEMU emulator to play during down-time at work, just rename it to SMB.png, and email it you yourself.
> Remember, secure encryption, good compression, and truely random data are indistinguishable.
Yes, and the only reason the bad guys get away with this is the people who trust signature-based scanning at the perimeter to detect all threats.
One of the hacks I'm most proud of in my whole career was when we were doing a proof of concept at an enterprise client we were being deliberately obstructed by the internal IT group due to politics between their boss and the boss who sponsored our POC. For unrelated trademark-related reasons we were prevented by a third party from having the software on physical media but we had a specific contractual clause agreeing to let us download it for install. So while we had been contractually engaged to provide this software and we had a strict deadline to prove value, the enterprise IT group were preventing us from actually getting it through the virus-scanning firewall to get it installed. What to do?
The scanner looked for the signature of executable or zipped files and blocked them. It would also block any files larger than a certain size. So what I did was write two shell scripts called "shred" and "unshred". "Shred" would take any files you gave it as input, make them into a tarball, encrypt that to confuse the virus scanner and then split it up into chunks small enough to get through the firewall, and "unshred" would reverse this. This almost worked, but I found that the first chunk was always failing to transmit through the firewall. The scanner noticed some signature that openssl was putting at the front of the file when encrypting it. The solution? Change shred to add 1k of random noise to the front of the file and unshred to remove it.
Job done. Our files were transmitted perfectly (I got the scripts to check the md5sum on both sides to be sure), and though the process was slow, we could continue.
The funny thing was the POC was a bake-off versus another (more established) vendor and they couldn't get their software installed until they had done a couple of weeks of trench warfare with enterprise IT. "To keep things fair" the people organising the POC decided to delay to let them have time to install, and eventually the person blocking us from installing was persuaded to change their mind (by being fired), so "shred" and "unshred" could be retired.
I did basically the same, to get some important CLI tools past the company firewall, just a few months back.
Crazy that this is easier than dealing with the bullshit politics, to get some essentials tools to do my job. German public service is a joke. I quit since.
Good compression should still be cryptographically distinguishable from true randomness right?
Sure the various measures of entropy should be high, but I always just assumed that compression wouldn't pass almost any cryptographic randomness test.
1000 is an over exaggeration, but it is not just 2 standards.
xls morphed with every version of Microsoft Excel. MS Excel has pretty good backwards compatibility, but making an xls parser is notoriously hard because of the many differences between versions.
A modern word document file (.docx) is literally just a Zip archive with a special folder structure, so unless your company is scanning word document contents I can’t imagine there’s any issue.
Pretty common in corporate world. Email scanner will helpfully drop all manner of useful files between staff, so make an encrypted zip with simple password.
I've done similar stuff. Concat a zip (that keeps throwing false positives) to a jpg and scanners will treat it like a jpg. Then write a script that chops off the jpg to access the zip. All this so I could automate a web a
app deploy.
Quote: "To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking."
That's the worse advice actually. You want the hidden shit to stay there unable to be seen by default programs. That's how you got all the crap in Windows mail starting from 90's when Outlook started to trying to be "smart" and automatically detect and run additional content. Be dumb and don't discover anything, let it rot in there. The only one that should do this is the antivirus, rest of unpackers/readers/whatever stay dumb.
I agree. The ZIP definition is extremely clear that the contents of the ZIP are defined by the single Central Directory at the end of the file. Local headers are only valid if pointed to by the Central Directory. Any other local headers are just supposed to be treated as garbage, except by software that is specifically meant to recover corrupted ZIP archive's contents.
VirusTotal just passes on the file to the actual virus scanners (a for-loop with an api as a front-end) - it's up to each individual virus scanner to scan as they see fit (including scanning any unreferenced holes and compressed zip entries in a zip archive).
I have no idea why those virus scanners don't check nested archives. Maybe time/cpu constraints?
Why do scanners need to look inside compressed archives at all? If the malicious file is extracted, it can be detected then. If it’s not extracted and instead used directly from the archive, then the offending code that executes malicious payloads from inside archives should be detectable?
Is that the role of AutoIt in this scenario?
Yes that was my question: "why, do you want to do that when it sounds like a futile job to do?"
If you have any kind of transformation (whatever trivial ROT or XOR) of the payload then there is no chance of detecting it via pattern matching, and if you need a "program" to process it into its malicious form, then how does detection work?
I understand you want to keep malicious payload from reaching endpoints in a form that would risk being consumed (malformed documents, images causing buffer overflows, executable and script files). But beyond that?
> To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking
Yeah, or, you know, just outright reject any ZIP file that doesn't start with a file entry, where a forward-scan of the file entries doesn't match the result of the central-directory-based walk.
There is just so much malicious crud coming in via email that you just want to instantly reject anything that doesn't look 'normal', and you definitely don't want to descend into the madness of recursive unpacking, 'cuz that enables another class of well-known attacks.
And no, "but my precious use-case" simply doesn't apply, as you're practically limited to a whole 50MB per attachment anyway. Sure, "this ZIP file is also a PDF is also a PNG is also a NES cartridge which displays its own MD5" (viz https://github.com/angea/pocorgtfo/tree/master/writeups/19) has a place (and should definitely be required study material for anyone writing mail filters!), but business email ain't it.
That's fair, but do realize that sometimes people do have to send around archives from the last century (they got archived for a reason!) or created by eldritch-horror tools that just make weird files (which, sometimes, are the gold masters for certain very important outputs...). And it's kind of annoying when these weird but standard files get silently dropped. Especially when that same file went through just fine yesterday, before the duly zealous security settings changed for whatever reason.
All I'm saying is, don't drop my stuff silently because your code couldn't be arsed to deal with (ugly) standard formats. At least give me a warning ("file of type not scannable" or whatever, the actual words are not so important). And then when I have to yell at the Shanghai people I can yell at them for the correct reasons.
Not really a new technique. A long time ago in a galaxy far far away, I needed to get libraries from the internet onto an air-gapped network, and the supported way was to burn them to a disk and bring the disk to be scanned. The scanner never allowed executables so it would always reject the libraries. Can't make this up, but the supported way that InfoSec (!!) explained to us to get past the scanner was to take advantage of WinRAR being available on the network, so split the rar archive a bunch of times (foo.r01, foo.r02,...) and the scanner, being unable to parse them nor execute them, would happily rubber-stamp them and pass them along. As long as the process was followed, InfoSec was happy. Thankfully this was before the days when people were really worried about supply chain security.
Glad to see this bit of security theater recognized as such.
From a security perspective, and as a programmer, I've never liked ZIP files precisely because there are two mechanisms to identify the contents, the per-file header and the central directory. When you're defining a format, protocol, or w'ever, ideally there should be a single source of truth, a single valid & useable parse, etc; basically, the structure of the data or process should be intrinsically constraining. There shouldn't be a pathway for multiple implementations to produce different functional results, and ZIP archives are in my mind the archetype for getting this wrong. tar files aren't ideal, but in the abstract (ignoring issues with long file names) they don't have this problem. (tar files don't support random access, either, but better to rely on something suboptimal than something that's fundamentally broken.)
A similar security problem, though not as fundamentally baked into the format, is MIME parsing. The header section is supposed to be delimited from the body by an empty line (likewise for nested entities). But what if it's not? For better or worse, Sendmail was tolerant of the absence of an empty line and treated as headers everything up to the first line that didn't parse as a header or header continuation.[1] Later systems, like Postfix, originally copied this behavior. But Microsoft Exchange and Outlook are even more tolerant, yet in a much more horrendous way, by parsing as a header anything that looks like a Content-Type or related header immediately after the first empty line. They have similar hacks for other, similar violations. So today, depending on the receiving software, you can send messages that appear differently, including having different attachments. It's a security nightmare!
I not a Postel's Law hater, but ZIP archives and Microsoft's MIME parsing behaviors are just egregiously wrong and indefensible. And even if you think the Robustness Principle is inherently bad policy, you still have to design your formats, protocols, and systems to be as intrinsically constraining as possible. You can't rely on vendors adhering to a MUST rule in an RFC, unless it's unquestioningly crystal clear what the repercussions will be--everybody else will (because it's the natural and convenient thing to do) reject your output as trash and drop it on the floor immediately so violations never have a chance to get a foothold.
[1] MTAs don't necessarily need to care about MIME parsing, but Sendmail eventually gained features where parsing message contents mattered, setting the de facto norm (for those paying attention) until Microsoft came along.
The central directory allows zip archives to be split across multiple files on separate media without needing to read them all in for selective extraction. Not particularly useful today but invaluable in the sneakernet era with floppies.
Still useful today.
Try to transmit a 100G file through any service is usually a pain especially if one end has non-stable Internet.
That's a very bad way of solving that issue. If transmission is a problem, either use a proper retry-friendly protocol (such as bittorrent) or split the file. Using hacks on the data format just leads to additional pain
> or split the file
Wait, I'm confused. Isn't this what OP was talking about?
Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.
The OP was saying that zip files can specify their own special type of splitting, done within the format itself, rather than operating on the raw bytes of a saved file.
> Splitting the file doesn’t need to be part of the file format itself. I could split a file into N parts, then concatenate the parts together at a later time, regardless of what is actually in the file.
I'm inclined to agree with you.
You can see good examples of this with the various multi-part upload APIs used by cloud object storage platforms like S3. There's nothing particularly fancy about it. Each part is individually retry-able, with checksumming of parts and the whole, so you get nice and reliable approaches.
On the *nix side, you can just run split over a file, to the desired size, and you can just cat all the parts together, super simple. It would be simple to have a CLI or full UI tool that would handle the pause between `cat`s as you swapped in and out various media, if we hark back to the zip archive across floppy disks days.
Without knowing the specifics of what's being talked about, I guess it makes sense that zip did that because the OS doesn't make it easy for the average user to concatenate files, and it would be hard to concatenate 10+ files in the right order. If you have to use a cli then it's not really a solution for most people, nor is it something I want to have to do anyways.
The OS level solution might be a naming convention like "{filename}.{ext}.{n}" like "videos.zip.1" where you right-click it and choose "concatenate {n} files" and turns them into "{filename}.{ext}".
> the OS doesn't make it easy for the average user to concatenate files
Bwah! You are probably thinking too much GUI.
Try to concatenate 1000 files with natural sorting names using `copy`. I did this regularly and I have to write a python script it make it easier.
It's much easier to just right click any of the zip part files and let 7-zip to unzip it, and it will tell me if any part is missing or corrupt.
Why would you use manual tools to achieve what ZIP archive can give you out of the box? E.g. if you do this manually you’d need to worry about file checksum to ensure you put it together correctly.
Because, as said before, zip managing splits ends with two sources of truth in the file format that can differ while the whole file still being valid
I've had good luck using tools like piping large files through `mbuffer`[1] such as ZFS snapshots, and it's worked like a charm.
[1] https://man.freebsd.org/cgi/man.cgi?query=mbuffer&sektion=1&...
couldn't agree more!
We need to separate and design modules as unitary as possible:
- zip should ARCHIVE/COMPRESS, i.e. reduce the file size and create a single file from the file system point of view. The complexity should go in the compression algorithm.
- Sharding/sending multiple coherent pieces of the same file (zip or not) is a different module and should be handled by specialized and agnostic protocols that do this like the ones you mentioned.
People are always doing tools that handle 2 or more use cases instead of following the UNIX principle to create generic and good single respectability tools that can be combined together (thus allowing a 'whitelist' of combinations which is safe). Quite frankly it's annoying and very often leads to issues such as this that weren't even thought in the original design because of the exponential problem of combining tools together.
Well, 1) is zip with compression into single file, 2) is zip without compression into multiple files. You can also combine the two. And in all cases, you need a container format.
The tasks are related enough that I don't really see the problem here.
I meant that they should be separate tools that can be piped together. For example: you have 1 directory of many files (1Gb in total)
`zip out.zip dir/`
This results in a single out.zip file that is, let's say 500Mb (1:2 compression)
If you want to shard it, you have a separate tool, let's call it `shard` that works on any type of byte streams:
`shard -I out.zip -O out_shards/ --shard_size 100Mb`
This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.
And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.
No need for 'sharding' to exist as a feature in the zip utility.
And... if you want only the shard from the get go without the original 1 file archive, you can do something like:
`zip dir/ | shard -O out_shards/`
Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.
In unix, that is split https://en.wikipedia.org/wiki/Split_(Unix) (and its companion cat).
The problem is that on DOS (and Windows), it didn't have the unix philosophy of a tool that did one thing well and you couldn't depend on the necessary small tools being available. Thus, each compression tool also included its own file spanning system.
https://en.wikipedia.org/wiki/File_spanning
The key thing that you get by integrating the two tools is the ability to more easily extract a single file from a multipart archive— Instead of having to reconstruct the entire file, you can look in the part/diskette with the index to find out which other part/diskette you need to use to get at the file you want.
Don't forget that with this two-step method, you also require enough diskspace to hold the entire ZIP archive before it's sharded.
AFAIK you can create a ZIP archive saved to floppy disks even if your source hard disk has low/almost no free space.
Phil Katz (creator of the ZIP file format) had a different set of design constraints.
The problem seems to be that each individual split part is valid in itself. This means that the entire file, with the central directory at the end, can diverge from each entry. This is the original issue.
Why do you believe that archiving and compressing belong in the same layer more than sharding does? The unixy tool isn't zip, it's tar | gzip.
tar|gzip does not allow random access to files. You have to decompress the whole tarball (up to the file you want).
Even worse, in the general case, you should really decompress the whole tarball up to the end because the traditional mechanism for efficiently overwriting a file in a tarball is to append another copy of it to the end. (This is similar to why you should only trust the central directory for zip files.)
I agree!
Also, I enjoyed your Freudian slip:
single respectability tools
->
single responsibility tools
If the point is being able to access some files even if the whole archive isn’t uploaded, why not create 100 separate archives each with a partial set of files?
Or use a protocol that supports resume of partial transmits.
Because sometimes your files are very large it's not easy to create separate archives with (roughly) even size.
A single video can easily be over 20GB, for example.
This carries the information that all those files are a pack in an inseparable and immutable way, contrary to encoding that in the archive's name or via some parallel channel.
Presumably it compresses better if it's all one archive?
nncp, bittorrent...
I recently had to do this with about 700Gb, and yeah OneDrive hated that. I ended up concatenating tars together.
>there are two mechanisms to identify the contents, the per-file header and the central directory
There is only one right, standard mandated, way to identify the contents (central directory). For one or another reason many implementations ignore it, but I don't think it's fair to say that the zip format in ambiguous.
Sometimes you want to read the file front-to-back in a streaming fashion.
That doesn't change anything wrt what the parent commenter said.
Imagine—
Officer: The reason why I pulled you over is that you were doing 45, but this is a 25 mph school zone right now, and even aside from that the posted speed when this is not a school zone is only 35. So you shouldn't be going faster than that, like you were just now.
Motorist: But sometimes you want to go faster than that.
I don't think you understand the reason for the ZIP archive file design.
Back in the late 1980s, backup media for consumers was limited to mostly floppy disks, some users had tape/another hard disk.
Say you had a variable number of files to compress and write out to a ZIP archive.
IF you write out the central directory first, followed by all the individually possibly compressed and/or encrypted files, you'd have to calculate all the files to be archived, process them (compress and/or encrypt), write them out, then go back and update the info for the actual compressed values and offsets for the ZIP local entries.
Now if you wanted to add files to the ZIP archive, the central directory will grow and push the following individual compressed/encrypted files further out and you'll have to update ALL the central directory entries since each entry includes an offset from the beginning of the disk - if the archive does not span multiple disks, this offset is from the start of the ZIP archive file.
So that's one reason for why the ZIP central directory is placed at the end of the ZIP archive file. If you're streaming the output from a ZIP program, then placing the ZIP central dir at the start of the file is a non-starter since you can't rewind a stream to update the ZIP central directory entries.
Why do some programs ignore the ZIP central directory as the ONE source of truth?
Before SSDs and their minimal seek latency, coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory and proceed accordingly.
If the central directory in the ZIP archive is corrupted or missing, the user could still recover the data for the individual files (if all the ZIP local entries are intact). In this case, ignoring the ZIP central dir and scanning sequentially for ZIP local entries is REQUIRED.
The fault here is the security scanners. There's never been any guarantee that the ONLY data in the ZIP archive was only valid ZIP local file entries followed by the ZIP central directory. Between ZIP local file entries, one can place any data. Unzip programs don't care.
The more general principle is that single source of truth is not ideal for data storage where you're worried about corruption. There's a backup MBR on your hard disk at the end, your ext4 filesystem has many backups of your superblock.
When it comes to user data the natural programmer instinct for "is exactly what I expect or fail" which is typically good design, falls to pragmatism where try your hardest to not lose data, partial results are better then nothing, is desired.
Having a backup copy isn't quite the same thing though. It is just a copy of the single source of truth. Not a different implementation or used for a different use case. Also trivial to verify.
> coders discovered that scanning the ZIP local entries to be a faster way to build up the ZIP archive entries, otherwise you're forced to seek all the way to the end of a ZIP archive and work backwards to locate the central directory
Would this have worked? Reserve a few bytes at the beginning of the archive at a fixed location offset from the start, and say "this is where we will write the offset to where the central directory will start." Then build the whole archive, writing the central directory at the end. Then seek back to that known offset at the start of the file and write the offset to the central directory. When creating the archive, we can write the central directory to a temp file, and then append that in to the end of the file we're building at the end, and fix up the offset.
Seems like this strategy would enable us to both have a number of files in the archive that are known at the beginning, and also allow us to do a high-speed seek to the central directory when reading the archive.
I imagine people thought about this idea and didn't do it for one reason or another. I can imagine why we didn't do that for Unix TAR-- most tape devices are a one-way write stream and don't have random access. But ZIP was designed for disks; I'm curious why this idea wouldn't have solved both problems.
You forgot about the streaming case. ZIP creators can stream the archive out and never seek back earlier in the stream (send the data to another tool or a write-only medium).
The central directory at the end of the archive fits that constraint. Any design where a placeholder needs to be updated later won't.
In a similar vein, HTTP header smuggling attacks exploit differences in header parsing. For instance, a reverse proxy and a web server might handle repetition of headers or the presence of whitespace differently.
[dead]
This attack vector has been known for at least 17 years, if not longer.
https://gnucitizen.org/blog/java-jar-attacks-and-features/ https://gnucitizen.org/blog/gifars-and-other-issues/
I'm with you.
I've evaded all sorts of scanning tools by base64 encoding data (i.e. binary data) to and copy pasting the text from insecure to highly secured environments.
At the end of the day, these malware databases rely on hashing and detecting for known bad hashes and there are lots of command line tools to help get over that sort of thing like zip/tar etc.
I used to have a workflow for updating code inside a very highly secure environment that relied on exactly this:
Run build of prior version, run build of current version, run diff against them, compress with xz -9, base64 encode, generate output, base64 encode, e-mail it to myself, copy text of email, type "openssl base64 -d | unxz | bash", right click.
E-mailing this was completely fine according to the stringent security protocols but e-mailing a zip of the code, etc. was absolutely 100% not. That would have to go on the vendor's formal portal.
(Eventually I just opened my own "portal" to upload binaries to, put the vendor I worked for's logo on it, and issued a statement saying it was an official place to download binaries from the vendor. But sometimes their WAF would still mangle downloads or flag them as a risk, so I made sure builds had options of coming in an obfuscated base64 format.)
rot13 must be outlawed for its use by cyber-criminals!
17 years? We played tricks with zip bombs that used this approach during 90-s.
Yeah the 90s are just 17 ye… oh no I’m old
This is sometimes used non-maliciously to concatenate zipped eBooks to a JPEG of the cover art. 4Chan's /lit/ board used to do this, but I can't find any reference to it anymore.
https://entropymine.wordpress.com/2018/11/01/about-that-jpeg...
https://github.com/Anti-Forensics/tucker
I think the reason it’s not used anymore is because it was used maliciously to share CSAM on other boards and 4chan banned uploading anything that looks like a concatenated zip
:/
That first read should be an HN post in its own right.
https://news.ycombinator.com/item?id=18342042
FYI, the blog post describes a zip file embedded in the ICC profile data of a JPEG in order to survive web image transformations, whereas the linked Tucker script is just appending the zip to the image.
WinRAR does it right, 7zip and Windows Explorer do it wrong according to https://en.m.wikipedia.org/wiki/ZIP_(file_format)
> only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives)
This contradicts the specification, which explicitly supports stream-processing zip files, which necessarily can't happen if your source of truth is the central directory record. Unless if you can wrap your stream processing in some kind of transaction that you can drop once you discover foul play.
Source what you're referring to / explanation?
In ZIP, later info wins. I don't see how that isn't always streamable.
Hmm, it appears that you are right. I vaguely remembered that zip was streamable, but it appears that it only means that it's stream writable, as in you can write zip file contents from a datastream of unknown size, and append the checksum and file size later in the zip file stream.
However such a zip file is definitely not stream readable, as the local file header no longer contains the size of the following file data, so you can't know where to end a given file. So for reading you definitely have to locate the central directory record.
In my defense the spec says[1].
> 4.3.2 Each file placed into a ZIP file MUST be preceded by a "local file header" record for that file. Each "local file header" MUST be accompanied by a corresponding "central directory header" record within the central directory section of the ZIP file.
Then in 4.3.6 it describes the file format, which seems to be fundamentally incompatible to altering zip files by appending data, as the resulting file would not conform to this format.
So basically some implementations (maybe opportunistically, relying on compressed sizes being available in the local file headers) stream read from the zip file, assuming that it's a valid zip file, but not validating. Some other implementations only use the central directory record at the end, but don't validate the file format either.
A validating zip file parser should be possible, by locating the central directory record and checking that the referred files with their metadata fully cover the file contents of the zip files, without gaps and overlaps. But this probably won't win any benchmarks.
[1] https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Good observations!
Indeed: ZIP files are stream-writable, and some ZIP files are stream-readable, but not both: ZIP files that were stream-written are not steam-readable.
Also, steaming unzipping always requires that by the time you arrive at the central directory, you delete so-far-unzipped files that don't have entries in it, as those were "deleted".
> Then in 4.3.6 it describes the file format, which seems to be fundamentally incompatible to altering zip files by appending data, as the resulting file would not conform to this format.
My interpretation of the spec and specifically of 4.3.6 is that it is informational for how ZIP files usually look, and that you may store arbitrary data in between files; such data then doesn't count as "files". This reading then does allow appending and concatenation.
Unfortunately 4.3.6 does not have MUST/MAY wording so we don't really know if this reading was intended by the authors (maybe they clarify it in the future), but allowing this reading seems rather useful because it permits append-only modification of existing ZIP files. (The wording /'A ZIP file MUST have only one "end of central directory record"'/ suggests somehow that the authors didn't intend this, but again one could argue that this is not necessarily to state, that there is only one EOCDR by definition, and that any previous ones are just garbage data that is to be ignored.)
> Unfortunately 4.3.6 does not have MUST/MAY wording so we don't really know if this reading was intended by the authors
It's not explicit from the APPNOTE, but that's not the same as saying "we don't really know". We do know—islands of opaque data are allowed and that's the entire reason the format is designed the way it is. Katz designed ZIP for use on machines with floppy drives, and append-only modification of archives spanning multiple media was therefore baked into the scheme from the very roots. He also an produced an implementation that we happen to be able to check, and we know it works that way.
The only way to arrive at another interpretation is to look at APPNOTE in isolation and devoid of context.
> one could argue that this is not necessarily to state, that there is only one EOCDR by definition, and that any previous ones are just garbage data that is to be ignored
That is the correct interpretation; the wording doesn't suggest otherwise.
The article says that WinRAR "displays both ZIP structures", so, no, it doesn't do it right. Of the three, only Windows Explorer is close to the correct behavior (showing the contents of the ZIP that was appended closest to the end, and ignoring everything else). The exception to its correctness is that they report it may fail to process the ZIP and identify it as being corrupt, which shouldn't happen so long as the endmost ZIP is well-formed.
> The article says that WinRAR "displays both ZIP structures", so, no, it doesn't do it right.
That would be true, but the bleepingcomputer article seems to be misquoting that fact.
The original research they reference (which I also find to be a more sensible article),
https://perception-point.io/blog/evasive-concatenated-zip-tr...
says that WinRAR only shows they last archive contents, and shows a screenshot of that:
> WinRAR, on the other hand, reads the second central directory and displays the contents of the second archive
Thanks, that's a much better article (and published on a site that, while slightly annoying on its own, is infested with fewer rudely intrusive and resource-consuming ads).
Reading both, it's clear that (a) you are correct, and (b) the submitted link, besides being materially inaccurate, is shameless reblog spam and should be changed.
Related, my two favourite ZIP parser issues:
https://bugzilla.mozilla.org/show_bug.cgi?id=1534483 "Ambiguous zip parsing allows hiding add-on files from linter and reviewers"
https://issues.chromium.org/issues/40082940 "Security: Crazy Linker on Android allows modification of Chrome APK without breaking signature"
The big problem with the ZIP format is that although the "spec" says what a ZIP file looks like, it does not tell you in concrete terms how to parse it, leading to all sorts of ambiguities and divergent implementations. Someone needs to write a "strict ZIP" spec that has explicit and well-defined parsing rules, and then we need to get every existing ZIP implementation to agree to follow said spec.
Or: better yet, just use an archive format for archival and a compression layer for compression. Don't use zip at all.
What non-compressing archive format would you suggest? tar doesn't support random access which is a non-starter for many use cases.
DAR (Disk ARchiver)[1] looks to be a good alternative. It supports random access, encryption, and individual file compression within the archive.
[1] http://dar.linux.free.fr/
That seems counter to GP's suggestion of doing compression at a separate layer
Not really. There's no "dar compression" format. It calls different compression tools just like tar.
You could say the same about ZIP (it uses deflate by default but optionally supports things like zstd)
Remember, secure encryption, good compression, and truely random data are indistinguishable.
It's best to paste that encrypted payload into a JPG with some bullshit magic headers and upload that to a trusted Exfil pivot instead.
Or, to get SuperMarioKart.rom to work with your chromeApp-XEMU emulator to play during down-time at work, just rename it to SMB.png, and email it you yourself.
One of the hacks I'm most proud of in my whole career was when we were doing a proof of concept at an enterprise client we were being deliberately obstructed by the internal IT group due to politics between their boss and the boss who sponsored our POC. For unrelated trademark-related reasons we were prevented by a third party from having the software on physical media but we had a specific contractual clause agreeing to let us download it for install. So while we had been contractually engaged to provide this software and we had a strict deadline to prove value, the enterprise IT group were preventing us from actually getting it through the virus-scanning firewall to get it installed. What to do?
The scanner looked for the signature of executable or zipped files and blocked them. It would also block any files larger than a certain size. So what I did was write two shell scripts called "shred" and "unshred". "Shred" would take any files you gave it as input, make them into a tarball, encrypt that to confuse the virus scanner and then split it up into chunks small enough to get through the firewall, and "unshred" would reverse this. This almost worked, but I found that the first chunk was always failing to transmit through the firewall. The scanner noticed some signature that openssl was putting at the front of the file when encrypting it. The solution? Change shred to add 1k of random noise to the front of the file and unshred to remove it.
Job done. Our files were transmitted perfectly (I got the scripts to check the md5sum on both sides to be sure), and though the process was slow, we could continue.
The funny thing was the POC was a bake-off versus another (more established) vendor and they couldn't get their software installed until they had done a couple of weeks of trench warfare with enterprise IT. "To keep things fair" the people organising the POC decided to delay to let them have time to install, and eventually the person blocking us from installing was persuaded to change their mind (by being fired), so "shred" and "unshred" could be retired.
I did basically the same, to get some important CLI tools past the company firewall, just a few months back.
Crazy that this is easier than dealing with the bullshit politics, to get some essentials tools to do my job. German public service is a joke. I quit since.
You could have just done ssh reverse shell to a public jump server you control? Might have been easier.
Good compression should still be cryptographically distinguishable from true randomness right?
Sure the various measures of entropy should be high, but I always just assumed that compression wouldn't pass almost any cryptographic randomness test.
not 'good' but maximum compression, yes.
Maximum compression should push ALL the 'work' to the compression alg, leaving essentially (no correlative bits) 'random' data to be decompressed.
Good compression, by definition, would leave small artifacts in the data, biasing it away from "true" randomness.
https://en.wikipedia.org/wiki/Kolmogorov_complexity
https://en.wikipedia.org/wiki/Shannon%E2%80%93Hartley_theore...
Related:
https://isc.sans.edu/forums/diary/zipdump+Evasive+ZIP+Concat...
https://isc.sans.edu/forums/diary/zipdump+PKZIP+Records/3142...
Encrypted ZIP files have long been a way to evade any sort of malware detection during transmission.
you dont need to even encrypt zip, since encrypted ZIP file can trigger tripwires during transmission.
unencrypted zip, but using .docx or .xlsx format is the way to go (the best way is to hide inside one of the openxml tags or xml comments )
encode it with enough filler as to reduce its "entropy" :)
I doubt it still works but things I needed to get through email I would embed in word documents.
Would probably still work. There's just too many formats which makes it very hard for a content blocker to really stop.
I pity the programmer that has to decode the 1000 versions of xls to find the binary blob that could be a virus.
1000? No. There's two. Openxml and the original xls. OpenXML can be scanned for issues like any other XML file.
Alas, it's more difficult to get excel to accept that it shouldn't delete leading zeros than it is to check a spreadsheet's sus-o-scale.
1000 is an over exaggeration, but it is not just 2 standards.
xls morphed with every version of Microsoft Excel. MS Excel has pretty good backwards compatibility, but making an xls parser is notoriously hard because of the many differences between versions.
A modern word document file (.docx) is literally just a Zip archive with a special folder structure, so unless your company is scanning word document contents I can’t imagine there’s any issue.
Pretty common in corporate world. Email scanner will helpfully drop all manner of useful files between staff, so make an encrypted zip with simple password.
I've done similar stuff. Concat a zip (that keeps throwing false positives) to a jpg and scanners will treat it like a jpg. Then write a script that chops off the jpg to access the zip. All this so I could automate a web a app deploy.
Or attach a zip through Exhange
Quote: "To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking."
That's the worse advice actually. You want the hidden shit to stay there unable to be seen by default programs. That's how you got all the crap in Windows mail starting from 90's when Outlook started to trying to be "smart" and automatically detect and run additional content. Be dumb and don't discover anything, let it rot in there. The only one that should do this is the antivirus, rest of unpackers/readers/whatever stay dumb.
I agree. The ZIP definition is extremely clear that the contents of the ZIP are defined by the single Central Directory at the end of the file. Local headers are only valid if pointed to by the Central Directory. Any other local headers are just supposed to be treated as garbage, except by software that is specifically meant to recover corrupted ZIP archive's contents.
Nice remix of an old technique.
I remember file packing exes together for fun and profit back in the day.
Last I checked virustotal doesn't test nested archives for viruses even though it's an issue old as modern computing
VirusTotal just passes on the file to the actual virus scanners (a for-loop with an api as a front-end) - it's up to each individual virus scanner to scan as they see fit (including scanning any unreferenced holes and compressed zip entries in a zip archive).
I have no idea why those virus scanners don't check nested archives. Maybe time/cpu constraints?
Why do scanners need to look inside compressed archives at all? If the malicious file is extracted, it can be detected then. If it’s not extracted and instead used directly from the archive, then the offending code that executes malicious payloads from inside archives should be detectable? Is that the role of AutoIt in this scenario?
Because they want to check on the wire, before it hits an endpoint.
Common situation is detecting malware sent through a phishing mail. You want to intercept those before a user can unpack the file.
Yes that was my question: "why, do you want to do that when it sounds like a futile job to do?"
If you have any kind of transformation (whatever trivial ROT or XOR) of the payload then there is no chance of detecting it via pattern matching, and if you need a "program" to process it into its malicious form, then how does detection work?
I understand you want to keep malicious payload from reaching endpoints in a form that would risk being consumed (malformed documents, images causing buffer overflows, executable and script files). But beyond that?
And "Defense In Depth" - the more layers that bad actors have to avoid/circumvent reduces the chance of their success.
see https://www.fortinet.com/resources/cyberglossary/defense-in-...
I checked the time to make sure today is in the year of 2024.
I swear this has been widely known at least since the Win 98 era.
> To defend against concatenated ZIP files, Perception Point suggests that users and organizations use security solutions that support recursive unpacking
Yeah, or, you know, just outright reject any ZIP file that doesn't start with a file entry, where a forward-scan of the file entries doesn't match the result of the central-directory-based walk.
There is just so much malicious crud coming in via email that you just want to instantly reject anything that doesn't look 'normal', and you definitely don't want to descend into the madness of recursive unpacking, 'cuz that enables another class of well-known attacks.
And no, "but my precious use-case" simply doesn't apply, as you're practically limited to a whole 50MB per attachment anyway. Sure, "this ZIP file is also a PDF is also a PNG is also a NES cartridge which displays its own MD5" (viz https://github.com/angea/pocorgtfo/tree/master/writeups/19) has a place (and should definitely be required study material for anyone writing mail filters!), but business email ain't it.
That's fair, but do realize that sometimes people do have to send around archives from the last century (they got archived for a reason!) or created by eldritch-horror tools that just make weird files (which, sometimes, are the gold masters for certain very important outputs...). And it's kind of annoying when these weird but standard files get silently dropped. Especially when that same file went through just fine yesterday, before the duly zealous security settings changed for whatever reason.
All I'm saying is, don't drop my stuff silently because your code couldn't be arsed to deal with (ugly) standard formats. At least give me a warning ("file of type not scannable" or whatever, the actual words are not so important). And then when I have to yell at the Shanghai people I can yell at them for the correct reasons.
Oh, nothing gets dropped silently, but bounced right back with `550 5.7.1 Message rejected due to content (Attachment refused: MATCH-code)`.
And for anything oversized, funny or otherwise non-standard, we offer a very convenient file transfer service.
The right way to do it!
I wish our infrastructure had been so thoughtful.
Not really a new technique. A long time ago in a galaxy far far away, I needed to get libraries from the internet onto an air-gapped network, and the supported way was to burn them to a disk and bring the disk to be scanned. The scanner never allowed executables so it would always reject the libraries. Can't make this up, but the supported way that InfoSec (!!) explained to us to get past the scanner was to take advantage of WinRAR being available on the network, so split the rar archive a bunch of times (foo.r01, foo.r02,...) and the scanner, being unable to parse them nor execute them, would happily rubber-stamp them and pass them along. As long as the process was followed, InfoSec was happy. Thankfully this was before the days when people were really worried about supply chain security.
Glad to see this bit of security theater recognized as such.
What year is it
Now?
Kakakakakak