

Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian) To get these we create record that basically skips the entire buffer except for the last six bytes, and then reads the first four bytes of these last six bytes. Again, assuming there are no comments in the Central directory, the start location is stored in a pointer that is located in the bytes 6 to 3 from the end of the ZIP file. Then we need to get the start location of the central directory. Source = Binary.Buffer(File.Contents(ZIPFile)), Let's stuff the data into a binary buffer: Since that is a costly operation, we better make sure we only do it once. First, we have to load the entire ZIP file into memory. In order to get to that file size information, we have to do some pointer math. Here is a ZIP file where the local header is missing the file size information: Hopefully that is true for your ZIP files. Let’s explore the alternative option – reading the ZIP file from the end and using the central directory information.Īdmittedly the following approach assumes that no comments are entered at the very end of the ZIP file. This means that Power Query code that reads the ZIP file from the beginning (the majority of code examples do it that way) will be unable to process ZIP files that do not have the proper local header information.

They ignore the local header (the file sizes are recorded as zero) and only write to the central directory. This is where the problem comes in with some of the ZIP files. Each file has its own local header (the File Entry Header), and at the end of the ZIP file a Central Directory repeats the file header information.īoth headers are supposed to contain information about the size of the compressed file. The compressed files are stored in the ZIP one after the other. (A quick excursion into the standard ZIP structure: (file_format)#Structure) (For a full solution that will work with multiple compressed files refer to. ZIP files come in all kinds of formats, and while most adhere to the basic standards of how a ZIP file should be structured, many legacy systems produce ZIP files that are decidedly non-standard.įor this article let’s assume that we are working with a very simple scenario – one ZIP file that only contains one compressed file. Here’s where the ZIP ecosystem throws a wrench into your plans. In your data source you can store your raw data efficiently – oftentimes the CSV files inside the ZIP are very suitable for compression – and when accessing the ZIP rather than the CSV you can reduce the network traffic as well (let’s ignore the scenario where that uses compression too…) and improve the dataset refresh times. Using ZIP files directly in Power BI is an attractive idea.
