Skip to content

Instantly share code, notes, and snippets.

@nerycordova
Last active March 14, 2024 18:10
Show Gist options
  • Save nerycordova/5cf0e169d330d8fbba85529d14907d31 to your computer and use it in GitHub Desktop.
Save nerycordova/5cf0e169d330d8fbba85529d14907d31 to your computer and use it in GitHub Desktop.
Unzip large files in AWS S3 using Lambda and Node.js
//Dev.to article: https://dev.to/nerycordova/unzip-large-files-in-aws-using-lambda-and-node-js-cpp
const AWS = require("aws-sdk");
const s3 = new AWS.S3({ apiVersion: "2006-03-01" });
const unzipper = require("unzipper");
exports.handler = async (event) => {
//...initialize bucket, filename and target_filename here
try {
/**
* Step 1: Get stream of the file to be extracted from the zip
*/
const file_stream = s3
.getObject({ Bucket: bucket, Key: filename })
.createReadStream()
.on("error", (e) => console.log(`Error extracting file: `, e))
.pipe(
unzipper.ParseOne("file_name_inside_zip.ext", {
forceStream: true,
})
);
/**
* Step 2: upload extracted stream back to S3: this method supports a readable stream in the Body param as per
* https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property
*/
await s3
.upload({ Bucket: bucket, Key: target_filename, Body: file_stream })
.promise();
} catch (error) {
console.log("Error: ", error.message, error.stack);
}
};
@nerycordova
Copy link
Author

Hello, I'm new to this and need some help. How do I get the filename of the file inside the zip? unzipper.ParseOne("file_name_inside_zip.ext" I won't know the name ahead of time. Then I want to stream that file back to another S3 bucket. This code does work, but I had to hard-code the unzipped_filename to get it working. const unzipped_filename = "test.csv"; .pipe(unzipper.ParseOne(unzipped_filename, {forceStream: true,})); .upload({ Bucket: target_bucket, Key: unzipped_filename, Body: file_stream })

@tdough21 hope you were able to solve this back in May. In case not and for the record, in the docs you will see that most of the examples have this conditionif (fileName === "this IS the file I'm looking for"), see this one for example:

const zip = fs.createReadStream('path/to/archive.zip').pipe(unzipper.Parse({forceStream: true}));
for await (const entry of zip) {
  const fileName = entry.path;
  const type = entry.type; // 'Directory' or 'File'
  const size = entry.vars.uncompressedSize; // There is also compressedSize;
  if (fileName === "this IS the file I'm looking for") {
    entry.pipe(fs.createWriteStream('output/path'));
  } else {
    entry.autodrain();
  }
}

So, by design, the library allows you to get all file names inside the .zip package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment