Let’s challenge ourselves and create a server that can upload and download files in hundreds of GBs.
Coping with limited server resources while managing massive file uploads is a significant challenge faced by many. In this comprehensive guide, we delve into a project to handle the seemingly impossible: uploading a staggering 100 GB of files to a Document Management System (DMS) using a server equipped with just 512 MB of capacity. Join me as we explore innovative strategies, streamlined techniques, and ingenious optimizations that redefine the boundaries of file upload capabilities in the digital landscape.
For more context , please visit my previous blogs:
1.SAP Document Management Service Node Js Client
2.Best way to upload & download documents using SAP Document Management Service
To upload 100 GB of data we need to have this data in our local system, i.e. laptop or desktop.
Let’s create a proper file generator that can generate a specific number of files in specific sizes.
import crypto from "crypto";
import fs from "fs";
import * as path from "path";
// total no of files
const totalFiles = 20;
// combined size of all file.
const totalSize = 1 * 1024 * 1024 * 1024; // 1 GB
// generate random string of specific length
function generateRandomString(length) {
const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
let randomString = '';
// randomizing to eleminate server side caching
for (let i = 0; i < length; i++) {
const randomIndex = crypto.randomInt(characters.length);
randomString += characters.charAt(randomIndex);
}
return randomString;
}
// generate a file with the given name, with the fixed size.
async function writeInChunks(filename, fileSize) {
// writing in chunks to reduce memory usage
const chunkSize = Math.min(highWaterMark, fileSize);
for (let j = 0; j < fileSize; j += chunkSize) {
fs.appendFileSync(filename, generateRandomString(chunkSize));
console.log(`${filesize(j)} written in ${filename} `);
}
}
async function generateFiles() {
fs.mkdirSync("file-content");
// calculate each file size
const fileSize = Math.ceil(totalSize / totalFiles);
for (let i = 0; i < totalFiles; i++) {
const filename = path.join("file-content", `test-file-${i}.txt`);
// write to all files in chunks concurrently
writeInChunks(filename, fileSize);
}
}
Let’s create a client script that will break the files in different chunks and use our optimized server to upload it to DMS. If the server is busy, it has a retry functionality.
const axios = require("axios");
const fs = require("fs");
const {join} = require("path");
const chunkSize = 8 * 1024 * 1024 // 8 MB
async function uploadFileInChunks(file, stream) {
let i = 0;
// loop through chuck by chunk, only load 1 chunk at a time to memory
for await (const chunk of stream) {
console.log('Chunk :', file.name, i, chunk.length);
// first we should create the file then append each of them.
const operation = i === 0 ? "create" : "append";
let config = {
method: 'post',
url: 'http://localhost:3000/upload-optimised/',
headers: {
'cs-filename': file.name,
'cs-operation': operation,
'Content-Type': file.type
},
data: chunk
};
let status = 0;
let response;
// loop with delay until the server accepts the file.
while (status !== 200) {
try {
response = await axios.request(config);
status = response.status;
} catch (e) {
status = e.status;
}
// wait for 3s then try again
await sleep(3000);
}
console.log(response.data);
i++;
}
}
async function uploadFiles() {
for (const name of fs.readdirSync("file-content")) {
const filePath = join("file-content", name);
const file = {name: name, type: "plain/txt"};
// create stream and set chuck size
const stream = fs.createReadStream(filePath, {highWaterMark: chunkSize});
// concurrently upload all files
uploadFileInChunks(file, stream);
}
}
It is a simple express node js server which streams the REQ to the DMS, thus making it extremely memory efficient, creating and appending the file depending on the request.
Also, the script can limit the no of api calls made to DMS parallelly.
// configurable paramter: allow only specific no of api calls to DMS
const MAX_DMS_API_CALLS_ALLOWED = 22;
let docStoreCalls = 0;
app.post('/upload-optimised', async (req, res) => {
console.log("new request ", docStoreCalls);
// at max it allows 22 parallel DMS upload Request
if(docStoreCalls < MAX_DMS_API_CALLS_ALLOWED) {
// increment as api call is going to be made
// this is thread safe as Node.js is single threaded
docStoreCalls++;
try {
const fileName = req.headers["cs-filename"];
const opType = req.headers["cs-operation"];
const mimeType = req.headers["content-type"];
let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");
let response = {success: "false"};
if (opType === "create") {
response = await session.createDocumentFromStream("/temp", req, fileName);
}
if (opType === "append") {
const obj = await session.getObjectByPath("/temp/" + fileName);
const objId = obj.succinctProperties["cmis:objectId"];
response = await session.appendContentFromStream(objId, req);
}
res.json(response);
} catch (e) {
console.log()
} finally {
// this is thread safe as Node.js is single threaded
docStoreCalls--;
}
} else{
res.status(429).send("try again later");
}
});
The file generator is ready, the uploading client & the server. Let’s upload some documents.
Let’s test with 1 GB of data first,
-> Generate 1 GB across 20 files, individual file Size is 50 Mb.
uploading all the file(s) parallely, in 8 Mb chunks.
Total Uplaod Size : 8 Mb * 20 = 160 Mb uploaded parallely.
Memory Usage : 400 Mb ( Node Js Server )
Total Time taken: 7 Mins (depends on DMS Server)
Upload Speed: 1 GB (20) in 7 mins ≈ 200 Kbps
Let’s go up to 12 GB.
-> Generate 12 GB across 20 files, individual file Size is 600 Mb.
uploading all the file(s) parallely, in 8 Mb chunks.
Total Uplaod Size : 8 Mb * 20 = 160 Mb uploaded parallely
Memory Usage : 400 Mb ( Node Js Server )
Total Time taken: 44 Mins (depends on DMS Server)
Upload Speed: 12 GB (20) in 44 mins = 600 MB / 1800 seconds ≈ 200 Kbps
We can go up to 100 GB also, but there is a catch as the level of parallel API calls is restricted, the server will not allow all 200 files to be uploaded concurrently. Consequently, some files will need to wait their turn before being uploaded due to this limitation.
-> Generate 100 GB across 200 files, individual file Size is 500 Mb.
uploading all the file(s) parallely, in 8 Mb chunks.
but the server only serve 22 upload request in parallel so.
Uplaod Size : 8 Mb * 22 = 175 Mb upload parallely
Memory Usage : ≈ 400 Mb ( Node Js Server )
Total Time taken: 2 hour (depends on DMS Server)
After analyzing the results we can determine the following scalable metrics.
The no of parallel uploads can be increased by
But if we increase the memory.
we can
The Memory Size can be optimized by
Upload Time can be reduce by the combination of increasing the memory & allowing concurrent connection.
Throughout our tests, ranging from 1 GB to an impressive 100 GB upload, the server exhibited consistent efficiency despite resource limitations. It’s capability to manage diverse file sizes and simultaneous uploads while maintaining respectable speed highlighting its resilience.