Hitesh

Hi, there!

I'm Hitesh, a self-motivated Software Engineer passionate about real problem solving, building scalable web apps and data platforms. I specialize in C++, JavaScript, TypeScript, Python, React, Next.js, Node.jsand FastAPIwith hands-on experience in cloud and data engineering tools like AWS, Snowflake, Kafkaand PySpark. Having overall 2+ Y.O.E. of working in Wealth management and Fintech firms.

Recent

View all

Exciting Developments in TypeScript: The Go Port

Source: microsoft/typescript-go: Staging repo for development of native port of TypeScript In a recent podcast episode, the creators of TypeScript, Anders Hejlsberg and Daniel Rosenwasser, shared groundbreaking news about the future of the TypeScript compiler. They announced that they are porting the TypeScript compiler and toolset to native code using the Go programming language, promising a significant performance boost—up to 10 times faster than the current JavaScript implementation. JavaScript, while powerful and widely used, has inherent limitations that affect performance, especially for compute-intensive tasks like compiling and type checking. The TypeScript compiler, originally built in JavaScript, has faced challenges due to: Single-threaded Execution: JavaScript's runtime environment is primarily single-threaded, which means it cannot efficiently utilize modern multi-core processors. This limitation leads to longer compile times, especially for large codebases. Garbage Collection Overhead: JavaScript's garbage collection can introduce latency, as it periodically pauses execution to reclaim memory. This can slow down the compilation process, particularly in large projects. Inefficient Memory Management: JavaScript's dynamic nature means that every object allocation can lead to performance overhead. The lack of control over data structures can result in inefficient memory usage. Complex Type Checking: TypeScript's structural type system, while powerful, can be computationally expensive. The need to recursively check types across potentially large and interconnected codebases can lead to slow performance. The Go port of TypeScript aims to address these issues by leveraging the strengths of the Go programming language: Native Code Execution: By compiling TypeScript to native code, the new compiler can run significantly faster than its JavaScript counterpart. This allows for better performance on multi-core processors, enabling the compiler to handle tasks in parallel. Efficient Memory Management: Go's support for structs allows for more efficient data representation, reducing the overhead associated with object allocations in JavaScript. This leads to better memory usage and faster execution. Concurrency: The Go port takes advantage of Go's built-in concurrency features, allowing multiple parsing and type-checking operations to occur simultaneously. This is particularly beneficial for large projects, where tasks can be distributed across available CPU cores. Improved Type Checking: The new compiler will maintain the same error messages and behavior as the existing TypeScript compiler while improving performance. The port aims to optimize type checking by allowing multiple type checkers to operate in parallel, reducing the time taken to resolve types across large codebases. Future-Proofing with AI: The TypeScript team is also looking to integrate AI capabilities into the language service, enhancing features like refactoring and code suggestions. This could lead to a more intelligent development experience, where the compiler not only checks types but also assists developers in writing better code. The Go port of TypeScript represents a significant leap forward in addressing the performance limitations of the current JavaScript-based compiler. Source: A 10x Faster TypeScript - TypeScript | MicroSoft By harnessing the power of native code execution, efficient memory management, and concurrency, the TypeScript team aims to provide developers with a faster, more responsive tool for building large-scale applications. As the project progresses, the team encourages the community to engage with the new compiler, providing feedback and contributing to its development. The future of TypeScript looks promising, with the potential for enhanced performance and new features that leverage the latest advancements in technology. Github: microsoft/typescript-go: Staging repo for development of native port of TypeScript

Mar 14, 2025

Most of things in NodeJS are streams...

Streams are a fundamental concept in Node.js, allowing data to be read or written in chunks instead of loading everything into memory at once. This makes them especially useful for handling large files or continuous data. Examples of streams in Node.js include: HTTP requests and responses: When a server processes an HTTP request, it works as a readable stream. The response it sends back is a writable stream. File operations: Reading from or writing to files using the fs module can be handled as streams. Network communications: Sockets use streams to send and receive data. How Many Things in Node.js Are Streams? Node.js provides a variety of streams, which are core building blocks for handling data flow. These streams are categorized into four main types: Readable, Writable, Duplex, and Transform. 1. Readable Streams Streams from which data can be read. fs.createReadStream() for reading files. HTTP request (http.IncomingMessage). Process standard input (process.stdin). Network socket (net.Socket) in read mode. 2. Writable Streams Streams to which data can be written. fs.createWriteStream() for writing to files. HTTP response (http.ServerResponse). Process standard output and error (process.stdout and process.stderr). Network socket (net.Socket) in write mode. 3. Duplex Streams Streams that are both readable and writable. net.Socket (TCP socket connection). zlib compression streams (e.g., zlib.createGzip()). stream.Duplex for custom implementations. 4. Transform Streams Special duplex streams that can modify or transform data as it is written and read. zlib.createGzip() or zlib.createGunzip() for compression and decompression. crypto streams like crypto.createCipher() or crypto.createDecipher(). stream.Transform for custom transformations. Other Notable Stream Implementations File System (fs): Readable and writable streams for file operations. HTTP: Incoming requests (Readable stream). Server responses (Writable stream). Child Processes: child_process.spawn() and related methods provide streams for stdin, stdout, and stderr. Streams in Libraries: Streams used in third-party libraries like axios or request for handling data. WebSocket Streams: Some libraries like ws or Socket.io use streams for real-time communication. While there is no single definitive number because streams can be custom-implemented, the core Node.js API has several dozen implementations of streams across various modules. Why Use Streams for Large CSV Files? Processing large CSV files can be memory-intensive if the entire file is read into memory at once. By using streams, you can process the file line by line or chunk by chunk, keeping memory usage low and improving performance. Reading a Large CSV File Here is an example of how to read a large CSV file using streams: const fs = require('fs'); const readline = require('readline'); const readStream = fs.createReadStream('largefile.csv'); const rl = readline.createInterface({ input: readStream }); rl.on('line', (line) => { console.log(`Line: ${line}`); }); rl.on('close', () => { console.log('Finished reading the file.'); }); In this example, the fs.createReadStream method reads the file in chunks, and the readline module processes each line. Writing a Large CSV File Here is how you can write to a CSV file using streams: const fs = require('fs'); const writeStream = fs.createWriteStream('output.csv'); writeStream.write('Name,Age,Location\n'); writeStream.write('John,30,New York\n'); writeStream.write('Jane,25,London\n'); writeStream.end(() => { console.log('Finished writing to the file.'); }); The fs.createWriteStream method allows data to be written in chunks to the file. Transforming Data with Streams Sometimes, you may want to transform data while reading or writing. This can be done using transform streams: const fs = require('fs'); const { Transform } = require('stream'); const readStream = fs.createReadStream('largefile.csv'); const writeStream = fs.createWriteStream('output.csv'); const transformStream = new Transform({ transform(chunk, encoding, callback) { const modifiedChunk = chunk.toString().toUpperCase(); callback(null, modifiedChunk); } }); readStream.pipe(transformStream).pipe(writeStream); In this example, the transform stream converts all data to uppercase before writing it to the output file. Benefits of Streams Efficient memory usage Faster processing for large data Allows for real-time data processing

Dec 21, 2024

Which Data Storage Format you choose? Columnar vs Row-Based File Formats

Columnar vs Row-Based File Formats When working with large datasets, choosing the right file format can have a significant impact on performance and storage efficiency. The two common types of data formats are row-based and columnar-based formats. Each has its own strengths, depending on your use case, whether it's for transaction processing or analytical queries. What Are Row-Based and Columnar File Formats? Row-based file formats (like CSV or JSON) store data by rows, meaning each record is stored sequentially. On the other hand, columnar file formats (like Parquet or ORC) store data by columns, grouping values from the same column together. Key Differences Between Row-Based and Columnar Formats 1. Data Storage Layout Row-based: Data is stored as complete rows, meaning all fields for a record are stored together. Columnar: Data is stored by columns, meaning values for a particular column are grouped together. 2. Use Case Row-based: Best for transactional systems where entire records need to be written or read at once, such as in OLTP (Online Transaction Processing). Columnar: Ideal for analytical queries that focus on specific columns, commonly used in OLAP (Online Analytical Processing). 3. Read/Write Performance Row-based: Fast writes as entire records are stored together, but slower for analytical reads as it reads all columns even if only a few are needed. Columnar: Fast reads for analytics, since only the necessary columns are read. However, writes are slower as columns are written separately. 4. Data Compression Row-based: Less efficient for compression since rows contain diverse data types, making compression harder. Columnar: Highly compressible because each column typically contains similar data, which can be easily compressed. 5. Storage Efficiency Row-based: More efficient for small datasets or systems where full records are accessed at once. Columnar: More efficient for large datasets, especially in systems where only a few columns are frequently accessed. 6. Common Usage Scenarios Row-based: Frequently used in real-time applications, transactional databases, and systems that require fast row-level access. Columnar: Commonly used in data warehouses and big data systems that perform heavy analytical queries over specific columns. Example: Reading Data Let’s say you have a dataset with 1 million rows and 50 columns, but you only need to analyze two columns: Row-based: The system would read all 50 columns for each of the 1 million rows, even if only two columns are needed. This is inefficient for analytical queries. Columnar: The system only reads the two required columns, resulting in faster query times and lower I/O costs. Conclusion Row-based formats are best suited for applications that require frequent access to entire records, such as transactional databases. On the other hand, columnar formats excel in analytical environments where queries involve aggregating or filtering on specific columns. Understanding the differences will help you choose the right format for your project, depending on whether you need fast transactional processing or efficient data analysis.

Oct 15, 2024