Working with Apache Spark in data engineering tasks can sometimes be very tedious as it works in a distributed manner and it processes large scale data. You can code the logic and do the unit test and everything works fine in the development cluster. The problem occurs once you deploy the jobs to production. Because of the huge volume of data, you often get into performance issues and different types of errors/exceptions. OUT OF MEMORY is the most common exception that can occur in any spark application. I have also faced this issue multiple times in my experience. …


CODEX

Hello Everyone!!! Welcome to this article. In this article, I will try to explain functional programming concepts with Scala as a programming language.

The name Scala stands for “scalable language”. The language is so named because it was designed to grow with the demands of its users as with Scala, users can do a variety of tasks from writing small scripts to building large systems. Scala is a blend of Object-Oriented and Functional Programming concepts. So it has the advantages of both the programming paradigms.

Image for post
Image for post
Photo by Xiaole Tao on Unsplash

Why Scala?

Scala is a concise, high level and expressive language that follows a functional programming paradigm…


Hello Everyone !!! Welcome to this article. In this article, I am going to talk to you about how to improve spark jobs performance. Spark is a very fast data processing framework in the data engineering world. With the tremendous amount of increase in data in the last few years, a need for fast data processing framework has emerged. Apache Spark solves that problem by providing fast in-memory data computations. Nowadays, almost every big data engineer uses Apache Spark for processing batch as well as real-time streaming data.

Image for post
Image for post
Photo by Paul Carmona on Unsplash

As Apache Spark works in a distributed manner, it needs a lot…


Welcome to another novel CNN architecture. In this article, we are going to discuss the Inception also known as GoogLeNet. InceptionNet was invented by Google and also won some of the ImageNet challenges.

Image for post
Image for post
Photo by Christophe Hautier on Unsplash

Motivation

Choosing hyperparameters is hard as we don’t actually know what should be the filter size which should be perfect for the input image. So the idea behind Inception was why not include all of them in a single layer and concatenate the results. So with this idea, they have used all of the filter sizes (1 x 1, 3 x 3, and 5 x 5) and max…


ResNet is a network structure proposed by He Kaiming, Sun Jian, and others of Microsoft Research Asia in 2015, and won first place in the ILSVRC-2015 classification task. At the same time, it won first place in ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation tasks.

Image for post
Image for post
Photo by Marc-Olivier Jodoin on Unsplash

To understand ResNet, we must first understand what kind of problems will occur when the network becomes deeper.

The Problem caused by increasing depth

Very deep Neural networks are difficult to train because of the Vanishing/Exploding Gradient problem. What does it mean? When we stack multiple layers in a Convolution Neural Network then in theory the training error should…


Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive is Hadoop’s SQL interface over HDFS which gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Image for post
Image for post
Photo by Dominic Hampton on Unsplash

In this article, I am going to explain a few performance optimizations techniques which you can implement while developing your Hive queries to improve it’s performance.

1. Use of Single Scan

By default, Hive does a complete table scan, so if you are performing multiple operations in the hive table then it’s recommended to use a single scan and…


Hello Everyone !!! Welcome to the Deep Learning series. In this particular chapter, I am going to explain all about Optimizers. Optimizers play a very important role in Deep Learning or Neural Networks. So whenever you are building any neural network say CNN or RNN or any type of Neural Networks, you need to use an Optimizer in order to update the weights and biases. I am hoping you will be familiar with the basic terminologies with respect to Deep Learning. If you are, then this chapter will definitely help you in deciding optimizers for your Neural Networks.

Image for post
Image for post
Image source: Deep Ideas

What is an Optimizer?

Optimizers are…


Exception Handling and File Handling in Python

Welcome to the part-3 of the advanced Python series. In the second chapter, we have learned important concepts related to object-oriented programming. In this chapter, we are going to work with files, exception handling, and a few other concepts. Let’s start.

Image for post
Image for post
Photo by mostafa meraji on Unsplash

What does __name__ == '__main__' mean ?

Usually, in every Python project, we see the above statement. So what exactly it does, we are going to understand here. In simple words, __name__ is a special variable in Python which tells us the name of the module. Whenever you are going to run a python file directly, it sets a few special variables before executing the actual…


Welcome to the part-2 of the advanced python series. In the first chapter, we have understood important python concepts like Iterators, Generators, and Decorators in Python. In this chapter, we are going to learn all about Object-Oriented Programming.

Image for post
Image for post
Image By Author

Object-Oriented Programming in Python

Object-Oriented programming is very important as in every programming language, it is essential to modularize the code so that it can be re-used. We don’t want to write similar code at multiple places and also we want to provide a proper structure to the code so that it’s easier to understand. Hence to solve these and many other problems, we use Object-Oriented Programming. It’s not specific to any language however it’s a programming paradigm which many languages follow.

Classes and Instance Variables

Why we should use Classes. Well, Classes are used in most modern programming languages. They provide a way to logically group data…


Iterators, Generators and Decorators in Python

Welcome to the Python: More to the Basics series Part — 1. In this series, I am going to talk about Python Advance concepts which are very important to understand. Without understanding these concepts, it’s very difficult to apply them in Real-world more importantly in the Data Science world.

Image for post
Image for post
Photo by Sean Lim on Unsplash

Hence, let’s start with the first concept => Iterators, Generators, and Decorators in Python.

Iterators

Iterators are simply python objects which can be iterated upon and can be used to get element one by one from any collection. Iterators are everywhere in python for example for loop, generators, and comprehensions etc.

If…

Vishal Mishra

Big Data Engineer at JP Morgan Singapore | AWS Certified Solution Architect | Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store