CODEX

Hello Everyone!!! Welcome to this article. In this article, I will try to explain functional programming concepts with Scala as a programming language.

The name Scala stands for “scalable language”. The language is so named because it was designed to grow with the demands of its users as with Scala, users can do a variety of tasks from writing small scripts to building large systems. Scala is a blend of Object-Oriented and Functional Programming concepts. So it has the advantages of both the programming paradigms.

Image for post
Image for post
Photo by Xiaole Tao on Unsplash

Why Scala?

Scala is a concise, high level and expressive language that follows a functional programming paradigm and scripting approach which is required for data transformation. Scala is widely used by big data engineers as the most powerful data processing framework which is called Apache Spark is written in Scala. …


Hello Everyone !!! Welcome to this article. In this article, I am going to talk to you about how to improve spark jobs performance. Spark is a very fast data processing framework in the data engineering world. With the tremendous amount of increase in data in the last few years, a need for fast data processing framework has emerged. Apache Spark solves that problem by providing fast in-memory data computations. Nowadays, almost every big data engineer uses Apache Spark for processing batch as well as real-time streaming data.

Image for post
Image for post
Photo by Paul Carmona on Unsplash

As Apache Spark works in a distributed manner, it needs a lot of resources like memory and cores (CPU) to complete a job, actually, that depends on the input data and some other factors as well. So, tuning the spark job is not that easy. Sometimes you don't know that your spark job is utilizing the correct amount of resources or not. You try to check this via spark UI and wonder why your application is running so long. So, there are certain ways that can help in improving the performance with the Spark jobs. In this article, we are going to understand 10 ways that can help in improving the spark performance. …


Welcome to another novel CNN architecture. In this article, we are going to discuss the Inception also known as GoogLeNet. InceptionNet was invented by Google and also won some of the ImageNet challenges.

Image for post
Image for post
Photo by Christophe Hautier on Unsplash

Motivation

Choosing hyperparameters is hard as we don’t actually know what should be the filter size which should be perfect for the input image. So the idea behind Inception was why not include all of them in a single layer and concatenate the results. So with this idea, they have used all of the filter sizes (1 x 1, 3 x 3, and 5 x 5) and max pooling. But a new problem arose which was Computational Cost because of using a 5 x 5 filter which increased the number of parameters. To solve this problem they have used the 1 x 1 convolution (Network-In-Network) concept. …


ResNet is a network structure proposed by He Kaiming, Sun Jian, and others of Microsoft Research Asia in 2015, and won first place in the ILSVRC-2015 classification task. At the same time, it won first place in ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation tasks.

Image for post
Image for post
Photo by Marc-Olivier Jodoin on Unsplash

To understand ResNet, we must first understand what kind of problems will occur when the network becomes deeper.

The Problem caused by increasing depth

Very deep Neural networks are difficult to train because of the Vanishing/Exploding Gradient problem. What does it mean? When we stack multiple layers in a Convolution Neural Network then in theory the training error should decrease but in practice or reality, adding more layers in the CNN (making CNN deeper), causes training error to increase instead of decrease. …


Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive is Hadoop’s SQL interface over HDFS which gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Image for post
Image for post
Photo by Dominic Hampton on Unsplash

In this article, I am going to explain a few performance optimizations techniques which you can implement while developing your Hive queries to improve it’s performance.

1. Use of Single Scan

By default, Hive does a complete table scan, so if you are performing multiple operations in the hive table then it’s recommended to use a single scan and use that to perform multiple operations. …


Hello Everyone !!! Welcome to the Deep Learning series. In this particular chapter, I am going to explain all about Optimizers. Optimizers play a very important role in Deep Learning or Neural Networks. So whenever you are building any neural network say CNN or RNN or any type of Neural Networks, you need to use an Optimizer in order to update the weights and biases. I am hoping you will be familiar with the basic terminologies with respect to Deep Learning. If you are, then this chapter will definitely help you in deciding optimizers for your Neural Networks.

Image for post
Image for post
Image source: Deep Ideas

What is an Optimizer?

Optimizers are used to change the attributes of your Neural Network such as weights and biases (w,b) in order to reduce the losses. …


Exception Handling and File Handling in Python

Welcome to the part-3 of the advanced Python series. In the second chapter, we have learned important concepts related to object-oriented programming. In this chapter, we are going to work with files, exception handling, and a few other concepts. Let’s start.

Image for post
Image for post
Photo by mostafa meraji on Unsplash

What does __name__ == '__main__' mean ?

Usually, in every Python project, we see the above statement. So what exactly it does, we are going to understand here. In simple words, __name__ is a special variable in Python which tells us the name of the module. Whenever you are going to run a python file directly, it sets a few special variables before executing the actual code. __name__ is one of the special variables. …


Welcome to the part-2 of the advanced python series. In the first chapter, we have understood important python concepts like Iterators, Generators, and Decorators in Python. In this chapter, we are going to learn all about Object-Oriented Programming.

Image for post
Image for post
Image By Author

Object-Oriented Programming in Python

Object-Oriented programming is very important as in every programming language, it is essential to modularize the code so that it can be re-used. We don’t want to write similar code at multiple places and also we want to provide a proper structure to the code so that it’s easier to understand. Hence to solve these and many other problems, we use Object-Oriented Programming. It’s not specific to any language however it’s a programming paradigm which many languages follow.

Classes and Instance Variables

Why we should use Classes. Well, Classes are used in most modern programming languages. They provide a way to logically group data and functions. …


Iterators, Generators and Decorators in Python

Welcome to the Python: More to the Basics series Part — 1. In this series, I am going to talk about Python Advance concepts which are very important to understand. Without understanding these concepts, it’s very difficult to apply them in Real-world more importantly in the Data Science world.

Image for post
Image for post
Photo by Sean Lim on Unsplash

Hence, let’s start with the first concept => Iterators, Generators, and Decorators in Python.

Iterators

Iterators are simply python objects which can be iterated upon and can be used to get element one by one from any collection. Iterators are everywhere in python for example for loop, generators, and comprehensions etc.

If you create a simple for loop in python then during execution that gets converted into iterators. …


Image for post
Image for post
Photo by Franki Chamaki on Unsplash

What is Data Engineering?

Data Engineering is one of the most critical and foundational skills in today’s world. It’s a continuous process of collecting the data from multiple sources and creating a data lake that can be exposed to the Data Science and Data Analytics Team. So Data Engineers provides a data platform to the other data teams — Data Analysts and Data Scientists.

Who is a Data Engineer?

A Data Engineer is somebody responsible for collecting the data from various sources, transforming it into a usable format, and storing it into a common data lake. …

About

Vishal Mishra

Big Data Engineer at JP Morgan Singapore | AWS Certified Solution Architect | Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store