Home » Posts tagged 'Java'

Tag Archives: Java

Spring Boot + Apache Spark

This post will guide you to create a simple web application using Spring Boot and Apache Spark.

For the demonstration we are going to build a maven project with Spring Boot 2.1.2 using the Spring Initializr web-based interface.

Cheers to the beginning đź™‚

Please follow the steps below to create the classic Apache Spark’s WordCount example with Spring Boot :

1) Creating the Web Application template:

We’ll be using Spring Initializr to create the web application project structure.

Spring Initializr is a web application used to generate a Spring Boot project structure either in Maven or Gradle project specification.

Spring Initializr can be used in several ways, including:

  1. A web-based interface
  2. Using Spring Tool Suite
  3. Using the Spring Boot CLI

For brevity we’ll be using the Spring initializr web interface.

  1. Go to https://start.spring.io/.

Note: By default, the project type is Maven Project and if you wish to select Gradle then just click on the Maven Project drop down and select Gradle Project.

2. Enter Group and Artifact details:

3. Type Web in Search for dependencies and select the Web option.

4. Now click on Generate Project:

This will generate and download the spring-spark-word-count.zip file which is your maven project structure.

5. Unzip the file and then import it in your favourite IDE.

After you’ve imported the project in your IDE (in my case Eclipse) the project structure looks as follows:

The package names are automatically generated with the combination of group and artifact details.

Moving forward I’ve changed the package names from com.technocratsid.spring.spark.springsparkwordcount to com.technocratsid for brevity.

You can even do this while generating the project using Spring Initializr web interface. You just have to switch to full version and there you’ll find the option to change the package name.

2) Adding the required dependencies in pom.xml:

Add the following dependencies in your project’s pom.xml

<dependency>
	<groupId>com.thoughtworks.paranamer</groupId>
	<artifactId>paranamer</artifactId>
	<version>2.8</version>
</dependency>
<dependency>
	<groupId>org.apache.spark</groupId>
	<artifactId>spark-core_2.12</artifactId>
	<version>2.4.0</version>
</dependency>

Note: You might be thinking why we need to add the paranamer dependency as spark core dependency already has it? This is because JDK8 is compatible with paranamer version 2.8 or above and spark 2.4.0 uses paranamer version 2.7. So, if you won’t add the 2.8 version, you’ll get an error like this:

Request processing failed; nested exception is java.lang.ArrayIndexOutOfBoundsException: 10582

After this your complete pom.xml should look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.1.2.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.technocratsid.spring.spark</groupId>
	<artifactId>spring-spark-word-count</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>Spring Spark Word Count</name>
	<description>Demo project for Spring Boot</description>

	<properties>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>com.thoughtworks.paranamer</groupId>
			<artifactId>paranamer</artifactId>
			<version>2.8</version>
		</dependency>
		<dependency>
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.12</artifactId>
			<version>2.4.0</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>

</project>

3) Adding the Spark Config:

Create a class SparkConfig.java in package com.technocratsid.config.

Add the following content to SparkConfig.java:

@Configuration
public class SparkConfig {

	@Value("${spark.app.name}")
	private String appName;
	@Value("${spark.master}")
	private String masterUri;

	@Bean
	public SparkConf conf() {
		return new SparkConf().setAppName(appName).setMaster(masterUri);
	}

	@Bean
	public JavaSparkContext sc() {
		return new JavaSparkContext(conf());
	}

}

Import the packages.

Note: Here we are declaring the JavaSparkContext and SparkConf as beans (using @Bean annotation) this tell the spring container to manage them for us.

@Configuration is used to tell Spring that this is a Java-based configuration file and contains the bean definitions.

@Value annotation is used to inject value from a properties file based on the the property name.

The application.properties file for properties spark.app.name and spark.master is inside src/main/resources and looks like this:

spark.app.name=Spring Spark Word Count Application
spark.master=local[2]

local[2] indicates to run spark locally with 2 worker threads.

If you wish to run the application with your remote spark cluster then edit spark.master pointing to your remote cluster.

4) Creating a service for Word Count:

Create a class WordCountService.java in package com.technocratsid.service and add the following content:

@Service
public class WordCountService {

	@Autowired
	JavaSparkContext sc;

	public Map<String, Long> getCount(List<String> wordList) {
		JavaRDD<String> words = sc.parallelize(wordList);
		Map<String, Long> wordCounts = words.countByValue();
		return wordCounts;
	}

}

Import the packages.

Note: This class holds our business logic which is converting the list of words into a JavaRDD and then counting them by value by calling countByValue() and returning the results.

@Service tells Spring that this file performs a business service.

@Autowired tells Spring to automatically wire or inject the value of variable from the beans which are managed by the the spring container.

5) Register a REST Controller with an endpoint:

Create a class WordCountController.java in package com.technocratsid.controller and add the following content:

@RestController
public class WordCountController {

	@Autowired
	WordCountService service;

	@RequestMapping(method = RequestMethod.POST, path = "/wordcount")
	public Map<String, Long> count(@RequestParam(required = true) String words) {
		List<String> wordList = Arrays.asList(words.split("\\|"));
		return service.getCount(wordList);
	}
}

Import the packages.

Note: This class registers an endpoint /wordcount for a POST request with a mandatory query parameter words which is basically a string like (“abc|pqr|xyz”) and we are splitting the words on pipes (|) to generate a list of words and then using our business service’s count() method with the list of words to get the word count.

6) Run the application:

Either run the SpringSparkWordCountApplication class as a Java Application from your IDE or use the following command:

mvn spring-boot:run

7) Test your application from a REST client:

For this demo I am using Insomnia REST Client which is quite handy with simple interface. You can use any REST client you want like Postman and Paw etc.

Once your application is up and running perform a POST request to the URL http://localhost:8080/wordcount with query parameter words=”Siddhant|Agnihotry|Technocrat|Siddhant|Sid”.

The response you’ll get:

You’ve just created your first Spring Boot Application and integrated Apache Spark with it.

If you want to hack into the code check out the github link.

What is Type Safety ?

Definition

Type safety is prevention of typed errors in a programming language.

A type error occurs when someone attempts to perform an operation on a value that doesn’t support that operation.

In simple words, type safety makes sure that an operation o which is meant to be performed on a data type x cannot be performed on data type y which does not support operation o.

That is, the language will not allow you to to execute o(y).

Example: 

Let’s consider JavaScript which is not type safe:

<!DOCTYPE html>
<html>
<body>
<script>
var number = 10; // numeric value
var string = "10"; // string value
var sum = number + string; // numeric + string
document.write(sum);
</script>
</body>
</html>

Output:

1010

The output is the concatenation of number and string.

Important point to note here is that JavaScript is allowing you to perform an arithmetic operation between an int and string.

As JavaScript is not type safe, you can add a numeric and string without restriction. This can lead to typed errors in type safe programming languages.

Let’s consider java which is type safe:

You can clearly observe that in java the compiler validates the types while compiling and throwing a compile time exception:

Type mismatch: cannot convert from String to int

 

As java is type safe, you cannot perform an arithmetic operation between an int and string.

Take away

Type-safe code won’t allow any invalid operation on an object and the operation’s validity depends on the type of the object.

Example of Java 8 Streams groupingBy feature

Statement: Let’s say you have a list of integers which you want to group into even and odd numbers.

Create a list of integers with four values 1,2,3 and 4:

List<Integer> numbers = new ArrayList<>();
numbers.add(1);
numbers.add(2);
numbers.add(3);
numbers.add(4);

Now group the list into odd and even numbers:

Map<String, List<Integer>> numberGroups= 
numbers.stream().collect(Collectors.groupingBy(i -> i%2 != 0 ? "ODD" : "EVEN"));

This returns a map of (“ODD/EVEN” -> numbers).

Printing the segregated list along with its offset (ODD/EVEN):

for (String offset : numberGroups.keySet()) {
  for (Integer i : numberGroups.get(offset)) {
    System.out.println(offset +":"+i);
  }
}

Outputs:

EVEN:2
EVEN:4
ODD:1
ODD:3

Refer Github for complete program.

How to re-index an index in Elasticsearch using Java ?

To re-index an index using java, build a re-index request using ReindexRequestBuilder API like:

ReindexRequestBuilder reindexRequest = 
new ReindexRequestBuilder(client,ReindexAction.INSTANCE)
    .source("source_index")
    .destination("destination_index")
    .refresh(true);

After creating a request execute the request:

reindexRequest.execute();

To validate whether the request is executed or not add a validation check:

if(copy.execute().isDone()) {
System.out.println("Request is executed");
}

Bingo! Your index is re-indexed.

Reflections in java

Reflection is a powerful feature of Java which provides the ability to inspect & modify the code at run time (manipulate internal properties of the program).

For example: It’s possible for a Java class to obtain the names of all its members and display them. Even we can also use reflection to instantiate an object, invoke it’s methods and change field values.

 

How it is done?

For every object JVM creates an immutable Class object which is used by reflection to get the run time properties of that object and once it has access we can change the properties. Reflection is not something which is used in daily programming tasks as it has some cons as well, one being a security threat, as using reflection we can get access to the private variables of a class and then can change it’s value.

 

How do we get access to the class object?

object.getClass();

 

After having the access we can get the methods, variables and constructors etc.

 

Stop the world phase

Garbage Collection literally stops the world.

When a GC occurs in young generation space, it is completed quickly as the young generation space is small.

Young generation space is the space where newly instantiated objects are stored. Internally, this space has two survivor spaces which are used when GC occurs and the objects which still have references are shifted to a survivor space. If an object survives many cycles of GC, it is shifted to old generation space.

Problem is when GC occurs in Old generation space which contains long lived objects. This space uses a lot more memory than the young generation and when GC occurs in old generation, it literally halts all the requests made to that JVM process.

So, the world literally stops !!

Why Java 8 ?

In simple words java 8 allows us to write code more precisely and concisely, which is better than writing verbose code in the java versions prior to java 8.

Example: Let’s sort a collection of cars based on their speed.

Java versions prior to java 8 :

Collections.sort(fleet, new Comparator() {
  @Override
  public int compare (Car c1, Car c2) { 
  return c1.getSpeed().compareTo(c2.getSpeed());
  }
}

Instead of writing a verbose code like above, using java 8 we can write the same code as:

Java 8 :

fleet.sort(Comparator.comparing(Car::getSpeed));

The above code is more concise and could be read as “sort fleet comparing Car’s speed”.

So why write a boilerplate code which is not related to the problem statement. Instead you can write concise code which is related to the problem statement and has SQL like readability.