Elasticsearch Analyzer is a wrapper which wraps three functions:
Character filter: Mainly used to strip off some unused characters or change some characters.
Tokenizer: Breaks a text into individual tokens(or words) based on certain factors like whitespace, ngram etc.
Token filter: It receives the individual tokens from tokenizer and then applies some filters on it (example changing uppercase terms to lowercase).
In a nutshell, an analyzer is used to tell elasticsearch how the text/phrase should be indexed and searched.
Why do we need analyzers?
Analyzers are generally used when you want to index a text or phrase. It is useful to break the text into words so that you can search on terms to get the document.
Example: Let’s say you have an index (my_index) with a field “intro” and you index a document:
Before Elasticsearch6.x, the analogy wrt Relational Databases was:
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields
which led to incorrect assumptions.
SQL tables are independent of each other and if two tables have same column names then they will be stored separately and even they can have different definitions (eg: Table_1 & Table_2 have a common column name “date” which can have different meaning for both the tables), which is not the case in elastic mapping types. Internally, fields that have same names in different mapping types are stored as same Lucene field, having said that, it implies that both the fields should have the same mapping definition. This breaks the analogy mentioned above.
So in order to break this analogy ES6.x doesn’t allow more than one mapping type for an index. Even they are planning to remove _type in the upcoming versions.
Question: How you’re going to differentiate documents for the same index then?
Reflection is a powerful feature of Java which provides the ability to inspect & modify the code at run time (manipulate internal properties of the program).
For example: It’s possible for a Java class to obtain the names of all its members and display them. Even we can also use reflection to instantiate an object, invoke it’s methods and change field values.
How it is done?
For every object JVM creates an immutable Class object which is used by reflection to get the run time properties of that object and once it has access we can change the properties. Reflection is not something which is used in daily programming tasks as it has some cons as well, one being a security threat, as using reflection we can get access to the private variables of a class and then can change it’s value.
How do we get access to the class object?
object.getClass();
After having the access we can get the methods, variables and constructors etc.
When a GC occurs in young generation space, it is completed quickly as the young generation space is small.
Young generation space is the space where newly instantiated objects are stored. Internally, this space has two survivor spaces which are used when GC occurs and the objects which still have references are shifted to a survivor space. If an object survives many cycles of GC, it is shifted to old generation space.
Problem is when GC occurs in Old generation space which contains long lived objects. This space uses a lot more memory than the young generation and when GC occurs in old generation, it literally halts all the requests made to that JVM process.
In simple words java 8 allows us to write code more precisely and concisely, which is better than writing verbose code in the java versions prior to java 8.
Example: Let’s sort a collection of cars based on their speed.
Java versions prior to java 8 :
Collections.sort(fleet, new Comparator() {
@Override
public int compare (Car c1, Car c2) {
return c1.getSpeed().compareTo(c2.getSpeed());
}
}
Instead of writing a verbose code like above, using java 8 we can write the same code as:
Java 8 :
fleet.sort(Comparator.comparing(Car::getSpeed));
The above code is more concise and could be read as “sort fleet comparing Car’s speed”.
So why write a boilerplate code which is not related to the problem statement. Instead you can write concise code which is related to the problem statement and has SQL like readability.