Following are the software versions used while learning the below items.
Elastic search Version: 5.4.0
Java Version: 8
- unzip and start
- By default it will start as a single node cluster
- cluster and node concept
- check the indices
Response:
Request: add a new field color in the mobile 1
-- If all the documents trying to get is of same index, can be put in the url itself
from=10
size=2
The terms found in the index may vary based on how you analyze them
-- the most common filter is the range filter
-- term and filters could be combined
-- for text field, the above query will not work by default
-- have to enable fieldData for the text field
-- we can also bucket by range
- multi layer nesting of aggregations
-- you can use multiple filters instead of just one "filters" keyword
Elastic search Version: 5.4.0
Java Version: 8
Overview
- Download and install
- TFIDF
- Building an index
- Adding documents to index, individually and in bulk
- Search queries - query DSL
- Analysis of data , aggregations
- Lucene - Java
- Distributed - scales to many Nodes
- Highly available - multiple copies of data
- Restful APIs - CRUD, monitoring and other operations via simple JSON based HTTP calls
- Power query DSL - Schemaless
- Can be installed in machine, as well as cloud instance is available
Download
- download the latest version from www.elastic.co- unzip and start
- By default it will start as a single node cluster
- cluster and node concept
CRUD operations
- cURL (https://curl.haxx.se/download.html)
- create
- read/retrieve
- update
- delete
- Bluk operations on indexed documents
- Bulk creation of indices from json data
create a new index called products
curl -XPUT "localhost:9200/products?&pretty"
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "products"
}
Requests
curl -XPUT "localhost:9200/customers?&pretty" curl -XPUT "localhost:9200/orders?&pretty"
- check the indices
Request:
curl -XGET "localhost:9200/_cat/indices?v&pretty"
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open products BkpL7fogS0uFYMkzV8TYZA 1 1 0 0 230 230b
Add documents to existing indices
-- Request to add Iphone7 Phone
curl -XPUT "localhost:9200/products/mobiles/1?pretty" -H'Content-Type: application/json' -d'
{ "name": "Iphone 7",
"camera": "12MP",
"storage": "256GB",
"display": "4.7inch",
"battery": "1960mAh",
"reviews": ["Incredibly happy after having used it for one week", "Best phone so far", "Very expensive"]
}
'
- here products is the index
- mobiles is the documentType
- can pass a documentId (1) to represent this document being created
- PUT is used for Create or delete
- Post is used to update
-- Response:
{
"_index" : "products",
"_type" : "mobiles",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
-- Request to add Samsung Galaxy Phone
curl -XPUT "localhost:9200/products/mobiles/2?pretty" -H'Content-Type: application/json' -d'
{ "name": "Samsung Galaxy",
"camera": "8MP",
"storage": "128GB",
"display": "5.2inch",
"battery": "1500mAh",
"reviews": ["Best phone ever", "Love the screen size", "Awesome"]
}
'
-- Request to add Pixel 3
curl -XPUT "localhost:9200/products/mobiles/3?pretty" -H'Content-Type: application/json' -d'
{ "name": "Pixel 3",
"camera": "12.2MP",
"storage": "128GB",
"display": "5.5inch",
"battery": "2950mAh",
"reviews": ["I Love the camera on this phone", "Awesome google phone"]
}
'
-- Request to add Macbook pro Laptop (Doctype is different)
curl -XPUT "localhost:9200/products/laptops/1?pretty" -H'Content-Type: application/json' -d'
{ "name": "Macbook Pro",
"storage": "500GB",
"RAM" : "8GB",
"display": "13inch",
"os": "El capitan",
"reviews": ["Size is sleek compared to other laptops", "Storage capacity is great"]
}
'
NOTE:
This request will fail because as of Lucene version 6.x, multiple doc types in a single index is not supported.{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Rejecting mapping update to [products] as the final mapping would have more than 1 type: [mobiles, laptops]"
}
],
"type" : "illegal_argument_exception",
"reason" : "Rejecting mapping update to [products] as the final mapping would have more than 1 type: [mobiles, laptops]"
},
"status" : 400
}
Retrieving Documents
-- curl -XGET "localhost:9200/products/mobiles/1?pretty" { "_index" : "products", "_type" : "mobiles", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "Iphone 7", "camera" : "12MP", "storage" : "256GB", "display" : "4.7inch", "battery" : "1960mAh", "reviews" : [ "Incredibly happy after having used it for one week", "Best phone so far", "Very expensive" ] } }
-- check if document exist without retrieving the source
curl -XGET "localhost:9200/products/mobiles/1?pretty&_source=false"
{
"_index" : "products",
"_type" : "mobiles",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true
}
-- to fetch certain fields only in the json document
curl -XGET "localhost:9200/products/mobiles/1?pretty&_source=name,reviews"
{
"_index" : "products",
"_type" : "mobiles",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"reviews" : [
"Incredibly happy after having used it for one week",
"Best phone so far",
"Very expensive"
],
"name" : "Iphone 7"
}
}
Update
- Update document by id
- Whole document
- Partial document
-- update of a document can be done via a put request (whole document)
curl -XPUT "localhost:9200/products/mobiles/1?pretty" -H'Content-Type: application/json' -d'
{
"name" : "Iphone 7",
"camera" : "12MP",
"storage" : "256GB",
"display" : "4.7inch",
"battery" : "1960mAh",
"reviews" : [
"Incredibly happy after having used it for one week",
"Best phone so far",
"Very expensive",
"Much better than android phones"
]
}
'
Response:
{
"_index" : "products",
"_type" : "mobiles",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
- partial update of a document can be done using the _update endpoint, use the POST command with a doc field
Request: add a new field color in the mobile 1
curl -XPOST "localhost:9200/products/mobiles/1/_update?pretty" -H'Content-Type: application/json' -d'
{
"doc": {
"color": "black"
}
}'
- script field can be used to update a field of a document
-- Request: increment the shoe size by 2
curl -XPOST "localhost:9200/products/mobiles/1/_update?pretty" -H'Content-Type: application/json' -d'
{
"script": "ctx._source.size += 2"
}'
Deletes
- delete a document from an index
curl -XDELETE "localhost:9200/products/mobile/1?pretty
- delete an entire index
curl -XDELETE "localhost:9200/products/mobile?pretty
Bulk operations
- retrieve multiple documents
- _mget api allows us to get multiple documents in one command
curl "localhost:9200/_mget?pretty" -d'
{
"docs": [
{
"_index": "products",
"_type": "laptops",
"_id": "1"
},
{
"_index": "products",
"_type": "laptops",
"_id": "2"
}
]
}'
-- If all the documents trying to get is of same index, can be put in the url itself
curl -XGET "localhost:9200/products/mobiles/_mget?pretty" -H'Content-Type: application/json' -d'{"docs": [{"_id": "1"}, {"_id": "2"}]}'
Index multiple documents
- The _bulk api allows to specify multiple operations in one go.
curl -XPOST "localhost:9200/_bulk?pretty" -H'Content-Type: application/json' -d'
{ "index": {"_index": "products", "_type": "mobiles", "_id": "3" } }
{ "name": "Puma", "size": 9, "color": "black" }
{ "index": {"_index": "products", "_type": "mobiles", "_id": "4" } }
{ "name": "New Balance", "size": 9, "color": "White" }
'
Multiple operations in one command
- Multiple operations can be done using the _bulk api.
- create keyword can be used instead of index, to add a document to the index
- for create and update operation, one json has to follow with the actual json document to be created or updated.
curl -XPOST "localhost:9200/products/shoes/_bulk?pretty" -H'Content-Type: application/json' -d'
{ "index": { "_id": "3" } }
{ "name": "Puma", "size": 9, "color": "black" }
{ "index": {"_id": "4" } }
{ "name": "New Balance", "size": 8, "color": "White" }
{"delete": { "_id": "2"}}
{ "create": {"_id": "5" } }
{ "name": "Nike Power", "size": 11, "color": "red" }
{ "update": {"_id": "1" } }
{ "doc": {"color": "orange" }
'
Bulk index documents from a json file
Searching and filtering
Random json generator: www.json-generator.com-- Generate 1000 customer data and save it in json format
Schema:
[
'{{repeat(1000, 1000)}}',
{
name: '{{firstName()}} {{surname()}}',
age: '{{integer(18, 75)}}',
gender: '{{gender()}}',
email: '{{email()}}',
phone: '+1 {{phone()}}',
street: '{{integer(100, 999)}} {{street()}}',
city: '{{city()}}',
state: '{{state()}}, {{integer(100, 10000)}}'
}
]
Two context of search
- Query context
- Every document has a relevance score which tells how well the document matches the search term
- Search term can be specified as
- URL query parameter
- URL request body
- use of the _search api
curl -XGET "localhost:9200/customers/_search?q=wyoming&pretty" curl -XGET "localhost:9200/customers/_search?q=wyoming&sort=age:desc&pretty"
from=10
size=2
- Filter context
curl -XGET "localhost:9200/products/_search?pretty" -d'
{
"query": {"match_all": {} },
"size": 3,
"from": 2,
"sort": { "age": { "order": "desc" } }
}
- Can search multiple indices
curl -XGET "localhost:9200/customers,products/_search?pretty" curl -XGET "localhost:9200/products/mobiles,laptops/_search?pretty"
- We can search on fields that we are interested in "term"
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"term": {"name": "gates"}
}
}
- we can append "_source": false in the above request to eliminate the body from the response.
- _source field is very powerful and we can even specify regular expressions
{
"_source": ["st*", "*n*"],
"query": {
"term": { "state": "washington"}
}
}
- we can specify to include or exclude some pattern from the source fields
{
"_source": {
"includes": ["st*", "*n*"],
"excludes": [ "*der"]
},
"query": {
"term": { "state": "washington"}
}
}
Full text queries
- match
- match_phrase
- match_phrase_prefix
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"match": {
"name": "webster"
}
}
}'
-- above match keyword can be used to perform not an exact term match, but other ways also (other parameters)
{
"query": {
"match": {
"name": {
"query": "frank morris",
"operator": "or"
}
}
}
}
-- logical OR matches , all documents having frank or morris in the name field
-- default operator is OR
{
"query": {
"match_phrase": {
"name": "frank morris"
}
}
}
-- entire phrase has to match
{
"query": {
"match_phrase_prefix": {
"name": "fr"
}
}
}
-- all names that begins with the prefix fr
-- this can be used as autocomplete
TFIDF
{
"common": {
"reviews": {
"query": "this is great",
"cutoff_frequency": 0.001
}
}
}
- some of the terms in the query may be common words (stop words). treat any word with frequency > 0.1% as common word while searching
Compound queries
- Boolean query
- Matches documents by combining multiple queries using boolean operators such as AND, OR
- Must clause
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"bool": {
"must": [
{"match": { "street": "ditmas" } },
{"match": { "street": "avenue" } }
]
}
}
}
'
- Should clause
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"bool": {
"should": [
{"match": { "street": "ditmas" } },
{"match": { "street": "avenue" } }
]
}
}
}
'
- must_not clause
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"bool": {
"must_not": [
{"match": { "state": "california texas" } },
{"match": { "street": "lane street" } }
]
}
}
}
'
- filter clause
Term queries
The exact term needs to be found in inverted index for indexed documentsThe terms found in the index may vary based on how you analyze them
- simple term queries
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"bool": {
"should": [
{"term": { "state": {"value": "california"} } },
{"term": { "street": {"value": "idaho"} } }
]
}
}
}
'
- Boost some terms over others
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"bool": {
"should": [
{
"term": {
"state":
{
"value": "california",
"boost": 2.0
}
}
},
{
"term":
{
"street":
{
"value": "idaho"
}
}
}
]
}
}
}
'
Filters
- the documents in the result are not scored.
- just checks if the document should be included in the result or not.
-- the most common filter is the range filter
-- term and filters could be combined
curl "localhost:9200/customers/_search?pretty" -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": [
{
"term": {
"gender": "female"
}
},
{
"range": {
"age": {
"gte": 20,
"lte": 30
}
}
}
]
}
}
}
'
Analytics and Aggregations
- Different kind of aggregations that can be performed
- Implement queries for metrics and bucketing aggregations
- Work with multi level nesting of aggregations
Four kind of Aggregations
- Metric
- Bucketing
- Matrix
- Pipeline
Metric Aggregations
- Aggregations over a set of documents
- All document in a search result
- Document within a logical group
Bucketing Aggregations
- Logically group documents based on search query
- A document falls into a bucket if the criteria matches
- Each bucket associated with a key
Matrix Aggregations
- Operates on multiple fields and produces a matrix result
- Experimental and may change in the future releases
- Not covered
Pipeline Aggregations
- Aggregations tht work on the output of other aggregations
- Experimental and may change in the future releases
- Not covered
Metric Aggregations
- numeric aggregations like sum, average, count, min, etc
- multi value stats aggregations
- aggregations are done by using the same _search api
- aggregations are done by using aggs keyword in the request body
- provide a name that you want to be assigned to the result - "avg_age"
- avg is the keyword for average aggregations
- field keyword specifies the field over which this aggregation is going to be performed
- size = 0, means we do not want any documents to be returned, we just want the final aggregate value
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
'
- metric aggregations become more powerful when combined with search or filter queries
- the below query calcualtes the average age of all the customers who live in minnesota
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"query": {
"bool": {
"filter": {
"match": { "state": "minnesota"}
}
}
},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
'
- elastic search can also calculate a whole range of statistics in one go
- specify the "stats" aggregation keyword within the "aggs" field
- "age_stats" is the field name that will appear in the response
- "stats" calculates the count, min, max, avg, sum of the age field
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"age_stats": {
"stats": {
"field": "age"
}
}
}
}
'
Cardinality
- the number of unique values in a field across all documents
- enabling cardinality aggregations on text fields require some special setup for the field data
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"age_count": {
"cardanality": {
"field": "age"
}
}
}
}
'
-- since age is an integer value, the above query will directly work.-- for text field, the above query will not work by default
-- have to enable fieldData for the text field
curl -XPUT "localhsot:9200/customers/_mapping/personal?pretty" -d'
{
"properties": {
"gender": {
"type": "text",
"fielddata": true
}
}
}
'
-- now you can run cardanality aggregation on the gender fieldcurl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"gender_count": {
"cardanality": {
"field": "gender"
}
}
}
}
Bucketing
- similar to the GROUP BY operation in sql
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"gender_bucket": {
"terms": {
"field": "gender"
}
}
}
}
'
-- we can also bucket by range
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"age_range": {
"range": {
"field": "age",
"ranges": [
{ "to": 30},
{ "from": 30, "to": 40},
{ "from": 40, "to": 55},
{ "from": 55 }
]
}
}
}
}
'
- "keyed": true can be specified which changes the way the response is returned,
- also can specify key in the ranges
Multi level nested aggregations
- example of a metric aggregation nested inside a bucketing aggregation
- returns the average age of males and females
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"gender_bucket": {
"terms": {
"field": "gender"
},
"aggs": {
"average_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
'
- multi layer nesting of aggregations
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"gender_bucket": {
"terms": {
"field": "gender"
},
"aggs": {
"age_ranges": {
"range": {
"field": "age",
"keyed": true,
"ranges": [
{ "key": "young", "to": 30},
{ "key": "middle-aged","from": 30, "to": 55},
{ "key": "senior","from": 55 }
]
},
"aggs": {
"average_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
}
}
'
Filter aggregation and filters keyword
- average age of customers from the state of texascurl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"state": {
"filter": { "term": { "state": "texas" } },
"aggs": {
"average_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
'
-- you can use multiple filters instead of just one "filters" keyword
curl -XPOST "localhost:9200/customers/_search?&pretty" -d'
{
"size": 0,
"aggs": {
"state": {
"filters": {
"filters": {
"washington" : { "match": { "state": "washington" } },
"north carolina" : { "match": { "state": "north carolina" } },
"south dakota" : { "match": { "state": "south dakota" } }
}
},
"aggs": {
"average_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
'
No comments:
Post a Comment
If you like to say anything (good/bad), Please do not hesitate...