Following are the software versions used while learning the below items.
Elastic search Version: 5.4.0
Java Version: 8
- unzip and start
- By default it will start as a single node cluster
- cluster and node concept
- check the indices
Response:
Request: add a new field color in the mobile 1
-- If all the documents trying to get is of same index, can be put in the url itself
from=10
size=2
The terms found in the index may vary based on how you analyze them
-- the most common filter is the range filter
-- term and filters could be combined
-- for text field, the above query will not work by default
-- have to enable fieldData for the text field
-- we can also bucket by range
- multi layer nesting of aggregations
-- you can use multiple filters instead of just one "filters" keyword
Elastic search Version: 5.4.0
Java Version: 8
Overview
- Download and install
- TFIDF
- Building an index
- Adding documents to index, individually and in bulk
- Search queries - query DSL
- Analysis of data , aggregations
- Lucene - Java
- Distributed - scales to many Nodes
- Highly available - multiple copies of data
- Restful APIs - CRUD, monitoring and other operations via simple JSON based HTTP calls
- Power query DSL - Schemaless
- Can be installed in machine, as well as cloud instance is available
Download
- download the latest version from www.elastic.co- unzip and start
- By default it will start as a single node cluster
- cluster and node concept
CRUD operations
- cURL (https://curl.haxx.se/download.html)
- create
- read/retrieve
- update
- delete
- Bluk operations on indexed documents
- Bulk creation of indices from json data
create a new index called products
curl -XPUT "localhost:9200/products?&pretty" { "acknowledged" : true, "shards_acknowledged" : true, "index" : "products" }
Requests
curl -XPUT "localhost:9200/customers?&pretty" curl -XPUT "localhost:9200/orders?&pretty"
- check the indices
Request:
curl -XGET "localhost:9200/_cat/indices?v&pretty"
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open products BkpL7fogS0uFYMkzV8TYZA 1 1 0 0 230 230b
Add documents to existing indices
-- Request to add Iphone7 Phone curl -XPUT "localhost:9200/products/mobiles/1?pretty" -H'Content-Type: application/json' -d' { "name": "Iphone 7", "camera": "12MP", "storage": "256GB", "display": "4.7inch", "battery": "1960mAh", "reviews": ["Incredibly happy after having used it for one week", "Best phone so far", "Very expensive"] } '
- here products is the index
- mobiles is the documentType
- can pass a documentId (1) to represent this document being created
- PUT is used for Create or delete
- Post is used to update
-- Response: { "_index" : "products", "_type" : "mobiles", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
-- Request to add Samsung Galaxy Phone curl -XPUT "localhost:9200/products/mobiles/2?pretty" -H'Content-Type: application/json' -d' { "name": "Samsung Galaxy", "camera": "8MP", "storage": "128GB", "display": "5.2inch", "battery": "1500mAh", "reviews": ["Best phone ever", "Love the screen size", "Awesome"] } '
-- Request to add Pixel 3 curl -XPUT "localhost:9200/products/mobiles/3?pretty" -H'Content-Type: application/json' -d' { "name": "Pixel 3", "camera": "12.2MP", "storage": "128GB", "display": "5.5inch", "battery": "2950mAh", "reviews": ["I Love the camera on this phone", "Awesome google phone"] } '
-- Request to add Macbook pro Laptop (Doctype is different) curl -XPUT "localhost:9200/products/laptops/1?pretty" -H'Content-Type: application/json' -d' { "name": "Macbook Pro", "storage": "500GB", "RAM" : "8GB", "display": "13inch", "os": "El capitan", "reviews": ["Size is sleek compared to other laptops", "Storage capacity is great"] } '
NOTE:
This request will fail because as of Lucene version 6.x, multiple doc types in a single index is not supported.{ "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "Rejecting mapping update to [products] as the final mapping would have more than 1 type: [mobiles, laptops]" } ], "type" : "illegal_argument_exception", "reason" : "Rejecting mapping update to [products] as the final mapping would have more than 1 type: [mobiles, laptops]" }, "status" : 400 }
Retrieving Documents
-- curl -XGET "localhost:9200/products/mobiles/1?pretty" { "_index" : "products", "_type" : "mobiles", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "Iphone 7", "camera" : "12MP", "storage" : "256GB", "display" : "4.7inch", "battery" : "1960mAh", "reviews" : [ "Incredibly happy after having used it for one week", "Best phone so far", "Very expensive" ] } }
-- check if document exist without retrieving the source curl -XGET "localhost:9200/products/mobiles/1?pretty&_source=false" { "_index" : "products", "_type" : "mobiles", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true } -- to fetch certain fields only in the json document curl -XGET "localhost:9200/products/mobiles/1?pretty&_source=name,reviews" { "_index" : "products", "_type" : "mobiles", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "reviews" : [ "Incredibly happy after having used it for one week", "Best phone so far", "Very expensive" ], "name" : "Iphone 7" } }
Update
- Update document by id
- Whole document
- Partial document
-- update of a document can be done via a put request (whole document) curl -XPUT "localhost:9200/products/mobiles/1?pretty" -H'Content-Type: application/json' -d' { "name" : "Iphone 7", "camera" : "12MP", "storage" : "256GB", "display" : "4.7inch", "battery" : "1960mAh", "reviews" : [ "Incredibly happy after having used it for one week", "Best phone so far", "Very expensive", "Much better than android phones" ] } ' Response: { "_index" : "products", "_type" : "mobiles", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 3, "_primary_term" : 1 }
- partial update of a document can be done using the _update endpoint, use the POST command with a doc field
Request: add a new field color in the mobile 1
curl -XPOST "localhost:9200/products/mobiles/1/_update?pretty" -H'Content-Type: application/json' -d' { "doc": { "color": "black" } }'
- script field can be used to update a field of a document
-- Request: increment the shoe size by 2 curl -XPOST "localhost:9200/products/mobiles/1/_update?pretty" -H'Content-Type: application/json' -d' { "script": "ctx._source.size += 2" }'
Deletes
- delete a document from an index
curl -XDELETE "localhost:9200/products/mobile/1?pretty
- delete an entire index
curl -XDELETE "localhost:9200/products/mobile?pretty
Bulk operations
- retrieve multiple documents
- _mget api allows us to get multiple documents in one command
curl "localhost:9200/_mget?pretty" -d' { "docs": [ { "_index": "products", "_type": "laptops", "_id": "1" }, { "_index": "products", "_type": "laptops", "_id": "2" } ] }'
-- If all the documents trying to get is of same index, can be put in the url itself
curl -XGET "localhost:9200/products/mobiles/_mget?pretty" -H'Content-Type: application/json' -d'{"docs": [{"_id": "1"}, {"_id": "2"}]}'
Index multiple documents
- The _bulk api allows to specify multiple operations in one go.
curl -XPOST "localhost:9200/_bulk?pretty" -H'Content-Type: application/json' -d' { "index": {"_index": "products", "_type": "mobiles", "_id": "3" } } { "name": "Puma", "size": 9, "color": "black" } { "index": {"_index": "products", "_type": "mobiles", "_id": "4" } } { "name": "New Balance", "size": 9, "color": "White" } '
Multiple operations in one command
- Multiple operations can be done using the _bulk api.
- create keyword can be used instead of index, to add a document to the index
- for create and update operation, one json has to follow with the actual json document to be created or updated.
curl -XPOST "localhost:9200/products/shoes/_bulk?pretty" -H'Content-Type: application/json' -d' { "index": { "_id": "3" } } { "name": "Puma", "size": 9, "color": "black" } { "index": {"_id": "4" } } { "name": "New Balance", "size": 8, "color": "White" } {"delete": { "_id": "2"}} { "create": {"_id": "5" } } { "name": "Nike Power", "size": 11, "color": "red" } { "update": {"_id": "1" } } { "doc": {"color": "orange" } '
Bulk index documents from a json file
Searching and filtering
Random json generator: www.json-generator.com-- Generate 1000 customer data and save it in json format Schema: [ '{{repeat(1000, 1000)}}', { name: '{{firstName()}} {{surname()}}', age: '{{integer(18, 75)}}', gender: '{{gender()}}', email: '{{email()}}', phone: '+1 {{phone()}}', street: '{{integer(100, 999)}} {{street()}}', city: '{{city()}}', state: '{{state()}}, {{integer(100, 10000)}}' } ]
Two context of search
- Query context
- Every document has a relevance score which tells how well the document matches the search term
- Search term can be specified as
- URL query parameter
- URL request body
- use of the _search api
curl -XGET "localhost:9200/customers/_search?q=wyoming&pretty" curl -XGET "localhost:9200/customers/_search?q=wyoming&sort=age:desc&pretty"
from=10
size=2
- Filter context
curl -XGET "localhost:9200/products/_search?pretty" -d' { "query": {"match_all": {} }, "size": 3, "from": 2, "sort": { "age": { "order": "desc" } } }
- Can search multiple indices
curl -XGET "localhost:9200/customers,products/_search?pretty" curl -XGET "localhost:9200/products/mobiles,laptops/_search?pretty"
- We can search on fields that we are interested in "term"
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "term": {"name": "gates"} } }
- we can append "_source": false in the above request to eliminate the body from the response.
- _source field is very powerful and we can even specify regular expressions
{ "_source": ["st*", "*n*"], "query": { "term": { "state": "washington"} } }
- we can specify to include or exclude some pattern from the source fields
{ "_source": { "includes": ["st*", "*n*"], "excludes": [ "*der"] }, "query": { "term": { "state": "washington"} } }
Full text queries
- match
- match_phrase
- match_phrase_prefix
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "match": { "name": "webster" } } }' -- above match keyword can be used to perform not an exact term match, but other ways also (other parameters) { "query": { "match": { "name": { "query": "frank morris", "operator": "or" } } } } -- logical OR matches , all documents having frank or morris in the name field -- default operator is OR
{ "query": { "match_phrase": { "name": "frank morris" } } } -- entire phrase has to match { "query": { "match_phrase_prefix": { "name": "fr" } } } -- all names that begins with the prefix fr -- this can be used as autocomplete
TFIDF
{ "common": { "reviews": { "query": "this is great", "cutoff_frequency": 0.001 } } }
- some of the terms in the query may be common words (stop words). treat any word with frequency > 0.1% as common word while searching
Compound queries
- Boolean query
- Matches documents by combining multiple queries using boolean operators such as AND, OR
- Must clause
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "bool": { "must": [ {"match": { "street": "ditmas" } }, {"match": { "street": "avenue" } } ] } } } '
- Should clause
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "bool": { "should": [ {"match": { "street": "ditmas" } }, {"match": { "street": "avenue" } } ] } } } '
- must_not clause
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "bool": { "must_not": [ {"match": { "state": "california texas" } }, {"match": { "street": "lane street" } } ] } } } '
- filter clause
Term queries
The exact term needs to be found in inverted index for indexed documentsThe terms found in the index may vary based on how you analyze them
- simple term queries
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "bool": { "should": [ {"term": { "state": {"value": "california"} } }, {"term": { "street": {"value": "idaho"} } } ] } } } '
- Boost some terms over others
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "bool": { "should": [ { "term": { "state": { "value": "california", "boost": 2.0 } } }, { "term": { "street": { "value": "idaho" } } } ] } } } '
Filters
- the documents in the result are not scored.
- just checks if the document should be included in the result or not.
-- the most common filter is the range filter
-- term and filters could be combined
curl "localhost:9200/customers/_search?pretty" -d' { "query": { "bool": { "must": { "match_all": {} }, "filter": [ { "term": { "gender": "female" } }, { "range": { "age": { "gte": 20, "lte": 30 } } } ] } } } '
Analytics and Aggregations
- Different kind of aggregations that can be performed
- Implement queries for metrics and bucketing aggregations
- Work with multi level nesting of aggregations
Four kind of Aggregations
- Metric
- Bucketing
- Matrix
- Pipeline
Metric Aggregations
- Aggregations over a set of documents
- All document in a search result
- Document within a logical group
Bucketing Aggregations
- Logically group documents based on search query
- A document falls into a bucket if the criteria matches
- Each bucket associated with a key
Matrix Aggregations
- Operates on multiple fields and produces a matrix result
- Experimental and may change in the future releases
- Not covered
Pipeline Aggregations
- Aggregations tht work on the output of other aggregations
- Experimental and may change in the future releases
- Not covered
Metric Aggregations
- numeric aggregations like sum, average, count, min, etc
- multi value stats aggregations
- aggregations are done by using the same _search api
- aggregations are done by using aggs keyword in the request body
- provide a name that you want to be assigned to the result - "avg_age"
- avg is the keyword for average aggregations
- field keyword specifies the field over which this aggregation is going to be performed
- size = 0, means we do not want any documents to be returned, we just want the final aggregate value
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "avg_age": { "avg": { "field": "age" } } } } '
- metric aggregations become more powerful when combined with search or filter queries
- the below query calcualtes the average age of all the customers who live in minnesota
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "query": { "bool": { "filter": { "match": { "state": "minnesota"} } } }, "aggs": { "avg_age": { "avg": { "field": "age" } } } } '
- elastic search can also calculate a whole range of statistics in one go
- specify the "stats" aggregation keyword within the "aggs" field
- "age_stats" is the field name that will appear in the response
- "stats" calculates the count, min, max, avg, sum of the age field
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "age_stats": { "stats": { "field": "age" } } } } '
Cardinality
- the number of unique values in a field across all documents
- enabling cardinality aggregations on text fields require some special setup for the field data
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "age_count": { "cardanality": { "field": "age" } } } } '-- since age is an integer value, the above query will directly work.
-- for text field, the above query will not work by default
-- have to enable fieldData for the text field
curl -XPUT "localhsot:9200/customers/_mapping/personal?pretty" -d' { "properties": { "gender": { "type": "text", "fielddata": true } } } '-- now you can run cardanality aggregation on the gender field
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "gender_count": { "cardanality": { "field": "gender" } } } }
Bucketing
- similar to the GROUP BY operation in sql
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "gender_bucket": { "terms": { "field": "gender" } } } } '
-- we can also bucket by range
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "age_range": { "range": { "field": "age", "ranges": [ { "to": 30}, { "from": 30, "to": 40}, { "from": 40, "to": 55}, { "from": 55 } ] } } } } '
- "keyed": true can be specified which changes the way the response is returned,
- also can specify key in the ranges
Multi level nested aggregations
- example of a metric aggregation nested inside a bucketing aggregation
- returns the average age of males and females
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "gender_bucket": { "terms": { "field": "gender" }, "aggs": { "average_age": { "avg": { "field": "age" } } } } } } '
- multi layer nesting of aggregations
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "gender_bucket": { "terms": { "field": "gender" }, "aggs": { "age_ranges": { "range": { "field": "age", "keyed": true, "ranges": [ { "key": "young", "to": 30}, { "key": "middle-aged","from": 30, "to": 55}, { "key": "senior","from": 55 } ] }, "aggs": { "average_age": { "avg": { "field": "age" } } } } } } } } '
Filter aggregation and filters keyword
- average age of customers from the state of texascurl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "state": { "filter": { "term": { "state": "texas" } }, "aggs": { "average_age": { "avg": { "field": "age" } } } } } } '
-- you can use multiple filters instead of just one "filters" keyword
curl -XPOST "localhost:9200/customers/_search?&pretty" -d' { "size": 0, "aggs": { "state": { "filters": { "filters": { "washington" : { "match": { "state": "washington" } }, "north carolina" : { "match": { "state": "north carolina" } }, "south dakota" : { "match": { "state": "south dakota" } } } }, "aggs": { "average_age": { "avg": { "field": "age" } } } } } } '
No comments:
Post a Comment
If you like to say anything (good/bad), Please do not hesitate...