Some Basic to advanced Kusto Queries
Here am inclusing some basic and advanced Azure Data Explorer Queries (ADX) or Kusto Queries which i met during my development journey.
Normal Tables
Create table in ADX
.create table tableName(ColumnName:datatype)
Alter table in ADX
.create-or-alter table tableName(ColumnName:datatype)
Create JSON Mapping, This will be used to ingest JSON data to ADX
.create table tableName ingestion json mapping “mappingName”
‘[‘
‘ { “column” : “tableColumnName”, “Properties”:{“Path”:”$.jsonproperty”}},’
‘ { “column” : “tableColumnName”, “Properties”:{“Path”:”$.jsonproperty.innerjsonproperty”}},’
‘ { “column” : “full_text”, “Properties”:{“Path”:”$”}}’
‘]’
Drop Table
.drop table tableName
Clear Table Data
.clear table tableName data
Alter data type of the column
.alter column [‘tableName’].[‘tableColumnName’] type=datatype
Ingest directly into tables, using tags while ingesting will help to remove the data based on tag, we can inset the same without tag also
.ingest inline into table tableName with(tags=’[“tagName”]’) <|
column1_data1,column2_data1
column1_data2,column2_data2
delete ingested data using tag
.drop extents <| .show table tableName extents where tags has “tagName”
ADX use one default ingestion policy, to alter ingestion latency. We can update the conditions based on the need
.alter table tableName policy ingestionbatching
‘{“MaximumBatchingTimeSpan”: “00:00:10”, “MaximumNumberOfItems”: 10, “MaximumRawDataSizeMB”:100}’
View Ingestion mapping of the database
.show database databaseName policy ingestionbatching
View Ingestion mapping of the table
.show table databaseName.tableName policy ingestionbatching
External Tables
Create External Table. We can use blob/adls as the storage kind also we can change partition based on the need
.create external table externalTableName(ColumnName:datatype)
kind=blob
partition by (DateTimeSource:datetime = bin(DateTimeSource, 1d))
dataformat=csv
(
h@’blobConnectionString'
)
Continuous Export
Create continuously export for a table. This will export the data from a table to another table
.create-or-alter continuous-export continousExportName
over (tableName)
to table externalTableName
with
(intervalBetweenRuns=15m,
forcedLatency=5m)
<| tableName
disable continuous export
.disable continuous-export continousExportName
Queries
One important note on the kusto queries as these conditions will run as chained queries.
Get count
tableName | count;
Take rows from entire list
tableName | take 10
Get the 100 rows after sorting the table with the specified column name
tableName | order by columnName desc | take 100
Prepare timeseries from data. Will count the data between the time range and prepare time chart with one hour bin interval
let min_t = toscalar(tableName| summarize min(columnName));
let max_t = toscalar(tableName| summarize max(columnName));
tableName | make-series num=count() default=0 on timeStampColumn in range(min_t, max_t, 1h) | render timechart
Prepare timeseries with average of column values
let min_t = toscalar(tableName| summarize min(columnName));
let max_t = toscalar(tableName| summarize max(columnName));
tableName | make-series num=avg(todouble(columnName)) default=0 on timeStampColumn in range(min_t, max_t, 1h) | render timechart
Get data between time range with string c, and count it
tableName
| where timeStampColumn between (todatetime(“2020–05–21T14:00:00.000Z”) .. todatetime(“2020–05–22T02:00:00.000Z”))
| count ;
Get data between time raneg with datatime columns
tableName
| where timeStampColumn between (datetime(2020–07–16 14:15:50) .. datetime(2020–07–16 14:15:55))
Group the data based on column with highest timestamp
tableName
| where timeStampColumn > ago(15m)
| summarize arg_max(timeStampColumn, *) by groupByColumnName
Get The maximum valued row
tableName
| where columnName== toscalar(tableName| summarize max(columnName));
Get Current user email who is using the ADX
current_principal_details()[“UserPrincipalName”]
Using “in” operator to check matching data from a list
Using “extend” operator to create additional column with calculated information. Columns added by “extend” operator will only be avaiable with the result set.
Using “prev” operator to find the previous row value of the column.
tableName
| where columnName in (‘value1’, ‘value2’)
| extend NewColumn= DateTimeSource — prev(DateTimeSource),
NewColumn2= prev(DateTimeSource, 2) — prev(DateTimeSource)
Joining Multiple Table Contions and Calculate the count of all
union withsource=’TempTableName’
(tableName1| where condition),(tableName2| where condition)
| count
Calculate time difference
tableName
| project diff=datetime_diff(‘minute’, timeStampColumn1, timeStampColumn2)
Dynamically taking the bin size based on the date time gap
let [‘_startTime’]=datetime(‘2021–02–16T08:51:11Z’);
let [‘_endTime’]=datetime(‘2021–02–17T09:51:11Z’);
tableName
| where timeStampColumn between ([‘_startTime’] .. [‘_endTime’])
| summarize Count = count() by summerizeColumn, bin(timeStampColumn, case( datetime_diff(‘hour’,[‘_endTime’],[‘_startTime’])<= 24, 10m,
datetime_diff(‘hour’,[‘_endTime’],[‘_startTime’]) <= 480, 1h,
1d))
| render timechart
Get data size of your table
tableName
| summarize Total=sum(estimate_data_size(*))
Get data size as chart for a period of days
tableName
| where timeStampColumn > ago(14d)
| make-series num=sum(estimate_data_size(*)) default=0 on timeStampColumn in range(ago(14d), ago(1m), 1d)
| render timechart
Get Duplicate count fom the data
tableName
| summarize count() by Columns
| where count_ >1
Will add more in future..