

Performance Analysis Results¶

Notes

These performance statistics were taken when the load average was below 3.8 in the 4 core instance.
Most of the examples given below use log sinks to log statistics for performance monitoring purposes. However, note that log sinks incur a high system overhead and can lower performance by even more than 50%.

Performance analysis results summary¶

The recommended CPU and memory specifications for Docker containers are as follows:

CPU: 4 cores
Memory: 8GB

The recommended memory specifications SI server as follows. These are configured in the <SI_Home>/wso2/server/bin/carbon.sh file.

Xms: 2GB
Xmx: 4GB

The exact specifications used in the use cases listed in this section are summarised in the table below:

Scenario	CPU	Memory	SI Memory Allocation	Input TPS	Input Message Size	Output TPS
Consuming events using Kafka source	4 cores	8GB	- Xms: 2g - Xmx: 4g	180K	60 bytes	Not Available
Consuming messages from an HTTP Source	4 cores	8GB	- Xms: 2g - Xmx: 4g	30K	60 bytes	Not Available
Sending HTTP requests and consuming the responses	4 cores	8GB	- Xms: 2g - Xmx: 4g	29K	Sent: 60 bytes Received: 60 bytes	- To HTTP Source: 29K - To HTTP Request Sink: 29K
Performing ETL tasks	4 cores	16GB	- Xms: 2g - Xmx: 4g	29K	Read: 100 bytes Stored: 200 bytes	To Oracle Store: 72K
Consuming messages from a Kafka source and publish to an HTTP endpoint	2 cores	Docker Memory: 3GB	- Xms: 256m - Xmx: 1g	10K	Consumed: 400 bytes Published: 600 bytes	10K
Consuming messages from a CSV file and publish to a MySQL table	4 cores	Docker Memory: 8GB	- Xms: 2g - Xmx: 4g	9K	Read: 300 bytes Published: 300 bytes	9K
Monitoring a database table in MySQL and publishing data to a Kafka topic	4 cores	Docker Memory: 8GB	- Xms: 2g - Xmx: 4g	13K	Read: 300 bytes Published: 300 bytes	13K
Read XML file and mapping to a stream	4 cores	Docker Memory: 8GB	- Xms: 2g - Xmx: 4g	40K	read: 350 bytes	40K
Reading an XML file and publishing to a Kafka topic	4 cores	Docker Memory: 8GB	- Xms: 2g - Xmx: 4g	38K	read: 350 bytes Published: 350 bytes	38K

Consuming events using Kafka source¶

Specifications of EC2 Instances¶

Stream Processor : c5.xLarge
Kafka server : c5.xLarge
Kafka publisher : c5.xLarge

Siddhi Application¶

@App:name("HelloKafka")

@App:description('Consume events from a Kafka Topic and publish to a different Kafka Topic')

@source(type='kafka',
        topic.list='kafka_topic',
        partition.no.list='0',
        threading.option='single.thread',
        group.id="group",
        bootstrap.servers='172.31.0.135:9092',
        @map(type='json'))
define stream SweetProductionStream (name string, amount double);

@sink(type='log')
define stream KafkaSourceThroughputStream(count long);

from SweetProductionStream#window.timeBatch(5 sec)
select count(*)/5 as count
insert into KafkaSourceThroughputStream;

Results¶

Average Consuming TPS from Kafka: 180K

Consuming messages from an HTTP Source¶

Specifications of EC2 Instances¶

Stream Processor : c5.xLarge
JMeter : c5.xLarge

Siddhi Application¶

@App:name("HttpSource")

@App:description('Consume events from http clients')

@source(type='http', worker.count='20', receiver.url='http://172.31.2.99:8081/service',
@map(type='json'))
define stream SweetProductionStream (name string, amount double);

@sink(type='log')
define stream HttpSourceThroughputStream(count long);

from SweetProductionStream#window.timeBatch(5 sec)
select count(*)/5 as count
insert into HttpSourceThroughputStream;

Results¶

Average Consuming TPS from Http Source: 30K

Sending HTTP requests and consuming the responses¶

Specifications of EC2 Instances¶

Stream Processor : c5.xLarge
JMeter : c5.xLarge
Web server : c5.xLarge

Siddhi Application¶

@App:name("HttpRequestResponse")

@App:description('Consume events from an HTTP source, ')

@source(type='http', worker.count='20', receiver.url='http://172.31.2.99:8081/service',
@map(type='json'))
define stream SweetProductionStream (name string, amount double);

@sink(type='http-request', l, sink.id='production-request', publisher.url='http://172.17.0.1:8688//netty_echo_server', @map(type='json'))
define stream HttpRequestStream (batchNumber double, lowTotal double);

@source(type='http-response' , sink.id='production-request', http.status.code='200',
@map(type='json'))
define stream HttpResponseStream(batchNumber double, lowTotal double);

@sink(type='log')
define stream FinalThroughputStream(count long);

@sink(type='log')
define stream InputThroughputStream(count long);

from SweetProductionStream
select 1D as batchNumber, 1200D as lowTotal
insert into HttpRequestStream;

from SweetProductionStream#window.timeBatch(5 sec)
select count(*)/5 as count
insert into InputThroughputStream;

from HttpResponseStream#window.timeBatch(5 sec)
select count(*)/5 as count
insert into FinalThroughputStream;

Results¶

Average Consuming TPS to HTTP Source : 29K
Average Publishing TPS from HTTP request sink : 29K
Average Consuming TPS from HTTP response source: 29K

Performing ETL tasks¶

Specifications of EC2 Instances¶

Stream Processor : m4.xlarge
JMeter : m4.xlarge
Web server : m4.xlarge

Siddhi Application¶

This scenario was tested using two Siddhi applications that execute the process explained below.

ETL Process

The two Siddhi applications are as follows:

ETLFIleRecordsCopier.siddhi

@App:name('ETLFileRecordsCopier')
@App:description('This sample demonstrates on integrating a File in a particular location with a Database.')

@source(type='file', mode='LINE',
    dir.uri='file:/Users/wso2/demo/accurate-files',
    action.after.process='MOVE',
    move.after.process='file:/Users/wso2/demo/moved',
    tailing='false',
    header.present='true',
    @map(
        type='csv',
        delimiter='|',
        @attributes(code = '0', serialNo = '1', amount = '2', fileName = 'trp:file.path', eof = 'trp:eof')))
define stream FileReaderStream (code string, serialNo string, amount double, fileName string, eof string); -- Reads from file

@Store(type="rdbms",
      jdbc.url="jdbc:mysql://localhost:3306/batchInformation?useSSL=false",
      username="root",
      password="root" ,
      jdbc.driver.name="com.mysql.jdbc.Driver",
      isAutoCommit = 'true')
define table AccurateBatchTable(serialNo string, amount double, fileName string, status string, timestamp long);

@sink(type='log', prefix='File to DB copying has Started: ')
define stream FileReadingStartStream(fileName string);

@sink(type='log', prefix='File to DB copying has Finished: ')
define stream FileReadingEndStream(fileName string);


from FileReaderStream
select serialNo, amount, fileName, "test" as status, eventTimestamp() as timestamp, count() as rowNumber, eof
insert into DataStream;

from DataStream
select *
insert into DataStreamPassthrough;

-- Write to DB Passthrough
from DataStreamPassthrough#window.externalTimeBatch(timestamp, 5 sec, timestamp, 10 sec)
select serialNo, amount, fileName, status, timestamp, rowNumber, eof
insert into TemporaryTablePassthroughStream;

-- Log First Record
from TemporaryTablePassthroughStream[rowNumber == 1]
select fileName
insert into FileReadingStartStream;

-- Log Every 100000th Record
from TemporaryTablePassthroughStream
select fileName, rowNumber as rows
insert into FileReadingInProgressStream;

-- Log Last Record
from TemporaryTablePassthroughStream[eof == 'true']
select fileName
insert into FileReadingEndStream;

-- Write to DB
from TemporaryTablePassthroughStream#window.batch()
select serialNo, amount, fileName, status, timestamp
insert into AccurateBatchTable;

ETLFileAnalyzer.siddhi

@App:name('ETLFileAnalyzer')
@App:description('This sample demonstrates on moving files to a specific location comparing its content with the header values.')

@source(type='file', mode='REGEX',
    dir.uri='file:/Users/wso2/demo/new',
    action.after.process='MOVE',
    move.after.process='file:/Users/wso2/demo/header-processed',
    tailing='false',
    @map(
        type='text',
        fail.on.missing.attribute = 'false',
        regex.A='HDprod-[a-zA-z]*-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]-([0-9]+)',
        @attributes(
            expectedRowCount = 'A[1]',
            fileName = 'trp:file.path')))
define stream HeaderReaderStream (fileName string, expectedRowCount long);

@source(type='file', mode='LINE',
    dir.uri='file:/Users/wso2/demo/header-processed',
    tailing='false',
    header.present='true',
    @map(
        type='csv',
        delimiter='|',
        @attributes(code = '0', serialNo = '1', amount = '2', fileName = 'trp:file.path', eof = 'trp:eof')))
define stream FileReaderStream (code string, serialNo string, amount double, fileName string, eof string);

@sink(type='log', prefix='Accurate Batch: ')
define stream AccurateFileNotificationStream (fromPath string);

@sink(type='log', prefix='Inaccurate Batch: ')
define stream InaccurateFileNotificationStream (fromPath string);

@sink(type='log', prefix='Batch checking started: ')
define stream ExpectedRowCountsStream (fileName string, expectedRowCount long);

define stream AnalyzingLogStream (fileName string, rowCount long);

define table ExpectedRowCountsTable (fileName string, expectedRowCount long, existingRowCount long);

@sink(type='log', prefix='Batch checking finished: ')
define stream ExistingRowCountsStream (fileName string, existingRowCount long);

-- Expected Row Count reader. Moves file from 'new' to 'header-processed'
from HeaderReaderStream[NOT(expectedRowCount is null) and NOT(fileName is null)]
select *
insert into ExpectedRowCountsStream;

from ExpectedRowCountsStream
select fileName, expectedRowCount, -1L as existingRowCount
insert into ExpectedRowCountsTable;

-- Existing Row Count calculator. Moves file from 'header-processed' to 'rows-counted'
from FileReaderStream
select *
insert into FileDataStream;

partition with (fileName of FileDataStream)
begin
    from FileDataStream
    select fileName, count() as rowCount, eof
    insert into #ThisFileRowCounts;

    from #ThisFileRowCounts
    select fileName, rowCount
    insert into AnalyzingLogStream;

    from #ThisFileRowCounts[eof == 'true']
    select fileName, rowCount as existingRowCount
    insert into ExistingRowCountsStream;
end;

-- Existing vs. Expected Row Counts comparer
from ExistingRowCountsStream as S inner join ExpectedRowCountsTable as T on str:replaceFirst(S.fileName, 'header-processed', 'new') == T.fileName
select S.fileName as fromPath, T.expectedRowCount as expectedRowCount, S.existingRowCount as existingRowCount
insert into FileInfoMatcherStream;

from FileInfoMatcherStream
select fromPath, existingRowCount
update ExpectedRowCountsTable
    set ExpectedRowCountsTable.existingRowCount = existingRowCount
    on ExpectedRowCountsTable.fileName == fromPath;

-- Accurate file mover
from FileInfoMatcherStream[expectedRowCount == existingRowCount]
select fromPath
insert into AccurateFileStream;

from AccurateFileStream#file:move(fromPath, '/Users/wso2/demo/accurate-files/')
select fromPath
insert into AccurateFileNotificationStream;

-- Inaccurate batch file mover
from FileInfoMatcherStream[expectedRowCount != existingRowCount]
select fromPath
insert into InaccurateFileStream;

from InaccurateFileStream#file:move(fromPath, '/Users/wso2/demo/inaccurate-files/')
select fromPath
insert into InaccurateFileNotificationStream;

For a detailed description of this scenario, see the Streaming ETL with WSO2 Streaming Integrator article

Results¶

The performance statistics of this scenario are as follows:

Lines : 6,140,031
Size : 124MB
Database : AWS RDS instance with oracle-ee 12.1.0.2.v15
Duration : 1.422 minutes (85373ms)

Consuming messages from a Kafka source and publish to an HTTP endpoint¶

Specifications of EC2 Instances¶

Docker resource allocation¶

Memory	3GB
CPU	2 Cores

Server memory allocation¶

Xms	256m
Xmx	1g

Siddhi applications¶

The following Siddhi applications were used in this scenario:

To read messages from a Kafka topic, do a transformation and insert into an in-memory topic:

@App:name('kafka-consumer')

@App:description('Reads messages from kafka topics and puts into in-memory-input topic')

@sink(type = 'inMemory', topic = "in-memory-input",
  @map(type = 'passThrough'))
define stream ToInMemoryInput (kafkaConsumerInTS long, kafkaConsumerOutTS long, locations string, material string, createdDate string, sid string, headline string, body string, publishTS long, id string);

@source(type = 'kafka', topic.list = "test3", threading.option = "single.thread", group.id = "group1", 
  bootstrap.servers = "172.31.39.91:9092", optional.configuration = "auto.offset.reset:latest",
  @map(type = 'json', fail.on.missing.attribute = "false", enclosing.element = "$"))
define stream FromKafkaMessage (locations string, material string, createdDate string, sid string, headline string, body string, publishTS string, id string, updatedDate string);

@sink(type = 'log', prefix = '----------------------Kafka Consumer Throughput per second: ',
  @map(type = 'json'))
define stream LogSink (totalEventsPerSec long);

@info(name = 'Kafka Consumer Event Timestamp')
from FromKafkaMessage
select eventTimestamp() as kafkaConsumerInTS, time:timestampInMilliseconds() as kafkaConsumerOutTS, locations, material, 
str:replaceFirst(createdDate, 'Z', 'GMT') as createdDate, 
sid, headline, body, time:timestampInMilliseconds(str:replaceFirst(ifThenElse(publishTS is null, updatedDate, publishTS), 'Z', 'GMT'), "yyyy-MM-dd'T'HH:mm:ss.SSSZ") as publishTS, ifThenElse( id is null, 'null', id) as id
insert into ToInMemoryInput;

from FromKafkaMessage#window.timeBatch(1 sec)
select count() as totalEventsPerSec
insert into LogSink;

To filter dynamic headers from incoming data stream

@App:name('Intermediate-process')
@App:description('Filter dynamic headers from incoming data stream')

@sink(type = "inMemory", topic = "in-memory-output", @map(type = "passThrough"))
define stream ToInMemoryOutput (sid string, connectionId string, headers string, data string);

@source(type = 'inMemory', topic = "in-memory-input", @map(type = 'passThrough'))
define stream FromInMemoryInput (kafkaConsumerInTS long, kafkaConsumerOutTS long, locations object, material 
object, createdDate string, sid string, headline string, body string, publishTS long, id string);


@info(name = 'Filter Heards Messages')
from FromInMemoryInput
select  sid, "test_connectionId" as connectionId, "'connectionId:test_connectionId','appKey:workManWork','Content-type:application/json'" as headers,
 str:fillTemplate("""
    {
        "type": "heards_sub_resp",
        "publishTS": {{publishTS}},
        "dynamicAppInTS": {{dynamicAppInTS}},
        "dynamicAppOutTS": {{dynamicAppOutTS}},
        "kafkaConsumerInTS": {{kafkaConsumerInTS}},
        "kafkaConsumerOutTS": {{kafkaConsumerOutTS}},
        "headline":"{{headline}}",
        "body":"{{body}}",
        "id": "{{id}}",
        "material": {{material}},
        "locations": {{locations}},
        "createdDate": "{{createdDate}}",
        "sid":"{{sid}}",
        "correlationId":"{{correlationId}}" 
    }""", 
    map:create(
    'headline', headline, 
    'body', body, 
    'id', id, 
    'material', json:getString(material, '$'), 
    'locations', json:getString(locations, '$'), 
    'createdDate', str:replaceFirst(createdDate, 'GMT', 'Z'), 
    'sid', sid, 
    'publishTS', publishTS, 
    'dynamicAppOutTS', time:timestampInMilliseconds(), 
    'dynamicAppInTS', eventTimestamp(), 
    'kafkaConsumerInTS', kafkaConsumerInTS, 
    'kafkaConsumerOutTS', kafkaConsumerOutTS, 
    'correlationId', 'Test123')) as data
insert into ToInMemoryOutput;

- To read output messages from the in-memory-output topic and publish them to the HTTP client

@App:name('ws-publisher')
@App:description('Reads from in-memory-output topic and publishes messages to client')

@source(type = 'inMemory', topic = "in-memory-output", @map(type = 'passThrough'))
define stream fromInMemoryOutput (sid string, connectionId string, headers string, data string);

@sink(type = 'http',
  method = "POST",
  publisher.url = "http://172.31.39.177:8280/services/TestProxy",
  headers = "{{headers}}",
  on.error = "LOG",
  max.pool.active.connections="1000",
  ssl.verification.disabled = "true",
  @map(type = 'json',
    @payload("""{"data":{{data}} }""")))

define stream ToWsClient (data string, wsPublisherOutTS long, headers string, connectionId string, sid string);

@sink(type = 'log', prefix = '----------------------WS Publisher Throughput per second: ',
  @map(type = 'json'))
define stream LogSink (totalEventsPerSec long);

@info(name = 'Add Filtered Message Timestamp')
from fromInMemoryOutput
select json:toString(json:setElement(json:setElement(json:toObject(data), '$', eventTimestamp(), 'wsPublisherInTS'), '$', time:timestampInMilliseconds(), 'wsPublisherOutTS')) as data, time:timestampInMilliseconds() as wsPublisherOutTS, headers, connectionId, sid
insert into ToWsClient;

from ToWsClient#window.timeBatch(1 sec)
select count() as totalEventsPerSec
insert into LogSink;

Results¶

Memory consumed: 1g
TPS: 10,000

Consuming messages from a CSV file and publish to a MySQL table¶

Specifications of EC2 Instances¶

Docker resource allocation¶

Memory	8GB
CPU	4 Cores

Server memory allocation¶

Xms	2g
Xmx	4g

Siddhi application¶

@App:name("FileToRdbms")

@App:description("Description of the plan")

@store(type='rdbms' , jdbc.url='jdbc:mysql://172.31.18.173:3306/purchesOrder?useSSL=false',username='root',password='root',jdbc.driver.name='com.mysql.jdbc.Driver') 
define table  PurchesOrderTable (orderID string, numberOfItems int, totalValue double, paymentStatus string, deliveryAddress string );


@source(type='file', mode='line',
file.uri='file:/home/ubuntu/csv/productTable.csv',
tailing='false',
action.after.process='MOVE',
move.after.process='file:/home/ubuntu/csv/moved',
@map(type='csv', delimiter=','))
define stream InventoryUpdate (orderID string, numberOfItems int, totalValue double, paymentStatus string, deliveryAddress string);

@async(buffer.size='4096', workers='2', batch.size.max='5000') 
define stream IntrimEventStream(orderID string, numberOfItems int, totalValue double, paymentStatus string, deliveryAddress string);


from InventoryUpdate
select *
insert into IntrimEventStream;


from IntrimEventStream
select *
insert into PurchesOrderTable;


from InventoryUpdate#window.timeBatch(1 sec)
select count() as throughput
insert into OutputStream;

from OutputStream#log('TPS: ')
insert into TempStream;

Results¶

Memory consumed: 2.56g
TPS: 9,000

Monitoring a database table in MySQL and publishing data to a Kafka topic¶

Specifications of EC2 Instances¶

Docker resource allocation¶

Memory	8GB
CPU	4 Cores

Server memory allocation¶

Xms	2g
Xmx	4g

Siddhi applications¶

@App:name("PurchaseOrderSiddhiApp")

@App:description("Description of the plan")

--@sink(type='log')
@source(type = 'cdc', url = "jdbc:mysql://172.31.18.173:3306/order?useSSL=false", username = "root", password = "root", table.name = "PurchesOrders", operation = "insert", 
    @map(type = 'keyvalue', fail.on.missing.attribute = "false"))
define stream PurchesOrderStream (orderID string, numberOfItems int, totalValue double, paymentStatus string, deliveryAddress string );

@sink(type='kafka',
      topic='delivery_items_topic',
      bootstrap.servers='172.31.3.169:9092',
      partition.no='0',
      @map(type='xml'))
define stream kafkaPublisherStream(orderID string, numberOfItems int, totalPayable double, deliveryAddress string);

@sink(type='log') 
define stream kafkPubTps(pubCount long);

@sink(type='log') 
define stream publishTps(recCount long);


from PurchesOrderStream[paymentStatus =='cod' or paymentStatus=='paid']
select orderID, numberOfItems, ifThenElse(paymentStatus=='cod', totalValue, 0.0) as totalPayable, deliveryAddress
insert into kafkaPublisherStream;


from PurchesOrderStream#window.timeBatch(1 sec)
select count() as recCount
insert into publishTps;


from kafkaPublisherStream#window.timeBatch(1 sec)
select count() as pubCount
insert into kafkPubTps;

Results¶

Memory Consumption: 1.5g
Time taken: 46 minutes
Data set size: 34,330,327

Read XML file and mapping to a stream¶

Specifications of EC2 Instances¶

Docker resource allocation¶

Memory	8GB
CPU	4 Cores

Server memory allocation¶

Xms	2g
Xmx	4g

Siddhi applications¶


@App:name("NodesConvertor")


@App:description("Description of the plan")

@source(
    type = 'file', 
    file.uri = "file:/home/ubuntu/csv/input.xml", 
    mode = "line",
    tailing = "false", 
    action.after.process='keep',
    @map(type='xml', 
        enclosing.element="/osm/node",
        enclosingElementAsEvent="true",
        enable.streaming.xml.content="true",
        fail.on.missing.attribute="false",
        @attributes(id = "/node/@id", lat = "/node/@lat", lon = "/node/@lon", version = "/node/@version", timestamp = "/node/@timestamp", changeset = "/node/@changeset")))
define stream FooStream (id string, lat string, lon string, version string, timestamp string, changeset string);


@info(name = 'totalQuery')
from FooStream#window.timeBatch(1 sec)
select count() as throughput
insert into OutputStream;

from OutputStream#log('TPS: ')
insert into TempStream;

Results¶

Memory consumption: 1.2g
TPS: 40,000

Reading an XML file and publishing to a Kafka topic¶

Specifications of EC2 Instances¶

Docker resource allocation¶

Memory	8GB
CPU	4 Cores

Server memory allocation¶

Xms	2g
Xmx	4g

Siddhi applications¶

@App:name("NodesConvertor")
@App:description("Description of the plan")

@source(
    type = 'file', 
    file.uri = "file:/home/ubuntu/csv/input.xml", 
    mode = "line",
    tailing = "false", 
    action.after.process='keep',
    @map(type='xml', 
        enclosing.element="/osm/node",
        enclosingElementAsEvent="true",
        enable.streaming.xml.content="true",
        fail.on.missing.attribute="false",
        @attributes(id = "/node/@id", lat = "/node/@lat", lon = "/node/@lon", version = "/node/@version", timestamp = "/node/@timestamp", changeset = "/node/@changeset")))
define stream FooStream (id string, lat string, lon string, version string, timestamp string, changeset string);

@sink(type='kafka',
      topic='kafka_result_topic',
      bootstrap.servers='172.31.3.169:9092',
      partition.no='0',
      @map(type='xml'))
define stream kafkaStream(id string, lat string, lon string, version string, timestamp string, changeset string);


@info(name = 'totalQuery')
from FooStream#window.timeBatch(1 sec)
select count() as throughput
insert into OutputStream;

from OutputStream#log('TPS: ')
insert into TempStream;

from FooStream 
select * 
insert into kafkaStream;

Results¶

Memory consumption: 1.7g
TPS: 38,000

Top