Taiwan Hadoop Forum

台灣 Hadoop 技術討論區
現在的時間是 2018-10-22, 13:17

所有顯示的時間為 UTC + 8 小時




發表新文章 回覆主題  [ 2 篇文章 ] 
發表人 內容
 文章主題 : Cassandra on a free trail box?
文章發表於 : 2016-05-31, 17:05 
離線

註冊時間: 2016-05-31, 17:03
文章: 2
Hi,
I'm exploring the databricks trial version using a 3 node r3.2xlarge (Spark 1.6). Tried executing a Cassandra data insert (external cassandra DB on AWS cluster itself) using spark-cassandra connector 1.5.0-M3, it took an average of 28 seconds for inserting the data ( 3 rows, 2 columns) via a scheduler job. Am I missing something in the optimization here? I used the standard cassandra -spark code snippet
1. import com.datastax.spark.connector._
2. val sampleRdd = sc.parallelize(Array(("Databricks", 101), ("Spark", 110), ("Big Data", 10)))
3. // This function saves to an existing table. To save to a new table, use saveAsCassandraTable.
4. sampleRdd.saveToCassandra("test_keyspace", "words", SomeColumns("word", "count"))
Thanks,


回頂端
 個人資料 E-mail  
 
 文章主題 : Re: Cassandra on a free trail box?
文章發表於 : 2016-06-03, 21:51 
離線

註冊時間: 2009-11-09, 19:52
文章: 2897
kokilasoral 寫:
Hi,
I'm exploring the databricks trial version using a 3 node r3.2xlarge (Spark 1.6). Tried executing a Cassandra data insert (external cassandra DB on AWS cluster itself) using spark-cassandra connector 1.5.0-M3, it took an average of 28 seconds for inserting the data ( 3 rows, 2 columns) via a scheduler job. Am I missing something in the optimization here? I used the standard cassandra -spark code snippet
1. import com.datastax.spark.connector._
2. val sampleRdd = sc.parallelize(Array(("Databricks", 101), ("Spark", 110), ("Big Data", 10)))
3. // This function saves to an existing table. To save to a new table, use saveAsCassandraTable.
4. sampleRdd.saveToCassandra("test_keyspace", "words", SomeColumns("word", "count"))
Thanks,


It's hard to understand your question. I guess your question is about the latency (28 seconds).
Since the dataset sampleRdd is so small and according to these description, all I could guess are:
(1) maybe the network latency between Spark 1.6 and Cassandra DB is high
(2) maybe the spark is run on YARN ? so it need to "schedule" a job to run your insert. If this is the case, most of 28 seconds are wasted on 'scheduling'.

- Jazz


回頂端
 個人資料 E-mail  
 
顯示文章 :  排序  
發表新文章 回覆主題  [ 2 篇文章 ] 

所有顯示的時間為 UTC + 8 小時


誰在線上

正在瀏覽這個版面的使用者:沒有註冊會員 和 1 位訪客


不能 在這個版面發表主題
不能 在這個版面回覆主題
不能 在這個版面編輯您的文章
不能 在這個版面刪除您的文章
不能 在這個版面上傳附加檔案

搜尋:
前往 :  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
正體中文語系由 竹貓星球 維護製作