Subscribe to DSC Newsletter

  1. Besides transaction scoring (credit card transactions, online ad delivery systems), what other applications need true real time?
  2. What types of applications work better with near real time, rather than true real time?
  3. How do you boost performance of true real time scoring? e.g. by having pre-loaded, permanent "in-memory" small look up tables (updated hourly) or other mechanisms? Please explain.
  4. How do you handle server load at peak times, as it can be 10x higher than (say) at 2am? And at 2am, do you use idle servers to run hourly or daily algorithms to refine / adjust real time scores?
  5. Are real time algorithms selected and integrated into production by data scientists, mostly based on their ability to be easily deployed in a distributed environment?
  6. Examples of hard-to-solve problems, how is it done? Example: 3-D streaming video processing in real time from a moving observation point (to automatically fly a large plane at low elevations in crowded skies)
  7. Do you think end users (e.g. decision makers) should have access to dashboards updated in true real-time, or is it better to offer 5-minute delayed statistics to end users? In which application real time is better, for end users?
  8. Is real time limited only to machine generated data?
  9. What is machine generated data? What about a real-time trading system that is based on recent or even extremely recent tweets or Facebook posts? Do you call this real time, big data, machine data, machine talking to machines, etc.?
  10. What is the benefit of "true real time" over (say) "5-minute delayed" signals in question 9)? Does the benefit (increased accuracy) outweight the extra costs? (On Wall Street, usually the answer is yes. But what about keyword bidding algorithms? - delayed reaction is OK?)
  11. Any rules of a thumb regarding optimum latency (when not implementing true real time) depending on the type of application? For instance, for Internet traffic monitoring, 15 minutes is good because it covers most user sessions.
  12. What kind of programming environments are well suited for big data in real-time (SQL is not, C++ is better, what about Hadoop? What technology do you use?)
  13. What kind of applications are well suited for big data in real-time?
  14. Example of metrics heavily / easily used in real time (e.g. time to last transaction)?
  15. To deliver clicks evenly and catch fraud right away, do you thing that the only solution for Google is to monitor ad delivery at the advertiser level, in true real time?
  16. Do you think Facebook use true real time for ad targeting? How could they improve their very low impression-to-click ratio? (10x below Google, I think) Why is this ratio so low despite the fact that they know so many things about their users? Could technology help?
  17. Future of real time over the next 10 years? What will become real time? What will stay hourly of end-of-day systems?
  18. Are all real-time systems actually hybrid, relying also on hourly and daily or even yearly (with seasonality) components to boost accuracy? How are real-time predictions performed for very sparse highly granular data, such as predicting the yield of any advertising keyword in real time for any advertiser? [answer: group sparse data into bigger buckets, make forecasts for the entire bucket]

Read related article: Seven questions about real time analytics.

Views: 3222

Reply to This

On Data Science Central

© 2019 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service