27 Worst Mistakes at Data Science Job Interviews
This applies to many tech job interviews. But here we provide specific advice for data scientists and other professionals with a similar background. More advice is being added regularly.
- Calling a data set with 100,000 observations 'big data'
- Thinking that the techniques that you learned at school can be applied to any kind of problems or data, with little if any adjustments. Not being aware of modern, robust, scalable techniques not learned at school. A solution is to get some data (there are tons of free data sets) and use a modern tool such as a cataloguer algorithm, to automatically process a few gigabytes of data. Now you have something interesting to talk about during your job interview, especially if you can describe the benefits that it offers (automated, fast indexation of big unstructured data, creation of search engines or taxonomies such as Amazon's big product listing)
- Unable to say much about the speed (computational complexity) of various algorithms, offering slow/inefficient solutions when asked to solve a problem, not knowing where the complexities and bottlenecks are in modern platforms.
- Believing that data is king. Not being able to guess where sources of bias and variance might come from. No experience working with messy data. Not knowing how data is produced, and how metrics are identified.
- Not being able to tell the pros and cons of two popular platforms, products, architectures, programming languages, or algorithms. You need to read the literature to become familiar with this. For instance, R versus Python, the 8 worst predictive techniques, or 10 types of regressions, which one to choose, or Hadoop versus Spark.
For the full list, click here.
4 Easy Steps to Structure Highly Unstructured Big Data
You have gathered gigabytes or terabytes of unstructured text, for instance scraping the Internet, or pieces of email from your employees or users, or tweets, or millions of products that you want to categorize (only product description and product name is available - sometimes with typos). Now you want to make sense of it, and extract value, possibly design a nice search engine so that your customers can easily find your products. The core algorithm that you need is an ...
Data Scientist Breaks State Monopoly on Lotteries
While the winning numbers generated look extremely random, just as random as traditional lottery winning numbers, they are actually produced by extremely rudimentary, short mathematical formulas. Think of the decimals of number Pi
New IoT Articles