Subscribe to DSC Newsletter

I work in online marketing analytics, and I do most of my data analysis using excel. However, recently I've come across some job postings listing SQL as one of the qualifications. I've used SQL in the past for things like retrieving, inserting, updating and deleting data from a database. However, I've never really used it as part of a data analysis process, and I'm wondering if it could be, or should be. I feel like you would use SQL to manage your data, but then use another tool (such as R or SPSS?) to analyze the data. Any thoughts would be appreciated.

Views: 3588

Reply to This

Replies to This Discussion

It is a mandatory requirement for almost any kind of analytic job.You can learn SQL by creating your own MySQL database: it's free (open source). At this time, very few people use in-database analytics (such as SQL server data mining -- which still has many drawbacks compared to open source solutions).
In theory you are correct. You shouldn't need SQL for anything else other than managing data. However since most shops are SQL centric, you probably should learn it only so that you would be able to extract your own data when the need arised. In some cases a SQL query tool is the only tool that you will have at your disposal, and it will be the common language for speaking with IT.

-Ralph Winters
I agree with Vincent and Ralph. Although your question is about using SQL during the analysis rather than using SQL to extract your data, here are some situations where i found SQL to be of great help :

1) When looking for outliers and marking them in a DB containing many records.
2) When doing transformations on Big Data.
3) Added bonus : Not relying all the time on -always- busy DBAs to give you what you need.
Thank you. That's all very helpful.
I agree with everyone.

Being able to write your own SQL, is valuable because of "downsized" IT departments with limited resources. And since I am writing my own SQL, I will sometimes clean data/generate new variables with the SQL. Maybe I want to extract just the domain part of an email address.

SELECT RIGHT(Email_Address, CHARINDEX('@', REVERSE(Email_Address))-1) AS [Domain Name]
Thank you for the feedback. I understand much better now. I'm also wondering what some programs are to analyze the data that should be on my radar. I'm starting a masters program in statistics in the spring and am interested in what the most common programs are for analyzing data. R? SPSS? SAS? And do many people know just one program, or multiple ones?

Got one more question. Let's say I have some historic stock data including Date, Opening Price, High, Low, Closing Price and Volume. I've written a script (using Ruby) that goes row by row and performs different operations depending on the numbers in each row, meanwhile updating a set of variables as it does this. The values of the set of variables once the script has run is what I'm interested in. I know how to do this programically, obviously, but I'm wondering if this kind of task can be done (or be done better) with statistical programs.
All 3 of the programs are good choices. Both R & SAS have modules which integrate SQL calls within the environment.
Knowing SQL is a huge plus in Analytics. Pulling data for my projects based on the business problem/requirement, processing the data, etc. using SQL is something I do everyday. Once I create the analysis or modeling dataset, I then import it into SAS or Clementine.

Don't think there are many differences amongst the data analysis programs. SAS is good in handling large datasets and they have been able to create this brand/image so successfully that SAS skills are very much in demand. Clementine is very good and easy to use, helps a lot when you don't have time and you don't want to write codes. But it's quite slow when you have large datasets. R requires a lot of coding skills and it's bigggest drawback is data handling capacity - it just cannot work on large datasets!


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service