SAS in a Pythonista’s workflow

Most who know me will probably know that I’m a major fan of Python in particular and open source software in general. Still, in my day job I use SAS quite a lot. Because I will be at the SAS Global Forum in Denver next week, I thought it would be a good time to write about the place of SAS in my work.


First of all, I can obviously not pay for the massive license fees of software like SAS, which also is my biggest objection to using it. So when I do anything on a freelance basis, it will be purely Python, in almost all assignments. In my day job I arrived, now 4,5 years ago, in a team that even was called the “SAS team”. Their main, and in fact for a long time only, tooling was SAS Base. After fighting for a good two years we do now have a decent Anaconda installation with a Jupyter Hub which makes for much faster data exploration, easier analytics and easier and much prettier visualization (no, SAS Visual Analytics is *not* a visualization tool, it’s a dashboarding tool, enormous difference).

Alright, so I can use Python and R and I still use SAS? Indeed I do. I think every tool is good for something. Part of our team is mainly occupied with ETL (extract, transform, load) work, building a good, stable, clean and well structured data warehouse that others (like me) can use at ease. For such work, SAS has some great tools. Moving data from a data base that is designed to make a particular application work smoothly into a data warehouse, that keeps track of the history of that data base as well is not a trivial task and keeping track of all relevant input data base changes is quite some work. This is done with SAS (by others) and results in a data warehouse stored in SAS data sets that I consequently use for data analysis, machine learning, you name it.

From a data warehouse in SAS tables, I find it easiest to do my data collection, merging, aggregation and all that prerequisite work one always needs for analytics in SAS as well. Most of my projects start with data wrangling in SAS, and when I get to a small number of tables that are ready for analysis, analytics or visualization, I move over to Python (where the fun starts).

Yes, I do have my strong opinions about many SAS tools, about SAS as a company, about how they work and about how the world (e.g. auditors, authorities) look at SAS versus how they look at software from the open source realms. Still, I use a tool that is good at what I need from it, if I have access to it. More and better interfacing between Python, R and SAS is well underway and I hope the integration between the two will only become smoother. Hoping that SAS would vanish from my world is just an utterly unrealistic view of the world.