An Objective World - Part 1 (Blog 01)

Before we get into any hands-on programming, I thought it would be a good idea to start with some background that will help us later on.

 

Python is one of several programming languages that can be described as an "object-oriented" language.  While the subject of object-oriented programming is a broad one about which many books have been written, we'll keep it short here, concentrating only on what you need to know to get started.

 

SAS and Python both have constructs that map keys to values.  In SAS we call them formats, in Python we call them dictionaries.  Both have two-dimensional data constructs.  In SAS we call it a data set, in Python, it's a DataFrame.  These naming differences, however, are only the tip of the iceberg with regards to how these structures differ across the languages.

 

In SAS, the data set is the fundamental unit of data.  It is a SAS proprietary file format you can find in Windows Explorer or attach to an email, just as you can with any other file format.  In Python, everything is an object, a term that's just as descriptive in programming languages as it is in the real world.  Look around you.  All you see are objects.  Many are fundamentally different from each other – the picture frame on your desk almost has nothing in common with the monitor on which you are reading this.  Chances are they were not made in the same factory, or in the same way with the same equipment.  But what they do have in common, is that they both have properties, and we can do stuff with both of them.  While SAS programmers are content with calling a data set a data set, in Python, it helps to think of a dictionary, a DataFrame, even an integer and a Boolean, as different "kinds" of objects, each of which has its own set of properties, and each of which we can do things to, or apply methods. 

 

In a SAS data step, the statement x = 4 tells the SAS processor to create a numeric variable inside the data set.  This variable automatically has certain attributes associated with it, such as Type, whose value is in this case "Num", but other times can be "Char".  The value of this attribute determines how we can use the variable.  Python has no data steps.  Like SAS's macro facility, Python variables can be created outside of any data context.  The statement x = 4 in Python creates an object called x.  More specifically, x here is an Integer object, which means that it has all of the properties and methods associated with Python integers.  Every different "kind" of object is created from a unique object blueprint called a class.  Each class defines its own unique properties and methods.

 

Python has several built-in classes.  Another that may look like something you might find in a SAS data set is a string.  A string is a sequence of alphanumeric characters.  Illustrated below are examples of different string methods.  Look familiar?

 

   myString = 'hello world'

   ex1 = myString.strip()

   ex2 = myString.upper()

   ex3 = myString.find('world')

 

While the syntax is a little different, what we're doing to a string is no different than what we do to data set Char variables in the data step – trimming leading and trailing spaces, uppercasing, finding text within a string.

 

At this point, we see a similarity between data set variable types and built-in Python classes (which, by the way, are sometimes referred to as data types).  But Python has other classes or object types too.  One important one is the list. 

 

A list is simply an ordered list of values.  These values can be integers, strings, dictionaries (we'll get to those in a minute), or any other types of objects.  We use square brackets to represent lists.

 

list1 = ['hello', 'world']

list2 = [1, 4, 9, 16, 25]

 

We can also use formulas to define the contents of a list:

list2 = [x*x for x in [1,2,3,4,5]]

 

Lists are iterable:

for x in [1,2,3]:

            print ('My favorite number is ' + str(x))

 

Note here that since we are iterating through a list of integers, x has to be converted to a string in order to concatenate with another string ("My favorite number is").

 

We can access elements of lists by their position in the list, enclosed in square brackets (0 is the first element).

 

print ('The first element of list2 is ' + list2[0])

 

List methods allow us to do things with elements of a list.  Can you guess what the append method does?

 

list1.append('how are you?')

 

list1 now has the value ['hello', 'world', 'how are you?'].

 

Another common object is a dictionary.  Although not a perfect comparison, I tend to equate these to SAS formats for their ability to serve as lookups.  Note the similarities below.

 

   proc format ; /* SAS */

   value $mymap 'M' = 'Male' 'F' = 'Female' ;

   run ;

 

   mymap = {'M':'Male' , 'F':'Female'} # Python

 

In SAS, the PUT function is used to find a value associated with a particular key.  In Python we enclose the key value in square brackets alongside a reference to the name of the dictionary.

 

   put('M', $mymap.) /* SAS */

   mymap['M'] # Python

 

Python has several built-in classes like those illustrated above, but users can also create their own classes, complete with their own properties and methods.  DataFrames, however, are objects that are not built from built-in classes, but rather from a third-party class.  In the next installment, we'll go over basic Python installation as well as the installation of these other classes.  After that, we'll be ready to start looking at data!

 

Write a comment

Comments: 0