söndag 8 maj 2016

An introduction to Data Analysis with Python

In this great 10.25 minutes you will be introduced to doing data analysis using Python. Very nice. A must see!

fredag 6 maj 2016

Great summary of Python and R - a short note

I found this post: R Vs Python For Datascience and it is very interesting. In this post R and Python are compared. You get to learn the pros and the cons of the two languages. I personally like Python because it is a general-purpose language. However, there are so many more (really, there are so many more) packages for statistics in R.
You should really read the post though. It is really helpful. The image above is linked to the site. The conclusion is that there is no winner. You yourself have to choose language. I chose both. What will you choose?

onsdag 4 maj 2016

Three great Python books to get you going (they are free)

In this short post I am going to share three great, and free, Python books. I think that you could easily go from the first I will list (Think Python) to the last (Think Bayes) and you will have started your journey into Python and data science really nicely. You will have a lot of knowledge to go further with! The three books are written by Allen Downey.

Think Python

Think Python: How to Think As A Computer Scientist are a great starter for someone that want to learn Python. The first edition uses Python 2 and the second edition uses Python 3. If you are completely new to Python I suggest that you go for the second edition. This book introduces beginners to the Python language. 

Think Stats

You are of course interested in statistics and Python, right? Then you should go on to Think Stats. As with Think Python there are a first edition and a second edition for Python 2 and 3, respectively. Think stats introduces you to exploratory statistics using Python. Really handy books and you will learn a lot. Both on computation and statistical programming (i.e., in Python).

Think Bayes

Think Bayes  introduces you to bayesian statistics. Bayesian statistics is really up and coming in the cognitive sciences. It offers very intuitive interpretation (p-values are not intuitive!). As for now you have to read a book written for Python 2. However, you will find updated code on Allen Downey's github page: updated code. I would suggest that you read the book but look on the new code and learn how to do it in Python 3. 

There are probably a bunch of more free books out there for learning Python and statistics. I stick with these three in this post because they are short and you will learn so much from them.

That is it for now, take care. 

tisdag 3 maj 2016

More on learning how to code Python - a Cognitive scientists journey to coding

In this post I will continue the discussion on programming in Python for cognitive scientists. I will go from a perspective of data collection, to analysis, and finally to writing your results up (yes, you can basically use python for all these tasks!)

Collecting data

There are several ways you may collect data as a cognitive scientist. All depends on your research question(s). I will in this post only discuss two; collecting online data using social media and/or questionnaires, and collecting data using laboratory experiments. In fact, I will barely mention the first one but you can scrape a lot of behavior:ish data off, for instance, Twitter and maybe throw in questionnaires in that.

Creating experiments to use in data collection

Programming, or building, experiments have for long been carried out with crappy and expensive tools (e.g., E-prime). Although I understand the attraction in simpler experiment building tools where you drag and drop objects. When you are finished building your experiment you generate a script by pressing a button. All fine. However, you may at times need to do more advanced stuff and then you will need add stuff like inline code (e.g., write some scripts and add to the "timeline" in the builder). Recently, it has appeared a couple of free and open-source Python tools for creating experiments. Two of them, PsychoPy and OpenSesame, offers builders and inline scripting (much like e-prime).
OpenSesame builder GUI
Some of the others just gives you an API to ease some of the coding of your experiment (PsychoPy can be used as a library, also). That is, you import it as it was any other Python library (after you have installed it, of course). For instance, if you use the Python library Expyriment you will import what you need from the library:

from expyriment import design, control, stimuli, io, misc

On Expyriments website you can find some beginner's tutorials.

If you are interested in using PsychoPy's builder mode you can watch the following youtube tutorial:

In this tutorial you will learn how to create a classical psychology experiment; the stroop task (of course, in its original form pen and paper were used...). For a psycho-linguistic researcher the following tutorial may be more adequate:
 
More resources on Psychopy can be found on the software's resources page.  You will find out that coding using the library of Psychopy (e.g., importing the stuff you need for your experiment from the PsychoPy library is much like the short Expyriment example above).
When you have learned how to create and code your own experiment in Python you will be able to collect a lot of data you probably want to analyze your data. Although MATLAB, and more recently, R have had the majority of the cognitive science crowd when it comes to analysis (you can also create experiment in MATLAB using psychotoolbox and such) you can OF COURSE do your analysis in Python.

Data analysis

Common statistical methods in Psychology, and related fields, are linear regression, t-test, and analysis of variance (ANOVA). Especially when it comes to experiments when doing more subjective survey studies other techniques such as factor analysis (FA) and structural equation modelling (SEM) are carried out. Of course, an experimental design may also need such multivariate analysis'. If you are interested in FA and SEM in Python I must disappoint you here, however. As far as I know you can only carry out principal component analysis (which is not 'real' factor analysis according to my old stats teacher!)

Enough of my rambling you say? What CAN I do in Python?! Well you CAN do t-tests, linear regression (non-linear also), ANOVA, etc. For instance, using the package Statsmodels we can carry out all of the methods (except for repeated measures ANOVA, however). Sci-kit learn, a machine learning library, can also do a lot of statistics. Of course, SciPy can do some basic parameteric tests and Pandas (and SciPy and NumPy) can carry out most descriptive statistics you'd want to have. Repeated measures ANOVA can be carried out using the package Pyvttbl which, sadly, seem to be un-maintained. No more updates of that...

That is it, most of the stuff I list here I found via this excellent site: Python and R as tools for data analysis and  creating Psychology experiments. If you follow the link you will find discussions on Python IDEs for Psychology researchers (or any other scientist), how to do ANOVA for repeated and dependent measures, and some descriptive statistics. All in Python.

That is it for me now. 

Please leave a comment if you have any suggestions!

måndag 2 maj 2016

Bokeh Tutorial

In this great Tutorial Video you get to learn how to use Bokeh to create interactive visualisations.

lördag 30 april 2016

Python for Cognitive Scientists - Choosing your IDE

In my two previous posts I have shared some good learning resources for Python (here) and a short guide on why you should learn Python as a Cognitive Scientist (here). This post will touch on what I think is a very important tool when it comes to programming Python (but also for programming in any language); namely the Integrated Development Environment (IDE).

Although Python, as well as most programming languages, can be written using any text editor (e.g., Notepad in Windows or Gedit, Vim, Emacs, etc. in Linux) an IDE offers a little bit more (yes, both VIM and Emacs are very powerful but may have a steeper learning curve...). Below you can see a quote from the above linked Wikipedia article:
An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of a source code editor, build automation tools and a debugger. Most modern IDEs have intelligent code completion.
 I think the above quote illustrates pretty nicely why an IDE is a powerful and necessary tool for any programmer.

In this post I will consider three Python IDEs: Spyder, PyCharm, and Rodeo  (there are more. See this link, for example).

Spyder

Spyder is the IDE I have used most of my time and there are some aspects of it that I prefer before PyCharm and Rodeo.
[Spyder is] a powerful interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features

Spyder uses iPython as its default command line environment (of course, you can choose to use the regular Python interpreter also). iPython comes with a lot of perks: such as that it has built-in support for Matplotlib. It also alleviates some of the issues with editing modules, since iPython supports auto-reloading modules.

In Spyder you can run a selected part of the script (F9) and a complete script file (F5) from within the editor.

It also offers deep introspection, highlights errors,  warnings, and opens up the docstring information when calling a function (i.e., iPython functionality). Errors and warnings are displayed on the left of the line number. Inspection is carried out by clicking and holding  over the error and warning icons. Highlighting is extremely helpful because when highlighting a variable all instances of it is also highligted; easily tracked, that is!

Furthermore, it has a very similar function to MATLAB (CTRL+D - Open definition feature in MATLAB): Spyder can find the file or line where a function was defined by holding CTRL and clicking the function name.
One really great feature of Spyder's interface is the object inspector. All objects that were created in the iPython console can be examined in the inspector. Note, you can also inspect any variable in the console; by typing the variable name and getting the output. Nice feature! It has been a while since I worked with MATLAB but as far as I remember MATLAB has a similar feature. I am, however, working quite a lot in RStudio which has a similar feature. For sure. I like PyCharm very much and it is free.

PyCharm

The next IDE is PyCharm. I have not worked that much in it and if you are a not a student (or an academic, I think) you may have to pay for the professional version of the IDE (Spyder is FREE!). I got myself a student license of the professional version some weeks ago and I start to like it!

PyCharm is one of the most popular Python IDEs. It’s has so many  features. For instance,  incredible code completion, code analysis,  andcode navigation. It also have very good Django, JavaScript, HTML, and CSS support, great debugger, to name a few!

PyCharm provides smart code completion, code inspections, on-the-fly error highlighting and quick-fixes, along with automated code refactorings and rich navigation capabilities.
 The interface of PyCharm can be customized. There are some themes that are dark (I need dark themes on my software) and you can make it look great.  As far as I understand when you execute your code (SHIFT-F10) your variables are not saved in the console. You can, however, select your code and right-click to choose "Execute selection in Console" (the console seems also to be iPython out-of-the box). I also like that it has good version control support (e.g., GitHub). That is great. I think I will continue using this and making it my number one IDE. Just need to find a way to have an object inspector or something similar. PyCharm supports plugins and there seems to be many (I use markdown many times so I installed a markdown plugin, for instance).  Maybe I will update this section when I have played around more with PyCharm.

Rodeo

I have not used Rodeo that much but I am going to mention it. Why? Well, I really like RStudio and Rodeo is basically RStudio for R. 
Rodeo is an installable app that runs natively (as a standalone application) on your desktop. It’s built with Electron, a cross-platform framework for building desktop apps with good ‘ole javascript and HTML.

Rodeo is light-weight. That is, there are now fancy features such as those in Spyder or PyCharm. The interface is nice and clean. You have a script part, an iPython console, an Environment (object inspector in Spyder), and areas for plots, directories, and so on. Rodeo may be for the RStudio users but if you looking for a lot of features such as does in RStudio; go for Spyder or PyCharm. Give it a try!

To summarize, I would like something that is a mix between Spyder and PyCharm. I like these two IDEs and I think you can use either one of them. If you are coming from MATLAB maybe Spyder is more familiar and the learning curve will be less steep compared to PyCharm. I think. however, that in the end PyCharm is the most powerful one.

måndag 25 april 2016

Five excellent Python video lectures!

In this post you will find 5 great videos containing lectures on how to carry out data analysis using Python. These seem to be part of a course called "Programming for Psychology in Python". It seems pretty awesome.

The array data type

The first video covers numpy and how to use numpy to create and handle arrays.

Creating figures

I think the heading is quite self-explaning; the second video covers the creation of figures. It is using a library called vuesz which I have never heard of. Will test it later!

Descriptive statistics   

Third, we get to know how to do descriptive stats using Python. Also more figures... But very interestingly a bootstrapping method in Python for getting confidence intervals! Great!

Inferential statistics

In this post we get to know how to to t-tests but also a simulation based approach for understanding false positives and multiple comparisons. Cool! Correlations and scatterplots.

Power Analysis

Again, self-explaining title. We get to learn how to simulate data to calculate power. Yes! We want to determine sample size! 

All of the above videos have a homepage in which you can read some and see Python code. Also, there are some exercises. Greate of you wanna learn stuff!

måndag 18 april 2016

The aim of this guide is to give both an introduction and to motivate the use of the Python programming language in research in the field of cognitive science. Past 10 years, we have seen a rapid development of scientific and numerical libraries in Python. In fact,  Python can now easily be used as a scientific and numerical computing environment and is a contender to propriety products such as MATLAB and Mathematica.  The goal of this guide is to put forward the areas of application and to highlight the advantages and appeals of using Python as the number one programming language in cognitive science.  Given the generality of the tools being discussed, it is  hoped  that this  guide  will  have  widespread  appeal  and relevance. That is, researchers in other fields may also find this guide informative.


Python can, as previously mentioned, be a strong contender with its choices (i.e., MATLAB).  MATLAB has for long been one of the favourite programming environments in cognitive science. Striking purposes of likeness in the middle of Python and Matlab are that both offer an intelligent  interactive array-processing  and  visualization  environment using  high-level  dynamic  programming  languages. Both are intended for quick prototyping and advancement. Both take into consideration consistent augmentation utilizing outer modules composed as a part of ordered languages like C/C++ and Fortran.

Python, be that as it may, incorporate that it is a broadly useful language whose application goes a long ways past numerical array-processing. Python is one of the main five programming languages right now being used all through the world. Python is a strikingly designed object-oriented language whose standard library is vast and extensive. Furthermore, Python is free open-source programming distributed according to an unrestricted software license. Similarly, its substantial arrangement of third-party modules and libraries are likewise, typically, released according to open-source programming licenses.

Numerical and Scientific Python


The essential Python language as presented in the past segment needs n-dimensional numerical arrays and the capacity to effortlessly plot and visualize information. These capacities, notwithstanding countless extraordinary reason investigative libraries are given by the Scipy/Numpy suite of modules. These libraries are consistently incorporated with jupyter to make a rich intelligent exhibit handling and representation environment, tantamount in usefulness to MATLAB and Mathematica.

Jupyter has a lot of really nice functions: Interactive superior parallel computing for clusters and multicore models, an online intuitive Notebook practically identical to that utilized as a part of Mathematica, sql-based searchable summon histories, in-line illustrations, and typical arithmetic with TEX-based yield. Markdown can be used to create reports.

PC based Experiments

PC based brain research and psychophysics analyses are presently verging on universal in cognitive science.  While these undertakings have been customarily taken care of by GUI-based projects like
e-prime, Presentation, and superlab , these projects don't take into account the adaptability and control that is frequently requested by researchers. While programming environments like Matlab are being utilized as a distinct options for GUI-based projects, MATLAB's special-purpose nature is not well suited to the non-numerical  programming  necessary  for  experimental  stimuli presentation and recording. Python, because of the generality of its language, have a broad pool of libraries for creating graphical interfaces (e.g. wx-python, pyGTK, pyQt), and computer game libraries (pyGame,  pyglet), Python takes into account significant adaptability and complexity in the outline test programming.  At present, there are no less than 5 Python-based stimuli presentation programs: PsychoPy, OpenSesame,  ExPyriment, vision-egg, and pyepl. Note, that both PsychoPy and OpenSesame offers GUI-based projects.

To conclude this post, Python can be used for many things. It is a general purpose language so you can, basically, do whatever you want. Althoug, R may be more common when it comes to statistics you can, of course, also analyze your data with Python. My last post cover some jupyter notebooks that teaches you analysis using Python: http://pythondataanalysis.blogspot.se/2016/04/great-resources-for-learning-how-to.html.
I will, however, return with more Python and data analysis-related stuff. Later!


söndag 17 april 2016

Great resources for learning how to program in Python

In this post I am going to list a couple of great resources for learning how to write code in Python. I start with a couple of iPython notebooks. If you are not familliar with iPython (called jupyter these days):
"Notebook documents (or “notebooks”, all lower case) are documents produced by the Jupyter Notebook App which contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc...)."
  • Poll aggregation, web scraping, plotting, model evaluation, and forecasting (Homework 1) (solutions)
I follow up these with more notebooks and they are more on general Python learning;
I end this post with this 3+ hours long video on data analysis with Pandas. This is a tutorial that introduces you to manipulating and analyzing large and small structured data sets.  Hope these resources are enough for now! I may update the list when I find more!
The first jupyter/iPython notebooks come from a Harvard course.

It contains homeworks, and solutions to these homeworks. By doing the homeworks you will be guided through a number of data analysis, mining, scraping, manipulation problems with Python and iPython/Jupyter notebook!

måndag 7 mars 2016

Introducing Python Data Analysis for Cognitive Science

Hey,
In this short post I just want to introduce my self. My name is Fredrik and I enjoy programming in Python. I have a backround in cognitive science. I graduated with a MSc in Cognitive Science. Nowadays I am mainly doing statistical computing using Python (sometimes in R). This blog will mainly be concerning my interests in cognitive science and programming data analysis. Python tutorials, link collections and other fun stuff!