As you know, there has been a lot of contradictory statements from Google around how Penguin and Panda are run, is it real time, is it manual, how long does it take, when can you recover and so on.
In Friday's Google Hangout with John Mueller on Google+ John answered a question which may shed some light on why the answers Google has been giving some times seem contradictory. Let me share the exact transcription of the question and answer, which starts at 25:22:
Question:
How is the links disavow file submitted to Webmaster Tools being handled? Processed immediately, soon after or just on the next Penguin update?
Answer:
I guess the correct answer will be yes to all these variations.So first of all we have to process the file immediately when the upload, so we do that to double check that it doesn't have any glaring technical issues. That's something we have to process that immediately.
Then when we actually kind of use those links the disavow file is when we crawl those pages that are linking to your site. So we'll crawl a page that is linking to your site, we will see that there's a link to your site there but then we will check the disavow file and say oh well this website actually doesn't want that specific link counted and then we will take that link out.
So that kind of happens on going as we crawl the rest of the web and this is something we will see the effects in various algorithms that do look at links. So that's not only Penguin.
In Penguin however, its a web spam algorithm, it does also look at the links as well and if you're specifically kinda working on a problem that you see or that you suspect is related to Penguin then you do have to wait for that update as well.
So first you have to upload the disavow file and then we have to go and crawl those pages which might take a couple months depending on the pages and what we have to crawl there and then the Penguin algorithm has to do that.
So depending on which part you're looking at, those questions can be yes or no.
As you can see, Google's John Mueller is basically saying parts are real time and parts are not but with Penguin (and also currently with Panda), for you to recover, Google does need to run something manually and you need to wait for a refresh. The other aspects around that, the disavow file, your content updates, structural changes, how long it takes Google to pick up on those changes via their crawl, etc is a factor not just on Penguin or Panda but on many real time algorithms.
Here is the video:
Forum discussion at Google+.