Thursday, June 18, 2009 at 11:27 PM
Webmaster Level: All
We just added external resource loading to our Flash indexing capabilities. This means that when a SWF file loads content from some other file—whether it's text, HTML, XML, another SWF, etc.—we can index this external content too, and associate it with the parent SWF file and any documents that embed it.
This new capability improves search quality by allowing relevant content contained in external resources to appear in response to users' queries. For example, this result currently comes up in response to the query [2002 VW Transporter 888]:

Prior to this launch, this result did not appear, because all of the relevant content is contained in an XML file loaded by a SWF file.
To date, when Google encounters SWF files on the web, we can:
- Index textual content displayed as a user interacts with the file. We click buttons and enter input, just like a user would.
- Discover links within Flash files.
- Load external resources and associate the content with the parent file.
- Support common JavaScript techniques for embedding Flash, such as SWFObject and SWFObject2.
- Index sites scripted with AS1 and AS2, even if the ActionScript is obfuscated. Update on June 19, 2009: We index sites with AS3 as well. The ActionScript version isn't particularly relevant in our Indexing process, so we support older versions of AS in addition to the latest.


40 comments:
Great !
Thanks for sharing this valuable piece of information.
That's great news!!
Hi and great !
Could you send us example of Robots.txt text to block links ?
Stephane.
I notice the clickable links in flash videos such as YouTube embeds don't get crawled. Will this ever be changed? It seems like if someone feels a video is important enough to post the link to the source of that video would be completely relevant and should count as a link.
How does it work with dynamically loaded XML assets? i.e. Flash loads config.xml file (of which the URL is built into SWF movie) then depending on user interaction it dynamically loads other XML assets whose URL paths are stored in config.xml file.
Would they be indexed as well?
Thats good news. I was not using flash because of this reason in my cricket website http://www.thecricfanclub.com
But I can use it now :)
I think this is huge for flash.
Great news!
But and about action script 3 ?
Is it not correct interpreted and indexed yet ?
Hmmm, will the indexing result in the external file being the search result or the swf that is using the external file?
Good question from Tomec about indexing db connection xml or other undisclosed information.
Cheers, 1CallService.com
"Index sites scripted with AS1 and AS2"-So what about AS3?
Does this work with all version of Flash or just more recent ones?
Not much use if AS3 is not supported!
An example of a robots.txt file blocking some .swf and Flash related assets would be as follows:
User-Agent: *
Disallow: /website.swf
Disallow: /data/myData.xml*
Allow: /
In fact, I had to use a similar one a couple of months ago when my xml files gained more pagerank than my very own Flash site.
On the other hand, I didn't like the user accessing directly the swf files. My SWF files need the embedding parameters of the SWFObject in order to work properly. So nobody should get straight to the .swf, or she would think that it's a broken site.
That's why filtering swfs and external data files in robots.txt becomes so important.
Finally! Thks for the Feedback.
Thanks for the example E.S.V - this is good news! My flash site http://benrandall.co.uk/bridging.aspx might actually start appearing in search results now!!
Google is already indexing AS3 content using the Ichabob player. I think the article was implying that it is also indexing AS1 and AS2 content.
Now this is a great news as all the website I build use an external preloader to load my main content. Now that content can be indexed.
In general this is a positive step for Adobe and Google, but it's still a long way from having a Flash RIA be as penetrable by Google's crawler as HTML, CSS and JavaScript are. We at Socrata previously built our social data discovery application entirely in Flex/Flash, but in part due to the ongoing SEO challenges we rebuilt the site in HTML, CSS and JavaScript. Socrata is a site for people to find public data and datasets, so it was important that the underlying content be indexed by Google and other search engines. Since relaunching the site the SEO improvement has been substantial and was almost immediate.
Does it index deep content. Let's say that a user click on a flash link/button which in turn load some XML external content, does it get index?
Also, what happen if in my external xml content there are lot of content which is not meant for user display, would it get indexed?
@Tomek: Yes, the XML assets in your example could be indexed as well.
This is great! Thank you! :D
Great news. Now the matter is to hide unwanted contents from google...
I'm interested to know more about how it indexes URLs found in an XML file. For example, if a XML file contained URLs for external images, would the paths have to be absolute? Or does the indexing have the capability to follow relative paths using the parent document as the root?
Yes, the robots are crawling as3 now - I can verify that. Here's what I've experienced with Google indexing our flash stuff:
1. don't break apart your text - but it doesn't seem to matter if it's HTML text format or not.
2. keep all your "accessible" text in Movie clips - NOT graphic clips.
3. Google seems to have reduced some of the significance of flash text on home pages - previously Google was returning search results for the content of our swf file - even though that content was minimal on the page, and not directly related to the rest of the site content (client testimonials). Looks like that's been fixed.
3. I would ALWAYS make external links within a flash file absolute, including the http:// - unless you're pulling data back, like an mp3 file, or image. Those I would block with robots.txt
4. I'm sure they're filtering out the config.xml files, since that's included in every swf. But I'll test that out to be sure.
5. links in videos probably won't be crawled if it's in .flv format, like youtube. However, if you have a .swf loading flv's into it - any links in the swf should be recognized. I'm assuming you can "nofollow" those as well.
6. I still have NO intentions on building sites entirely in flash - since you can go ahead and say goodbye to any mobile visitors, which happens to be the biggest growth right now.
7. still, this IS great.
You can prevent google from indexing sensitive strings in your AS code by encrypting the strings either manually, or by using secureSWF's literal strings encryption feature.
Disclosure: I work for Kindisoft.
Wow! I can't wait to see how it changes searches on my photography flash site. I'm so tired of being on page 3-15 on google search
http://www.BorkgrenPhoto.net
There was this site g2000 and it has some content loading from xml data but it seems that google search didn't return the respective pages... E.g. g2000 ang mo kio hub but the following result was not given.. http://www.g2000.com.sg/ss09/#/store/2/1...
Does that mean that there is a syntax / structure I would need to follow?
thanks for sharing
What about AMF content?
Will deep linking with SWFAddress be incorporated
I knew it was coming but now I can tell my clients it's here, thanks!
OK so google has indexed text in our flash photography .swf, which is neat BUT the google link to said indexed content only brings up the .swf file - rather than loading the html file the .swf is embeded in? Is it just me or is this a bad way to do it? As now the .swf file loads oversized in the browser which means it scales up over 100% making the jpegs look awful. Is there something i can do to prevent this?
Do you make webservice calls and index the returning XML as well?
When you say "We click buttons and enter input, just like a user would," is that limited to Flash components? If I create my own drop down menu from scratch with AS and drawing objects, can your robots really access that?
And how do you decide what type of text gets entered? You guys haven't come up with some kind of AI that's eventually going to subjugate us, have you?
Really cute and all but what if you have a complete site in flash and it can "read" all the external files. What good does that do when one flash-file contains 80 pages and the content is put in an xml-file.
Does it then link to the flash-page? Because that does really no good for searches, then I still don't know which of the 80 pages contains the information I am looking for.
Or am I seeing this all wrong? Anyone who knows?
Great! Relevant! important! :)
Thanx!
Lets index it all.
Index flash as3 developers too ;)
dsaliberti
Nice one chiefs!
Let me ask this, this discussion is about indexing files within an swf. I'm looking for an answer for a similar question. If I embed my swf content inside an xhtml file, go to the header add the title of the site and regular keywords wouldnt Google see it?
I'm also interested about the AMF content? is it indexed too?
Post a Comment