Get RSS Fullfeed - Trick
>> Tuesday, March 29, 2011
I noticed there are many people who are clueless about Yahoo Pipes or can't figure it out. First off, let me explain what Pipes is. "Pipes is a powerful composition tool to aggregate, manipulate, and mash-up content from around the web." Some people will use Pipes to takes several RSS feeds and combine them into 1 feed for use with their website. Others might use Pipes to take a RSS feed that only displays a summary and convert the feed to display the entire content of the article or news story. Or you can use both techniques to aggregate or combine multiple summary feeds into a "content super feed" (I'm trademarking that LOL). For this tutorial, I will explain the second technique.
Step1
Sign up if you haven't already: http://pipes.yahoo.com
Step 2
Click the "Create a pipe" button on the main page.
Step 3
On the left menu, drag the "Fetch Feed" module into the grid. At this point you need to add your RSS feed that you want to convert into the "URL" text-box of this module. Use the XML version of your feed. For this example, I will use a health feed from the Washington Post:
Code:
feed://feeds.washingtonpost.com/wp-dyn/rss/health/index_xml
The purpose of this module is to tell Pipes what feed we are going to be working with. Simple!
Step 4
Drag the "Loop" module into the grid. The "Loop" module is located in the Operators submenu on the left menu. Click the little arrow to the left of "Operators" to display the submenu. After you have done this, you will notice there is another grid located inside of this "Loop" module. We will drag another module here in the next step.
The purpose of this module is to loop through each RSS item from the feed we specified in the previous module. For example, our feed has 100 stories in it. This module is going to loop through each story, 1 at a time, and do what we tell it to do.
Step 5
Drag the "Fetch Page" module into the "Loop" grid from the previous step. You should see a red box outlining the grid of the "Loop" module when you are hovering correctly. Now select the first dropdown next to "URL" and select item.link or type that in exactly.
The purpose of this module will basically look at the source page with in the RSS feed and strip out the full story. Since we threw this module into a "Loop" module, this will be looped for each RSS item(story) and grab the full story from the source page.
Step 6
Drag the "Regex" module onto the main grid. This module is located in the "Operators" submenu.
The purpose of this module is to manipulate the story, link, or even title. This module is only optional. Some examples I've used are to strip all links from a story. Or to remove the season and episode number from the title on a Hulu feed. Or to change every instance of the word Blackhat and make it output the word Whitehat instead. There are many regex examples out there on the web.
Step 7
Connect these all up. Currently these modules are all separate and there is no data flow from each module to the next. Data comes in the top of the module, filters through the module, then exits the module on the bottom. So click and hold the little circle at the bottom of the first module("Fetch Feed") and drag it to the top circle of the "Loop" module and release the click. You should see a connection or Pipe between the two modules. Now connect the bottom of the "Loop" module to the top of the "Regex" module. Finally, connect the bottom of the "Regex" module to the top of the "Pipe Output"
Now that the design is setup, you should test your connections. If you click on the "Pipe Output" module on the grid, it should turn orange. And at the bottom of the webpage, it should be generating the feed. After completion, you should see a list of your RSS stories in the bottom pane. If not, or there is an error, click the Refresh button in the bottom panes a few times. If you do not see any items then you either forgot to put the URL to your feed from Step 3 or you messed up your connections in Step 7. Try again or try another feed to test. Once you see your items in the bottom pane, you may continue.
Step 8
We need to find markers or characteristics on the source pages to pull out the full story. Open up your original RSS feed in a new browser or tab. Click on the first story so we are now on the source website reading the original story. Now on your browser, you need to view the source of the webpage. We need to find the story in this source. We also need to find a unique marker just before and just after the story. Here's a snippet of an article from the feed:
Code:
<!-- End New Comments Box: Common -->
<div class="sidebarhack"><b></b></div>
<div class="sidebar">
<div class="seo-header"><div style="float:left;padding-left:7px;">Who's Blogging</div><div style="float:right;padding-right:5px;"><a href="http://www.sphere.com/" style="padding:0;"><img src="http://media3.washingtonpost.com/wp-srv/images/logo_sphere_powered101x13.gif" border="0" width="101" height="13"/></a></div><div style="clear:both;"></div></div>
<div class="sidebarcontent">
» <a class="iconsphere" title="Related Blogs & Articles" onclick="return Sphere.Widget.search();" href="http://www.sphere.com/search?q=sphereit:http://www.washingtonpost.com/wp-dyn/content/article/2010/01/04/AR2010010402752.html" rel="nofollow">Links to this article</a>
</div>
</div>
</div>
<div id="ad_links_inner" style="display:none"><script type="text/javascript" src="http://media.washingtonpost.com/wp-srv/ad/quigo/article_inner.js"></script></div>
</td></tr></table>
<FONT SIZE="2">
<div id="byline">By <a href="http://projects.washingtonpost.com/staff/articles/rachel+saslow/" title="Send an e-mail to Rachel Saslow">Rachel Saslow</a></div>
Washington Post Staff Writer
<br/>
Tuesday, January 5, 2010
</FONT><P>
</div>
<div id="article_body" style="padding-left:10px;">
<span id="aptureStartContent"></span>
<p>
Scientists may have created a vaccine against cocaine addiction: a series of shots that changes the body's chemistry so that the drug can't enter the brain and provide a high.
</p>
<div id="body_after_content_column">
<p>
The vaccine, called TA-CD, shows promise but could also be dangerous; some of the addicts participating in a study of the vaccine started doing massive amounts of cocaine in hopes of overcoming its effects, according to Thomas R. Kosten, the lead researcher on the study, which was published in the Archives of General Psychiatry in October.
</p>
<p>
"After the vaccine, doing cocaine was a very disappointing experience for them," said Kosten, a professor of psychiatry and neuroscience at Baylor College of Medicine in Houston.
</p>
<p>
Nobody overdosed, but some of them had 10 times more cocaine coursing through their systems than researchers had encountered before, according to Kosten. He said some of the addicts reported to researchers that they had gone broke buying cocaine from multiple drug dealers, hoping to find a variety that would get them high.
</p>
<p>
Of the 115 addicts in the study, 58 were given the vaccine, administered in a series of five shots over 12 weeks, while 57 received placebo injections. Six people dropped out before the end of the study. The researchers recruited the participants from a methadone-treatment program in West Haven, Conn., which made it possible to track them for the full 24 weeks of the study. The patients were addicted to cocaine and heroin; TA-CD is designed to work only on cocaine, including the crack form of the drug.
</p>
<p>
Like disease vaccines, TA-CD stimulates a person's immune system to produce antibodies. Of those who received all five vaccine injections, 38 percent reached antibody levels that were high enough to dull the effects of the drug. The antibodies stayed active for eight to 10 weeks after the last shot.
</p>
<p>
In the high-antibodies group, 53 percent stayed off cocaine more than half the time once they had built up immunity. That compares with 23 percent of those who produced fewer antibodies. The researchers monitored cocaine use through regular urinalysis.
</p>
<p>
"In this study, immunization did not achieve complete abstinence from cocaine use," Kosten said. "Previous research has shown, however, that a reduction in use is associated with a significant improvement in cocaine abusers' social functioning and thus is therapeutically meaningful."
</p>
<p>
About a quarter of those who received the vaccine did not make sufficient antibodies at all; Kosten isn't sure why.
</p>
<p>
"That's the million-dollar question," said Margaret Haney, a professor of clinical neuroscience at Columbia University Medical Center, who is also researching the cocaine vaccine though she was not involved in Kosten's study.
</p>
<p>
In October, the journal Biological Psychiatry published online an article by Haney that also tested the effects of TA-CD.
</p>
<p>
Through newspaper ads, Haney had recruited 15 cocaine-dependent men to participate in her study. (Only 10 stayed to the end.)
</p>
</div>
<span id="aptureEndContent"></span>
In the beginning of the story, you should see the text "<span id="aptureStartContent"></span>". This is unique to the page meaning there is only 1 instance of it on the page source AND it is on every news story on this feed. This will be our beginning marker. Hey look at the end, Washington Post is handing us their shit on a silver platter "<span id="aptureEndContent"></span>". This will be our end marker. Now back to Yahoo Pipes.
Step 9
Within the "Fetch Page" module, you will see an area that says "Cut content from:" and this first box will be the beginning marker (<span id="aptureStartContent"></span>) and the box to the right of that will be your end marker(<span id="aptureEndContent"></span>)
Step 10
Within the "Fetch Page" module, ensure that "assign" is selected and NOT "emit". Ensure the dropdown says "first" and NOT "all". To the right of that, change the dropdown for "results to" to item.description. This is where the full content is swapped with the summary on your original RSS feed.
Step 11
Almost there This is an optional step. If you are happy with your out put then skip this step. But you MAY want to strip links out of your story that may be inserted such as adds or reference links. You don't want these on your blog. Do you? Within the "Regex" module, add a rule by clicking the plus sign in the module. Select item.description.content in the first box. This is the item that we are editing. Paste into the "replace" box the following
Code:
<[/\]?[a]\s+[^>]*>
Don't ask how to read regex because thats a whole tutorial on its own.
Step 12
Save your Pipe by clicking the Save botton at the top right of your window. And name it.
Step 13
Lets get the NEW and IMPROVED feed url. Click "Run Pipe..." at the top of the page. Now click the "Get as RSS" link and you should see your new RSS feed. Copy that url into your favorite autoblogging plugin and you will now be ripping full news story's instead of excerpts