UPDATE: Method below now correctly deals with contact db updates! -- JohnDudaAccount - 28 Jan 2006

Briefly, this page will outline how to set up pull syndication via rss feeds in Mir.

This method works in Mir_1_1 in cvs, if you aren't updated, this is not guaranteed to work.

The first thing you need to do is establish a source of feeds. While it is possible to manually enter feeds into producers.xml, this is not the most sustainable solution. A better one is to store the feeds in Mir's database as articles. This means that admins can add or change feeds, feeds can be temporarily disabled or commented upon internally.

So go ahead to the mir admin "advanced functions" and get to the article type page. Add an article type "localfeaturefeed", which will hold the feeds. Now in the grand tradition of retasking fields so prevalent in mir, let's establish some semantics for the way the feed will be represented in the db:

title => name of the site where the feed comes from
source/location => the url of the feed 
web address => the url of the site where the feed comes from
is_published => is an active feed
abstract => rss version
author => "manual" if we want to o verride whatever comes out of the contact db, blank otherwise

Notice you can repurpose other fields, to store things like latitude and longitude, but that makes the example unnecessarily complicated.

Now we can go ahead and just add feeds manually to the db, but let's say we want to pull stories from (a subset of) the indymedia network. We can populate the feed list automatically from the contact db:

Add the following to producers.xml (notice the line which selects only imc's in the united states, if you don't want this, take out that If):
 <producer name="pullfeedoffeeds" >
    <verbs>
      <verb name="doit" />
    </verbs>
    <body>
     <Log message="  Checking feeds from contact.indymedia.org" type="info" />
      <RSS key="feeditems" url="http://contact.indymedia.org/feature_feeds_rss.php"  />
        <Enumerate key="item" list="feeditems['rss:item']" >
       <Log message="  Checking feed from: ${item['rss:title']}" type="info" />
        <If condition="item['dc:coverage']=='united states'">
        <then>
          <Set key="present" value="0" />
          <Set key="present" value="1" />
            <Enumerate
              key="article"
              table="content"
              selection="title='${utility.escapeJDBCString(item['rss:title'])}' and to_article_type in (${articletype.localfeaturefeed})" limit="1">
              <Set key="present" value="1" />
              <If condition="article.creator=='manual'"><then><Set key="keep" value="1" ></then></If>
            </Enumerate>
          <If condition="present==0">
          <then>
            <CreateEntity
               key="article"
               table="content"
               date="config.now.formatted.yyyymmdd"
               publish_path="' '"
               to_publisher="'0'"
               is_produced="'0'"
               is_published="'1'"
               is_html="'0'"
               title = "item['rss:title']"
               source="item['rss:link']++' '"
               to_article_type="articletype.localfeaturefeed"
               content_data="' '"
               creator_main_url="item['dc:source']++' '"
               description="item['dc:format']"
               webdb_create="config.now"
               />
          </then>
          <else>
            <If condition="keep==0">
              <then>
                <UpdateEntity
                  key="article"
                  source="item['rss:link']++' '"
                  title = "item['rss:title']"
                  creator_main_url="item['dc:source']++' '"
                  description="item['dc:format']"
                  />
              </then>
            </If>
          </else>
        </If>
      </then>
     </If>
    </Enumerate>
  </body>
</producer>
This way of setting things up means you can periodically update your list of feeds, but override feed locations in specific instances, and maintain your list even if the contact db is offline.

The next thing to do is set up an article type to receive your syndicated features. Make an article type called "localfeature" for this. The bulk of the work will be done in a node called "Pull", which you need to add to producers.xml:


<nodedefinition name="Pull">
    <parameters>
      <string name="url"/>
      <string name="imcname"/>
      <string name="imcurl"/>
      <string name="articletype"/>
    </parameters>

    <definition>
      <Log message="Pulling 1.0 feed from ${imcname} at ${url}" type="info" />
      <RSS key="feeditems" url="${url}" encoding="UTF-8" />
      <Enumerate key="item" list="feeditems['rss:item']" >
        <Define key="languagecode" value="ot" />
        <If condition="item['dc:language']">
          <then>
            <Set key="languagecode" value="item['dc:language']"/>
          </then>
        </If>

        <If condition="item['dc:source']">
          <then>
            <Set key="origin" value="item['dc:source']"/>
          </then>
          <else>
            <Set key="origin" value="item.identifier"/>
          </else>
        </If>

        <If condition="item['dcterms:hasPart']">
          <then>
            <Set key="haspartimg" value="'<img align="right" src="'++
item['dcterms:hasPart'] ++'" />' " />
          </then>
          <else>
            <Set key="haspartimg" value="' '" />
          </else>
        </If>

        <Set key="present" value="0" />
        <Enumerate
          key="article"
          table="content"
          selection="source='${utility.escapeJDBCString(origin)}'" limit="1">
          <Set key="present" value="1" />
        </Enumerate>
        <If condition="present==0">
          <then>
            <Log message="  new entry from ${imcname}: ${item['rss:title']}" type="info" />
             <CreateEntity
               key="article"
               table="content"

               date="config.now.formatted.yyyymmdd"
               publish_path="' '"
               to_publisher="'0'"
               is_produced="'0'"
               is_published="'1'"
               is_html="'1'"

               source="origin"
               to_article_type="articletype"

               content_data="item['content:encoded']++' '"
               creator_main_url="imcurl"
               description="haspartimg++item['rss:description']++' '"
               to_language="languageCodeToId(languagecode)"
               title="item['rss:title']++''"
               webdb_create="config.now"
               creator="imcname"
               comment="'Taken from ' ++ imcname"
             />
          </then>
        </If>
      </Enumerate>
    </definition>
  </nodedefinition>

Notice here that we use dc:lang, if present, to set the language. You should have a language "ot" set up for language name "other" to receive articles without a language specified. Also notice the use of dc:source to identify not only where an article is coming from but it's identity(useful for preventing loops!), and that we assume articles are html, and that the feed is in UTF8, and is an RSS 1.0 feed. We could make the example more complicated by using different RSS nodes for different formats/charsets, but these assumptions hold pretty widely across the imc network. YMMV. Also, we could use an else when articles are already present to update the content, but again, that would complicate the example.

Finally, we tie together Pull and our list of feeds with the following producer:
 <producer name="getlocalfeatures">
   <verbs>
     <verb name="doit" />
   </verbs>
   <body>
      <Enumerate key="article" table="content" selection="to_article_type in(${articletype.localfeaturefeed}) and is_published=true"  order="webdb_create desc">
        <Log message="Pulling 1.0 feed from ${article.title} at ${article.source}" type="info" />
        <Pull
          url="${article.source}"
          articletype="${articletype.localfeature}"
          imcurl="${article.creator_main_url}"
          imcname="${article.title}"
        />
      </Enumerate>
    </body>
  </producer>

We're almost done! Now you need to do the obvious stuff like add these new producers to cron jobs, and modify your (for example) newswire producer to enumerate articles of type localfeature.

A final step is the following bit of trickery: Since we are pulling in HTML abstracts/content, we will also get IMG tags, which may be bigger than the size we want on our page. The solution assumes you have a knowledge of localizers in Mir. Start by localizing MirBasicProducerAssistantLocalizer.java:
public class MyProducerAssistantLocalizer extends MirBasicProducerAssistantLocalizer {

You'll need to copy the entire method that prints nodes for acceptable html and put it in your new localizer, possibly adding the other methods it calls if it fails to compile:
  private void print(Node node,StringWriter out) throws IOException{
You'll want to modify the case for ELEMENT_NODE as follows:

      case Node.ELEMENT_NODE:
            if (canOutput){
                out.write('<');
                String nodeName=node.getNodeName();
                out.write(nodeName);
                NamedNodeMap attrs = node.getAttributes();
                String src="";
                for ( int i = 0; i < attrs.getLength(); i++ ) {
                    String attrName=attrs.item(i).getNodeName();
                    if (checkAttr(attrName)){
                        if (attrName.equals("src") && nodeName.equals("img")){
                            src=attrs.item(i).getNodeValue();
                        }
                        out.write(' ');
                        out.write(attrName);
                        out.write("=\"");
                        out.write(attrs.item(i).getNodeValue());
                        out.write('"');
                    }
                }
                if (nodeName.equals("img") && (! mir.util.StringRoutines.performRegularExpressionSearch(src,".*(maillink|extlink|icon|thumb|small).*"))){

                    out.write(" class=\"unsafeimage\" ");
                }
                out.write('>');

            }
            NodeList children = node.getChildNodes();
            if ( children != null ) {
                int len = children.getLength();
                for ( int i = 0; i < len; i++ ) {
                    print(children.item(i),out);
                }
            }
            break;

What this basically does is look through any html in articles (not just localfeatures, but user-supplied as well), and add a class attribute "class='unsafeimage'"to any images, provided they don't match certain commonly used icon names across the imc network, which tend to be small enough to be ignored. Every other img will get resized by css, where you add something like the following to your css file:
.unsafeimage {
  height: 100px;
}
This does a pretty good job of making sure all images included with img tags in localfeature html don't break your page layout.

Finally, some dada imc installs leave their media placeholder text tags in their feeds. You'll want to strip them out, for example with a regex in the templates when you call the abstract of an article:
${utility.regexpreplace(anArticle.description_parsed,"#media[^#]+#","")}

-- JohnDudaAccount - 26 Feb 2005
Topic revision: r3 - 28 Jan 2006, JohnDudaAccount
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback