Posts Tagged Calibre

Calibre Recipe: il Sole24ore

Today’s Calibre Recipe is: ‘il Sole24ore ‘

il Sole, printed on a orangey paper similar to the FT, is the most popular daily newspaper in Italy dedicate mainly to economy and finance and it is the third most diffused paper in Italy after the il Corriere della sera and la Repubblica

Click here to view the code

#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = 'Lorenzo Vigentini & Edwin van Maastrigt'
__copyright__ = '2009, Lorenzo Vigentini  and Edwin van Maastrigt '
__description__ = 'Financial news daily paper - v1.02 (30, January 2010)'

'''

http://www.ilsole24ore.com/

'''

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
import mechanize

temp_files = []

class ilsole(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini & Edwin van Maastrigt'
    description   = 'Financial news daily paper - v1.02 (30, January 2010)'

    cover_url      = 'http://www.ilsole24ore.com/img2009/header/t_logosole.gif'
    title          = u'il Sole 24 Ore '
    publisher      = 'italiaNews'
    category       = 'News, finance, economy, politics'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 2
    max_articles_per_feed = 50
    use_embedded_content  = False

    remove_javascript     = True
    no_stylesheets        = True

    def get_article_url(self, article):
        return article.get('id', article.get('guid', None))

    def print_version(self, url):
        link, sep, params = url.rpartition('?')
        return link.replace('.shtml', '_PRN.shtml')

    keep_only_tags     = [
                            dict(name='div', attrs={'class':'txt'})
                        ]
    remove_tags = [dict(name='br')]

    feeds          = [
                       (u'Prima pagina', u'http://www.ilsole24ore.com/rss/primapagina.xml'),
                       (u'Norme e tributi', u'http://www.ilsole24ore.com/rss/norme-tributi.xml'),
                       (u'Finanza e mercati', u'http://www.ilsole24ore.com/rss/finanza-mercati.xml'),
                       (u'Economia e lavoro', u'http://www.ilsole24ore.com/rss/economia-lavoro.xml'),
                       (u'Italia', u'http://www.ilsole24ore.com/rss/italia.xml'),
                       (u'Mondo', u'http://www.ilsole24ore.com/rss/mondo.xml'),
                       (u'Tecnologia e business', u'http://www.ilsole24ore.com/rss/tecnologia-business.xml'),
                       (u'Cultura e tempo libero', u'http://www.ilsole24ore.com/rss/tempolibero-cultura.xml'),
                       (u'Sport', u'http://www.ilsole24ore.com/rss/sport.xml'),
                       (u'Professionisti 24', u'http://www.ilsole24ore.com/rss/prof_home.xml')
                     ]

    extra_css = '''
                html, body, table, tr, td, h1, h2, h3, h4, h5, h6, p, a, span, br, img {margin:0;padding:0;border:0;font-size:12px;font-family:Arial;}
                .linkHighlight {color:#0292c6;}
                .txt {border-bottom:1px solid #7c7c7c;padding-bottom:20px;text-align:justify;}
                .txt p {line-height:18px;}
                .txt span {line-height:22px;}
                .title h3 {color:#7b7b7b;}
                .title h4 {color:#08526e;font-size:26px;font-family:"Times New Roman";font-weight:normal;}
                '''

Download the file here: Calibre recipe: il sole24ore


Tags: , , ,

Calibre Recipe: quotidiano.net

Today’s Calibre Recipe is: ‘il Quotidiano.net ‘

Quotidiano.net is the result of a consortium of local newspapers (La Nazione, il Resto del Carlino, il Giorno) under the heading of the il Sole24ore, feeding the italianews network and part of the Montrif publishing group.

Click here to view the code

#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = 'Lorenzo Vigentini'
__copyright__ = '2009, Lorenzo Vigentini '
__version__     = 'v1.01'
__date__        = '10, January 2010'
__description__ = 'Italian News Agency'

'''

http://www.quotidianonet.ilsole24ore.com/

'''

from calibre.web.feeds.news import BasicNewsRecipe

class panorama(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini, based on Darko Miletic'
    description   = 'Italian News Agency'

    cover_url      = 'http://quotidianonet.ilsole24ore.com/file_generali/img/logo_quotidianonet-top.gif'
    title          = u'Quotidiano Net '
    publisher      = 'italiaNews'
    category       = 'News, politics, culture, economy, general interest'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 7
    max_articles_per_feed = 100
    use_embedded_content  = False
    recursion             = 10    

    remove_javascript = True

    keep_only_tags     = [dict(name='div', attrs={'class':'box_contenuto articolo'})]

    remove_tags        = [
                            dict(name=['object','link']),
                            dict(name='div',attrs={'class':['post-meta','sharing-tools','related','comments','prev-next','box_contenuto adsense']}),
                            dict(name='div',attrs={'id':['strumenti','related-posts','footer','inline_boxes','inline_boxes_header','inline_boxes_body','bottom']}),
                            dict(name='span',attrs={'class':'titolosezione default'})
                         ]

    feeds          = [
                       (u'Prima pagina', u'http://quotidianonet.ilsole24ore.com/rss/home.xml'),
                       (u'Cronaca', u'http://quotidianonet.ilsole24ore.com/rss/cronaca.xml'),
                       (u'Economia', u'http://quotidianonet.ilsole24ore.com/rss/economia.xml'),
                       (u'Esteri', u'http://quotidianonet.ilsole24ore.com/rss/esteri.xml'),
                       (u'Politica', u'http://quotidianonet.ilsole24ore.com/rss/politica.xml'),
                       (u'Salute', u'http://quotidianonet.ilsole24ore.com/rss/salute.xml'),
                       (u'Tecnologia', u'http://quotidianonet.ilsole24ore.com/rss/tecnologia.xml'),

                     ]

or Download the file here: Calibre recipe – quotidiano.net



Tags: , , ,

Calibre Recipe: il Corriere della sera

Today’s Calibre Recipe is: ‘ il Corriere della sera’

il Corriere is the most popular daily newspaper in italy with an average of over 620000 copies sold every day and a long history (first copy published in 1876). The main office is in Milan and It is published by the RCS group.

Click here to view the code

#!/usr/bin/env  python
__license__     = 'GPL v3'
__author__      = 'Lorenzo Vigentini, based on Darko Miletic'
__copyright__   = '2009, Darko Miletic , Lorenzo Vigentini '
__version__     = 'v1.01'
__date__        = '10, January 2010'
__description__ = 'Italian daily newspaper'

'''

http://www.corriere.it/

'''

from calibre.web.feeds.news import BasicNewsRecipe

class ilCorriere(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini, based on Darko Miletic'
    description   = 'Italian daily newspaper'

    cover_url      = 'http://images.corriereobjects.it/images/static/common/logo_home.gif?v=200709121520'
    title          = u'Il Corriere della sera '
    publisher      = 'RCS Digital'
    category       = 'News, politics, culture, economy, general interest'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 1
    max_articles_per_feed = 100
    use_embedded_content  = False
    recursion             = 10    

    remove_javascript = True
    no_stylesheets = True

    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        , '--ignore-tables'
                        ]

    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_tables=True' 

    keep_only_tags = [dict(name='div', attrs={'class':['news-dettaglio article','article']})]

    remove_tags = [
                   dict(name=['base','object','link','embed']),
                   dict(name='div', attrs={'class':'news-goback'}),
                   dict(name='ul', attrs={'class':'toolbar'})
                  ]

    remove_tags_after = dict(name='p', attrs={'class':'footnotes'})

    feeds = [
             (u'Ultimora'  , u'http://www.corriere.it/rss/ultimora.xml'  ),
             (u'Editoriali', u'http://www.corriere.it/rss/editoriali.xml'),
             (u'Cronache'  , u'http://www.corriere.it/rss/cronache.xml'  ),
             (u'Politica'  , u'http://www.corriere.it/rss/politica.xml'  ),
             (u'Esteri'    , u'http://www.corriere.it/rss/esteri.xml'    ),
             (u'Economia'  , u'http://www.corriere.it/rss/economia.xml'  ),
             (u'Cultura'    , u'http://www.corriere.it/rss/cultura.xml'  ),
             (u'Scienze'   , u'http://www.corriere.it/rss/scienze.xml'   ),
             (u'Salute'    , u'http://www.corriere.it/rss/salute.xml'    ),
             (u'Spettacolo', u'http://www.corriere.it/rss/spettacoli.xml'),
             (u'Cinema e TV', u'http://www.corriere.it/rss/cinema.xml'   ),
             (u'Sport'     , u'http://www.corriere.it/rss/sport.xml'     )
            ]

or Download the file here: Calibre recipe – ilCorriere



Tags: , , ,

Writing ‘in the cloud’

The line between e-books and other Internet writing has been diminishing over time, with commercial e-self-publishing sites such as Smartbooks or Scribd, independent story hosts such as Shifti.org, and fanfic downloaders and converters that turn stories posted on fan-fiction hosts into formatted e-books.

One of today’s big buzz-words is “the cloud”, referring to the practice of storing data on remote Internet servers for access anywhere.

The e-book sites I mentioned above are places where reading material sits in the cloud—but writing is moving to the cloud, too, to let people work on documents remotely without the risk of losing or forgetting a USB stick or other media. And reading and writing are just two sides of the same textual coin.

So, in this post I am going to look at four cloud-based tools that can be used for remote writing. We have mentioned or covered some of these in different ways already, as some of them can be used for reading as well.

I am sure there are other such tools out there, too. If I miss mentioning your favorite, please suggest it in the comments!

Google Docs

Google Docs was inarguably the earliest writing-in-the-cloud service. This service allows you to create or upload Word documents into the cloud and write on them yourself, or share them with collaborators so that you can write on them together (though not in real-time the way you can with EtherPad, below).

There are even third-party iPhone and desktop apps for Docs now (though both require payment to use), as well as an extension for Google Chrome.

The benefit of Docs is that you can access and use it from practically any computer that has Internet access, and it uses the Word Doc format that has more or less become a standard. The downside is that you need that Internet access to write—a problem if you are on a train or bus or otherwise bereft of signal.

(Of course, you could download the document from Docs to your computer, then upload it again when you are finished, but if you were going to do that you could simply have carried it on a USB stick instead.)

EtherPad

We have mentioned this cloud collaborative writing service, which was itself recently bought by Google, a number of times before. Although the main EtherPad service will likely be going away by the end of March, a number of offshoots have sprung up since AppJet and Google opened the EtherPad source.

In addition to being a powerful tool for collaboration, EtherPad can also be used as a simple tool for solo cloud writing, like Google Docs. It even does italic and bold emphasis, and can import or export to a variety of formats including plain text, HTML, and Word Doc.

It does have the same drawback as Docs, however, that one needs Internet access to use it—and if net access hiccups while writing, you may lose the last sentence or so of text you wrote when you reconnect.

Also, unlike most of the other services, there is no access protection method apart from obscurity. If someone knows the URL you use to access a given pad, he can read and change it with impunity.

Evernote

The cross-platform note-taking, web-clipping database Evernote has been around for a while, and I have been using it since it was in beta. It was recently mentioned by technology review maven Walter S. Mossberg, who called it a “digital file cabinet you can bring with you anywhere.”

Evernote is quite useful for snapping photos of things you want to remember (such as business cards) and uploading or emailing them into the database. It can be used for clipping interesting things from webpages—a great tool for writers who do a lot of research. But as Jeff Kirvin notes, it is also great for writing drafts, because it syncs everything into the cloud.

Unlike Docs and EtherPad, Evernote maintains a local copy of the database on each computer on which the application is installed—synchronizing periodically or manually with the database in the cloud. It has applications for Windows and Macintosh operating systems, as well as a number of smartphones. (Smartphone versions may or may not keep the database locally as well; the iPhone version recently did add this capability.)

Computers for which a version is not available (such as those running Linux) may still access the database remotely via the web—so if you write on a Linux computer, you still have the same net-required problem as the above choices.

Dropbox

I mentioned the Dropbox file-syncing service a couple of months ago in reference to setting up a cloud-based Calibre catalog that could be accessed from anywhere with Stanza. However, it only recently occurred to me that Dropbox could be used as a replacement for a USB stick in terms of keeping writing documents synchronized between my desktop and my laptop.

Dropbox works by setting up a shared folder on each computer (Windows, OS X, or Linux) where it is installed. Any changes made to the file system in this folder are automatically synchronized to the cloud as they are made, and then synchronized from the cloud on other devices.

Thus, Dropbox can be used to share files between work and home computers and between desktop and laptop. It can also be used to share files publicly, or to specific other people. There is even a Dropbox client for the iPhone.

With Dropbox, it does not matter what software you use to create the files; I am using it right now for some OpenOffice Writer documents. As the file is saved, it is synced, and I only need Internet access on my laptop long enough to download the changes. It will sync back up as soon as I have access again.

Dropbox’s one major drawback is that file uploads can be incredibly slow—5K per second or even slower. Dropbox is stored on Amazon’s S3 system, which seems to be perpetually short on inbound bandwidth. So, if you are planning to shift around large mp3 files, synchronization could take several hours.

Dropbox provides 2 gigabytes of on-line storage for free (2.25 gigabytes if you use my referral link), and can be expanded to 5 gigabytes at a rate of .25 gigabyte per referral. Additional storage can be purchased on a monthly basis—but for storing text or word documents in progress, 2 gigs should be more than sufficient.

(Cartoon by Andrew Weldon taken from this interview.)

Tags: , , , , , , , , , , , , , , , , ,

Calibre Recipe: la Gazzetta dello Sport

Today’s Calibre Recipe is: ‘la Gazzetta dello sport ‘

la Gazzetta is the most popular daily italian newspaper dedicated to sports. It is highly recognizable as it is printed on pink paper and features in most bars and cafes in Italy. It is published by the Rcs MediaGroup

Click here to view the code

#!/usr/bin/env  python
__license__     = 'GPL v3'
__author__      = 'Lorenzo Vigentini'
__copyright__   = '2009, Lorenzo Vigentini '
__version__     = 'v1.02'
__date__        = '10, January 2010'
__description__ = 'Sport news from the most read sport newspaper in Italy'

'''www.gazzetta.it'''

from calibre.web.feeds.news import BasicNewsRecipe

class laGazzetta(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini'
    description   = 'Sport news from the most read sport newspaper in Italy'

    cover_url      = 'http://www.gazzetta.it/primapagina/images/prima_pagina_grande.png'
    title          = 'La Gazzetta dello Sport '
    publisher      = 'RCS Digital'
    category       = 'Sport News'         

    language       = 'it'
    encoding       = 'cp1252'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 2
    max_articles_per_feed = 20
    use_embedded_content  = False
    recursion             = 10   

    remove_javascript = True
    no_stylesheets = True

    keep_only_tags = [ dict(name='div', attrs={'id':'articolo'})]

    remove_tags = [
                dict(name='ul',attrs={'id':['service-toolbar','sections-menu']}),
                dict(name='div',attrs={'id':['header','rightcol','sponsored','vxFlashPlayer','footer','print-box']}),
                dict(name='iframe',attrs={'id':'mirago-feed'}),
                dict(name='a',attrs={'id':'commenta-up'}),
                dict(name='cite',attrs={'class':['signature','parag-title']}),
                dict(name='a',attrs={'class':['last-comment','button-bold2']}),
                dict(name=['base','object','link','a','script','noscript'])
            ]

    extra_css      = '''
                        h1 {font: sans-serif large;}
                        h2 {font: sans-serif medium;}
                        h3 {font: sans-serif small;}
                        h4 {font: sans-serif bold small;}
                        p  {font:10pt helvetica}
                        dd {font:8pt helvetica}
                      '''

    feeds       = [
                   (u'Calcio',u'http://www.gazzetta.it/rss/Calcio.xml'),
                   (u'Formula 1',u'http://www.gazzetta.it/rss/Formula1.xml'),
                   (u'Motomodiale',u'http://www.gazzetta.it/rss/Motomondiale.xml'),
                   (u'Motori',u'http://www.gazzetta.it/rss/Motori.xml'),
                   (u'Ciclismo',u'http://www.gazzetta.it/rss/Ciclismo.xml'),
                   (u'Basket',u'http://www.gazzetta.it/rss/Basket.xml'),
                   (u'Tennis',u'http://www.gazzetta.it/rss/Tennis.xml'),
                   (u'Pallavolo',u'http://www.gazzetta.it/rss/Pallavolo.xml'),
                   (u'Vela',u'http://www.gazzetta.it/rss/Vela.xml'),
                   (u'Atletica',u'http://www.gazzetta.it/rss/Atletica.xml'),
                   (u'Altri Sport',u'http://www.gazzetta.it/rss/Sport_Vari.xml')
                 ]

    def print_version(self,url):
        segments = url.split('/')
        basename = '/'.join(segments[:3])+'/'
        subPath= '/'.join(segments[3:7])+'/'
        articleURL=(segments[len(segments)-1])[:-6]
        myArticleSegs=articleURL.split('.')
        myArticle=myArticleSegs[0]
        printVerString=myArticle+ '_print.html'
        myURL = basename + subPath + printVerString
        print 'this is the url: ' + myURL
        return basename + subPath + printVerString

or Download the file here: Calibre recipe – LaGazzetta



Tags: , , ,