Archive for category Daily Calibre Recipe

Calibre Recipe: il Sole24ore

Today’s Calibre Recipe is: ‘il Sole24ore ‘

il Sole, printed on a orangey paper similar to the FT, is the most popular daily newspaper in Italy dedicate mainly to economy and finance and it is the third most diffused paper in Italy after the il Corriere della sera and la Repubblica

Click here to view the code

#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = 'Lorenzo Vigentini & Edwin van Maastrigt'
__copyright__ = '2009, Lorenzo Vigentini  and Edwin van Maastrigt '
__description__ = 'Financial news daily paper - v1.02 (30, January 2010)'

'''

http://www.ilsole24ore.com/

'''

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
import mechanize

temp_files = []

class ilsole(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini & Edwin van Maastrigt'
    description   = 'Financial news daily paper - v1.02 (30, January 2010)'

    cover_url      = 'http://www.ilsole24ore.com/img2009/header/t_logosole.gif'
    title          = u'il Sole 24 Ore '
    publisher      = 'italiaNews'
    category       = 'News, finance, economy, politics'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 2
    max_articles_per_feed = 50
    use_embedded_content  = False

    remove_javascript     = True
    no_stylesheets        = True

    def get_article_url(self, article):
        return article.get('id', article.get('guid', None))

    def print_version(self, url):
        link, sep, params = url.rpartition('?')
        return link.replace('.shtml', '_PRN.shtml')

    keep_only_tags     = [
                            dict(name='div', attrs={'class':'txt'})
                        ]
    remove_tags = [dict(name='br')]

    feeds          = [
                       (u'Prima pagina', u'http://www.ilsole24ore.com/rss/primapagina.xml'),
                       (u'Norme e tributi', u'http://www.ilsole24ore.com/rss/norme-tributi.xml'),
                       (u'Finanza e mercati', u'http://www.ilsole24ore.com/rss/finanza-mercati.xml'),
                       (u'Economia e lavoro', u'http://www.ilsole24ore.com/rss/economia-lavoro.xml'),
                       (u'Italia', u'http://www.ilsole24ore.com/rss/italia.xml'),
                       (u'Mondo', u'http://www.ilsole24ore.com/rss/mondo.xml'),
                       (u'Tecnologia e business', u'http://www.ilsole24ore.com/rss/tecnologia-business.xml'),
                       (u'Cultura e tempo libero', u'http://www.ilsole24ore.com/rss/tempolibero-cultura.xml'),
                       (u'Sport', u'http://www.ilsole24ore.com/rss/sport.xml'),
                       (u'Professionisti 24', u'http://www.ilsole24ore.com/rss/prof_home.xml')
                     ]

    extra_css = '''
                html, body, table, tr, td, h1, h2, h3, h4, h5, h6, p, a, span, br, img {margin:0;padding:0;border:0;font-size:12px;font-family:Arial;}
                .linkHighlight {color:#0292c6;}
                .txt {border-bottom:1px solid #7c7c7c;padding-bottom:20px;text-align:justify;}
                .txt p {line-height:18px;}
                .txt span {line-height:22px;}
                .title h3 {color:#7b7b7b;}
                .title h4 {color:#08526e;font-size:26px;font-family:"Times New Roman";font-weight:normal;}
                '''

Download the file here: Calibre recipe: il sole24ore


Tags: , , ,

Calibre Recipe: quotidiano.net

Today’s Calibre Recipe is: ‘il Quotidiano.net ‘

Quotidiano.net is the result of a consortium of local newspapers (La Nazione, il Resto del Carlino, il Giorno) under the heading of the il Sole24ore, feeding the italianews network and part of the Montrif publishing group.

Click here to view the code

#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = 'Lorenzo Vigentini'
__copyright__ = '2009, Lorenzo Vigentini '
__version__     = 'v1.01'
__date__        = '10, January 2010'
__description__ = 'Italian News Agency'

'''

http://www.quotidianonet.ilsole24ore.com/

'''

from calibre.web.feeds.news import BasicNewsRecipe

class panorama(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini, based on Darko Miletic'
    description   = 'Italian News Agency'

    cover_url      = 'http://quotidianonet.ilsole24ore.com/file_generali/img/logo_quotidianonet-top.gif'
    title          = u'Quotidiano Net '
    publisher      = 'italiaNews'
    category       = 'News, politics, culture, economy, general interest'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 7
    max_articles_per_feed = 100
    use_embedded_content  = False
    recursion             = 10    

    remove_javascript = True

    keep_only_tags     = [dict(name='div', attrs={'class':'box_contenuto articolo'})]

    remove_tags        = [
                            dict(name=['object','link']),
                            dict(name='div',attrs={'class':['post-meta','sharing-tools','related','comments','prev-next','box_contenuto adsense']}),
                            dict(name='div',attrs={'id':['strumenti','related-posts','footer','inline_boxes','inline_boxes_header','inline_boxes_body','bottom']}),
                            dict(name='span',attrs={'class':'titolosezione default'})
                         ]

    feeds          = [
                       (u'Prima pagina', u'http://quotidianonet.ilsole24ore.com/rss/home.xml'),
                       (u'Cronaca', u'http://quotidianonet.ilsole24ore.com/rss/cronaca.xml'),
                       (u'Economia', u'http://quotidianonet.ilsole24ore.com/rss/economia.xml'),
                       (u'Esteri', u'http://quotidianonet.ilsole24ore.com/rss/esteri.xml'),
                       (u'Politica', u'http://quotidianonet.ilsole24ore.com/rss/politica.xml'),
                       (u'Salute', u'http://quotidianonet.ilsole24ore.com/rss/salute.xml'),
                       (u'Tecnologia', u'http://quotidianonet.ilsole24ore.com/rss/tecnologia.xml'),

                     ]

or Download the file here: Calibre recipe – quotidiano.net



Tags: , , ,

Calibre Recipe: il Corriere della sera

Today’s Calibre Recipe is: ‘ il Corriere della sera’

il Corriere is the most popular daily newspaper in italy with an average of over 620000 copies sold every day and a long history (first copy published in 1876). The main office is in Milan and It is published by the RCS group.

Click here to view the code

#!/usr/bin/env  python
__license__     = 'GPL v3'
__author__      = 'Lorenzo Vigentini, based on Darko Miletic'
__copyright__   = '2009, Darko Miletic , Lorenzo Vigentini '
__version__     = 'v1.01'
__date__        = '10, January 2010'
__description__ = 'Italian daily newspaper'

'''

http://www.corriere.it/

'''

from calibre.web.feeds.news import BasicNewsRecipe

class ilCorriere(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini, based on Darko Miletic'
    description   = 'Italian daily newspaper'

    cover_url      = 'http://images.corriereobjects.it/images/static/common/logo_home.gif?v=200709121520'
    title          = u'Il Corriere della sera '
    publisher      = 'RCS Digital'
    category       = 'News, politics, culture, economy, general interest'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 1
    max_articles_per_feed = 100
    use_embedded_content  = False
    recursion             = 10    

    remove_javascript = True
    no_stylesheets = True

    html2lrf_options = [
                          '--comment', description
                        , '--category', category
                        , '--publisher', publisher
                        , '--ignore-tables'
                        ]

    html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_tables=True' 

    keep_only_tags = [dict(name='div', attrs={'class':['news-dettaglio article','article']})]

    remove_tags = [
                   dict(name=['base','object','link','embed']),
                   dict(name='div', attrs={'class':'news-goback'}),
                   dict(name='ul', attrs={'class':'toolbar'})
                  ]

    remove_tags_after = dict(name='p', attrs={'class':'footnotes'})

    feeds = [
             (u'Ultimora'  , u'http://www.corriere.it/rss/ultimora.xml'  ),
             (u'Editoriali', u'http://www.corriere.it/rss/editoriali.xml'),
             (u'Cronache'  , u'http://www.corriere.it/rss/cronache.xml'  ),
             (u'Politica'  , u'http://www.corriere.it/rss/politica.xml'  ),
             (u'Esteri'    , u'http://www.corriere.it/rss/esteri.xml'    ),
             (u'Economia'  , u'http://www.corriere.it/rss/economia.xml'  ),
             (u'Cultura'    , u'http://www.corriere.it/rss/cultura.xml'  ),
             (u'Scienze'   , u'http://www.corriere.it/rss/scienze.xml'   ),
             (u'Salute'    , u'http://www.corriere.it/rss/salute.xml'    ),
             (u'Spettacolo', u'http://www.corriere.it/rss/spettacoli.xml'),
             (u'Cinema e TV', u'http://www.corriere.it/rss/cinema.xml'   ),
             (u'Sport'     , u'http://www.corriere.it/rss/sport.xml'     )
            ]

or Download the file here: Calibre recipe – ilCorriere



Tags: , , ,

Calibre Recipe: la Gazzetta dello Sport

Today’s Calibre Recipe is: ‘la Gazzetta dello sport ‘

la Gazzetta is the most popular daily italian newspaper dedicated to sports. It is highly recognizable as it is printed on pink paper and features in most bars and cafes in Italy. It is published by the Rcs MediaGroup

Click here to view the code

#!/usr/bin/env  python
__license__     = 'GPL v3'
__author__      = 'Lorenzo Vigentini'
__copyright__   = '2009, Lorenzo Vigentini '
__version__     = 'v1.02'
__date__        = '10, January 2010'
__description__ = 'Sport news from the most read sport newspaper in Italy'

'''www.gazzetta.it'''

from calibre.web.feeds.news import BasicNewsRecipe

class laGazzetta(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini'
    description   = 'Sport news from the most read sport newspaper in Italy'

    cover_url      = 'http://www.gazzetta.it/primapagina/images/prima_pagina_grande.png'
    title          = 'La Gazzetta dello Sport '
    publisher      = 'RCS Digital'
    category       = 'Sport News'         

    language       = 'it'
    encoding       = 'cp1252'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 2
    max_articles_per_feed = 20
    use_embedded_content  = False
    recursion             = 10   

    remove_javascript = True
    no_stylesheets = True

    keep_only_tags = [ dict(name='div', attrs={'id':'articolo'})]

    remove_tags = [
                dict(name='ul',attrs={'id':['service-toolbar','sections-menu']}),
                dict(name='div',attrs={'id':['header','rightcol','sponsored','vxFlashPlayer','footer','print-box']}),
                dict(name='iframe',attrs={'id':'mirago-feed'}),
                dict(name='a',attrs={'id':'commenta-up'}),
                dict(name='cite',attrs={'class':['signature','parag-title']}),
                dict(name='a',attrs={'class':['last-comment','button-bold2']}),
                dict(name=['base','object','link','a','script','noscript'])
            ]

    extra_css      = '''
                        h1 {font: sans-serif large;}
                        h2 {font: sans-serif medium;}
                        h3 {font: sans-serif small;}
                        h4 {font: sans-serif bold small;}
                        p  {font:10pt helvetica}
                        dd {font:8pt helvetica}
                      '''

    feeds       = [
                   (u'Calcio',u'http://www.gazzetta.it/rss/Calcio.xml'),
                   (u'Formula 1',u'http://www.gazzetta.it/rss/Formula1.xml'),
                   (u'Motomodiale',u'http://www.gazzetta.it/rss/Motomondiale.xml'),
                   (u'Motori',u'http://www.gazzetta.it/rss/Motori.xml'),
                   (u'Ciclismo',u'http://www.gazzetta.it/rss/Ciclismo.xml'),
                   (u'Basket',u'http://www.gazzetta.it/rss/Basket.xml'),
                   (u'Tennis',u'http://www.gazzetta.it/rss/Tennis.xml'),
                   (u'Pallavolo',u'http://www.gazzetta.it/rss/Pallavolo.xml'),
                   (u'Vela',u'http://www.gazzetta.it/rss/Vela.xml'),
                   (u'Atletica',u'http://www.gazzetta.it/rss/Atletica.xml'),
                   (u'Altri Sport',u'http://www.gazzetta.it/rss/Sport_Vari.xml')
                 ]

    def print_version(self,url):
        segments = url.split('/')
        basename = '/'.join(segments[:3])+'/'
        subPath= '/'.join(segments[3:7])+'/'
        articleURL=(segments[len(segments)-1])[:-6]
        myArticleSegs=articleURL.split('.')
        myArticle=myArticleSegs[0]
        printVerString=myArticle+ '_print.html'
        myURL = basename + subPath + printVerString
        print 'this is the url: ' + myURL
        return basename + subPath + printVerString

or Download the file here: Calibre recipe – LaGazzetta



Tags: , , ,

Calibre Recipe: la Repubblica

Today’s Calibre Recipe is ‘La Repubblica’

la Repubblica is a daily italian newspaper belonging to the publisher Gruppo Editoriale L’Espresso, with main office in Rome. It is the second most popular paper after the il Corriere della Sera

Click here to view the code

#!/usr/bin/env  python
__license__   = 'GPL v3'
__author__    = 'Lorenzo Vigentini, based on Darko Miletic'
__copyright__ = '2009, Darko Miletic , Lorenzo Vigentini '
description   = 'Italian daily newspaper - v1.01 (04, January 2010)'

'''

http://www.repubblica.it/

'''

from calibre.web.feeds.news import BasicNewsRecipe

class LaRepublica(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini, based on Darko Miletic'
    description   = 'Italian daily newspaper - v1.01 (04, January 2010)'

    cover_url      = 'http://www.repubblica.it/images/homepage/la_repubblica_logo.gif'
    title          = u'la Repubblica v1.01 '
    publisher      = 'Gruppo editoriale L''Espresso'
    category       = 'News, politics, culture, economy, general interest'         

    language       = 'it'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article = 1
    max_articles_per_feed = 100
    use_embedded_content  = False
    recursion             = 10    

    remove_javascript = True

    keep_only_tags     = [dict(name='div', attrs={'class':'articolo'})]

    remove_tags        = [
                            dict(name=['object','link']),
                            dict(name='span',attrs={'class':'linkindice'}),
                            dict(name='div',attrs={'class':'bottom-mobile'}),
                            dict(name='div',attrs={'id':['rssdiv','blocco']})
                         ]

    feeds          = [
                       (u'Repubblica Rilievo', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'),
                       (u'Repubblica Cronaca', u'http://www.repubblica.it/rss/cronaca/rss2.0.xml'),
                       (u'Repubblica Esteri', u'http://www.repubblica.it/rss/esteri/rss2.0.xml'),
                       (u'Repubblica Economia', u'http://www.repubblica.it/rss/economia/rss2.0.xml'),
                       (u'Repubblica Politica', u'http://www.repubblica.it/rss/politica/rss2.0.xml'),
                       (u'Repubblica Scienze', u'http://www.repubblica.it/rss/scienze/rss2.0.xml'),
                       (u'Repubblica Tecnologia', u'http://www.repubblica.it/rss/tecnologia/rss2.0.xml'),
                       (u'Repubblica Scuola e Universita', u'http://www.repubblica.it/rss/scuola_e_universita/rss2.0.xml'),
                       (u'Repubblica Ambiente', u'http://www.repubblica.it/rss/ambiente/rss2.0.xml'),
		       (u'Repubblica Cultura', u'http://www.repubblica.it/rss/spettacoli_e_cultura/rss2.0.xml'),
		       (u'Repubblica Persone', u'http://www.repubblica.it/rss/persone/rss2.0.xml'),
		       (u'Repubblica Sport', u'http://www.repubblica.it/rss/sport/rss2.0.xml'),
		       (u'Repubblica Calcio', u'http://www.repubblica.it/rss/sport/calcio/rss2.0.xml')
                     ]

or Download the file here: Calibre recipe – la Repubblica



Tags: , , ,