Newsticker: Use libxml instead of `xml-parse-region'. Fix some glitches. Clean up.

* lisp/net/newst-backend.el: Remove Time-stamp.  Rename variable
`newsticker--download-logos' to `newsticker-download-logos' and
make it customizable.
(newsticker--sentinel-work): Move xml-workarounds to function
`newsticker--do-xml-workarounds', call unless libxml-parser is
used.  Allow single quote in regexp for encoding.  Use
libxml-parser if available, else fall back to `xml-parse-region'.
Take care of possibly missing namespace prefixes (like "RDF"
instead of "rdf:RDF") when checking xml nodes and attributes (as
libxml correctly removes the prefixes).  Always use Atom 1.0 as
fallback feed type.  Rename `newsticker--download-logos' to
`newsticker-download-logos'
(newsticker--unxml, newsticker--unxml-node)
(newsticker--unxml-attribute): New.
(newsticker--parse-atom-1.0): Call `unxml' in case that embedded
HTML code has become part of the xml parse tree.
(newsticker--parse-rss-1.0, newsticker--parse-rss-2.0): Take care
of possibly missing namespace prefixes.
(newsticker--parse-generic-items): Code formatting. Typo.
(newsticker--images-dir): Add trailing slash.
(newsticker--image-get): Fix error message.

* lisp/net/newst-plainview.el: Remove Time-stamp.

* lisp/net/newst-reader.el: Remove Time-stamp.
(newsticker-download-logos): Rename variable
`newsticker--download-logos' to `newsticker-download-logos' and
make it customizable.
(newsticker--print-extra-elements): Add optional parameter
'htmlish for using html markup.  Amend list of ignored elements.
(newsticker--do-print-extra-element): Add parameter 'htmlish for
using html markup.

* lisp/net/newst-ticker.el: Remove Time-stamp.

* lisp/net/newst-treeview.el (newsticker--treeview-item-show): Use html
for formatting extra elements.

* lisp/net/newsticker.el:  Remove Time-stamp, Version.
(newsticker-version): Make obsolete.
This commit is contained in:
Ulf Jasper 2014-10-01 19:20:00 +02:00
parent f94ec2f3c8
commit 34912c0a2b
7 changed files with 214 additions and 98 deletions

View file

@ -1,3 +1,46 @@
2014-10-01 Ulf Jasper <ulf.jasper@web.de>
* net/newst-backend.el: Remove Time-stamp. Rename variable
`newsticker--download-logos' to `newsticker-download-logos' and
make it customizable.
(newsticker--sentinel-work): Move xml-workarounds to function
`newsticker--do-xml-workarounds', call unless libxml-parser is
used. Allow single quote in regexp for encoding. Use
libxml-parser if available, else fall back to `xml-parse-region'.
Take care of possibly missing namespace prefixes (like "RDF"
instead of "rdf:RDF") when checking xml nodes and attributes (as
libxml correctly removes the prefixes). Always use Atom 1.0 as
fallback feed type. Rename `newsticker--download-logos' to
`newsticker-download-logos'
(newsticker--unxml, newsticker--unxml-node)
(newsticker--unxml-attribute): New.
(newsticker--parse-atom-1.0): Call `unxml' in case that embedded
HTML code has become part of the xml parse tree.
(newsticker--parse-rss-1.0, newsticker--parse-rss-2.0): Take care
of possibly missing namespace prefixes.
(newsticker--parse-generic-items): Code formatting. Typo.
(newsticker--images-dir): Add trailing slash.
(newsticker--image-get): Fix error message.
* net/newst-plainview.el: Remove Time-stamp.
* net/newst-reader.el: Remove Time-stamp.
(newsticker-download-logos): Rename variable
`newsticker--download-logos' to `newsticker-download-logos' and
make it customizable.
(newsticker--print-extra-elements): Add optional parameter
'htmlish for using html markup. Amend list of ignored elements.
(newsticker--do-print-extra-element): Add parameter 'htmlish for
using html markup.
* net/newst-ticker.el: Remove Time-stamp.
* net/newst-treeview.el (newsticker--treeview-item-show): Use html
for formatting extra elements.
* net/newsticker.el: Remove Time-stamp, Version.
(newsticker-version): Make obsolete.
2014-09-30 Leonardo Nobrega <leonobr@gmail.com> (tiny change)
* progmodes/python.el (python-fill-paren): Don't inf-loop at EOB

View file

@ -6,7 +6,6 @@
;; Filename: newst-backend.el
;; URL: http://www.nongnu.org/newsticker
;; Keywords: News, RSS, Atom
;; Time-stamp: "23. September 2014, 19:51:10 (ulf)"
;; Package: newsticker
;; ======================================================================
@ -47,9 +46,6 @@
"List of timers for news retrieval.
This is an alist, each element consisting of (feed-name . timer).")
(defvar newsticker--download-logos nil
"If non-nil download feed logos if available.")
(defvar newsticker--sentinel-callback nil
"Function called at end of `newsticker--sentinel'.")
@ -861,59 +857,11 @@ Argument BUFFER is the buffer of the retrieval process."
(node-list
(save-current-buffer
(set-buffer buffer)
;; a very very dirty workaround to overcome the
;; problems with the newest (20030621) xml.el:
;; remove all unnecessary whitespace
(goto-char (point-min))
(while (re-search-forward ">[ \t\r\n]+<" nil t)
(replace-match "><" nil t))
;; and another brutal workaround (20031105)! For some
;; reason the xml parser does not like the colon in the
;; doctype name "rdf:RDF"
(goto-char (point-min))
(if (re-search-forward "<!DOCTYPE[ \t\n]+rdf:RDF" nil t)
(replace-match "<!DOCTYPE rdfColonRDF" nil t))
;; finally.... ~##^°!!!!!
(goto-char (point-min))
(while (search-forward "\r\n" nil t)
(replace-match "\n" nil t))
;; still more brutal workarounds (20040309)! The xml
;; parser does not like doctype rss
(goto-char (point-min))
(if (re-search-forward "<!DOCTYPE[ \t\n]+rss[ \t\n]*>" nil t)
(replace-match "" nil t))
;; And another one (20050618)! (Fixed in GNU Emacs 22.0.50.18)
;; Remove comments to avoid this xml-parsing bug:
;; "XML files can have only one toplevel tag"
(goto-char (point-min))
(while (search-forward "<!--" nil t)
(let ((start (match-beginning 0)))
(unless (search-forward "-->" nil t)
(error "Can't find end of comment"))
(delete-region start (point))))
;; And another one (20050702)! If description is HTML
;; encoded and starts with a `<', wrap the whole
;; description in a CDATA expression. This happened for
;; http://www.thefreedictionary.com/_/WoD/rss.aspx?type=quote
(goto-char (point-min))
(while (re-search-forward
"<description>\\(<img.*?\\)</description>" nil t)
(replace-match
"<description><![CDATA[ \\1 ]]></description>"))
;; And another one (20051123)! XML parser does not
;; like this: <yweather:location city="Frankfurt/Main"
;; region="" country="GM" />
;; try to "fix" empty attributes
;; This happened for
;; http://xml.weather.yahoo.com/forecastrss?p=GMXX0040&u=f
(goto-char (point-min))
(while (re-search-forward "\\(<[^>]*\\)=\"\"" nil t)
(replace-match "\\1=\" \""))
;;
(set-buffer-modified-p nil)
(unless (fboundp 'libxml-parse-xml-region)
(newsticker--do-xml-workarounds))
;; check coding system
(goto-char (point-min))
(if (re-search-forward "encoding=\"\\([^\"]+\\)\""
(if (re-search-forward "encoding=['\"]\\([^\"]+?\\)['\"]"
nil t)
(setq coding-system (intern (downcase (match-string 1))))
(setq coding-system
@ -931,12 +879,13 @@ Argument BUFFER is the buffer of the retrieval process."
(condition-case errordata
;; The xml parser might fail
;; or the xml might be bugged
(xml-parse-region (point-min) (point-max))
(if (fboundp 'libxml-parse-xml-region)
(list (libxml-parse-xml-region (point-min) (point-max)))
(xml-parse-region (point-min) (point-max)))
(error (message "Could not parse %s: %s"
(buffer-name) (cadr errordata))
(throw 'oops nil)))))
(topnode (car node-list))
(channelnode (car (xml-get-children topnode 'channel)))
(imageurl nil))
;; mark all items as obsolete
(newsticker--cache-replace-age newsticker--cache
@ -963,7 +912,8 @@ Argument BUFFER is the buffer of the retrieval process."
(setq imageurl (newsticker--get-logo-url-rss-0.92 topnode))
(newsticker--parse-rss-0.92 name time topnode))
;; RSS 1.0
((eq 'rdf:RDF (xml-node-name topnode))
((or (eq 'RDF (xml-node-name topnode))
(eq 'rdf:RDF (xml-node-name topnode)))
(setq imageurl (newsticker--get-logo-url-rss-1.0 topnode))
(newsticker--parse-rss-1.0 name time topnode))
;; RSS 2.0
@ -978,16 +928,24 @@ Argument BUFFER is the buffer of the retrieval process."
(setq imageurl (newsticker--get-logo-url-atom-0.3 topnode))
(newsticker--parse-atom-0.3 name time topnode))
;; Atom 1.0
((and (eq 'feed (xml-node-name topnode))
(string= "http://www.w3.org/2005/Atom"
(xml-get-attribute topnode 'xmlns)))
(t
;; The test for Atom 1.0 does not work when using
;; libxml, as with libxml the namespace attribute is
;; not in the xml tree. For the time being we skip
;; the check and assume that we are dealing with an
;; Atom 1.0 feed.
;; (and (eq 'feed (xml-node-name topnode))
;; (string= "http://www.w3.org/2005/Atom"
;; (xml-get-attribute topnode 'xmlns)))
(setq imageurl (newsticker--get-logo-url-atom-1.0 topnode))
(newsticker--parse-atom-1.0 name time topnode))
;; unknown feed type
(t
(newsticker--debug-msg "Feed type unknown: %s: %s"
(xml-node-name topnode) name)
nil))
;; (t
;; (newsticker--debug-msg "Feed type unknown: %s: %s"
;; (xml-node-name topnode) name)
;; nil)
)
(setq something-was-added t))
(error (message "sentinelerror in %s: %s" name error-data)))
@ -1031,11 +989,67 @@ Argument BUFFER is the buffer of the retrieval process."
(unless newsticker-debug
(kill-buffer buffer))
;; launch retrieval of image
(when (and imageurl newsticker--download-logos)
(when (and imageurl (boundp 'newsticker-download-logos)
newsticker-download-logos)
(newsticker--image-get name imageurl)))))
(when newsticker--sentinel-callback
(funcall newsticker--sentinel-callback)))
(defun newsticker--do-xml-workarounds ()
"Fix all issues which `xml-parse-region' could be choking on."
;; a very very dirty workaround to overcome the
;; problems with the newest (20030621) xml.el:
;; remove all unnecessary whitespace
(goto-char (point-min))
(while (re-search-forward ">[ \t\r\n]+<" nil t)
(replace-match "><" nil t))
;; and another brutal workaround (20031105)! For some
;; reason the xml parser does not like the colon in the
;; doctype name "rdf:RDF"
(goto-char (point-min))
(if (re-search-forward "<!DOCTYPE[ \t\n]+rdf:RDF" nil t)
(replace-match "<!DOCTYPE rdfColonRDF" nil t))
;; finally.... ~##^°!!!!!
(goto-char (point-min))
(while (search-forward "\r\n" nil t)
(replace-match "\n" nil t))
;; still more brutal workarounds (20040309)! The xml
;; parser does not like doctype rss
(goto-char (point-min))
(if (re-search-forward "<!DOCTYPE[ \t\n]+rss[ \t\n]*>" nil t)
(replace-match "" nil t))
;; And another one (20050618)! (Fixed in GNU Emacs 22.0.50.18)
;; Remove comments to avoid this xml-parsing bug:
;; "XML files can have only one toplevel tag"
(goto-char (point-min))
(while (search-forward "<!--" nil t)
(let ((start (match-beginning 0)))
(unless (search-forward "-->" nil t)
(error "Can't find end of comment"))
(delete-region start (point))))
;; And another one (20050702)! If description is HTML
;; encoded and starts with a `<', wrap the whole
;; description in a CDATA expression. This happened for
;; http://www.thefreedictionary.com/_/WoD/rss.aspx?type=quote
(goto-char (point-min))
(while (re-search-forward
"<description>\\(<img.*?\\)</description>" nil t)
(replace-match
"<description><![CDATA[ \\1 ]]></description>"))
;; And another one (20051123)! XML parser does not
;; like this: <yweather:location city="Frankfurt/Main"
;; region="" country="GM" />
;; try to "fix" empty attributes
;; This happened for
;; http://xml.weather.yahoo.com/forecastrss?p=GMXX0040&u=f
(goto-char (point-min))
(while (re-search-forward "\\(<[^>]*\\)=\"\"" nil t)
(replace-match "\\1=\" \""))
;;
(set-buffer-modified-p nil))
(defun newsticker--get-logo-url-atom-1.0 (node)
"Return logo URL from atom 1.0 data in NODE."
(car (xml-node-children
@ -1117,6 +1131,30 @@ same as in `newsticker--parse-atom-1.0'."
(xml-node-children node))))
(or new-item new-feed)))
(defun newsticker--unxml (node)
"Reverse parsing of an xml string.
Restore an xml-string from a an xml-node that was returned by xml-parse..."
(if (or (not node) (stringp node))
node
(newsticker--unxml-node node)))
(defun newsticker--unxml-node (node)
"Actually restore xml-string of an xml node."
(let ((qname (symbol-name (car node)))
(att-list (cadr node))
(children (cddr node)))
(concat "<" qname
(when att-list " ")
(mapconcat 'newsticker--unxml-attribute att-list " ")
">"
(mapconcat 'newsticker--unxml children "") "</" qname ">")))
(defun newsticker--unxml-attribute (att)
"Actually restore xml-string of an attribute of an xml node."
(let ((name (symbol-name (car att)))
(value (cdr att)))
(concat name "=\"" value "\"")))
(defun newsticker--parse-atom-1.0 (name time topnode)
"Parse Atom 1.0 data.
Argument NAME gives the name of a news feed. TIME gives the
@ -1149,8 +1187,13 @@ URL `http://www.atompub.org/2005/08/17/draft-ietf-atompub-format-11.html'"
(car (xml-get-children node 'title)))))
;; desc-fn
(lambda (node)
(or (car (xml-node-children
(car (xml-get-children node 'content))))
;; unxml the content node. Atom allows for
;; integrating (x)html into the atom structure
;; but we need the raw html string.
;; e.g. http://www.heise.de/open/news/news-atom.xml
(or (newsticker--unxml
(car (xml-node-children
(car (xml-get-children node 'content)))))
(car (xml-node-children
(car (xml-get-children node 'summary))))))
;; link-fn
@ -1295,9 +1338,15 @@ For the RSS 1.0 specification see URL `http://web.resource.org/rss/1.0/spec'."
(car (xml-node-children
(car (xml-get-children channelnode 'title))))
;; desc
(car (xml-node-children
(car (xml-get-children channelnode
'description))))
(or (car (xml-node-children
(car (xml-get-children channelnode
'encoded))))
(car (xml-node-children
(car (xml-get-children channelnode
'content:encoded))))
(car (xml-node-children
(car (xml-get-children channelnode
'description)))))
;; link
(car (xml-node-children
(car (xml-get-children channelnode 'link))))
@ -1321,8 +1370,10 @@ For the RSS 1.0 specification see URL `http://web.resource.org/rss/1.0/spec'."
;; time-fn
(lambda (node)
(newsticker--decode-iso8601-date
(car (xml-node-children
(car (xml-get-children node 'dc:date))))))
(or (car (xml-node-children
(car (xml-get-children node 'dc:date))))
(car (xml-node-children
(car (xml-get-children node 'date)))))))
;; guid-fn
(lambda (node)
nil)
@ -1346,9 +1397,15 @@ For the RSS 2.0 specification see URL `http://blogs.law.harvard.edu/tech/rss'."
(car (xml-node-children
(car (xml-get-children channelnode 'title))))
;; desc
(car (xml-node-children
(car (xml-get-children channelnode
'description))))
(or (car (xml-node-children
(car (xml-get-children channelnode
'encoded))))
(car (xml-node-children
(car (xml-get-children channelnode
'content:encoded))))
(car (xml-node-children
(car (xml-get-children channelnode
'description)))))
;; link
(car (xml-node-children
(car (xml-get-children channelnode 'link))))
@ -1363,6 +1420,9 @@ For the RSS 2.0 specification see URL `http://blogs.law.harvard.edu/tech/rss'."
;; desc-fn
(lambda (node)
(or (car (xml-node-children
(car (xml-get-children node
'encoded))))
(car (xml-node-children
(car (xml-get-children node
'content:encoded))))
(car (xml-node-children
@ -1456,7 +1516,7 @@ argument, which is one of the items in ITEMLIST."
;; decode numeric entities
(setq title (xml-substitute-numeric-entities title))
(when desc
(setq desc (xml-substitute-numeric-entities desc)))
(setq desc (xml-substitute-numeric-entities desc)))
(setq link (xml-substitute-numeric-entities link))
;; remove whitespace from title, desc, and link
(setq title (newsticker--remove-whitespace title))
@ -1478,9 +1538,9 @@ argument, which is one of the items in ITEMLIST."
(let ((prev-age (newsticker--age old-item)))
(unless newsticker-automatically-mark-items-as-old
;; Some feeds deliver items multiply, the
;; first time we find an 'obsolete-old one the
;; cache, the following times we find an 'old
;; one
;; first time we find an 'obsolete-old one in
;; the cache, the following times we find an
;; 'old one
(if (memq prev-age '(obsolete-old old))
(setq age2 'old)
(setq age2 'new)))
@ -1704,7 +1764,7 @@ Checks list of active processes against list of newsticker processes."
;; ======================================================================
(defun newsticker--images-dir ()
"Return directory where feed images are saved."
(concat newsticker-dir "/images"))
(concat newsticker-dir "/images/"))
(defun newsticker--image-get (feed-name url)
"Get image of the news site FEED-NAME from URL.
@ -1725,7 +1785,7 @@ If the image has been downloaded in the last 24h do nothing."
(item (or (assoc feed-name newsticker-url-list)
(assoc feed-name newsticker-url-list-defaults)
(error
"Cannot get news for %s: Check newsticker-url-list"
"Cannot get image for %s: Check newsticker-url-list"
feed-name)))
(wget-arguments (or (car (cdr (cdr (cdr (cdr item)))))
newsticker-wget-arguments)))

View file

@ -5,7 +5,6 @@
;; Author: Ulf Jasper <ulf.jasper@web.de>
;; Filename: newst-plainview.el
;; URL: http://www.nongnu.org/newsticker
;; Time-stamp: "Mon 11-Feb-2013 20:27:11 gm on skiddaw"
;; Package: newsticker
;; ======================================================================

View file

@ -5,7 +5,6 @@
;; Author: Ulf Jasper <ulf.jasper@web.de>
;; Filename: newst-reader.el
;; URL: http://www.nongnu.org/newsticker
;; Time-stamp: "24. September 2011, 15:47:49 (ulf)"
;; Package: newsticker
;; ======================================================================
@ -67,6 +66,12 @@ This must be one of the functions `newsticker-plainview' or
:group 'newsticker-reader)
;; image related things
(defcustom newsticker-download-logos
t
"If non-nil newsticker downloads logo images of subscribed feeds."
:type 'boolean
:group 'newsticker-reader)
(defcustom newsticker-enable-logo-manipulations
t
"If non-nil newsticker manipulates logo images.
@ -186,15 +191,17 @@ KEYMAP will be applied."
'nt-type 'desc))
(insert "\n")))))
(defun newsticker--print-extra-elements (item keymap)
(defun newsticker--print-extra-elements (item keymap &optional htmlish)
"Insert extra-elements of ITEM in a pretty form into the current buffer.
KEYMAP is applied."
(let ((ignored-elements '(items link title description content
content:encoded dc:subject
dc:date entry item guid pubDate
content:encoded encoded
dc:subject subject
dc:date date entry item guid pubDate
published updated
enclosure))
(left-column-width 1))
(if htmlish (insert "<ul>"))
(mapc (lambda (extra-element)
(when (listp extra-element) ;; take care of broken xml
;; data, 2007-05-25
@ -209,15 +216,19 @@ KEYMAP is applied."
(unless (memq (car extra-element) ignored-elements)
(newsticker--do-print-extra-element extra-element
left-column-width
keymap))))
(newsticker--extra item))))
keymap
htmlish))))
(newsticker--extra item))
(if htmlish (insert "</ul>"))))
(defun newsticker--do-print-extra-element (extra-element width keymap)
(defun newsticker--do-print-extra-element (extra-element width keymap htmlish)
"Actually print an EXTRA-ELEMENT using the given WIDTH.
KEYMAP is applied."
(let ((name (symbol-name (car extra-element))))
(insert (format "%s: " name))
(insert (make-string (- width (length name)) ? )))
(if htmlish
(insert (format "<li>%s: " name))
(insert (format "%s: " name))
(insert (make-string (- width (length name)) ? ))))
(let (;;(attributes (cadr extra-element)) ;FIXME!!!!
(contents (cddr extra-element)))
(cond ((listp contents)
@ -238,7 +249,9 @@ KEYMAP is applied."
contents))
(t
(insert (format "%s" contents))))
(insert "\n")))
(if htmlish
(insert "</li>")
(insert "\n"))))
(defun newsticker--image-read (feed-name-symbol disabled)
"Read the cached image for FEED-NAME-SYMBOL from disk.

View file

@ -6,7 +6,6 @@
;; Filename: newst-ticker.el
;; URL: http://www.nongnu.org/newsticker
;; Keywords: News, RSS, Atom
;; Time-stamp: "24. September 2014, 19:07:25 (ulf)"
;; Package: newsticker
;; ======================================================================

View file

@ -781,8 +781,11 @@ for the button."
(put-text-property pos (point) 'face 'newsticker-enclosure-face)
(setq pos (point))
(insert "\n")
(newsticker--print-extra-elements item newsticker--treeview-url-keymap)
(put-text-property pos (point) 'face 'newsticker-extra-face)
(set-marker marker1 pos)
(newsticker--print-extra-elements item newsticker--treeview-url-keymap t)
(set-marker marker2 (point))
(newsticker--treeview-render-text marker1 marker2)
(put-text-property marker1 marker2 'face 'newsticker-extra-face)
(goto-char (point-min)))))
(if (and newsticker-treeview-automatically-mark-displayed-items-as-old
item

View file

@ -7,8 +7,6 @@
;; URL: http://www.nongnu.org/newsticker
;; Created: 17. June 2003
;; Keywords: News, RSS, Atom
;; Time-stamp: "6. Dezember 2009, 19:15:18 (ulf)"
;; Version: 1.99
;; ======================================================================
@ -28,6 +26,7 @@
;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>.
(defconst newsticker-version "1.99" "Version number of newsticker.el.")
(make-obsolete-variable 'newsticker-version 'emacs-version "25.1")
;; ======================================================================
;;; Commentary: