如何在Linux中轻松地从标准inputstream中转换HTML特殊实体？

CentOS的

有没有一种简单的方法来转换数据stream中的HTML特殊实体？我将数据传递给bash 脚本，有时这些数据包含特殊的实体。例如：

“testing”＆amp; testing$testing！ test @＃$％^＆amp; *

我不知道为什么有些angular色显示正常，其他人不显示，但不幸的是，我无法控制数据进入。

我想我可以在这里使用SED，但是这看起来很麻烦，可能会出现误报。有一个我可以pipe理的Linux命令专门解码这种types的数据？

PHP非常适合这个。这个例子需要PHP 5：

cat file.html | php -R 'echo html_entity_decode($argn);'

Perl（一如既往）是你的朋友。我认为这将做到这一点：

 perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'

例如：

 echo '"test" &amp; test $test ! test @ # $ % ^ &amp; *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;'

输出：

 someguy@somehost ~]$ echo '"test" &amp; test $test ! test @ # $ % ^ &amp; *' |perl -n -mHTML::Entities -e ' ; print HTML::Entities::decode_entities($_) ;' "test" & test $test ! test @ # $ % ^ & *

recode似乎可用在GNU / Linux主发行版的默认软件包存储库上。例如，将HTML实体解码为UTF-8：

 …|recode html..utf8

从标准input中获取文本文件：

 #!/bin/bash # while read lin; do newl=${lin//&gt;/>} newl=${newl//&lt;/<} newl=${newl//&amp;/<} # ...other entites echo "$newl" done

它可能需要bash> =版本4