pax_global_header 0000666 0000000 0000000 00000000064 14137461156 0014522 g ustar 00root root 0000000 0000000 52 comment=8da4375c758a30d174ce9a44147a1b5f36e5029b reverse_markdown-2.1.1/ 0000775 0000000 0000000 00000000000 14137461156 0015100 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/.gitignore 0000664 0000000 0000000 00000000140 14137461156 0017063 0 ustar 00root root 0000000 0000000 *.gem .bundle .rvmrc .ruby-version .ruby-gemset .codeclimate Gemfile.lock pkg/* coverage/* TODO reverse_markdown-2.1.1/.rspec 0000664 0000000 0000000 00000000010 14137461156 0016204 0 ustar 00root root 0000000 0000000 --color reverse_markdown-2.1.1/.travis.yml 0000664 0000000 0000000 00000000267 14137461156 0017216 0 ustar 00root root 0000000 0000000 language: ruby cache: bundler rvm: - 2.0 - 2.1 - 2.2 - 2.3 - 2.4 - 2.5 - 2.6 - 2.7 - jruby-9.2.8.0 notifications: disabled: false recipients: - xijo@pm.me reverse_markdown-2.1.1/CHANGELOG.md 0000664 0000000 0000000 00000006654 14137461156 0016724 0 ustar 00root root 0000000 0000000 # Change Log All notable changes to this project will be documented in this file. ## 2.1.1 - October 2021 - Fixes unintentional newline characters within lists with paragraphs, thanks @diogoosorio, see #93 - Lets \n to be present in
tag. solves #77 #78, thanks @shivabhusal
## 2.1.0 - May 2020
- Add support for `figure` tags, see #86, thanks @anshul78
## 2.0.0 - March 2020
- BREAKING: Dropped support for ruby 1.9.3
- Add support for `details` and `summary` tags, see #85
## 1.4.0 – January 2020
- BREAKING: jump links will no longer be ignored but treated as links, see #82
## 1.3.0 - September 2019
- Add support for `s` HTML tag, thanks @fauno
## 1.2.0 - August 2019
- Handle windows `\r\n` within text blocks, thanks for reporting @krisdigital
- Handle paragraphs in `li` tags, thanks @gstamp
## 1.1.0 - April 2018
- Support Jruby, thanks @grddev (#71)
- Bypass `` tags, thanks @mu-is-too-short (#70)
## 1.0.5 - February 2018
- Fix newline handling within pre tags, thanks @niallcolfer (#69)
## 1.0.4 - November 2017
- Make blockquote behave as true block, thanks for reporting @kanedo (#67)
## 1.0.3 - Apr 2016
### Changes
- Use tag_border option while cleaning up, thanks @AlexanderPruss (#66)
## 1.0.2 - Apr 2016
### Changes
- Handle edge case: exclamation mark before links, thanks @Easy-D (#57)
## 1.0.1 - Jan 2016
### Changes
- Prevent double escaping of * and _, thanks @craig-day (#61)
## 1.0.0 - Nov 2015
### Changes
- BREAKING: Parsing was significantly improved, thanks @craig-day (#60)
Please update your custom converters to accept and use the state hash, for
examples look into exisiting standard converters.
- Use OptionParser for command line options, thanks @grmartin (#55)
- Tag border behavior is now configurable with the `tag_border` option, thanks @faheemmughal (#59)
- Preserve > and < from original markup, thanks @willglynn (#58)
## 0.8.2 - May 2015
### Changes
- Don't add whitespaces in links and images if they contain underscores
## 0.8.1 - April 2015
### Changes
- Don't add newlines after nested lists
## 0.8.0 - April 2015
### Added
- `article` tag is now supported and treated like a div
### Changed
- Special characters are treated correctly inside of backticks, see (#47)
## 0.7.0 - February 2015
### Added
- pre-tags support syntax github and confluence syntax highlighting now
## 0.6.1 - January 2015
### Changed
- Setting config options in block style will last for all following `convert` calls.
- Inline config options are only applied to this particular operation
### Removed
- `config.reset` is removed
## 0.6.0 - September 2014
### Added
- Ignore `col` and `colgroup` tags
- Bypass `thead` and `tbody` tags to show the tables correctly
### Changed
- Eliminate ruby warnings on load (thx @vsipuli)
- Treat newlines within text nodes as space
- Remove whitespace between inline tags and punctuation characters
## 0.5.1 - April 2014
### Added
- Adds support for ruby versions 1.9.3 back in
- More options for handling of unknown tags
### Changed
- Bugfixes in `li` indentation behavior
## 0.5.0 - March 2014
**There were some breaking changes, please make sure you don't miss them:**
1. Only ruby versions 2.0.0 or above are supported
2. There is no `Mapper` class any more. Just use `ReverseMarkdown.convert(input, options)`
3. Config option `github_style_code_blocks` changed its name to `github_flavored`
Please open an issue and let me know about it if you have any trouble with the new version.
reverse_markdown-2.1.1/Gemfile 0000664 0000000 0000000 00000000144 14137461156 0016372 0 ustar 00root root 0000000 0000000 source "http://rubygems.org"
# Specify your gem's dependencies in reverse_markdown.gemspec
gemspec
reverse_markdown-2.1.1/LICENSE 0000664 0000000 0000000 00000000742 14137461156 0016110 0 ustar 00root root 0000000 0000000 DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2014 Johannes Opper
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
reverse_markdown-2.1.1/README.md 0000664 0000000 0000000 00000010356 14137461156 0016364 0 ustar 00root root 0000000 0000000 # Summary
Transform html into markdown. Useful for example if you want to import html into your markdown based application.
[](https://travis-ci.org/xijo/reverse_markdown) [](http://badge.fury.io/rb/reverse_markdown) [](https://codeclimate.com/github/xijo/reverse_markdown) [](https://codeclimate.com/github/xijo/reverse_markdown)
## Changelog
See [Change Log](CHANGELOG.md)
## Requirements
1. [Nokogiri](http://nokogiri.org/)
2. Ruby 2.0.0 or higher
## Installation
Install the gem
```sh
[sudo] gem install reverse_markdown
```
or add it to your Gemfile
```ruby
gem 'reverse_markdown'
```
## Features
- Supports all the established html tags like `h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `p`, `em`, `strong`, `i`, `b`, `blockquote`, `code`, `img`, `a`, `hr`, `li`, `ol`, `ul`, `table`, `tr`, `th`, `td`, `br`, `figure`
- Module based - if you miss a tag, just add it
- Can deal with nested lists
- Inline and block code is supported
- Supports blockquote
# Usage
## Ruby
You can convert html content as string or Nokogiri document:
```ruby
input = 'feelings'
result = ReverseMarkdown.convert input
result.inspect # " **feelings** "
````
## Commandline
It's also possible to convert html files to markdown using the binary:
```sh
$ reverse_markdown file.html > file.md
$ cat file.html | reverse_markdown > file.md
````
## Configuration
The following options are available:
- `unknown_tags` (default `pass_through`) - how to handle unknown tags. Valid options are:
- `pass_through` - Include the unknown tag completely into the result
- `drop` - Drop the unknown tag and its content
- `bypass` - Ignore the unknown tag but try to convert its content
- `raise` - Raise an error to let you know
- `github_flavored` (default `false`) - use [github flavored markdown](https://help.github.com/articles/github-flavored-markdown) (yet only code blocks are supported)
- `tag_border` (default `' '`) - how to handle tag borders. valid options are:
- `' '` - Add whitespace if there is none at tag borders.
- `''` - Do not not add whitespace.
### As options
Just pass your chosen configuration options in after the input. The given options will last for this operation only.
```ruby
ReverseMarkdown.convert(input, unknown_tags: :raise, github_flavored: true)
```
### Preconfigure
Or configure it block style on a initializer level. These configurations will last for all conversions until they are set to something different.
```ruby
ReverseMarkdown.config do |config|
config.unknown_tags = :bypass
config.github_flavored = true
config.tag_border = ''
end
```
# Related stuff
- [Write custom converters](https://github.com/xijo/reverse_markdown/wiki/Write-your-own-converter) - Wiki entry about how to write your own converter
- [html_massage](https://github.com/harlantwood/html_massage) - A gem by Harlan T. Wood to convert regular sites into markdown using reverse_markdown
- [word-to-markdown](https://github.com/benbalter/word-to-markdown) - Convert word docs into markdown while using reverse_markdown, by Ben Balter
- [markdown syntax](http://daringfireball.net/projects/markdown) - The markdown syntax specification
- [github flavored markdown](https://help.github.com/articles/github-flavored-markdown) - Githubs extension to markdown
- [wmd-editor](http://wmd-editor.com) - Markdown flavored text editor
# Thanks
Thanks to all [contributors](https://github.com/xijo/reverse_markdown/graphs/contributors) and all other helpers:
- [Empact](https://github.com/Empact) Ben Woosley
- [harlantwood](https://github.com/harlantwood) Harlan T. Wood
- [aprescott](https://github.com/aprescott) Adam Prescott
- [danschultzer](https://github.com/danschultzer) Dan Schultzer
- [Benjamin-Dobell](https://github.com/Benjamin-Dobell) Benjamin Dobell
- [schkovich](https://github.com/schkovich) Goran Miskovic
- [craig-day](https://github.com/craig-day) Craig Day
- [grmartin](https://github.com/grmartin) Glenn R. Martin
- [willglynn](https://github.com/willglynn) Will Glynn
reverse_markdown-2.1.1/Rakefile 0000664 0000000 0000000 00000000520 14137461156 0016542 0 ustar 00root root 0000000 0000000 require 'bundler/gem_tasks'
if File.exist?('.codeclimate')
ENV["CODECLIMATE_REPO_TOKEN"] = File.read('.codeclimate').strip
end
require 'rspec/core/rake_task'
RSpec::Core::RakeTask.new(:spec)
task :default => :spec
desc 'Open an irb session preloaded with this library'
task :console do
sh 'irb -I lib -r reverse_markdown.rb'
end
reverse_markdown-2.1.1/bin/ 0000775 0000000 0000000 00000000000 14137461156 0015650 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/bin/reverse_markdown 0000775 0000000 0000000 00000001130 14137461156 0021146 0 ustar 00root root 0000000 0000000 #!/usr/bin/env ruby
# Usage: reverse_markdown [FILE]...
# Usage: cat FILE | reverse_markdown
require 'reverse_markdown'
require 'optparse'
options = {}
OptionParser.new do |opts|
opts.banner = "Usage: reverse_markdown [options] "
opts.on('-u', '--unknown_tags [pass_through, drop, bypass, raise]', 'Unknown tag handling (default: pass_through)') { |v| ReverseMarkdown.config.unknown_tags = v }
opts.on('-g', '--github_flavored bool', 'use github flavored markdown (default: false)') { |v| ReverseMarkdown.config.github_flavored = v }
end.parse!
puts ReverseMarkdown.convert(ARGF.read)
reverse_markdown-2.1.1/lib/ 0000775 0000000 0000000 00000000000 14137461156 0015646 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/lib/reverse_markdown.rb 0000664 0000000 0000000 00000003730 14137461156 0021553 0 ustar 00root root 0000000 0000000 require 'nokogiri'
require 'reverse_markdown/version'
require 'reverse_markdown/errors'
require 'reverse_markdown/cleaner'
require 'reverse_markdown/config'
require 'reverse_markdown/converters'
require 'reverse_markdown/converters/base'
require 'reverse_markdown/converters/a'
require 'reverse_markdown/converters/blockquote'
require 'reverse_markdown/converters/br'
require 'reverse_markdown/converters/bypass'
require 'reverse_markdown/converters/code'
require 'reverse_markdown/converters/del'
require 'reverse_markdown/converters/div'
require 'reverse_markdown/converters/drop'
require 'reverse_markdown/converters/details'
require 'reverse_markdown/converters/em'
require 'reverse_markdown/converters/figcaption'
require 'reverse_markdown/converters/figure'
require 'reverse_markdown/converters/h'
require 'reverse_markdown/converters/hr'
require 'reverse_markdown/converters/ignore'
require 'reverse_markdown/converters/img'
require 'reverse_markdown/converters/li'
require 'reverse_markdown/converters/ol'
require 'reverse_markdown/converters/p'
require 'reverse_markdown/converters/pass_through'
require 'reverse_markdown/converters/pre'
require 'reverse_markdown/converters/strong'
require 'reverse_markdown/converters/table'
require 'reverse_markdown/converters/td'
require 'reverse_markdown/converters/text'
require 'reverse_markdown/converters/tr'
module ReverseMarkdown
def self.convert(input, options = {})
config.with(options) do
input = cleaner.force_encoding(input.to_s)
root = case input
when String then Nokogiri::HTML(input).root
when Nokogiri::XML::Document then input.root
when Nokogiri::XML::Node then input
end
root or return ''
result = ReverseMarkdown::Converters.lookup(root.name).convert(root)
cleaner.tidy(result)
end
end
def self.config
@config ||= Config.new
yield @config if block_given?
@config
end
def self.cleaner
@cleaner ||= Cleaner.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/ 0000775 0000000 0000000 00000000000 14137461156 0021223 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/lib/reverse_markdown/cleaner.rb 0000664 0000000 0000000 00000005373 14137461156 0023171 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
class Cleaner
def tidy(string)
result = remove_inner_whitespaces(string)
result = remove_newlines(result)
result = remove_leading_newlines(result)
result = clean_tag_borders(result)
clean_punctuation_characters(result)
end
def remove_newlines(string)
string.gsub(/\n{3,}/, "\n\n")
end
def remove_leading_newlines(string)
string.gsub(/\A\n+/, '')
end
def remove_inner_whitespaces(string)
string.each_line.inject("") do |memo, line|
memo + preserve_border_whitespaces(line) do
line.strip.gsub(/[ \t]{2,}/, ' ')
end
end
end
# Find non-asterisk content that is enclosed by two or
# more asterisks. Ensure that only one whitespace occurs
# in the border area.
# Same for underscores and brackets.
def clean_tag_borders(string)
result = string.gsub(/\s?\*{2,}.*?\*{2,}\s?/) do |match|
preserve_border_whitespaces(match, default_border: ReverseMarkdown.config.tag_border) do
match.strip.sub('** ', '**').sub(' **', '**')
end
end
result = result.gsub(/\s?\_{2,}.*?\_{2,}\s?/) do |match|
preserve_border_whitespaces(match, default_border: ReverseMarkdown.config.tag_border) do
match.strip.sub('__ ', '__').sub(' __', '__')
end
end
result = result.gsub(/\s?~{2,}.*?~{2,}\s?/) do |match|
preserve_border_whitespaces(match, default_border: ReverseMarkdown.config.tag_border) do
match.strip.sub('~~ ', '~~').sub(' ~~', '~~')
end
end
result.gsub(/\s?\[.*?\]\s?/) do |match|
preserve_border_whitespaces(match) do
match.strip.sub('[ ', '[').sub(' ]', ']')
end
end
end
def clean_punctuation_characters(string)
string.gsub(/(\*\*|~~|__)\s([\.!\?'"])/, "\\1".strip + "\\2")
end
def force_encoding(string)
ReverseMarkdown.config.force_encoding or return string
string.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end
private
def preserve_border_whitespaces(string, options = {}, &block)
return string if string =~ /\A\s*\Z/
default_border = options.fetch(:default_border, '')
# If the string contains part of a link so the characters [,],(,)
# then don't add any extra spaces
default_border = '' if string =~ /[\[\(\]\)]/
string_start = present_or_default(string[/\A\s*/], default_border)
string_end = present_or_default(string[/\s*\Z/], default_border)
result = yield
string_start + result + string_end
end
def present_or_default(string, default)
if string.nil? || string.empty?
default
else
string
end
end
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/config.rb 0000664 0000000 0000000 00000001551 14137461156 0023017 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
class Config
attr_writer :unknown_tags, :github_flavored, :tag_border, :force_encoding
def initialize
@unknown_tags = :pass_through
@github_flavored = false
@force_encoding = false
@em_delimiter = '_'.freeze
@strong_delimiter = '**'.freeze
@inline_options = {}
@tag_border = ' '.freeze
end
def with(options = {})
@inline_options = options
result = yield
@inline_options = {}
result
end
def unknown_tags
@inline_options[:unknown_tags] || @unknown_tags
end
def github_flavored
@inline_options[:github_flavored] || @github_flavored
end
def tag_border
@inline_options[:tag_border] || @tag_border
end
def force_encoding
@inline_options[:force_encoding] || @force_encoding
end
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters.rb 0000664 0000000 0000000 00000001650 14137461156 0023744 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
def self.register(tag_name, converter)
@@converters ||= {}
@@converters[tag_name.to_sym] = converter
end
def self.unregister(tag_name)
@@converters.delete(tag_name.to_sym)
end
def self.lookup(tag_name)
@@converters[tag_name.to_sym] or default_converter(tag_name)
end
private
def self.default_converter(tag_name)
case ReverseMarkdown.config.unknown_tags.to_sym
when :pass_through
ReverseMarkdown::Converters::PassThrough.new
when :drop
ReverseMarkdown::Converters::Drop.new
when :bypass
ReverseMarkdown::Converters::Bypass.new
when :raise
raise UnknownTagError, "unknown tag: #{tag_name}"
else
raise InvalidConfigurationError, "unknown value #{ReverseMarkdown.config.unknown_tags.inspect} for ReverseMarkdown.config.unknown_tags"
end
end
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/ 0000775 0000000 0000000 00000000000 14137461156 0023415 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/lib/reverse_markdown/converters/a.rb 0000664 0000000 0000000 00000001103 14137461156 0024155 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class A < Base
def convert(node, state = {})
name = treat_children(node, state)
href = node['href']
title = extract_title(node)
if href.to_s.empty? || name.empty?
name
else
link = "[#{name}](#{href}#{title})"
link.prepend(' ') if prepend_space?(node)
link
end
end
private
def prepend_space?(node)
node.at_xpath("preceding::text()[1]").to_s.end_with?('!')
end
end
register :a, A.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/base.rb 0000664 0000000 0000000 00000001105 14137461156 0024651 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Base
def treat_children(node, state)
node.children.inject('') do |memo, child|
memo << treat(child, state)
end
end
def treat(node, state)
ReverseMarkdown::Converters.lookup(node.name).convert(node, state)
end
def escape_keychars(string)
string.gsub(/(? '\*', '_' => '\_')
end
def extract_title(node)
title = escape_keychars(node['title'].to_s)
title.empty? ? '' : %[ "#{title}"]
end
end
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/blockquote.rb 0000664 0000000 0000000 00000000544 14137461156 0026115 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Blockquote < Base
def convert(node, state = {})
content = treat_children(node, state).strip
content = ReverseMarkdown.cleaner.remove_newlines(content)
"\n\n> " << content.lines.to_a.join('> ') << "\n\n"
end
end
register :blockquote, Blockquote.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/br.rb 0000664 0000000 0000000 00000000250 14137461156 0024342 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Br < Base
def convert(node, state = {})
" \n"
end
end
register :br, Br.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/bypass.rb 0000664 0000000 0000000 00000000635 14137461156 0025247 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Bypass < Base
def convert(node, state = {})
treat_children(node, state)
end
end
register :document, Bypass.new
register :html, Bypass.new
register :body, Bypass.new
register :span, Bypass.new
register :thead, Bypass.new
register :tbody, Bypass.new
register :tfoot, Bypass.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/code.rb 0000664 0000000 0000000 00000000270 14137461156 0024653 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Code < Base
def convert(node, state = {})
"`#{node.text}`"
end
end
register :code, Code.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/del.rb 0000664 0000000 0000000 00000001072 14137461156 0024506 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Del < Base
def convert(node, state = {})
content = treat_children(node, state.merge(already_crossed_out: true))
if disabled? || content.strip.empty? || state[:already_crossed_out]
content
else
"~~#{content}~~"
end
end
def enabled?
ReverseMarkdown.config.github_flavored
end
def disabled?
!enabled?
end
end
register :strike, Del.new
register :s, Del.new
register :del, Del.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/details.rb 0000664 0000000 0000000 00000001113 14137461156 0025363 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Details < Base
def convert(node, state = {})
content = treat_children(node, state.merge(already_processed: true))
if disabled? || content.strip.empty? || state[:already_processed]
content
else
"##{content}"
end
end
def enabled?
ReverseMarkdown.config.github_flavored
end
def disabled?
!enabled?
end
end
register :details, Details.new
register :summary, Details.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/div.rb 0000664 0000000 0000000 00000000363 14137461156 0024526 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Div < Base
def convert(node, state = {})
"\n" << treat_children(node, state) << "\n"
end
end
register :div, Div.new
register :article, Div.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/drop.rb 0000664 0000000 0000000 00000000214 14137461156 0024703 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Drop < Base
def convert(node, state = {})
''
end
end
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/em.rb 0000664 0000000 0000000 00000000644 14137461156 0024347 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Em < Base
def convert(node, state = {})
content = treat_children(node, state.merge(already_italic: true))
if content.strip.empty? || state[:already_italic]
content
else
"#{content[/^\s*/]}_#{content.strip}_#{content[/\s*$/]}"
end
end
end
register :em, Em.new
register :i, Em.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/figcaption.rb 0000664 0000000 0000000 00000000452 14137461156 0026066 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class FigCaption < Base
def convert(node, state = {})
if node.text.strip.empty?
""
else
"\n" << "_#{node.text.strip}_" << "\n"
end
end
end
register :figcaption, FigCaption.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/figure.rb 0000664 0000000 0000000 00000000362 14137461156 0025224 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Figure < Base
def convert(node, state = {})
content = treat_children(node, state)
"\n#{content.strip}\n"
end
end
register :figure, Figure.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/h.rb 0000664 0000000 0000000 00000000577 14137461156 0024202 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class H < Base
def convert(node, state = {})
prefix = '#' * node.name[/\d/].to_i
["\n", prefix, ' ', treat_children(node, state), "\n"].join
end
end
register :h1, H.new
register :h2, H.new
register :h3, H.new
register :h4, H.new
register :h5, H.new
register :h6, H.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/hr.rb 0000664 0000000 0000000 00000000255 14137461156 0024355 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Hr < Base
def convert(node, state = {})
"\n* * *\n"
end
end
register :hr, Hr.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/ignore.rb 0000664 0000000 0000000 00000000377 14137461156 0025234 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Ignore < Base
def convert(node, state = {})
'' # noop
end
end
register :colgroup, Ignore.new
register :col, Ignore.new
register :head, Ignore.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/img.rb 0000664 0000000 0000000 00000000435 14137461156 0024520 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Img < Base
def convert(node, state = {})
alt = node['alt']
src = node['src']
title = extract_title(node)
" "
end
end
register :img, Img.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/li.rb 0000664 0000000 0000000 00000001703 14137461156 0024347 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Li < Base
def convert(node, state = {})
contains_child_paragraph = node.first_element_child ? node.first_element_child.name == 'p' : false
content_node = contains_child_paragraph ? node.first_element_child : node
content = treat_children(content_node, state)
indentation = indentation_from(state)
prefix = prefix_for(node)
"#{indentation}#{prefix}#{content.chomp}\n" +
(contains_child_paragraph ? "\n" : '')
end
def prefix_for(node)
if node.parent.name == 'ol'
index = node.parent.xpath('li').index(node)
"#{index.to_i + 1}. "
else
'- '
end
end
def indentation_from(state)
length = state.fetch(:ol_count, 0)
' ' * [length - 1, 0].max
end
end
register :li, Li.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/ol.rb 0000664 0000000 0000000 00000000451 14137461156 0024354 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Ol < Base
def convert(node, state = {})
ol_count = state.fetch(:ol_count, 0) + 1
"\n" << treat_children(node, state.merge(ol_count: ol_count))
end
end
register :ol, Ol.new
register :ul, Ol.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/p.rb 0000664 0000000 0000000 00000000324 14137461156 0024200 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class P < Base
def convert(node, state = {})
"\n\n" << treat_children(node, state).strip << "\n\n"
end
end
register :p, P.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/pass_through.rb 0000664 0000000 0000000 00000000232 14137461156 0026445 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class PassThrough < Base
def convert(node, state = {})
node.to_s
end
end
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/pre.rb 0000664 0000000 0000000 00000002062 14137461156 0024530 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Pre < Base
def convert(node, state = {})
content = treat_children(node, state)
if ReverseMarkdown.config.github_flavored
"\n```#{language(node)}\n" << content.strip << "\n```\n"
else
"\n\n " << content.lines.to_a.join(" ") << "\n\n"
end
end
private
# Override #treat as proposed in https://github.com/xijo/reverse_markdown/pull/69
def treat(node, state)
case node.name
when 'code', 'text'
node.text.strip
when 'br'
"\n"
else
super
end
end
def language(node)
lang = language_from_highlight_class(node)
lang || language_from_confluence_class(node)
end
def language_from_highlight_class(node)
node.parent['class'].to_s[/highlight-([a-zA-Z0-9]+)/, 1]
end
def language_from_confluence_class(node)
node['class'].to_s[/brush:\s?(:?.*);/, 1]
end
end
register :pre, Pre.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/strong.rb 0000664 0000000 0000000 00000000672 14137461156 0025263 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Strong < Base
def convert(node, state = {})
content = treat_children(node, state.merge(already_strong: true))
if content.strip.empty? || state[:already_strong]
content
else
"#{content[/^\s*/]}**#{content.strip}**#{content[/\s*$/]}"
end
end
end
register :strong, Strong.new
register :b, Strong.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/table.rb 0000664 0000000 0000000 00000000330 14137461156 0025025 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Table < Base
def convert(node, state = {})
"\n\n" << treat_children(node, state) << "\n"
end
end
register :table, Table.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/td.rb 0000664 0000000 0000000 00000000370 14137461156 0024351 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Td < Base
def convert(node, state = {})
content = treat_children(node, state)
" #{content} |"
end
end
register :td, Td.new
register :th, Td.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/text.rb 0000664 0000000 0000000 00000002633 14137461156 0024732 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Text < Base
def convert(node, options = {})
if node.text.strip.empty?
treat_empty(node)
else
treat_text(node)
end
end
private
def treat_empty(node)
parent = node.parent.name.to_sym
if [:ol, :ul].include?(parent) # Otherwise the identation is broken
''
elsif node.text == ' ' # Regular whitespace text node
' '
else
''
end
end
def treat_text(node)
text = node.text
text = preserve_nbsp(text)
text = remove_border_newlines(text)
text = remove_inner_newlines(text)
text = escape_keychars(text)
text = preserve_keychars_within_backticks(text)
text = preserve_tags(text)
text
end
def preserve_nbsp(text)
text.gsub(/\u00A0/, " ")
end
def preserve_tags(text)
text.gsub(/[<>]/, '>' => '\>', '<' => '\<')
end
def remove_border_newlines(text)
text.gsub(/\A\n+/, '').gsub(/\n+\z/, '')
end
def remove_inner_newlines(text)
text.tr("\r\n\t", ' ').squeeze(' ')
end
def preserve_keychars_within_backticks(text)
text.gsub(/`.*?`/) do |match|
match.gsub('\_', '_').gsub('\*', '*')
end
end
end
register :text, Text.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/converters/tr.rb 0000664 0000000 0000000 00000001037 14137461156 0024370 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
module Converters
class Tr < Base
def convert(node, state = {})
content = treat_children(node, state).rstrip
result = "|#{content}\n"
table_header_row?(node) ? result + underline_for(node) : result
end
def table_header_row?(node)
node.element_children.all? {|child| child.name.to_sym == :th}
end
def underline_for(node)
"| " + (['---'] * node.element_children.size).join(' | ') + " |\n"
end
end
register :tr, Tr.new
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/errors.rb 0000664 0000000 0000000 00000000227 14137461156 0023065 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
class Error < StandardError
end
class UnknownTagError < Error
end
class InvalidConfigurationError < Error
end
end
reverse_markdown-2.1.1/lib/reverse_markdown/version.rb 0000664 0000000 0000000 00000000057 14137461156 0023237 0 ustar 00root root 0000000 0000000 module ReverseMarkdown
VERSION = '2.1.1'
end
reverse_markdown-2.1.1/reverse_markdown.gemspec 0000664 0000000 0000000 00000002172 14137461156 0022024 0 ustar 00root root 0000000 0000000 # -*- encoding: utf-8 -*-
$:.push File.expand_path("../lib", __FILE__)
require "reverse_markdown/version"
Gem::Specification.new do |s|
s.name = "reverse_markdown"
s.version = ReverseMarkdown::VERSION
s.authors = ["Johannes Opper"]
s.email = ["johannes.opper@gmail.com"]
s.homepage = "http://github.com/xijo/reverse_markdown"
s.summary = %q{Convert html code into markdown.}
s.description = %q{Map simple html back into markdown, e.g. if you want to import existing html data in your application.}
s.licenses = ["WTFPL"]
s.files = `git ls-files`.split("\n")
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
s.require_paths = ["lib"]
# specify any dependencies here; for example:
s.add_dependency 'nokogiri'
s.add_development_dependency 'rspec'
s.add_development_dependency 'simplecov'
s.add_development_dependency 'rake'
s.add_development_dependency 'kramdown'
s.add_development_dependency 'byebug'
s.add_development_dependency 'codeclimate-test-reporter'
end
reverse_markdown-2.1.1/spec/ 0000775 0000000 0000000 00000000000 14137461156 0016032 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/spec/assets/ 0000775 0000000 0000000 00000000000 14137461156 0017334 5 ustar 00root root 0000000 0000000 reverse_markdown-2.1.1/spec/assets/anchors.html 0000664 0000000 0000000 00000002306 14137461156 0021660 0 ustar 00root root 0000000 0000000
some text...
Foobar
Fubar
Strong foobar
There should be no extra space before and after the anchor (stripped).
Exception: after an !there should be an extra space.
Even with stripped elements inbetween: !there should be an extra space.
ignore anchor tags with no link text
not ignore
anchor tags with images
pass through the text of internal jumplinks without treating them as links
pass through the text of anchor tags with no href without treating them as links
some text...
some text...
reverse_markdown-2.1.1/spec/assets/basic.html 0000664 0000000 0000000 00000002720 14137461156 0021304 0 ustar 00root root 0000000 0000000
plain text
h1
h2
h3
h4
h5
h6
em tag content
before and after empty em tags
before and after em tags containing whitespace
before
and after em tags containing whitespace
double em tags
double em tags in p tag
a em with leading and trailing whitespace
a
em with extra leading and trailing
whitespace
strong tag content
before and after empty strong tags
before and after strong tags containing whitespace
before
and after strong tags containing whitespace
double strong tags
double strong tags in p tag
before
double strong tags containing whitespace
after
a strong with leading and trailing whitespace
a
strong with extra leading and trailing
whitespace
b tag content
i tag content
br tags become double space followed by newline
before hr
after hr
section 1
section 2
reverse_markdown-2.1.1/spec/assets/code.html 0000664 0000000 0000000 00000000562 14137461156 0021137 0 ustar 00root root 0000000 0000000
pre block
code block
pre code block
Paragraph with inline code block
var this;
this.is("A multi line code block")
console.log("Yup, it is")
Code with indentation:
tell application "Foo"
beep
end tell
reverse_markdown-2.1.1/spec/assets/escapables.html 0000664 0000000 0000000 00000000351 14137461156 0022323 0 ustar 00root root 0000000 0000000
some text...
**two asterisks**
***three asterisks***
__two underscores__
___three underscores___
some text...
var theoretical_max_infin = 1.0;
reverse_markdown-2.1.1/spec/assets/from_the_wild.html 0000664 0000000 0000000 00000000477 14137461156 0023054 0 ustar 00root root 0000000 0000000
.
*** intentcast
: logo design
.
I\_AM\_HELPFUL
reverse_markdown-2.1.1/spec/assets/full_example.html 0000664 0000000 0000000 00000001456 14137461156 0022705 0 ustar 00root root 0000000 0000000
- li 1
- li 2
- li 3
- li 1
- li 2
- li 3
- li 1
-
- eins
- eins
- eins
- li 1
- li 2
h1
h2
h3
h4
Hallo em Text
strong
Block of code
First quoted paragraph
Second quoted paragraph
link
reverse_markdown-2.1.1/spec/assets/html_fragment.html 0000664 0000000 0000000 00000000057 14137461156 0023053 0 ustar 00root root 0000000 0000000 naked text 1
paragraph text
naked text 2 reverse_markdown-2.1.1/spec/assets/lists.html 0000664 0000000 0000000 00000004454 14137461156 0021367 0 ustar 00root root 0000000 0000000
some text...
- unordered list entry
- unordered list entry 2
- ordered list entry
- ordered list entry 2
- list entry 1st hierarchy
-
- nested unsorted list entry
-
- deep nested list entry
a nested list with no whitespace:
- item a
- item b
- item bb
- item bc
a nested list with lots of whitespace:
- item wa
- item wb
- item wbb
- item wbc
-
I want to have a party at my house!
-
I don't want to cleanup after the party!
-
li 1, p 1
li 1, p 2
li 2, p 1
-
one
- one one
- one two
-
two
-
two one
- two one one
- two one two
- two two
- three
a nested list between adjacent list items
- alpha
- bravo
- bravo alpha
- bravo bravo
- bravo bravo alpha
- charlie
- delta
reverse_markdown-2.1.1/spec/assets/minimum.html 0000664 0000000 0000000 00000000042 14137461156 0021671 0 ustar 00root root 0000000 0000000
reverse_markdown-2.1.1/spec/assets/paragraphs.html 0000664 0000000 0000000 00000000621 14137461156 0022351 0 ustar 00root root 0000000 0000000
First content
Second
content
Complex
Content
Trailing whitespace:
Trailing non-breaking space:
Combination: