Last active
February 28, 2017 07:14
-
-
Save gogocurtis/a5da2f0dff623513ade3d338ac5bbfe4 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def html5elements | |
html5 = %% | |
<!--...--> Defines a comment | |
<!DOCTYPE> Defines the document type | |
<a> Defines a hyperlink | |
<abbr> Defines an abbreviation or an acronym | |
<acronym> Not supported in HTML5. Use <abbr> instead. | |
Defines an acronym | |
<address> Defines contact information for the author/owner of a document | |
<applet> Not supported in HTML5. Use <embed> or <object> instead. | |
Defines an embedded applet | |
<area> Defines an area inside an image-map | |
<article> Defines an article | |
<aside> Defines content aside from the page content | |
<audio> Defines sound content | |
<b> Defines bold text | |
<base> Specifies the base URL/target for all relative URLs in a document | |
<basefont> Not supported in HTML5. Use CSS instead. | |
Specifies a default color, size, and font for all text in a document | |
<bdi> Isolates a part of text that might be formatted in a different direction from other text outside it | |
<bdo> Overrides the current text direction | |
<big> Not supported in HTML5. Use CSS instead. | |
Defines big text | |
<blockquote> Defines a section that is quoted from another source | |
<body> Defines the document's body | |
<br> Defines a single line break | |
<button> Defines a clickable button | |
<canvas> Used to draw graphics, on the fly, via scripting (usually JavaScript) | |
<caption> Defines a table caption | |
<center> Not supported in HTML5. Use CSS instead. | |
Defines centered text | |
<cite> Defines the title of a work | |
<code> Defines a piece of computer code | |
<col> Specifies column properties for each column within a <colgroup> element | |
<colgroup> Specifies a group of one or more columns in a table for formatting | |
<datalist> Specifies a list of pre-defined options for input controls | |
<dd> Defines a description/value of a term in a description list | |
<del> Defines text that has been deleted from a document | |
<details> Defines additional details that the user can view or hide | |
<dfn> Represents the defining instance of a term | |
<dialog> Defines a dialog box or window | |
<dir> Not supported in HTML5. Use <ul> instead. | |
Defines a directory list | |
<div> Defines a section in a document | |
<dl> Defines a description list | |
<dt> Defines a term/name in a description list | |
<em> Defines emphasized text | |
<embed> Defines a container for an external (non-HTML) application | |
<fieldset> Groups related elements in a form | |
<figcaption> Defines a caption for a <figure> element | |
<figure> Specifies self-contained content | |
<font> Not supported in HTML5. Use CSS instead. | |
Defines font, color, and size for text | |
<footer> Defines a footer for a document or section | |
<form> Defines an HTML form for user input | |
<frame> Not supported in HTML5. | |
Defines a window (a frame) in a frameset | |
<frameset> Not supported in HTML5. | |
Defines a set of frames | |
<h1> to <h6> Defines HTML headings | |
<head> Defines information about the document | |
<header> Defines a header for a document or section | |
<hr> Defines a thematic change in the content | |
<html> Defines the root of an HTML document | |
<i> Defines a part of text in an alternate voice or mood | |
<iframe> Defines an inline frame | |
<img> Defines an image | |
<input> Defines an input control | |
<ins> Defines a text that has been inserted into a document | |
<kbd> Defines keyboard input | |
<keygen> Defines a key-pair generator field (for forms) | |
<label> Defines a label for an <input> element | |
<legend> Defines a caption for a <fieldset> element | |
<li> Defines a list item | |
<link> Defines the relationship between a document and an external resource (most used to link to style sheets) | |
<main> Specifies the main content of a document | |
<map> Defines a client-side image-map | |
<mark> Defines marked/highlighted text | |
<menu> Defines a list/menu of commands | |
<menuitem> Defines a command/menu item that the user can invoke from a popup menu | |
<meta> Defines metadata about an HTML document | |
<meter> Defines a scalar measurement within a known range (a gauge) | |
<nav> Defines navigation links | |
<noframes> Not supported in HTML5. | |
Defines an alternate content for users that do not support frames | |
<noscript> Defines an alternate content for users that do not support client-side scripts | |
<object> Defines an embedded object | |
<ol> Defines an ordered list | |
<optgroup> Defines a group of related options in a drop-down list | |
<option> Defines an option in a drop-down list | |
<output> Defines the result of a calculation | |
<p> Defines a paragraph | |
<param> Defines a parameter for an object | |
<picture> Defines a container for multiple image resources | |
<pre> Defines preformatted text | |
<progress> Represents the progress of a task | |
<q> Defines a short quotation | |
<rp> Defines what to show in browsers that do not support ruby annotations | |
<rt> Defines an explanation/pronunciation of characters (for East Asian typography) | |
<ruby> Defines a ruby annotation (for East Asian typography) | |
<s> Defines text that is no longer correct | |
<samp> Defines sample output from a computer program | |
<script> Defines a client-side script | |
<section> Defines a section in a document | |
<select> Defines a drop-down list | |
<small> Defines smaller text | |
<source> Defines multiple media resources for media elements (<video> and <audio>) | |
<span> Defines a section in a document | |
<strike> Not supported in HTML5. Use <del> or <s> instead. | |
Defines strikethrough text | |
<strong> Defines important text | |
<style> Defines style information for a document | |
<sub> Defines subscripted text | |
<summary> Defines a visible heading for a <details> element | |
<sup> Defines superscripted text | |
<table> Defines a table | |
<tbody> Groups the body content in a table | |
<td> Defines a cell in a table | |
<textarea> Defines a multiline input control (text area) | |
<tfoot> Groups the footer content in a table | |
<th> Defines a header cell in a table | |
<thead> Groups the header content in a table | |
<time> Defines a date/time | |
<title> Defines a title for the document | |
<tr> Defines a row in a table | |
<track> Defines text tracks for media elements (<video> and <audio>) | |
<tt> Not supported in HTML5. Use CSS instead. | |
Defines teletype text | |
<u> Defines text that should be stylistically different from normal text | |
<ul> Defines an unordered list | |
<var> Defines a variable | |
<video> Defines a video or movie | |
<wbr> Defines a possible line-break | |
% | |
extract_elements_expression = / | |
(?: # non-match pattern group begin | |
< # match < start html element | |
( # begin match group one | |
[!-]? # character class one of ! or - 0 or 1 time | |
.+? # any characters non-greedy 1 or more times | |
) # end match group one | |
> # match > end of html element | |
)+ # non-match pattern group end 1 or more times | |
.*? # match any other possible characters | |
\n # match the end of the line | |
/mx | |
p elements = html5.scan(extract_elements_expression).flatten | |
duplicates = elements.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first) | |
if duplicates.size > 0 | |
puts %%text contains #{elements.size} elements%, | |
%%#{elements.size - elements.uniq.size} are duplicates%, | |
%%Your regular expression is not correct for the text format% | |
%%The duplicates are #{duplicates}% | |
elements.uniq! | |
end | |
return elements | |
end ; html5elements() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment