Webscript: Tutorial

Summary
Webscript: Tutorial
General tipsGeneral tips and recommendations.
Search pathsTypical libquvi LUA script search paths.
Webscript (or Website script)What does a webscript do?
Getting startedChoose a website if you have not already.
Generating network traffic logsLogs may be helpful sometimes.
Once a website is chosenRead Webscript: Guidelines.
ProtocolsHistorically quvi only worked with HTTP.
Additional readingFor technical specifications.
Writing a webscriptLet us assume you know the media data inside out.
Working with dev codeClone the current development code.
Working with binariesWorking with the precompiled binaries.
Customizing the foo.luaYou will be using LUA patterns.
Testing webscriptTest.
ChecklistBefore you submit your webscript.
Generate the patchWe prefer new webscripts in the patch format.
See alsoRelated pages.

General tips

General tips and recommendations.

# Make libquvi print the LUA script search paths.
env QUVI_SHOW_SCANDIR=1 quvi
# Make libquvi print the found LUA scripts.
env QUVI_SHOW_SCRIPT=1 quvi
# Make libcurl print the messages.
quvi --verbose-libcurl
  • Isolate problems, disect data, write smaller LUA scripts before you put everything together

Search paths

Typical libquvi LUA script search paths.  Listed below in search order.

  • $QUVI_BASEDIR (see quvi manual for notes)
  • Current working directory
  • $HOME/.quvi/
  • $HOME/.config/quvi/
  • $HOME/.local/share/quvi/
  • configure’s --datadir=DIR (and pkgdatadir), usually these are $prefix/share/ and $prefix/share/quvi/

If you define QUVI_BASEDIR, make sure it points to the directory with the lua/ subdir.  For example:

tar xf quvi-$release.tar.gz && cd quvi-$release
mkdir tmp ; cd tmp
../configure ; make
QUVI_BASEDIR=../share ./src/quvi/quvi --support

Webscript (or Website script)

What does a webscript do?

  • Identifies itself to the library (`ident’ function)
  • Queries for available formats (`query_formats’ function)
  • Parses and resolves the media details and returns the results

Getting started

Choose a website if you have not already.

  • Visit our <Trac at url target=”http://sourceforge” name=”http://sourceforge”.net/apps/trac/quvi/report/3> for open requests
  • Or pick some other website

There are typically two kinds of websites

  • Those that use complex systems to access the media (see youtube.lua)
  • Those that do not (see gaskrank.lua)

Generating network traffic logs

Logs may be helpful sometimes.  Further analyzation of the log data may hint how the media could be accessed if you have not found any other way to do that.  The downside of this is that you typically need a system that can execute Adobe Flash object code which may not always be a choice.

If you choose to take this path, find youself a system capable of executing Adobe Flash object code and capture the generated network traffic.  You could use, for example, <Wireshark at url target=”http://www.wireshark” name=”http://www.wireshark”.org/> and/or <Live HTTP Headers at url target=”https://addons.mozilla” name=”https://addons.mozilla”.org/en-US/firefox/addon/live-http-headers/>.

Once a website is chosen

Read Webscript: Guidelines.  Go over the existing scripts for examples.  Write down how your website provides the required data and how you could parse it in your webscript.

Protocols

Historically quvi only worked with HTTP.  Support for other protocols, such as RTMP, RTSP and MMS were later.  This means that if your website uses, say RTMP, then you must define this in the `ident’ function of your webscript.  See francetelevisions.lua for an example of this.

For historical reasons, quvi defaults to HTTP category scripts.  This is likely changed in the 0.2.20.  Until then, make sure you use the --category-all (-a) or the --category-$scheme option when you run quvi.  Otherwise quvi ignores the non-HTTP category scripts.

For example

quvi --category-rtmp URL
quvi -a URL

Additional reading

For technical specifications.

Please read

  • $top_srcdir/share/lua/website/README

This README covers the webscript functions.  Be sure to read the Webscript: Guidelines as well.

Writing a webscript

Let us assume you know the media data inside out.  What left is writing the webscript.

Working with dev code

Clone the current development code.  Or, see Working with binaries.

git clone git://repo.or.cz/quvi.git quvi.git
cd quvi.git ; sh autogen.sh
mkdir tmp ; cd tmp
../configure ; make

Let’s use buzzhumor.lua as a template webscript.

cp ../share/lua/website/buzzhumor.lua ../share/lua/website/foo.lua
 # open foo.lua in an editor

Jump to Customizing the foo.lua to continue from here.

Working with binaries

Working with the precompiled binaries.  Make sure you have at least 0.2.17 installed to your system.  Earlier version may work, although not confirmed.  This HOWTO was written for 0.2.17.

quvi --version
quvi version 0.2.17 built on 2011-06-02 for i686-pc-linux-gnu

Let’s use buzzhumor.lua as a template webscript.

mkdir -p foo/lua/website/ ; cd foo
cp -r $prefix/share/lua/util/ lua/
cp -r $prefix/share/lua/website/quvi/ lua/website/
cp $prefix/share/lua/website/buzzhumor.lua lua/website/foo.lua
# foo/ dir should now look like:
find .
.
./lua
./lua/website
./lua/website/foo.lua
./lua/website/quvi
./lua/website/quvi/const.lua
./lua/website/quvi/url.lua
./lua/website/quvi/util.lua
./lua/website/quvi/bit.lua
./lua/util
./lua/util/content_type.lua
./lua/util/charset.lua
./lua/util/trim.lua
# open foo.lua in an editor

Customizing the foo.lua

You will be using LUA patterns.  If you are new to LUA or the patterns, read more about them from the <Programming in Lua at url target=”http://www.lua” name=”http://www.lua”.org/pil/> book before you continue.

Update the copyright notice

-- Copyright (C) $year  $your_name <$your_email>

Alternatively use the project default

-- Copyright (C) $year  quvi project <http://quvi.sourceforge.net/>

Modify the `ident’ function

-   r.domain  = 'buzzhumor.com"
+   r.domain  = 'foo.bar'
- r.handles    = U.handles(self.page_url, {r.domain}, {"/videos/"})
+ r.handles    = U.handles(self.page_url, {r.domain}, {"/watch/"})

r.handles tells the library whether your webscript can handle the self.page_url.  Notice how we use r.domain and additional “signatures” to check whether this URL is for this webscript.

Consider the following URLs.

The `handles’ function imported from quvi/util.lua simply confirms that the domain and path pattern(s) match the ones in the URL.

r.formats

Leave it as it is.  Let’s assume that foo.bar only supports one format.  The formats are a bit more complex topic to cover.  You can read more about them in the Overview: Formats.

r.formats = 'default'

r.categories

Modify this only if you know that the website uses some other protocol for streaming the media.  Check francetelevisions.lua for an example of this.  Let’s assume that foo.bar uses only HTTP.

r.categories = C.proto_http

query_formats function

Leave the dummy function as it is like we did with r.formats.  If the webscript supported >1 formats, we’d add the code here to that would produce a dynamically created list of available format strings.

function query_formats(self)
    self.formats = 'default'
    return self
end

parse function

To keep things simple, let’s go ahead and assume that our ‘foo.bar’ website is nearly identical to ‘buzzhumor’.

-   self.host_id = 'buzzhumor'
+   self.host_id = 'foo'

Fetch the page from the page URL so we have data to parse.

local page = quvi.fetch(self.page_url)

Now that we have the page, it’s time to parse the media details from it.  Let’s begin with the page title.

local _,_,s = page:find('<title>(.-)</title>')
self.title  = s or error('no match: media title')

And the media ID.

local _,_,s = page:find('vid_id="(.-)"')
self.id     = s or error('no match: media id')

This leaves the media URL.

local _,_,s = page:find('vid_url="(.-)"')
self.url    = {s or error('no match: media url')}

Note the use of ‘{}’ here, the media URLs must be returned in an array.

Testing webscript

Test.  Test.  Test.

If you are Working with dev code, run

# Still in $top_srcdir/tmp
env QUVI_BASEDIR=../share ./src/quvi/quvi TEST_URL

Or, if you are Working with binaries, run

# Still in foo/
quvi TEST_URL

You will probably spend most of the time tweaking the patterns in your webscript.  It often helps to “isolate the problem”, e.g. copy page data to a local file and write an additional script to perfect the LUA patterns.

Read also $top_srcdir/tests/README for tips on how you can use the existing test suite files to test your script.  This isn’t necessary but recommended reading.

Checklist

Before you submit your webscript.

Does your script set and parse everything as expected

  • Protocol category
  • Media URL
  • Media ID
  • Host ID
  • Title

Does the website support >1 formats

If yes, see if you can add support for them, see youtube.lua, cbsnews.lua, dailymotion.lua for examples.  The support may always be added later, too.

Does the parsed title contain extra characters

  • We want the media title only
  • Anything else, e.g. domain name, should be left out

Finally

If you are unsure about something, don’t hesitate to ask.

Generate the patch

We prefer new webscripts in the patch format.  We appreciate especially the git produced patches since they preserve the additional info from the original contribution.

If you are Working with dev code, run

# Still in $top_srcdir/tmp
git add ../share/lua/website/foo.lua

Or, if you are Working with binaries

# Still in foo/
git init ; git add lua/website/foo.lua

Once added, produce the patch with

git commit -am 'Add foo support'
git format-patch -M -1
quvi project uses astyle at url target=”http://astyle.sourceforge” name=”http://astyle.sourceforge”.net/ to reformat the C source code.
Working with the precompiled binaries.
You will be using LUA patterns.
Clone the current development code.
Close