modular parser framework

This commit is contained in:
Hadrian Burkhardt
2026-05-21 08:21:49 +00:00
parent 1223791074
commit cf5348a0c8
35 changed files with 452 additions and 166 deletions
+39 -7
View File
@@ -25,19 +25,53 @@ Main Module Entry Points
In the following the main workflow of this parser is explained.
Generating a new `OpenMensa` feed starts by reading the configured canteens. Some canteen data, such as ID, name, and location, are currently not scraped. Doing so would be very brittle and involve a multistep process. Refer to the :ref:`cache_hash` for deeper insight into the obstacles.
.. autofunction:: stw_potsdam.config.read_canteen_config
.. autofunction:: openmensa_parsers.config.read_canteen_config
.. autoclass:: stw_potsdam.config.Canteen
.. autoclass:: openmensa_parsers.config.Canteen
Use the canteen data to select matching upstream outlets, download the required menu JSON, and render the OpenMensa XML.
.. autoclass:: stw_potsdam.swp_webspeiseplan_api.SWPWebspeiseplanAPI
Parser Providers
~~~~~~~~~~~~~~~~
.. autoclass:: stw_potsdam.swp_webspeiseplan_parser.SWPWebspeiseplanParser
The application is structured around parser providers. A provider owns the
source-specific work: fetching raw upstream data and converting it into the
shared OpenMensa XML structures. The ``Builder`` only asks a provider for
canteens, attaches feed metadata, and renders XML.
New cities or data sources should add a parser under ``openmensa_parsers.parsers``.
The parser should implement three methods:
``fetch()``
Download or load the raw source data.
``parse(config, raw_data)``
Convert raw data into a ``dict[str, CanteenXML]`` keyed by the configured
canteen key.
``create_feed(canteen, url)``
Return the feed metadata for one canteen. In most cases, subclass
``BaseOpenMensaParser`` and configure ``feed`` instead of overriding this
method.
Register the parser in ``openmensa_parsers.parsers.registry``. At runtime, select a
parser with ``OM_PARSER_ID``. The default is ``potsdam``.
Parser tests should keep network access separate from parsing. Store raw
fixtures in the test suite, pass them directly into ``parse()``, and reserve
live source checks for opt-in tests.
.. autoclass:: openmensa_parsers.webspeiseplan_api.WebspeiseplanAPI
.. autoclass:: openmensa_parsers.webspeiseplan_parser.WebspeiseplanParser
.. autoclass:: openmensa_parsers.parsers.base.BaseOpenMensaParser
.. autoclass:: openmensa_parsers.parsers.potsdam.PotsdamParser
The XML type modules contain the OpenMensa rendering objects:
.. autoclass:: stw_potsdam.xml_types.builder.Builder
.. autoclass:: openmensa_parsers.xml_types.builder.Builder
Tests
~~~~~
@@ -55,5 +89,3 @@ Test execution works as follows: ::
The first invocation runs tests whose outcome can solely be determined by the test suite, which makes them suitable for frequent execution and CI systems.
Setting the environment variable ``ENABLE_API_QUERY`` enables tests which require querying the canteen API. Because third-party services are queried, those are more suited to manual execution. Developers can quickly check if their change is applicable to today's menu.