[PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos#1115
[PDF Generator] Adicionar botão Crossmark aos PDFs dos artigos#1115
Conversation
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds Crossmark button support to PDFs generated by packtools. Crossmark is a Crossref service that lets readers verify whether a document is the latest version; the implementation overlays a clickable logo on the first page with a URI annotation and embeds XMP metadata with DOI and versioning fields.
Changes:
- New module
packtools/sps/formats/pdf/crossmark.pywithadd_crossmark()function, XMP helpers, logo overlay logic, and amain()CLI entry point - New
crossmark_pdfconsole-script entry point andpypdf/reportlabdependencies added tosetup.py - Bundled Crossmark logo PNG asset and comprehensive test suite covering unit and integration cases
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
packtools/sps/formats/pdf/crossmark.py |
Core module: logo overlay via reportlab, URI annotation via pypdf, XMP metadata merge, and CLI |
packtools/sps/formats/pdf/assets/CROSSMARK_Color_horizontal.png |
Bundled Crossmark logo for use as default logo |
setup.py |
Adds pypdf>=3.0.0, reportlab>=3.6.0 to install_requires and crossmark_pdf console script |
tests/sps/formats/pdf/test_crossmark.py |
Unit and integration tests for all public and helper functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import unittest | ||
| from unittest.mock import patch, MagicMock | ||
|
|
||
| from pypdf import PdfReader, PdfWriter |
There was a problem hiding this comment.
PdfWriter is imported but never directly used in the test file (it is only used indirectly via the add_crossmark function under test). This unused import should be removed.
| from pypdf import PdfReader, PdfWriter | |
| from pypdf import PdfReader |
| for prefix, uri in _XMP_NAMESPACES.items(): | ||
| ElementTree.register_namespace(prefix, uri) | ||
| ElementTree.register_namespace("x", "adobe:ns:meta/") | ||
| ElementTree.register_namespace( | ||
| "rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#" | ||
| ) |
There was a problem hiding this comment.
ElementTree.register_namespace() modifies a global dictionary in the Python standard library. In a multi-threaded or multi-process environment (e.g., when processing a large batch CSV concurrently), concurrent calls to _merge_xmp_packet could cause race conditions on this shared state, potentially leading to namespace prefix mangling in serialized XML output. Consider either using lxml (which has per-tree namespace handling) or protecting this block with a module-level lock if concurrent use is expected.
| bytes: UTF-8 encoded XMP packet. | ||
| """ | ||
| fields_xml = "\n".join( | ||
| f" <{field}>{value.format(doi=doi, date_stamp=date_stamp)}</{field}>" |
There was a problem hiding this comment.
The DOI is interpolated directly into an XML string without escaping XML special characters. If a DOI contains &, <, or > (e.g., a DOI with an ampersand, which is technically valid), this will produce malformed XML that could fail to parse or embed corrupt XMP into the PDF.
The DOI and date_stamp values should be XML-escaped (e.g., using xml.sax.saxutils.escape()) before being inserted into the manually-built XML string.
| first_page.merge_page(overlay_page) | ||
|
|
||
| # --- Add URI annotation (clickable hyperlink) on first page --- | ||
| crossmark_url = _CROSSMARK_URL.format(doi=doi, date_stamp=date_stamp) |
There was a problem hiding this comment.
The DOI is interpolated directly into the URL query string without percent-encoding. DOIs can contain characters such as #, +, or spaces, which are significant in a URL query string and would produce an incorrect or broken Crossmark link (e.g., a # would be interpreted as the start of a URL fragment, truncating the query string entirely).
The DOI should be URL-encoded before being placed in the query string, e.g., using urllib.parse.quote(doi, safe="") or urllib.parse.urlencode({"doi": doi, "domain": "pdf", "date_stamp": date_stamp}).
| generic.NameObject("/Type"): generic.NameObject("/Metadata"), | ||
| generic.NameObject("/Subtype"): generic.NameObject("/XML"), | ||
| }) | ||
| xmp_ref = writer._add_object(xmp_stream) |
There was a problem hiding this comment.
writer._add_object() is a private/internal pypdf API (denoted by the leading underscore). Private APIs are not part of the public contract and may be renamed, removed, or have their behavior changed in any minor or patch version of pypdf without notice. The public equivalent writer.add_object() was introduced in pypdf 3.x and should be used instead to ensure forward compatibility.
| xmp_ref = writer._add_object(xmp_stream) | |
| xmp_ref = writer.add_object(xmp_stream) |
| import csv | ||
| import tempfile | ||
| import unittest | ||
| from unittest.mock import patch, MagicMock |
There was a problem hiding this comment.
MagicMock is imported but never used in the test file. This is a dead import that should be removed to keep the test file clean.
| from unittest.mock import patch, MagicMock | |
| from unittest.mock import patch |
Adiciona suporte ao botão Crossmark nos PDFs gerados pelo packtools: logo clicável na primeira página com hyperlink para o diálogo Crossref, e metadados XMP completos embutidos no PDF.
O que esse PR faz?
packtools/sps/formats/pdf/crossmark.pycom função principal:reportlab; demais páginas intactashttps://crossmark.crossref.org/dialog?doi=...&domain=pdf&date_stamp=...) viapypdfdc,prism,crossmarkepdfx; preserva XMP existente via mergecrossmark_pdf(novo entry point) com modo single-file e modo batch via CSV:packtools/sps/formats/pdf/assets/CROSSMARK_Color_horizontal.pngpypdf>=3.0.0,reportlab>=3.6.0Onde a revisão poderia começar?
packtools/sps/formats/pdf/crossmark.py— módulo completo com toda a lógica de inserção, XMP e CLI.Como este poderia ser testado manualmente?
Algum cenário de contexto que queira dar?
O Crossmark é requisito de indexadores (Crossref, PubMed) para indicar que um PDF é a versão canônica/atualizada. A abordagem de overlay garante que o conteúdo original do PDF não é alterado — apenas a primeira página recebe a logo e a anotação; a stream de conteúdo das páginas seguintes é preservada integralmente. O merge de XMP preserva metadados preexistentes.
Screenshots
N/A — saída é um arquivo PDF. Validação via
exiftoolou inspeção de anotações compypdf.Quais são tickets relevantes?
Relacionado à issue sobre adicionar botão Crossmark aos PDFs dos artigos.
Referências
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.